Professor Trevor Hastie
Department of Statistics, Stanford University
Division of Biostatistics, Health, Research and Policy Department, Stanford School of Medicine
(Joint work with Hui Zou)
One of the challenges with genomic data is building predictive models using
thousands of genes. We not only look for good predictors, but we would also
like to select a small instrumental subset of the genes. These problems
generalize to a number of similar scenarios, all sharing the
characterization that "p>>n". In this talk, we review the lasso procedure,
which has severe shortcomings when p>>n -- at most n variables are selected!
We then propose the ElasticNet, which overcomes this problem, and has the
ability to select groups of variables at a time.