High Dimensional Analytics
Modern data sets in epidemiology and biostatistics are commonly ultra-high dimensional, comprising millions of exposure variables. In contrast to classical statistics, modern data sets also tend to have significantly more exposure variables than the sample size. Examples of such data include genome-wide association studies data, mammographic imagining, and DNA methylation data. Classical multivariate analysis methodology is not readily applicable in this setting and this has recently led to a new field of statistics research referred to as high dimensional statistics.
The High Dimensional Analytics unit was introduced in order to provide statistical and computing expertise for analysis of modern data. We provide practical advice and statistical expertise in:
- Sparse regression methods (for example, LASSO and the elastic net)
- Modern Bayesian regression methods and corresponding MCMC sampling algorithms (for example, Bayesian LASSO, the horseshoe framework, Bayesian bridge, etc)
- Image processing algorithms and implementation (for example, image segmentation)
- High-dimensional statistical and parallel computing
The unit is also responsible for development of novel methodology that is applicable to high dimensional statistics. For example, we have recently developed novel techniques for analysis of genome-wide association studies data that offer significant advantages over the conventional analysis approach in this setting.
Dr Enes Makalic (unit head)
- A novel approach to discovering genes associated with cancer risk from GWAS and EWAS data
- A new automated measure of breast cancer risk from digital and film mammography (Cirrus)
- Novel statistical methodology for the analysis of medical data
- The Ark: Cloud-based bioinformatics tools
Faculty Research Themes
School Research Themes
For further information about this research, please contact Dr Enes Makalic