High Dimensional Analytics

Research Overview

Modern data sets in epidemiology and biostatistics are commonly ultra-high dimensional, comprising millions of exposure variables. In contrast to classical statistics, modern data sets also tend to have significantly more exposure variables than the sample size. Examples of such data include genome-wide association studies data, mammographic imagining, and DNA methylation data. Classical multivariate analysis methodology is not readily applicable in this setting and this has recently led to a new field of statistics research referred to as high dimensional statistics.

The High Dimensional Analytics unit was introduced in order to provide statistical and computing expertise for analysis of modern data. We provide practical advice and statistical expertise in:

  • Sparse regression methods (for example, LASSO and the elastic net)
  • Modern Bayesian regression methods and corresponding MCMC sampling algorithms (for example, Bayesian LASSO, the horseshoe framework, Bayesian bridge, etc)
  • Image processing algorithms and implementation (for example, image segmentation)
  • High-dimensional statistical and parallel computing

The unit is also responsible for development of novel methodology that is applicable to high dimensional statistics. For example, we have recently developed novel techniques for analysis of genome-wide association studies data that offer significant advantages over the conventional analysis approach in this setting.

Staff

Dr Enes Makalic (unit head)

Dr Adrian Bickerstaffe

Dr James Dowty

Dr Mirek (Miroslaw) Kapuscinski

Thilina Ranaweera

Dr Daniel Schmidt

Cameron Wellard