Minimum message length inference with application to genome-wide association studies data
Principal supervisor: Dr Enes Makalic
Cosupervisors: Prof John Hopper, Dr Daniel Schmidt, Guoqi Qian
The objective of genome-wide association studies (GWAS) is to discover genetic markers, called single nucleotide polymorphisms (SNPs), which are associated with some disease or trait. This is an extremely challenging data-analysis problem due to the high-dimensionality and strong correlation of the predictor variables (i.e., SNPs). The conventional statistical approach, marginal hypothesis testing, suffers from low detection rate due to its inability to detect joint effects and the requirement of a stringent p-value threshold to control the number of false positives.
The minimum message length (MML) principle is an approach to statistical inference with a strong foundation in information theory. In this framework, the optimal model is the model that can best compress the data and the model parameters using the shortest message length. MML has a number of attractive statistical properties (e.g., invariance, consistency) and allows the practitioner to easily compare statistical models with different model structures. The focus of my PhD is on developing a minimum message length principle model selection toolbox, which can be applied to many different settings, such as linear regression and hypothesis testing. My preliminary findings suggest that my new MML toolbox, when compared to conventional methods, can improve the ranking of SNPs truly associated with a disease or trait.
PhD scholarship and funding body:
- Melbourne International Research Scholarship
- Melbourne International Fee Remission Scholarship