A novel approach to discovering genes associated with cancer risk from GWAS and EWAS data

Project Details

The project has two primary objectives:

  1. To apply novel analytical and computational approaches and high-performance computing to cancer GWAS and EWAS data to provide new insights into the genetic and environmental causes, aetiology and biology of cancer.
  2. To develop a comprehensive risk prediction model for cancer using GWAS data.

Project Summary

The aim of the project is to gain new insights into the causes of cancer by using machine learning and high performance computing for analysis of ultra-high dimensional genomic data. This project is highly novel because: (a) it will pioneer the use of existing Genome-Wide Association Study (GWAS) and recently generated Epigenome-Wide Association Study (EWAS) data in tandem to discover new genomic regions associated with cancer risk, and (b) the analysis of GWAS and EWAS data will be performed using our new statistical methodology and the supercomputing capabilities available at The University of Melbourne. Our approach conducts an ultra-high dimensional linkage analysis across genomic regions, and is not just a study of individual SNPs considered one at a time. We have shown that our method can discover genes associated with risk of breast cancer and melanoma. By plotting the germline (GWAS) and somatic (EWAS) signals across the genome on top of one another, we have a new and powerful tool for finding genes implicated in cancer aetiology. Furthermore, combining EWAS analyses with epidemiological risk factors and exposures has the potential to reveal new insights regarding the interactions between environmental and lifestyle exposures, DNA methylation changes and cancer risk.

We will perform computationally intensive interaction analyses, targeted analyses of major cancer pathways, and stratified analyses based on different molecular subtypes of cancer. The results of our analyses will also be used to develop a comprehensive risk prediction model that includes all known risk factors.

This project requires no further genotyping, and will utilise existing knowledge, computing and data resources developed by NHMRC Project APP1033452. This innovative proposal has the potential to further our understanding of the biology and aetiology of cancer, through identifying novel genetic and epigenetic risk loci for cancer.


Professor John Hopper

Dr Enes Makalic

Dr Daniel Schmidt

Dr Miroslaw Kapuscinski

Dr Adrian Bickerstaffe

Dr Minh Bui

Dr Guoqi Qian (Department of Mathematics and Statistics, Faculty of Science, The University of Melbourne)

Dr Danny Park (Department of Pathology, Melbourne Medical School, The University of Melbourne)

Professor Justin Zobel (Department of Computer Science and Software Engineering, Melbourne School of Engineering, The University of Melbourne)

Dr Adam Kowalczyk (NICTA)

Dr Adam Freeman

Thilina Ranaweera



Research Group

High Dimensional Analytics

Faculty Research Themes


School Research Themes

Prevention and management of non-communicable diseases (including cancer), and promotion of mental health, Data science, health metrics and disease modeling

Key Contact

For further information about this research, please contact the research group leader.

Department / Centre

Centre for Epidemiology and Biostatistics

MDHS Research library
Explore by researcher, school, project or topic.