Entry for:The Peer Prize for Women in Science 2017
1. Please give a brief summary of your work.
Epigenetic marks such as DNA methylation can influence how genes function and are one of the mechanisms by which the environment can interact with the genome. DNA methylation is the most widely studied epigenetic mark and is known to be essential to normal development. It is also frequently disrupted in diseases such as imprinting disorders and cancer. Consequently, there is extensive interest in being able to study methylation across a variety contexts. Our research has produced a suite of methods for analysing methylation data that was measured using array technology across the human genome. We have developed and published robust statistical methods to clean up noisy data, detect methylation sites of interest and aid in the biological interpretation of results. Recently, our team packaged these methods into an accessible R software package that is freely available for the analysis of human methylation array data. The software package is called missMethyl.
2. Describe your approach and broader findings.
The missMethyl team consists of Drs Belinda Phipson, Jovana Maksimovic and Alicia Oshlack who have been working together for more than 4 years. As research bioinformaticians we bring together scientific skills in computer science, statistics and molecular biology in order to understand biological processes of medical importance. Our work in the field of methylation analysis has resulted in several widely-cited publications as it enables specialised analysis of human methylation array data in a variety of biological contexts. missMethyl is a software package comprised of four different analysis methods. SWAN is a normalization approach that removes a technical bias that is known to exist in the data. DiffVar is the first method designed to detect methylation sites that vary differently between conditions, which is of interest as variability in methylation signals has been observed in both cancer and aging. RUVm is a method that can remove unwanted variation from large data sets such as those that are used to investigate association between methylation and disease across populations. Finally, gometh and gsameth are methods that facilitate better biological interpretation of results by grouping important genes together and testing them as a set whilst accounting for biases that exist due to the design of methylation arrays.
missMethyl encapsulates all of these methods into one easy to use software package, developed in the R statistical programming language and hosted as part of the widely-used Bioconductor project. This means that missMethyl can easily interface with other R and Bioconductor software. Of the thousands of packages available from Bioconductor, missMethyl is one of the few that has been developed by an all female team. The individual methods available in missMethyl have been cited between 20 and 300 times indicating their importance in the analysis of methylation array data. The missMethyl package itself has already been cited 17 times since publication highlighting its utility in a wide range of studies.
3. What is the wider contribution, or impact, to your scientific field(s)?
The missMethyl software package is a convenient means for disseminating our novel methodological approaches for analysing methylation data. By coupling research papers with high quality and up-to-date software we ensure that other researchers across the globe are empowered to use our innovative methods. The additional step of contributing our software to the Bioconductor project ensures that our software meets a high standard, undergoes external curation, and is easy to install. It has been downloaded more than 5000 times demonstrating the wide impact of the approach. Our methods have been applied in a wide variety of contexts including colorectal cancer, Crohn’s disease, pregnancy and childhood development, stress reactivity and type 1 diabetes.
4. Are there any potential ideas you would like to explore to take this research further?
The missMethyl package was originally developed to analyse DNA methylation arrays, however the use of sequencing technologies to measure methylation will increase as the cost inevitably drops. We would like to adapt our methods specifically for analysing data generated using sequencing methods, which may show different biases.
Currently, our methods for identifying methylation sites while removing unwanted variation, and finding sites that vary in their signal between groups of interest focus on a single methylation locus. However, it is known that methylation sites that are adjacent to each other in the genome have correlated methylation levels. The ability to identify multiple adjacent methylation sites for these two problems would be a major advance. In addition, our methods for testing groups of genes associated with different methylation signals are limited to single loci. A useful advance would be the ability to use methylation signals across regions encompassing multiple loci to find enrichment of sets of genes.