Entry for:The Bioinformatics Peer Prize
DNA methylation is an important mechanism of epigenetic regulation in development and disease.
New generation sequencers allow genome-wide measurements of the methylation status by reading short stretches of the DNA sequence (Methyl-seq). Several software tools for methylation analysis have been proposed over recent years. However, the current trend is that the new sequencers and the ones expected for an upcoming future yield sequences of increasing length, making these software tools inefficient and obsolete. New software tools are needed in order to face the current challenge of huge datasets composed of very long sequences.
We propose a new strategy for methylation analysis that greatly reduces the required execution time of the mapping tools while yielding a better level of sensitivity, particularly for datasets composed of long reads. It consists of two independent techniques: first, we use a bidirectional implementation of the Burrows-Wheeler Transform (BWT) that tries to map each read onto the reference genome simultaneously starting from both read ends (and proceeding to the center of the read). Unlike other implementations of bidirectional BWT , it allows up to two errors, insertions or deletions (EIDs). Second, we propose a parallel pipeline different from previous ones. This new pipeline merges several stages into a single but more flexible stage, based on the BWT, which provides fewer but very likely correct regions of the genome where each read can be mapped. As a result, the use of the Smith & Waterman Algorithm (SWA) in the pipeline is greatly reduced, and it maps much shorter read segments. Since the computational cost of the SWA depends on the read length, the proposed strategy greatly improves the performance of the methylation tools, allowing them to linearly scale with the length of the reads.
A new software based on a strategy for methylation analysis of Methyl-seq sequencing data that requires much shorter execution times while yielding a better level of sensitivity, particularly for datasets composed of long reads. This strategy can be exported to other methylation, DNA and RNA analysis tools.
The developed software tool achieves execution times one order of magnitude shorter than the
existing tools, while yielding equal sensitivity for short reads and even better sensitivity for long reads.
5. Future ideas/collaborators needed to further research?
We would like to collaborate with researchers that need to apply DNA methylation analysis to their research. Also, we are now developing a tool for DNA hydroxymethylation analysis, to be applied in the genomic study of Diabetes Mellitus 2.
6. Please share a link to your paper