Entry for:Bioinformatics Peer Prize III
Proteomics has enabled the broad-scale identification and study of post-translational modifications (PTMs), revealing that PTMs play key functional roles in many biological processes. However, proteomics studies of PTMs can be hampered by high false discovery rates (FDRs). One method of reducing these FDRs is to label PTM sites with heavy isotopes. By mixing unlabeled (light) and labeled (heavy) samples, and co-analysing them with liquid chromatography-tandem mass spectrometry (LC-MS/MS), true positive PTM sites can be identified from heavy/light peptide pairs. However, software designed to validate these heavy/light peptide pairs has yet to be developed.
Numerous software platforms have been developed to analyse heavy/light peptide pairs in proteomics data. However, these software platforms have been designed for quantitative proteomics experiments, in which all peptides are assumed to exist in heavy/light pairs. They therefore have limited utility when applied to samples prepared for PTM site validation using heavy isotopes, in which not all peptides exist in heavy/light pairs. To fill this software gap, we developed MethylQuant.
MethylQuant can specifically and sensitively identify heavy/light peptide pairs in proteomics datasets in which not all peptides exist in such pairs. To do this, it takes lists of peptide identifications (e.g., putative PTM-containing peptides) derived from a standard analysis of LC-MS/MS data (i.e., a sequence database search). Raw LC-MS/MS data is then searched to determine which of these peptides are associated with putative heavy/light peptide pairs. The validity of putative heavy/light peptide pairs are then assessed using heavy and light peptide isotope distribution correlations, elution profiles correlations and abundance ratios.
MethylQuant’s heavy/light peptide pair assessment criteria were developed using several high quality reference ‘heavy-methyl SILAC’ datasets, in which heavy/light peptide pairs – derived from post translational methylation-containing peptides (methylpeptides) – had previously been unambiguously characterised using low throughput methods.
MethylQuant is implemented in Python and accompanied with a user-friendly Graphical User Interface. The tool is freely available at https://bitbucket.org/aidantay/methylquant/src.
In designing MethylQuant’s heavy/light peptide pair assessment criteria, logistic regression analyses revealed that heavy and light peptide isotope distribution correlations, elution profiles correlations and abundance ratios are all statistically significant predictors of the true and false positive status of heavy/light peptide pairs.
Based on these analyses, 2 different MethylQuant confidence indicators for heavy/light peptide pairs were developed: ‘MethylQuant Confidence’ and ‘MethylQuant Score’. MethylQuant Confidence provides a confidence ranking for putative heavy/light peptide pairs, while the MethylQuant Score – derived from a logit model based on the above predictors – allows users to define FDR thresholds for heavy/light peptide pairs.
When evaluating these confidence indicators, we found that MethylQuant consistently identifies true positive heavy/light peptide pairs with high sensitivity and specificity. For example, in one particularly complex heavy-methyl SILAC reference dataset, MethylQuant identified 882 of 1165 true positive methylpeptide spectrum matches (i.e. >75% sensitivity) at high specificity (<2% FDR), and achieved near-perfect specificity at 41% sensitivity. These results highlight that MethylQuant is capable of validating heavy/light peptide pairs with high sensitivity while keeping FDRs acceptably low.
This work shows that MethylQuant offers an automated means for validating heavy/light peptide pairs in proteomics datasets. This provides an avenue by which isotopically labelled PTM sites can be validated in broad-scale proteomics experiments, filling a noteworthy gap in proteomics software.
These results are particularly noteworthy for the study of post-translational methylation. It has been shown that methylpeptide FDRs of >70% can be expected in proteomics experiments that do not label methylation sites with heavy isotopes. MethylQuant's in-depth capabilities therefore clear a path toward routine high accuracy characterisations of the methylproteome using heavy-methyl SILAC.
5. Future ideas/collaborators needed to advance research?
MethylQuant can be applied not only to PTM-specific labelling workflows, but also to workflows that utilise isotopically labelled peptide pairs of any type. For example, as MethylQuant outputs offer a universally applicable means of reporting the quality of heavy/light peptide pairs, low quality peptide pairs from SILAC-based quantitative proteomics experiment can be filtered out using MethylQuant’s confidence indicators. MethylQuant therefore offers a means for improving the overall quantification accuracy in SILAC experiments.