A new software platform to analyze large scale ‘omic’ data according to their metabolic machinery: the case of the biogeochemical sulfur cycle





1. Background

The increasing expansion in the number of metagenomic and genomic sequences has dramatically improving our understanding of life’s microbial diversity to an unprecedented level of detail. Yet, our ability to infer metabolic capabilities in a large omic datasets remain biologically and computationally challenging. Here we propose a new Multigenomic Entropy Based Score (MEBS), which enclose the information derived from complex metabolic pathways into a single Score. To test MEBS we focused on the biogeochemical Sulfur cycle due to the lack of studies aiming to integrate all the microbiological and geochemical transformations and their corresponding metabolic pathways in global scale.  

2. Method

MEBS algorithm is a software platform written in Bash, Perl and Python and have been tested under Linux environments. The first step of MEBS consists of the systematic manual acquisition and curation of the molecular and ecological information required to describe the metabolic machinery of interest, for example, the sulfur metabolism. This information is represented by two input files: a list of microorganisms and a multi FASTA file of proteins. MEBS then evaluate the presence/absence patterns of the input proteins in a Genomic dataset (Gen), containing 2,107 non-redundant complete sequenced genomes. Then, the expected vs observed pattern in the input organisms is obtained for each of the input proteins using the mathematical framework of relative entropy (H’). The last step consists in the summation of all the input protein entropies present in the omics data to be evaluated (either genomes or metagenomes) in order to obtain the final Entropy Score. MEBS was thoroughly tested to capture the importance of biogeochemical Sulfur (S) cycles in 935 metagenomes 2107 genomes. The performance, reproducibility and robustness of MEBS was evaluated using several approaches including a random sampling test, linear regression models and ROC curves.

3. Results

We present MEBS, a new open source software platform aimed to quantitatively evaluate, compare and infer the metabolic machinery of interest, in large ‘omic’ datasets, including complex metabolic pathways such as entire biogeochemical cycles. MEBS algorithm is free, open source and available through: through https://github.com/eead-csic-compbio/metagenome_Pfam_score. The curation effort reported here represents the first comprehensive inventory of the genes, enzymes, pathways, compounds and organisms involved in the sulfur cycle. The input protein domains enriched among sulfur-based microorganisms were obtained with the relative entropy (H’) mathematical framework. The clustering of the 112 H’ values of the input sulfur proteins obtained in a large collection of non-redundant genomes, highlight the possibility of use 12 sulfur informative domains as sulfur cycle marker genes in metagenomic data. Finally the summation of 112’ H’ values in a given genome or metagenome dataset build up the MEBS final Score (Sulfur Score: SS). The SS values in the genomic and metagenomic data collections strongly highlight the broad applicability of our proposed algorithm to accurately detect the sulfur cycle metabolic machinery in large OMIC scale in a fast and a simple fashion manner

4. Conclusions

Our Sulfur cycle benchmark using MEBS software platform, indicate that the use of a single informative Score the metabolic machinery of interest holds the potential to dramatically change the current view of inferring metabolic capabilities in the present omic-era. We have demonstrated that MEBS is very accurate to detect and classify genomes and metagenomes known to be closely involved in the Sulfur Cycle, suggesting several applications like, the prediction of metabolic capabilities in uncultivated/unexplored taxa and the generation of a measurable score devoted to evaluating any given metabolic pathway or cycle in large meta- genomic scale. 

5. Future ideas/collaborators needed to further research?

In this study, we focused on evaluate the Sulfur cycle, but we are currently preparing the manuscript for the carbon, nitrogen, oxygen, phosphorous and iron cycles. Furthermore, we are also working in improve MEBS algorithm by using only a list of microorganisms of interest to avoid the manual exhaustive curation of the proteins involved in the metabolic pathway of interest. We are looking forward to collaborating and help other researchers interested in integrate this software platform in large scale analysis (i.e climate change, bioremediation studies, etc)   


Angelo Aquino
about 1 year ago

Oh wow you won the prize?? This is such a cool topic..

quick help
9 months ago

In Essay writing help with the Center for Systems Genomics at the University of Melbourne, The Australian Regenerative Medicine Institute at Monash University and Melbourne Bioinformatics, Thinkable is eager to dispatch the inaugural 'Companion Prize' for bioinformatics.

Jack William
8 months ago

In this quick case study, I learn lots of things that really important to me.Especially when you talk about dog ear cleaner under $50 and the way you explain each and everything was really good.

Valerie De Anda
7 months ago

We have updated the main script of MEBS to compute with a single script the importance of the main biogeochemical cycles (C,N,O,Fe and S) in metagenomic and genomic data. Please have a look at :
Main Software page: https://eead-csic-compbio.github.io/metagenome_Pfam_score/
Readme: https://eead-csic-compbio.github.io/metagenome_Pfam_score/READMEv1.html
Paper: https://academic.oup.com/gigascience/article/6/11/1/4561660

Justin Brunker
4 months ago

Challenges of the data are done for the inclusion of ten norms for the humans. The phase of the data and research papers writing help is approved for the use of the reforms for the humans. The perspective of the data collector is new for the youngsters.

aliyah brown
3 months ago

Thanks for sharing. I learn lots of things that really important to me spanish dictionary

kelly Leona
2 months ago

What's more, you're correct - my experimental research included a branch of connected humanism (interpersonal organization investigation of correspondence/data stream on assignment service around particular, geologically important open issues ).

Muneer Ahmed
24 days ago

You have a great sense of writing I must say. Your post has those facts which are not accessible from anywhere else. It’s my humble request to u please keep writing such remarkable articles How to access multiple Gmail accounts in one login?

Muneer Ahmed
17 days ago

start with fresh vegetables such as spinach, kale, broccoli or others as your base. In a study published restaurants near me open now


PhD student at Ecology Institute UNAM Using 'omic' approaches to understand the individual reactions that make life possible on Earth, focusing on how the genes involved in biogeochemcial cycle...

Round: Open Peer Voting
Category: Student Prize






Recent Voters