2021
DOI: 10.1101/2021.04.15.440088
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Decoding microbiome and protein family linkage to improve protein structure prediction

Abstract: Information extracted from microbiome sequences through deep-learning techniques can significantly improve protein structure and function modeling. However, the model training and metagenome search were largely blind with low efficiency. Built on 4.25 billion microbiome sequences from four major biomes (Gut, Lake, Soil and Fermentor), we proposed a MetaSource model to decode the inherent link of microbial niches with protein homologous families. Large-scale protein family folding experiments showed that a targ… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3

Citation Types

0
3
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
2

Relationship

2
0

Authors

Journals

citations
Cited by 2 publications
(3 citation statements)
references
References 38 publications
0
3
0
Order By: Relevance
“…Meanwhile, incorrectly collected MSAs, despite having a high number of homologous sequences, can negatively impact the modeling results as witnessed in the CASP experiments [ 31 ]. The use of a targeted MSA generation protocol that focuses on searching sequences related to the target protein’s biome represents a promising strategy for improving the speed and quality of the MSA generation and the accuracy of the final 3D structure modeling [ 32 ].…”
Section: Discussionmentioning
confidence: 99%
“…Meanwhile, incorrectly collected MSAs, despite having a high number of homologous sequences, can negatively impact the modeling results as witnessed in the CASP experiments [ 31 ]. The use of a targeted MSA generation protocol that focuses on searching sequences related to the target protein’s biome represents a promising strategy for improving the speed and quality of the MSA generation and the accuracy of the final 3D structure modeling [ 32 ].…”
Section: Discussionmentioning
confidence: 99%
“…Here, we mainly report the results from two servers, "Zhang-Server" analyzed the metagenome assisted Pfam family structure modeling data and found that there is an inherent linkage between the microbiome niches and their homologous protein families. 55 When using MSAs constructed from an individual biome that is the most closely linked with the target protein family, the amount of memory requested and searching speed of MSA construction was significantly improved, compared to the more expensive MSA search from the whole set of the combined microbiome genome database. Meanwhile, the quality of the 3D structure modeling of the Pfam families was simultaneously improved compared to the latter.…”
Section: Discussionmentioning
confidence: 99%
“…For instance, the MSA construction by DeepMSA from the MetaClust (~100 GB) database took around 1 h using 1 CPU for a 150‐residue protein, while it took around 4 h using 50 CPUs for the same length protein by searching the 5TB IMG/M metagenome database in DeepMSA2. Most recently, we analyzed the metagenome assisted Pfam family structure modeling data and found that there is an inherent linkage between the microbiome niches and their homologous protein families 55 . When using MSAs constructed from an individual biome that is the most closely linked with the target protein family, the amount of memory requested and searching speed of MSA construction was significantly improved, compared to the more expensive MSA search from the whole set of the combined microbiome genome database.…”
Section: Discussionmentioning
confidence: 99%