2023
DOI: 10.1101/2023.11.09.566411
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

ProkBERT Family: Genomic Language Models for Microbiome Applications

Balázs Ligeti,
István Szepesi-Nagy,
Babett Bodnár
et al.

Abstract: Machine learning offers transformative capabilities in microbiology and microbiome analysis, deciphering intricate microbial interactions, predicting functionalities, and unveiling novel patterns in vast datasets. This enriches our comprehension of microbial ecosystems and their influence on health and disease. However, the integration of machine learning in these fields contends with issues like the scarcity of labeled datasets, the immense volume and complexity of microbial data, and the subtle interactions … Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(4 citation statements)
references
References 81 publications
0
4
0
Order By: Relevance
“…We initiated the evaluation by collecting promoter and non-promoter sequences from the ProkBERT dataset 31 . Prokaryotic promoter sequences typically span 81 base pairs.…”
Section: Fine Tuning Of Scorpio Embeddings For Bacterial Promoter Pre...mentioning
confidence: 99%
See 1 more Smart Citation
“…We initiated the evaluation by collecting promoter and non-promoter sequences from the ProkBERT dataset 31 . Prokaryotic promoter sequences typically span 81 base pairs.…”
Section: Fine Tuning Of Scorpio Embeddings For Bacterial Promoter Pre...mentioning
confidence: 99%
“…In this study, we utilized the promoter dataset provided by Ligeti et al 31 for training and testing our promoter prediction models. The promoter dataset by Ligeti et al consists of experimentally validated promoter sequences primarily drawn from the Prokaryotic Promoter Database (PPD), which includes sequences from 75 different organisms.…”
Section: Promoter Datasetmentioning
confidence: 99%
“…This trend presents both significant opportunities and new challenges for researchers in the field of biological sciences. For instance, a specific workflow for the processing whole-genome sequencing (WGS) data, involves utilization of dozens of software tools 3 and necessitating researchers to possess fundamental skills in software installation, parameter invocation, and troubleshooting capabilities. Furthermore, the in-depth analyses demand proficiency in an expanded toolkit of coding and visualization.…”
Section: Introductionmentioning
confidence: 99%
“…Bioinformatics is an interdisciplinary discipline, by leveraging cutting-edge computational methodologies and algorithmic strategies, it plays an irreplaceable role in many fields such as biology [1], medical science [2], microbiology [3], etc. Bioinformatics fosters a holistic understanding of biological processes by facilitating the integration and interpretation of multi-level data, from genomics to transcriptomics, proteomics, metabolomics, and beyond [4].…”
Section: Introductionmentioning
confidence: 99%