2023
DOI: 10.1038/s41467-023-36476-2
|View full text |Cite
|
Sign up to set email alerts
|

Multilingual translation for zero-shot biomedical classification using BioTranslator

Abstract: Existing annotation paradigms rely on controlled vocabularies, where each data instance is classified into one term from a predefined set of controlled vocabularies. This paradigm restricts the analysis to concepts that are known and well-characterized. Here, we present the novel multilingual translation method BioTranslator to address this problem. BioTranslator takes a user-written textual description of a new concept and then translates this description to a non-text biological data instance. The key idea o… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
7
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 10 publications
(7 citation statements)
references
References 84 publications
0
7
0
Order By: Relevance
“…While LLMs have seen marked success in the realms of DNA analysis [25] and biomedical NLP [26, 27], their application in single-cell research remains largely uncharted. There is a limited number of robust pre-trained models (known as single-cell LLMs) capable of managing multiple tasks in single-cell research.…”
Section: Introductionmentioning
confidence: 99%
“…While LLMs have seen marked success in the realms of DNA analysis [25] and biomedical NLP [26, 27], their application in single-cell research remains largely uncharted. There is a limited number of robust pre-trained models (known as single-cell LLMs) capable of managing multiple tasks in single-cell research.…”
Section: Introductionmentioning
confidence: 99%
“…The ability to integrate and explore these different types of data across different temporal and spatial scales will be powerfully amplified by AI and machine learning-enabled approaches [124][125][126]. Low-hanging fruit is already in sight with the identification of new associations among incompletely mapped or annotated data starting from simple text-based queries [127], where even simple gene expression profiling data (e.g., Figure 6) can provide useful and revealing points of departure.…”
Section: New Conceptual and Experimental Growing Pointsmentioning
confidence: 99%
“…Cross Text-protein Modalities ProteinDT (Liu et al, 2023a) is a multi-modal framework that uses semantically-related text for protein design. BioTranslator (Xu et al, 2023a) is a cross-modal translation system specifically designed for annotating biological instances, such as gene expression vectors, protein networks, and protein sequences, based on user-written text. Cross Three or More Biology Modalities Galactica (Taylor et al, 2022) is a general GPT-based large language model trained on various scientific domains, including scientific paper corpus, knowledge bases (e.g., PubChem (Kim et al, 2023) molecules, UniProt (uni, 2023) protein), codes, and other sources.…”
Section: Cross-modal Models In Biologymentioning
confidence: 99%