2023
DOI: 10.1021/acssynbio.3c00310
|View full text |Cite
|
Sign up to set email alerts
|

Generative Artificial Intelligence GPT-4 Accelerates Knowledge Mining and Machine Learning for Synthetic Biology

Zhengyang Xiao,
Wenyu Li,
Hannah Moon
et al.

Abstract: Knowledge mining from synthetic biology journal articles for machine learning (ML) applications is a labor-intensive process. The development of natural language processing (NLP) tools, such as GPT-4, can accelerate the extraction of published information related to microbial performance under complex strain engineering and bioreactor conditions. As a proof of concept, we proposed prompt engineering for a GPT-4 workflow pipeline to extract knowledge from 176 publications on two oleaginous yeasts (Yarrowia lipo… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
6
2

Relationship

1
7

Authors

Journals

citations
Cited by 26 publications
(4 citation statements)
references
References 41 publications
0
4
0
Order By: Relevance
“…Only 5 papers ( Kather et al, 2022 ; Grinbaum and Adomaitis, 2023 ; Morris, 2023 ; Ray, 2023 ; Xiao et al, 2023 ), a popular science article ( Tarasava, 2023 ), and an editorial ( Generating ‘smarter’ biotechnology, 2023 ) discussed the impact of generative AI on synthetic biology. This is expected to increase dramatically quite soon, given the success of this latest wave of AI technology and the platform aspects of its spread.…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…Only 5 papers ( Kather et al, 2022 ; Grinbaum and Adomaitis, 2023 ; Morris, 2023 ; Ray, 2023 ; Xiao et al, 2023 ), a popular science article ( Tarasava, 2023 ), and an editorial ( Generating ‘smarter’ biotechnology, 2023 ) discussed the impact of generative AI on synthetic biology. This is expected to increase dramatically quite soon, given the success of this latest wave of AI technology and the platform aspects of its spread.…”
Section: Resultsmentioning
confidence: 99%
“…Although biorisk is subject to an established governance regime ( Mampuys and Brom, 2018 ; Wang and Zhang, 2019 ), and scientists generally adhere to biosafety protocols if they receive the appropriate training and build a culture of responsibility ( Perkins et al, 2019 ), even experimental, but legitimate use by scientists could lead to unexpected developments ( O’Brien and Nelson, 2020 ). Additionally, recent advances in chatbots enabled by generative AI, technology capable of producing convincing real-world content, including text, code, images, music, and video, based on vast amounts of training data ( Feuerriegel et al, 2024 ), accelerates knowledge mining in biology ( Xiao et al, 2023 ) but has revived fears that advanced biological insight can get into the hands of malignant individuals or organizations ( Grinbaum and Adomaitis, 2023 ). It also further blurs the boundary between our understanding of living and non-living matter ( Deplazes and Huppenbauer, 2009 ).…”
Section: Introductionmentioning
confidence: 99%
“…These outcomes supported the rationale for constructing larger phylochemical maps with a text mining approach but suggested that methods to reduce the false positive rate (other than laborious manual inspection) should be explored. Recent studies in natural language processing have highlighted the abilities of large language models to answer questions about the meaning of text strings, including text describing chemical and biological concepts, 3739 and such models are now readily accessible. Accordingly, we next evaluated the abilities of large language models to categorize candidate compound-species associations by using the manually curated dataset as a ground truth set ( Figure 4A ).…”
Section: Resultsmentioning
confidence: 99%
“…NEKO can compile massive literature reports, fill knowledge gap, remove redundant data, and connect information streams, which can be used to collect both features and targets from literature for developing standardized datasets 15 . NEKO can be widely used for Synthetic biology research.…”
Section: Discussionmentioning
confidence: 99%