2023
DOI: 10.1101/2023.10.29.564479
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Simplifying bioinformatics data analysis through conversation

Zhengyuan Dong,
Han Zhou,
Yifan Jiang
et al.

Abstract: The burgeoning field of bioinformatics has been revolutionized by the rapid growth of omics data, providing insights into various biological processes. However, the complexity of bioinformatics tools and the rapidly evolving nature of data analysis pipelines present significant challenges for researchers, especially those lacking extensive programming expertise. To address these challenges, we introduce BioMANIA, an artificial intelligence-driven, natural language-oriented bioinformatics data analysis pipeline… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(3 citation statements)
references
References 41 publications
0
3
0
Order By: Relevance
“…Lastly, the core piece of our architecture: the analysis subsystem, had mixed performance results which varied depending on factors such as instruction complexity, degree of human involvement, and dataset size. As has been noted by others [13,27] code generation performance is meaningfully affected by instruction complexity. The first issue with making requests involving any level of complexity is that it results in increasingly inconsistent outputs.…”
Section: Resultsmentioning
confidence: 81%
See 1 more Smart Citation
“…Lastly, the core piece of our architecture: the analysis subsystem, had mixed performance results which varied depending on factors such as instruction complexity, degree of human involvement, and dataset size. As has been noted by others [13,27] code generation performance is meaningfully affected by instruction complexity. The first issue with making requests involving any level of complexity is that it results in increasingly inconsistent outputs.…”
Section: Resultsmentioning
confidence: 81%
“…The architecture we implored to answer this question is illustrated in Figure 1, of note is that numerous works exist [12][13][14] demonstrating the capabilities of LLMs to perform certain steps of this pipeline, and given the superior performance of GPT-4 across a range of benchmarks [15], we restricted ourselves to models built on Ope-nAI's platform. Specifically, we applied a system of text-generation and data analysis agents [14,15] to the task of replicating the methodology of a high impact journal article [16].…”
Section: Introductionmentioning
confidence: 99%
“…To this end, there are multiple proposed LMM-based solutions for data analysis that either benchmarked LLMs or built LMM-based solutions (12,13). In addition, there are specialized bioinformatics solutions relying on LLMs that bring prompt based solutions to enhance bioinformatics tasks (14)(15)(16)(17). Our main differentiating factor is building a more complete and ready use solution that integrates advanced prompt engineering solutions as well as interactivity.…”
Section: Discussionmentioning
confidence: 99%