2022
DOI: 10.1093/bioinformatics/btac764
|View full text |Cite
|
Sign up to set email alerts
|

Active learning for efficient analysis of high-throughput nanopore data

Abstract: As the third-generation sequencing technology, nanopore sequencing has been used for high-throughput sequencing of DNA, RNA, and even proteins. Recently, many studies have begun to use machine learning technology to analyze the enormous data generated by nanopores. Unfortunately, the success of this technology is due to the extensive labeled data, which often suffer from enormous labor costs. Therefore, there is an urgent need for a novel technology that can not only rapidly analyze nanopore data with high-thr… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
7
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 8 publications
(7 citation statements)
references
References 54 publications
(75 reference statements)
0
7
0
Order By: Relevance
“…Applying methods that have been proved successful in other areas to biobased molecules and materials is expected to revolutionize the development of nature-inspired computational materials for a circular economy. Examples are the use of active learning for free energy calculations, 217 efficient analysis of high-throughput nanopore data, 218 chemical dynamics simulations of interfacial systems, 219 or physics-informed ML models. 220 …”
Section: The Role Of Multiscale Modeling Ai and ML On Biomass Valoriz...mentioning
confidence: 99%
“…Applying methods that have been proved successful in other areas to biobased molecules and materials is expected to revolutionize the development of nature-inspired computational materials for a circular economy. Examples are the use of active learning for free energy calculations, 217 efficient analysis of high-throughput nanopore data, 218 chemical dynamics simulations of interfacial systems, 219 or physics-informed ML models. 220 …”
Section: The Role Of Multiscale Modeling Ai and ML On Biomass Valoriz...mentioning
confidence: 99%
“…Data labels in this context are numeric or categorical properties, which an ML model is trained to predict. Obtaining those labels, called annotation, might require a domain expert categorisation, an experiment, a measurement or a simulation [4][5][6]. Active learning (AL) is an ML method that applies when we have a large pool of fully characterised feature data and a far smaller amount of annotated label data.…”
Section: Introductionmentioning
confidence: 99%
“…In contrast to current approaches that rely on a single round of training data, active learning offers a way to iteratively improve models by selecting new training examples based on their potential to improve the model. Active learning has been successfully applied to model metabolic networks (47), optimize cell culture media (48), perform in silico drug screens (49)(50)(51)(52), identify TFs that drive cellular differentiation (53), select optimal training data for nanopore base calling (54), and design Perturb-seq experiments (55). Here we apply active learning to the problem of cisregulation for the first time, using it to iteratively train models of a cell type-specific cis-regulatory grammar in the early postnatal mouse retina, an experimentally accessible portion of the developing central nervous system.…”
Section: Introductionmentioning
confidence: 99%
“…By prioritizing these perturbations at each iteration, we efficiently sample the space of possible sequences for informative examples, and thereby train accurate machine learning models with less data [40][41][42] . Active learning has been successfully applied to model metabolic networks 43 , optimize cell culture media 44 , perform in silico drug screens [45][46][47][48] , improve text and image classifiers 42,49 , discover energy-efficient materials 50,51 , identify TFs that drive cellular differentiation 52 , and select optimal training data for nanopore base calling 53 . However, active learning has not yet been applied to train models of cis-regulatory grammars.…”
Section: Introductionmentioning
confidence: 99%