2021
DOI: 10.26508/lsa.202101113
|View full text |Cite
|
Sign up to set email alerts
|

Statistical guidelines for quality control of next-generation sequencing techniques

Abstract: More and more next-generation sequencing (NGS) data are made available every day. However, the quality of this data is not always guaranteed. Available quality control tools require profound knowledge to correctly interpret the multiplicity of quality features. Moreover, it is usually difficult to know if quality features are relevant in all experimental conditions. Therefore, the NGS community would highly benefit from condition-specific data-driven guidelines derived from many publicly available experiments,… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
1

Relationship

1
4

Authors

Journals

citations
Cited by 5 publications
(1 citation statement)
references
References 36 publications
0
1
0
Order By: Relevance
“…In previous studies, we used 2642 qualitylabeled FASTQ files from the ENCODE project to derive statistical features with different bioinformatics tools well known in the scientific community. We showed that these features have explanatory power over the quality of the data from which they were derived, and built a machine learning classification tool that uses these features as input [11,12]. With a grid search of multiple machine learning algorithms, from logistic regression to ensemble methods and multilayer perceptrons, we were able to provide a robust prediction of quality in FASTQ files.…”
Section: Introductionmentioning
confidence: 99%
“…In previous studies, we used 2642 qualitylabeled FASTQ files from the ENCODE project to derive statistical features with different bioinformatics tools well known in the scientific community. We showed that these features have explanatory power over the quality of the data from which they were derived, and built a machine learning classification tool that uses these features as input [11,12]. With a grid search of multiple machine learning algorithms, from logistic regression to ensemble methods and multilayer perceptrons, we were able to provide a robust prediction of quality in FASTQ files.…”
Section: Introductionmentioning
confidence: 99%