2021
DOI: 10.1186/s12864-021-07841-6
|View full text |Cite
|
Sign up to set email alerts
|

A machine learning approach for accurate and real-time DNA sequence identification

Abstract: Background The all-electronic Single Molecule Break Junction (SMBJ) method is an emerging alternative to traditional polymerase chain reaction (PCR) techniques for genetic sequencing and identification. Existing work indicates that the current spectra recorded from SMBJ experimentations contain unique signatures to identify known sequences from a dataset. However, the spectra are typically extremely noisy due to the stochastic and complex interactions between the substrate, sample, environment,… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
9
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
3

Relationship

2
5

Authors

Journals

citations
Cited by 14 publications
(9 citation statements)
references
References 21 publications
0
9
0
Order By: Relevance
“…According to the existence of 5'-end primer, 3'-end primer, or polyA tail, these sequences were divided into FLNC reads and non-full-length sequences. The former was clustered by iterative clustering in Iterative Clustering for Error Correction (Wang et al, 2021) algorithm software to generate the cluster consensus sequences. Subsequently, we corrected the polished consensus sequences of the TGS data through LoRDEC software (PacBio) with default parameters, and any redundancy in corrected consensus reads was removed by CD-hit (version 4.7) (Li and Godzik, 2006) to obtain the fi nal transcripts for subsequent analysis.…”
Section: Tgs Library Constructionmentioning
confidence: 99%
See 1 more Smart Citation
“…According to the existence of 5'-end primer, 3'-end primer, or polyA tail, these sequences were divided into FLNC reads and non-full-length sequences. The former was clustered by iterative clustering in Iterative Clustering for Error Correction (Wang et al, 2021) algorithm software to generate the cluster consensus sequences. Subsequently, we corrected the polished consensus sequences of the TGS data through LoRDEC software (PacBio) with default parameters, and any redundancy in corrected consensus reads was removed by CD-hit (version 4.7) (Li and Godzik, 2006) to obtain the fi nal transcripts for subsequent analysis.…”
Section: Tgs Library Constructionmentioning
confidence: 99%
“…The TGS technology is also called the single-molecule real-time (SMRT) sequencing technology, which is developed by Pacific Biosciences (PacBio) with longer sequencing lengths, full-length transcripts, direct sequencing without fragmentation or post-sequencing assembly, and easy analysis of alternative splicing (AS) (Kuang et al, 2019). The TGS can provide not only nucleotide sequences of the target molecules but also information regarding epigenetic modifications for systematical investigations of gene expression in various Conus species (Levene et al, 2003;Wang et al, 2021).…”
Section: Introductionmentioning
confidence: 99%
“…Thus, to differentiate between all possible targets including the complementary and mismatched ones, and to automate and speed up the diagnosis process, we apply a machine learning algorithm previously developed by our group based on the XGBoost algorithm [49][50][51] . This approach decreases the number of conductance measurements required while improving the detection and differentiation accuracy.…”
Section: Resultsmentioning
confidence: 99%
“…In supervised learning, the model is provided with both the training data and the correct answer, and it uses this information to predict the objective variable for unknown data. Regression, [132][133][134][135] which predicts continuous values, and classification, 33,[136][137][138][139][140][141][142][143][144][145][146][147] which predicts categorical values, such as chemical species prediction, are included in supervised learning.…”
Section: Supervised Learningmentioning
confidence: 99%