2023
DOI: 10.1101/2023.10.19.563060
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Identification of single nucleotide genetic polymorphism sites using machine learning methods

Mikalai M. Yatskou,
Elizabeth V. Smolyakova,
Victor V. Skakun
et al.

Abstract: The paper presents an algorithm for simulation modelling of nucleotide variations in the genomic DNA molecule. To identify single nucleotide genetic polymorphisms, it is proposed to use machine learning methods trained on simulated data. A comparative analysis of the effective classical and machine learning algorithms for identifying single nucleotide polymorphisms was performed on simulated data. The most optimal method for identifying single nucleotide genetic polymorphisms in DNA molecules at various experi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(3 citation statements)
references
References 18 publications
0
3
0
Order By: Relevance
“…We included in the comparative analysis two most effective existing SNP identification algorithms -the binomial distribution and entropy-based tests [2,4]. An efficient software implementation of the binomial distribution test (BDT) has been developed, a feature of which is the automation of the selection of a threshold value when identifying SNP sites.…”
Section: Organization Of a Computational Experimentsmentioning
confidence: 99%
See 2 more Smart Citations
“…We included in the comparative analysis two most effective existing SNP identification algorithms -the binomial distribution and entropy-based tests [2,4]. An efficient software implementation of the binomial distribution test (BDT) has been developed, a feature of which is the automation of the selection of a threshold value when identifying SNP sites.…”
Section: Organization Of a Computational Experimentsmentioning
confidence: 99%
“…It is proposed to use the value 10 -k as a threshold value of probabilities, where k is the average number of site coverages estimated from the simulated or experimental dataset. The published software implementation is used as an entropy-based test (EBT) [4]. Thresholds in identifying SNP sites are: the entropy E > 0,21 and the p-value < 0,5.…”
Section: Organization Of a Computational Experimentsmentioning
confidence: 99%
See 1 more Smart Citation