2016
DOI: 10.1016/j.jbi.2016.03.018
|View full text |Cite
|
Sign up to set email alerts
|

SD-MSAEs: Promoter recognition in human genome based on deep feature extraction

Abstract: The prediction and recognition of promoter in human genome play an important role in DNA sequence analysis. Entropy, in Shannon sense, of information theory is a multiple utility in bioinformatic details analysis. The relative entropy estimator methods based on statistical divergence (SD) are used to extract meaningful features to distinguish different regions of DNA sequences. In this paper, we choose context feature and use a set of methods of SD to select the most effective n-mers distinguishing promoter re… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
10
0

Year Published

2016
2016
2019
2019

Publication Types

Select...
6
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 22 publications
(10 citation statements)
references
References 18 publications
0
10
0
Order By: Relevance
“…Although a DNA sequence comprising of nucleotides do not show any distinguishing property when looked at visually, deep learning based models are powerful enough to infer various distinguishing features from local patterns if we can represent such sequence with appropriate mathematical representation (Umarov and Solovyev (2017), Singh et al (2016), Xu et al (2016)). In a DNA sequence, there can be four types of monomers such as -A, T, C and G. So, our monomer representation of each DNA sample is a 81 × 4 size two dimensional matrix (each sequence is 81 nucleotide long in our dataset).…”
Section: Mathematical Formulation Of Dna Sequencementioning
confidence: 99%
“…Although a DNA sequence comprising of nucleotides do not show any distinguishing property when looked at visually, deep learning based models are powerful enough to infer various distinguishing features from local patterns if we can represent such sequence with appropriate mathematical representation (Umarov and Solovyev (2017), Singh et al (2016), Xu et al (2016)). In a DNA sequence, there can be four types of monomers such as -A, T, C and G. So, our monomer representation of each DNA sample is a 81 × 4 size two dimensional matrix (each sequence is 81 nucleotide long in our dataset).…”
Section: Mathematical Formulation Of Dna Sequencementioning
confidence: 99%
“…The latter extracted features according to various promoter properties, such as CpG content, 9 free energy, 10 consensus sequence, 11 and global descriptor, 10 and built the prediction programs based on machine learning approaches, such as Fisher’s linear discriminant, 10 decision tree, 12 support vector machine (SVM), 13 Hidden Markov Model, 11 neural network, 14 pattern-based nearest neighbor search approach, 15 and so on. Recently, deep learning has been used to grasp complex promoter sequence characteristics16, 17 and related bioinformatics identification problems 18, 19, 20, 21, 22. Although existing algorithms have exhibited encouraging performance, most of those predictors focused on only one species, and there is still space for prediction performance improvement.…”
Section: Introductionmentioning
confidence: 99%
“…Another work built a convolutional neural network model to investigate the activities of transcription factors and histone modifications during E2-induced G1e differentiation [ 50 ]. Other examples of deep learning models in genomics include models to predict protein contact map [ 51 , 52 ], protein residue-residue contacts [ 53 , 54 ], protein sequence labeling [ 55 ], protein disorderedness [ 56 , 57 ], protein structures [ 58 – 61 ], protein properties [ 62 ], protein fold recognition [ 63 ], the functional effect of non-coding variants [ 64 ], the pathogenicity of variants [ 65 ], and the regulatory code of genomes [ 66 , 67 ].…”
Section: Introductionmentioning
confidence: 99%