2019
DOI: 10.1186/s40537-019-0241-0
|View full text |Cite
|
Sign up to set email alerts
|

Feature selection methods and genomic big data: a systematic review

Abstract: With the advance of computational techniques, the amount of genomic data has risen exponentially, with a rapid rate [1] making it hard to utilize such data in the medical field without appropriate pre-processing, which in turn leads to more complexity and veracity issues [2] eventually creating multiple complications such as storage, analysis, privacy and security. Therefore, genomic data may look easy to handle in terms of its volume, but it actually requires quite a complicated process due to the complexity,… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
52
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
6
2
1
1

Relationship

1
9

Authors

Journals

citations
Cited by 106 publications
(58 citation statements)
references
References 46 publications
0
52
0
Order By: Relevance
“…The amount of data generated by high-throughput sequencing technologies 115 represents a challenge in genomic prediction, particularly due to the difficulty of working with high-dimensional datasets, i.e., the 'large p, small n' problem 116 . This increase in the amount of available information makes the task of directly applying these marker data in genomic analyses more difficult and necessitates appropriate preprocessing steps 117 . In this study, we proposed the use of FS techniques to select a smaller set of SNPs with more predictive power than the entire dataset and closer associations with the brown rust phenotype to assist the identification of regions associated with disease status.…”
Section: Discussionmentioning
confidence: 99%
“…The amount of data generated by high-throughput sequencing technologies 115 represents a challenge in genomic prediction, particularly due to the difficulty of working with high-dimensional datasets, i.e., the 'large p, small n' problem 116 . This increase in the amount of available information makes the task of directly applying these marker data in genomic analyses more difficult and necessitates appropriate preprocessing steps 117 . In this study, we proposed the use of FS techniques to select a smaller set of SNPs with more predictive power than the entire dataset and closer associations with the brown rust phenotype to assist the identification of regions associated with disease status.…”
Section: Discussionmentioning
confidence: 99%
“…Feature selection (filter, wrapper and embedded) [7][8][9] and feature extraction [10][11][12] (supervised and unsupervised) are dimensionality reduction approaches that have been established, these approaches have overcome several problems such as performance enhancement, yet there is need for improvements hybrid model and optimization for getting better results [13]. Finding an optimal subset of genes proficient at handling high dimension optimization difficulties with reasonable solutions is required [5].…”
Section: Introductionmentioning
confidence: 99%
“…Whenever the needed number of training examples cannot be provided, reducing features decreases the size of the needed training examples and hence increases the overall yield shape of the classification algorithm. In the previous years, two methods for dimensional reduction were presented: feature selection and feature extraction [4,5]. Feature selection (FS) seeks for a relevant subset of existing features, while features are designed for a new space of lower dimensionality in the feature extraction method.…”
Section: Introductionmentioning
confidence: 99%