Malaria parasites adopt unresolved discrepancy of life segments as they grow through various mosquito vector stratospheres. Transcriptomes of thousands of individual parasites exists. Ribonucleic acid sequencing (RNA-seq) is a widespread method for gene expression which has resulted into improved understandings of genetical queries. RNA-seq compute transcripts of gene expressions. RNA-seq data necessitates analytical improvements of machine learning techniques. Several learning approached have been proposed by researchers for analyzing biological data. In this study, PCA feature extraction algorithm is used to fetch latent components out of a high dimensional malaria vector RNA-seq dataset, and evaluates it classification performance using KNN and Decision Tree classification algorithms. The effectiveness of this experiment is validated on a mosquito anopheles gambiae RNA-Seq dataset. The experiment result achieved a relevant performance metrics with a classification accuracy of 86.7% and 83.3% respectively.
Malaria is the world's leading cause of death, spread by Anopheles mosquitoes. Gene expression is a fundamental level where the effects of unseen vital revealing genes and developmental systems can be evident for detection of distinctions in malaria infections, to recognize the biological processes in human. Ribonucleic acid sequencing offers a large-scale measurable generated profiling transcriptional data results that help a variety of applications such as scientific and clinical condition studies. A fundamental limitation of ribonucleic acid sequencing consists of high dimensional, infrequent and noises, making classification of genes challenging. Several approaches have proposed enhancing the problem of the curse of dimensionality problem, requiring more improvement, yet it is critical to obtain accurate results. In this study, a hybrid dimensionality reduction technique proposes an optimized Genetic algorithm to pick pertinent subset features from the data. Features chosen is passed into principal component analysis and independent component analysis methods grounded on their class variants, to help transform the selected elements into a lower dimension separately. Support vector machine kernel classifiers used the reduced malaria vector dataset to assess the classification performance of the experiment.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.