Background The malaria risk analysis of multiple populations is crucial and of great importance whilst compressing limitations. However, the exponential growth in diversity and accumulation of genetic variation data obtained from malaria-infected patients through Genome-Wide Association Studies opens up unprecedented opportunities to explore the significant differences between genetic markers (risk factors), particularly in the resistance or susceptibility of populations to malaria risk. Thus, this study proposes using statistical tests to analyse large-scale genetic variation data, comprising 20,854 samples from 11 populations within three continents: Africa, Oceania, and Asia. Methods Even though statistical tests have been utilized to conduct case–control studies since the 1950s to link risk factors to a particular disease, several challenges faced, including the choice of data (ordinal vs. non-ordinal) and test (parametric vs. non-parametric). This study overcomes these challenges by adopting the Mann–Whitney U test to analyse large-scale genetic variation data; to explore the statistical significance of markers between populations; and to further identify the highly differentiated markers. Results The findings of this study revealed a significant difference in the genetic markers between populations (p < 0.01) in all the case groups and most control groups. However, for the highly differentiated genetic markers, a significant difference (p < 0.01) was present for most genetic markers with varying p-values between the populations in the case and control groups. Moreover, several genetic markers were observed to have very significant differences (p < 0.001) across all populations, while others exist between certain specific populations. Also, several genetic markers have no significant differences between populations. Conclusions These findings further support that the genetic markers contribute differently between populations towards malaria resistance or susceptibility, thus showing differences in the likelihood of malaria infection. In addition, this study demonstrated the robustness of the Mann–Whitney U test in analysing genetic markers in large-scale genetic variation data, thereby indicating an alternative method to explore genetic markers in other complex diseases. The findings hold great promise for genetic markers analysis, and the pipeline emphasized in this study can fully be reproduced to analyse new data.
In recent malaria research, the complexity of the disease has been explored using machine learning models via blood smear images, environmental, and even RNA-Seq data. However, a machine learning model based on genetic variation data is still required to fully explore individual malaria risk. Furthermore, many Genome-Wide Associations Studies (GWAS) have associated specific genetic markers, i.e., single nucleotide polymorphisms (SNPs), with malaria. Thus, the present study improves the current state-of-the-art genetic risk score by incorporating SNPs mutation location on large-scale genetic variation data obtained from GWAS. Nevertheless, it becomes computationally expensive for hyperparameter optimization on large-scale datasets. Therefore, this study proposes a machine learning model that incorporates mutation location as well as a Genetic Algorithm (GA) to optimize hyperparameters. Besides that, a deep learning model is also proposed to predict individual malaria risk as an alternative approach. The analysis is performed on the Malaria Genomic Epidemiology Network (MalariaGEN) dataset comprising 20,817 individuals from 11 populations. The findings of this study demonstrated that the proposed GA could overcome the curse of dimensionality and improve resource efficiency compared to commonly used methods. In addition, incorporating the mutation location significantly improved the machine learning models in predicting the individual malaria risk; a Mean Absolute Error (MAE) score of 8.00E−06. Moreover, the deep learning model obtained almost similar MAE scores to the machine learning models, indicating an alternative approach. Thus, this study provides relevant knowledge of genetic and technical deliberations that can improve the state-of-the-art methods for predicting individual malaria risk.
Background: Publicly available genome data provides valuable information on the genetic variation patterns across different modern human populations. Neuropeptide genes are crucial to the nervous, immune, endocrine system, and physiological homeostasis as they play an essential role in communicating information in neuronal functions. It remains unclear how evolutionary forces, such as natural selection and random genetic drift, have affected neuropeptide genes among human populations. To date, there are over 100 known human neuropeptides from the over 1000 predicted peptides encoded in the genome. The purpose of this study is to analyze and explore the genetic variation in continental human populations across all known neuropeptide genes by examining highly differentiated SNPs between African and non-African populations. Results: We identified a total of 644,225 SNPs in 131 neuropeptide genes in 6 worldwide population groups from a public database. Of these, 5163 SNPs that had ΔDAF |(African-non-African)| ≥ 0.20 were identified and fully annotated. A total of 20 outlier SNPs that included 19 missense SNPs with a moderate impact and one stop lost SNP with high impact, were identified in 16 neuropeptide genes. Our results indicate that an overall strong population differentiation was observed in the non-African populations that had a higher derived allele frequency for 15/20 of those SNPs. Highly differentiated SNPs in four genes were particularly striking: NPPA (rs5065) with high impact stop lost variant; CHGB (rs6085324, rs236150, rs236152, rs742710 and rs742711) with multiple moderate impact missense variants; IGF2 (rs10770125) and INS (rs3842753) with moderate impact missense variants that are in linkage disequilibrium. Phenotype and disease associations of these differentiated SNPs indicated their association with hypertension and diabetes and highlighted the pleiotropic effects of these neuropeptides and their role in maintaining physiological homeostasis in humans. Conclusions: We compiled a list of 131 human neuropeptide genes from multiple databases and literature survey. We detect significant population differentiation in the derived allele frequencies of variants in several neuropeptide genes in African and non-African populations. The results highlights SNPs in these genes that may also contribute to population disparities in prevalence of diseases such as hypertension and diabetes.
In recent years, author identification has become an active research area, where the major differences are caused by paper or online medium, mode of entry and target audience. Much research has been devoted to analyzing writing styles in handwritten, word-processed and online social networks (OSN) texts. Word processing editors that typically include spell and grammar checkers may influence the writing style as it allows an individual to edit a piece of text to perfection. Thus, similarities may exist between OSN and word-processed texts. Moreover, none of the studies to date have made a detailed comparison of the writing styles across multidisciplinary factors. This paper attempts to close the gap between the writing styles in pre-and post-Internet periods as well as provide an in-depth comparison of the writing styles in OSN texts across three major factors: demographics, personality & behavior, and cybersecurity. The aim is to learn from past literature as we advance these techniques to OSN texts. Thus, in this paper, we also propose a novel machine learning prediction model based on tense morphology, to classify age and gender from English blogs, and the PAN 2013 dataset. This model achieves an accuracy of 94%-98% and 95%-97% for age and gender, respectively. INDEX TERMS Online social networks, survey, writing styles.
Predicting Mild Cognitive Impairment (MCI) is currently a challenge as existing diagnostic criteria rely on neuropsychological examinations. Automated Machine Learning (ML) models that are trained on verbal utterances of MCI patients can aid diagnosis. Using a combination of skip-gram features, our model learned several linguistic biomarkers to distinguish between 19 patients with MCI and 19 healthy control individuals from the DementiaBank language transcript clinical dataset. Results show that a model with compound of skip-grams has better AUC and could help ML prediction on small MCI data sample.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.