While caGrid 1.0 is designed to address use cases in cancer research, the requirements associated with discovery, analysis and integration of large scale data, and coordinated studies are common in other biomedical fields. In this respect, caGrid 1.0 is the realization of a framework that can benefit the entire biomedical community.
BackgroundA unique archive of Big Data on Parkinson’s Disease is collected, managed and disseminated by the Parkinson’s Progression Markers Initiative (PPMI). The integration of such complex and heterogeneous Big Data from multiple sources offers unparalleled opportunities to study the early stages of prevalent neurodegenerative processes, track their progression and quickly identify the efficacies of alternative treatments. Many previous human and animal studies have examined the relationship of Parkinson’s disease (PD) risk to trauma, genetics, environment, co-morbidities, or life style. The defining characteristics of Big Data–large size, incongruency, incompleteness, complexity, multiplicity of scales, and heterogeneity of information-generating sources–all pose challenges to the classical techniques for data management, processing, visualization and interpretation. We propose, implement, test and validate complementary model-based and model-free approaches for PD classification and prediction. To explore PD risk using Big Data methodology, we jointly processed complex PPMI imaging, genetics, clinical and demographic data.Methods and FindingsCollective representation of the multi-source data facilitates the aggregation and harmonization of complex data elements. This enables joint modeling of the complete data, leading to the development of Big Data analytics, predictive synthesis, and statistical validation. Using heterogeneous PPMI data, we developed a comprehensive protocol for end-to-end data characterization, manipulation, processing, cleaning, analysis and validation. Specifically, we (i) introduce methods for rebalancing imbalanced cohorts, (ii) utilize a wide spectrum of classification methods to generate consistent and powerful phenotypic predictions, and (iii) generate reproducible machine-learning based classification that enables the reporting of model parameters and diagnostic forecasting based on new data. We evaluated several complementary model-based predictive approaches, which failed to generate accurate and reliable diagnostic predictions. However, the results of several machine-learning based classification methods indicated significant power to predict Parkinson’s disease in the PPMI subjects (consistent accuracy, sensitivity, and specificity exceeding 96%, confirmed using statistical n-fold cross-validation). Clinical (e.g., Unified Parkinson's Disease Rating Scale (UPDRS) scores), demographic (e.g., age), genetics (e.g., rs34637584, chr12), and derived neuroimaging biomarker (e.g., cerebellum shape index) data all contributed to the predictive analytics and diagnostic forecasting.ConclusionsModel-free Big Data machine learning-based classification methods (e.g., adaptive boosting, support vector machines) can outperform model-based techniques in terms of predictive precision and reliability (e.g., forecasting patient diagnosis). We observed that statistical rebalancing of cohort sizes yields better discrimination of group differences, specifically for predictive analytics based on heter...
To identify pan-ancestry and ancestry-specific loci associated with attempting suicide among veterans, we conducted a genome-wide association study (GWAS) of suicide attempts within a large, multi-ancestry cohort of U.S. veterans enrolled in the Million Veterans Program (MVP). Cases were defined as veterans with a documented history of suicide attempts in the electronic health record (EHR; N =14,089) and controls were defined as veterans with no documented history of suicidal thoughts or behaviors in the EHR ( N =395,064). GWAS was performed separately in each ancestry group, controlling for sex, age and genetic substructure. Pan-ancestry risk loci were identified through meta-analysis and included two genome-wide significant loci on chromosomes 20 ( p =3.64×10 −9 ) and 1 ( p =3.69×10 −8 ). A strong pan-ancestry signal at the Dopamine Receptor D2 locus ( p =1.77×10 −7 ) was also identified and subsequently replicated in a large, independent international civilian cohort ( p =7.97×10 −4 ). Additionally, ancestry-specific genome-wide significant loci were also detected in African-Americans, European-Americans, Asian-Americans, and Hispanic-Americans. Pathway analyses suggested overrepresentation of many biological pathways with high clinical significance, including oxytocin signaling, glutamatergic synapse, cortisol synthesis and secretion, dopaminergic synapse, and circadian rhythm. These findings confirm that the genetic architecture underlying suicide attempt risk is complex and includes both pan-ancestry and ancestry-specific risk loci. Moreover, pathway analyses suggested many commonly impacted biological pathways that could inform development of improved therapeutics for suicide prevention.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.