This work represents the first attempt to provide an overview of how to face data integration as the result of a dialogue between neuroscientists and computer scientists. Indeed, data integration is fundamental for studying complex multifactorial diseases, such as the neurodegenerative diseases. This work aims at warning the readers of common pitfalls and critical issues in both medical and data science fields. In this context, we define a road map for data scientists when they first approach the issue of data integration in the biomedical domain, highlighting the challenges that inevitably emerge when dealing with heterogeneous, large-scale and noisy data and proposing possible solutions. Here, we discuss data collection and statistical analysis usually seen as parallel and independent processes, as cross-disciplinary activities . Finally, we provide an exemplary application of data integration to address Alzheimer's Disease (AD), which is the most common multifactorial form of dementia worldwide. We critically discuss the largest and most widely used datasets in AD, and demonstrate how the emergence of machine learning and deep learning methods has had a significant impact on disease's knowledge particularly in the perspective of an early AD diagnosis.
Clinical decision support systems based on machine-learning algorithms are largely applied in the context of the diagnosis of neurodegenerative diseases (NDDs). While recent models yield robust classifications in supervised two classes-problems accurately separating Parkinson's disease (PD) from healthy control (HC) subjects, few works looked at prodromal stages of NDDs. Idiopathic Rapid-eye Movement (REM) sleep behavior disorder (iRBD) is considered a prodromal stage of PD with a high chance of phenoconversion but with heterogeneous symptoms that hinder accurate disease prediction. Machine learning (ML) based methods can be used to develop personalized trajectory models, but these require large amounts of observational points with homogenous features significantly reducing the possible imaging modalities to non-invasive and cost-effective techniques such as high-density electrophysiology (hdEEG). In this work, we aimed at quantifying the increase in accuracy and robustness of the classification model with the inclusion of network-based metrics compared to the classical Fourier-based power spectral density (PSD). We performed a series of analyses to quantify significance in cohort-wise metrics, the performance of classification tasks, and the effect of feature selection on model accuracy. We report that amplitude correlation spectral profiles show the largest difference between iRBD and HC subjects mainly in delta and theta bands. Moreover, the inclusion of amplitude correlation and phase synchronization improves the classification performance by up to 11% compared to using PSD alone. Our results show that hdEEG features alone can be used as potential biomarkers in classification problems using iRBD data and that large-scale network metrics improve the performance of the model. This evidence suggests that large-scale brain network metrics should be considered important tools for investigating prodromal stages of NDD as they yield more information without harming the patient, allowing for constant and frequent longitudinal evaluation of patients at high risk of phenoconversion.
Background: Drug-resistant focal epilepsy, defined by failure of two antiepileptic drugs, affects about 30% of patients with epilepsy. Epilepsy surgery may represent an alternative options for this population. However, defining the epileptogenic zone to be surgically removed requires highly specialised medical expertise as well as advanced technologies. The aim of this work is building a cost-effective support system based on text, in particular based on the semiological descriptions of the seizures (temporal vs extratemporal lobe; right vs left hemisphere), in order to predict the localization of seizure origin. Methods: Among a population of 121 surgically treated and seizure-free drug-resistant patients suffering with focal epilepsy, recruited at the Niguarda Hospital in Milan, we extracted a total number of 509 descriptions of seizures. After a data pre-processing phase, we used natural language processing tools to build numerical representations of the seizures descriptions, both using embedding and count-based methods. We then used machine learning models performing a binary classification into right/left and temporal/extra-temporal. Results: All predictive models show a better performance when using the representations relying on embedding models respect to count-based ones. Between all the combinations of representations and classifiers, the best performance obtained in terms of F1-score is 84.7% ± 0.6. Discussion: This preliminary work reached encouraging results considering both localization tasks. The main advantage is that no specific knowledge about epilepsy is used to build the models, rendering our pipeline applicable also in other scenarios. The major limitation lies in the fact that the text is highly specific to the writer.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.