Statistical machine learning methods are increasingly used for neuroimaging data analysis. Their main virtue is their ability to model high-dimensional datasets, e.g., multivariate analysis of activation images or resting-state time series. Supervised learning is typically used in decoding or encoding settings to relate brain images to behavioral or clinical observations, while unsupervised learning can uncover hidden structures in sets of images (e.g., resting state functional MRI) or find sub-populations in large cohorts. By considering different functional neuroimaging applications, we illustrate how scikit-learn, a Python machine learning library, can be used to perform some key analysis steps. Scikit-learn contains a very large set of statistical learning algorithms, both supervised and unsupervised, and its application to neuroimaging data provides a versatile tool to study the brain.
Machine learning is a pervasive development at the intersection of statistics and computer science. While it can benefit many data-related applications, the technical nature of the research literature and the corresponding algorithms slows down its adoption. Scikit-learn is an open-source software project that aims at making machine learning accessible to all, whether it be in academia or in industry. It benefits from the general-purpose Python language, which is both broadly adopted in the scientific world, and supported by a thriving ecosystem of contributors. Here we give a quick introduction to scikit-learn as well as to machine-learning basics.
BackgroundIn the context of sensory and cognitive-processing deficits in ADHD patients, there is considerable evidence of altered event related potentials (ERP). Most of the studies, however, were done on ADHD children. Using the independent component analysis (ICA) method, ERPs can be decomposed into functionally different components. Using the classification method of support vector machine, this study investigated whether features of independent ERP components can be used for discrimination of ADHD adults from healthy subjects.MethodsTwo groups of age- and sex-matched adults (74 ADHD, 74 controls) performed a visual two stimulus GO/NOGO task. ERP responses were decomposed into independent components by means of ICA. A feature selection algorithm defined a set of independent component features which was entered into a support vector machine.ResultsThe feature set consisted of five latency measures in specific time windows, which were collected from four different independent components. The independent components involved were a novelty component, a sensory related and two executive function related components. Using a 10-fold cross-validation approach, classification accuracy was 92%.ConclusionsThis study was a first attempt to classify ADHD adults by means of support vector machine which indicates that classification by means of non-linear methods is feasible in the context of clinical groups. Further, independent ERP components have been shown to provide features that can be used for characterizing clinical populations.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.