We present a novel approach to multitask learning in classification problems based on Gaussian process (GP) classification. The method extends previous work on multitask GP regression, constraining the overall covariance (across tasks and data points) to factorize as a Kronecker product. Fully Bayesian inference is possible but time consuming using sampling techniques. We propose approximations based on the popular variational Bayes and expectation propagation frameworks, showing that they both achieve excellent accuracy when compared to Gibbs sampling, in a fraction of time. We present results on a toy dataset and two real datasets, showing improved performance against the baseline results obtained by learning each task independently. We also compare with a recently proposed state-of-the-art approach based on support vector machines, obtaining comparable or better results.
We present a probabilistic framework for transferring learning across tasks and between labeled and unlabeled data. The approach is based on Gaussian process (GP) prediction and incorporates both the geometry of the data and the similarity between tasks within a GP covariance, allowing Bayesian prediction in a natural way. We discuss the transfer of learning in a multitask scenario in the two cases where the underlying geometry is assumed to be the same across tasks and where different tasks are assumed to have independent geometric structures. We demonstrate the method on a number of real datasets, indicating that the semisupervised multitask approach can result in very significant improvements in performance when very few labeled training examples are available.
Many compound properties depend directly on the dissociation constants of its acidic and basic groups. Significant effort has been invested in computational models to predict these constants. For linear regression models, compounds are often divided into chemically motivated classes, with a separate model for each class. However, sometimes too few measurements are available for a class to build a reasonable model, e.g., when investigating a new compound series. If data for related classes are available, we show that multi-task learning can be used to improve predictions by utilizing data from these other classes. We investigate performance of linear Gaussian process regression models (single task, pooling, and multitask models) in the low sample size regime, using a published data set (n = 698, mostly monoprotic, in aqueous solution) divided beforehand into 15 classes. A multi-task regression model using the intrinsic model of co-regionalization and incomplete Cholesky decomposition performed best in 85 % of all experiments. The presented approach can be applied to estimate other molecular properties where few measurements are available.
We propose a novel approach to the automated discrimination of normal and ventricular arrhythmic beats. The method employs Gaussian Processes, a non-parametric Bayesian technique which is equivalent to a neural network with infinite hidden nodes. The method is shown to perform competitively with other approaches on the MIT-BIH ArrhythmiaDatabase. Furthermore, its probabilistic nature allows to obtain confidence levels on the predictions, which can be very useful to practitioners. IntroductionCardiac arrhythmias are one of the major causes of morbidity and mortality in the Western world. Their early diagnosis is often reliant on an analysis of electrocardiogram (ECG) traces, generally involving timeconsuming manual annotation by expert physicians. Because of this, several automated methods to detect arrhythmic beats have been proposed, often achieving very good levels of performance [1][2][3].We present a novel approach for the automatic classification of arrhythmic versus normal beats from ECG signals based on recent developments in Machine Learning. We use the framework of Gaussian Process (GP) classification [4], a non-parametric Bayesian technique which has been shown to be highly accurate on non-linear classification tasks while controlling complexity and avoiding the pitfalls of overfitting. GPs are a natural way to define probability distributions over spaces of functions; they can be viewed as a generalization of Neural Networks where the number of hidden nodes (basis functions) tends to infinity [5]. A key feature of GPs is their probabilistic nature, which means that predictions are always accompanied by an estimate of the associated uncertainty. This is a key advantage over standard non-linear classifiers such as neural networks which generally can only provide a hard assignment.The method uses as input the spectral or wavelet transform of segmented individual beats from a recording, which requires much less manual annotation than methods based on interval estimation. We use an Automatic Relevance Determination (ARD) kernel for the classifier to automatically reduce dimensionality and extract the most discriminant features by optimising weights. We test the model on the MIT-BIH arrhythmia data set on the two class problem of discriminating normal and premature ventricular contraction beats (PVC). The results we report show that the method is competitive with the state of the art, obtaining predictive accuracies on test data which are frequently above 90%. This can be further increased by thresholding over posterior probabilities and retaining only predictions with high confidence; the model consistently has a higher accuracy for prediction made with higher posterior probability, indicating that the discriminant obtained from the training data mirrors the structure of the whole data set.The rest of the paper is organised as follows: in the first section, we briefly review Gaussian Process classification. In the second section, we discuss the beat segmentation algorithm and the feature selection...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.