Abstract. We address the problem of building a clustering as a subset of a (possibly large) set of candidate clusters under user-defined constraints. In contrast to most approaches to constrained clustering, we do not constrain the way observations can be grouped into clusters, but the way candidate clusters can be combined into suitable clusterings. The constraints may concern the type of clustering (e.g., complete clusterings, overlapping or encompassing clusters) and the composition of clusterings (e.g., certain clusters excluding others). In the paper, we show that these constraints can be translated into integer linear programs, which can be solved by standard optimization packages. Our experiments with benchmark and real-world data investigates the quality of the clusterings and the running times depending on a variety of parameters.
Abstract. We propose a new approach to test selection based on the discovery of subgroups of patients sharing the same optimal test, and present its application to breast cancer diagnosis. Subgroups are defined in terms of background information about the patient. We automatically determine the best t subgroups a patient belongs to, and decide for the test proposed by their majority. We introduce the concept of prediction quality to measure how accurate the test outcome is regarding the disease status. The quality of a subgroup is then the best mean prediction quality of its members (choosing the same test for all). Incorporating the quality computation in the search heuristic enables a significant reduction of the search space. In experiments on breast cancer diagnosis data we showed that it is faster than the baseline algorithm APRIORI-SD while preserving its accuracy.
One of the goals of medical research in the area of dementia is to correlate images of the brain with clinical tests. Our approach is to start with the images and explain the differences and commonalities in terms of the other variables. First, we cluster Positron emission tomography (PET) scans of patients to form groups sharing similar features in brain metabolism. To the best of our knowledge, it is the first time ever that clustering is applied to whole PET scans. Second, we explain the clusters by relating them to non-image variables. To do so, we employ RSD, an algorithm for relational subgroup discovery, with the cluster membership of patients as target variable. Our results enable interesting interpretations of differences in brain metabolism in terms of demographic and clinical variables. The approach was implemented and tested on an exceptionally large data collection of patients with different types of dementia. It comprises 10 GB of image data from 454 PET scans, and 42 variables from psychological and demographical data organized in 11 relations of a relational database. We believe that explaining medical images in terms of other variables (patient records, demographic information, etc.) is a challenging new and rewarding area for data mining research.
Spontaneous electroencephalogram (EEG) and auditory evoked potentials (AEP) have been suggested to monitor the level of consciousness during anesthesia. As both signals reflect different neuronal pathways, a combination of parameters from both signals may provide broader information about the brain status during anesthesia. Appropriate parameter selection and combination to a single index is crucial to take advantage of this potential. The field of machine learning offers algorithms for both parameter selection and combination. In this study, several established machine learning approaches including a method for the selection of suitable signal parameters and classification algorithms are applied to construct an index which predicts responsiveness in anesthetized patients. The present analysis considers several classification algorithms, among those support vector machines, artificial neural networks and Bayesian learning algorithms. On the basis of data from the transition between consciousness and unconsciousness, a combination of EEG and AEP signal parameters developed with automated methods provides a maximum prediction probability of 0.935, which is higher than 0.916 (for EEG parameters) and 0.880 (for AEP parameters) using a cross-validation approach. This suggests that machine learning techniques can successfully be applied to develop an improved combined EEG and AEP parameter to separate consciousness from unconsciousness.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations鈥揷itations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.