In this paper we discuss the challenge of equitably combining continuous (quantitative) and categorical (qualitative) variables for the purpose of cluster analysis. Existing techniques require strong parametric assumptions, or difficult-to-specify tuning parameters. We describe the kamila package, which includes a weighted k-means approach to clustering mixed-type data, a method for estimating weights for mixed-type data (Modha-Spangler weighting), and an additional semiparametric method recently proposed in the literature (KAMILA). We include a discussion of strategies for estimating the number of clusters in the data, and describe the implementation of one such method in the current R package. Background and usage of these clustering methods are presented. We then show how the KAMILA algorithm can be adapted to a map-reduce framework, and implement the resulting algorithm using Hadoop for clustering very large mixed-type data sets.
Leukodystrophies (LD) and lysosomal storage disorders (LSD) have generated increased interest recently as targets for newborn screening programs. Accurate epidemiological benchmarks are needed in the U.S. Age-specific mortality rates were estimated for Krabbe disease (KD) and nine related disorders. U.S. mortality records with E75.2 cause of death code during 1999-2004 were collected from 11 open record states. All E75.2 deaths in the United States were distributed into specific disease type based on proportions observed in these states. Yearly population sizes were obtained from the CDC and averaged. Mortality rates (per million individuals per year) by age group for the specific diseases were (for <5 or ≥5 years): Pelizaeus-Merzbacher (0.037/0.033); sudanophilic leukodystrophy (SLD) (0.037/0.004); Canavan (0.037/0.011), Alexander (0.147/0.022); Krabbe (0.994/0.007); metachromatic leukodystrophy (0.331/0.135); Fabry (0.000/0.124); Gaucher (0.221/0.073); Niemann-Pick (NP) (0.442/0.088); multiple sulfatase (0.000/0.004). This is the first report of mortality rates for the LD/LSD diseases in the U.S. Approximated birth prevalence rate for the early infantile Krabbe phenotype (onset 0-6 months) was based on the <5 year old mortality rate of one early infantile case per 244,000 births, which matches the 1 in 250,000 observed in the NYS newborn screening program as of 2011. It should be noted however that the NYS calculation refers only to the early infantile phenotype and does not include the majority of babies identified in the program with low GALC and two mutations who have remained clinically normal. It is presumed that most, if not all, will develop later onset forms of the disease, but this is by no means certain.
In spite of the abundance of clustering techniques and algorithms, clustering mixed interval (continuous) and categorical (nominal and/or ordinal) scale data remain a challenging problem.In order to identify the most effective approaches for clustering mixed-type data, we use both theoretical and empirical analyses to present a critical review of the strengths and weaknesses of the methods identified in the literature. Guidelines on approaches to use under different scenarios are provided, along with potential directions for future research.
Millennia ago Pythagoras noted a simple but remarkably powerful rule for the aesthetics of tone combinations: pairs of tones--intervals--with simple ratios such as an octave (ratio 2 : 1) or a fifth (ratio 3 : 2) were pleasant sounding (consonant), whereas intervals with complex ratios such as the major seventh (ratio 243 : 128) were harsh (dissonant). These Pythagorean ratio rules are the building blocks of Western classical music; however, their neurophysiologic basis is not known. Using functional MRI we have found the neurophysiologic correlates of the ratio rules. In musicians, the inferior frontal gyrus, superior temporal gyrus, medial frontal gyrus, inferior parietal lobule and anterior cingulate respond with progressively more activation to perfect consonances, imperfect consonances and dissonances. In nonmusicians only the right inferior frontal gyrus follows this pattern.
This review addresses difficulties arising in estimating epidemiological parameters of leukodystrophies and lysosomal storage disorders, with special focus on Krabbe disease. Although multiple epidemiological studies of Krabbe disease have been published, these studies are difficult to reconcile since they have used different study populations and varying methods of calculation. Confusion exists regarding which epidemiological parameters have been estimated; the current review shows that most previous estimates can be properly interpreted as lifetime risk at birth. One of the most common estimation methods is shown to be inaccurate, while two other methods are shown to be approximately accurate. Based on the results of the current paper, recommendations are made that are expected to improve the quality of future studies of Krabbe disease. It is anticipated that these recommendations will be applicable to epidemiological studies of other lysosomal storage disorders, as well as any other rare diseases diagnosed with enzymatic screening.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.