Diabetes mellitus is a chronic disease characterized by hyperglycemia. It may cause many complications. According to the growing morbidity in recent years, in 2040, the world’s diabetic patients will reach 642 million, which means that one of the ten adults in the future is suffering from diabetes. There is no doubt that this alarming figure needs great attention. With the rapid development of machine learning, machine learning has been applied to many aspects of medical health. In this study, we used decision tree, random forest and neural network to predict diabetes mellitus. The dataset is the hospital physical examination data in Luzhou, China. It contains 14 attributes. In this study, five-fold cross validation was used to examine the models. In order to verity the universal applicability of the methods, we chose some methods that have the better performance to conduct independent test experiments. We randomly selected 68994 healthy people and diabetic patients’ data, respectively as training set. Due to the data unbalance, we randomly extracted 5 times data. And the result is the average of these five experiments. In this study, we used principal component analysis (PCA) and minimum redundancy maximum relevance (mRMR) to reduce the dimensionality. The results showed that prediction with random forest could reach the highest accuracy (ACC = 0.8084) when all the attributes were used.
In this paper, we consider the mixture of sparse linear regressions model. Let β (1) , . . . , β (L) ∈ C n be L unknown sparse parameter vectors with a total of K non-zero elements. Noisy linear measurements are obtained in the form yi = x H i β ( i ) + wi, each of which is generated randomly from one of the sparse vectors with the label i unknown. The goal is to estimate the parameter vectors efficiently with low sample and computational costs. This problem presents significant challenges as one needs to simultaneously solve the demixing problem of recovering the labels i as well as the estimation problem of recovering the sparse vectors β ( ) .Our solution to the problem leverages the connection between modern coding theory and statistical inference. We introduce a new algorithm, Mixed-Coloring, which samples the mixture strategically using query vectors xi constructed based on ideas from sparse graph codes. Our novel code design allows for both efficient demixing and parameter estimation. To find K non-zero elements, it is clear that we need at least Θ(K) measurements, and thus the time complexity is at least Θ(K). In the noiseless setting, for a constant number of sparse parameter vectors, our algorithm achieves the order-optimal sample and time complexities of Θ(K). In the presence of Gaussian noise, 1 for the problem with two parameter vectors (i.e., L = 2), we show that the Robust Mixed-Coloring algorithm achieves near-optimal Θ(K polylog(n)) sample and time complexities. When K = O(n α ) for some constant α ∈ (0, 1) (i.e., K is sublinear in n), we can achieve sample and time complexities both sublinear in the ambient dimension. In one of our experiments, to recover a mixture of two regressions with dimension n = 500 and sparsity K = 50, our algorithm is more than 300 times faster than EM algorithm, with about one third of its sample cost.
We study the support recovery problem for compressed sensing, where the goal is to reconstruct the sparsity pattern of a highdimensional K-sparse signal x ∈ R N , as well as the corresponding sparse coefficients, from low-dimensional linear measurements with and without noise. Our key contribution is a new compressed sensing framework through a new family of carefully designed sparse measurement matrices associated with minimal measurement costs and a low-complexity recovery algorithm. Specifically, the measurement matrix in our framework is designed based on the well-crafted sparsification through capacity-approaching sparsegraph codes, where the sparse coefficients can be recovered efficiently in a few iterations by performing simple error decoding over the observations. We formally connect this general recovery problem with sparse-graph decoding in packet communication systems, and analyze our framework in terms of the measurement cost, computational complexity and recovery performance. Specifically, we show that in the noiseless setting, our framework can recover any arbitrary K-sparse signal in O(K) time using 2K measurements asymptotically with a vanishing error probability. In the noisy setting, when the sparse coefficients take values in a finite and quantized alphabet, our framework can achieve the same goal in time O(K log(N/K)) using O(K log(N/K)) measurements obtained from measurement matrix with elements {−1, 0, 1}. When the sparsity K is sub-linear in the signal dimension K = O(N δ ) for some 0 < δ < 1, our results are order-optimal in terms of measurement costs and run-time, both of which are sub-linear in the signal dimension N . The sub-linear measurement cost and run-time can also be achieved with continuous-valued sparse coefficients, with a slight increment in the logarithmic factors. More specifically, in the continuous alphabet setting, when K = O(N δ ) and the magnitudes of all the sparse coefficients are bounded below by a positive constant, our algorithm can recover an arbitrarily large (1−p)-fraction of the support of the sparse signal using O(K log(N/K) log log(N/K)) measurements, and O(K log 1+r (N/K)) run-time, where r is an arbitrarily small constant. For each recovered sparse coefficient, we can achieve O( ) error for an arbitrarily small constant . In addition, if the magnitudes of all the sparse coefficients are upper bounded by O(K c ) for some constant c < 1, then we are able to provide a strong 1 recovery guarantee for the estimated signal x: x − x 1 ≤ κ x 1, where the constant κ can be arbitrarily small. This offers the desired scalability of our framework that can potentially enable real-time or near-real-time processing for massive datasets featuring sparsity, which are relevant to a multitude of practical applications. X. Li is with Cubist Systematic Strategies.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.