Clustering is an important means of data mining based on separating data categories by similar features. Unlike the classification algorithm, clustering belongs to the unsupervised type of algorithms. Two representatives of the clustering algorithms are the K-means and the expectation maximization (EM) algorithm. Linear regression analysis was extended to the category-type dependent variable, while logistic regression was achieved using a linear combination of independent variables. To predict the possibility of occurrence of an event, a statistical approach is used. However, the classification of all data by means of logistic regression analysis cannot guarantee the accuracy of the results. In this paper, the logistic regression analysis is applied to EM clusters and the K-means clustering method for quality assessment of red wine, and a method is proposed for ensuring the accuracy of the classification results.
An important problem in frequency analysis is the selection of an appropriate probability distribution for a given sample data. This selection is generally based on goodness-of-fit tests. The goodness-of-fit method is an effective means of examining how well a sample data agrees with an assumed probability distribution as its population. However, the goodness of fit test based on empirical distribution functions gives equal weight to differences between empirical and theoretical distribution functions corresponding to all observations. To overcome this drawback, the modified Anderson-Darling test was suggested by Ahmad et al. (1988b). In this study, the critical values of the modified Anderson-Darling test statistics are revised using simulation experiments with extensions of the shape parameters for the GEV and GLO distributions, and a power study is performed to test the performance of the modified Anderson-Darling test. The results of the power study show that the modified Anderson-Darling test is more powerful than traditional tests such as the v 2 , Kolmogorov-Smirnov, and Cramer von Mises tests. In addition, to compare the results of these goodness-of-fit tests, the modified Anderson-Darling test is applied to the annual maximum rainfall data in Korea.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.