Interval-censored (IC) failure time data often occur in follow-up studies where subjects can only be followed periodically and the failure time can only be known to lie in an interval. In this paper we propose a modified log-rank test for the problem of comparing two or more IC samples. Our log-rank statistic and covariance matrix are calculated by a multiple imputation technique. Through simulation studies, we find that the performance of the proposed test is comparable to that of the test proposed by Finkelstein (Biometrics 1986; 42(4):845-854) and is better than that of the two existing log-rank type tests proposed by Sun (Lifetime Data Anal. 2001; 7:363-375) and Zhao and Sun (Stat. Med. 2004; 23(10):1621-1629) due to the differences in the method of multiple imputation and the covariance matrix estimation. Our covariance matrix estimator is similar to the estimator used by Follmann et al. (Biometrics 2003; 59:420-429) for clustered data. The proposed method is illustrated by means of an example involving patients with breast cancer.
It is critical to evaluate the quality of clusters for most cluster analysis. A number of cluster validity indexes have been proposed, such as the Silhouette and Davies-Bouldin indexes. However, these validity indexes cannot be used to process clusters with arbitrary shapes. Some researchers employ graph-based distance to cluster nonspherical data sets, but the computation of graph-based distances between all pairs of points in a data set is time-consuming. A potential solution is to select some representative points. Inspired by this idea, we propose a novel Local Cores-based Cluster Validity (LCCV) index to improve the performance of Silhouette index. Local cores, with local maximum density, are selected as representative points. Since graph-based distance is used to evaluate the dissimilarity between local cores, the LCCV index is effective for obtaining the optimal cluster number for data sets containing clusters with arbitrary shapes. Moreover, a hierarchical clustering algorithm based on the LCCV index is proposed. The experimental results on synthetic and real data sets indicate that the new index outperforms existing ones.
Cluster analysis aims at classifying objects into categories on the basis of their similarity and has been widely used in many areas such as pattern recognition and image processing. In this paper, we propose a novel clustering algorithm called QCC mainly based on the following ideas: the density of a cluster center is the highest in its K nearest neighborhood or reverse K nearest neighborhood, and clusters are divided by sparse regions. Besides, we define a novel concept of similarity between clusters to solve the complex-manifold problem. In experiments, we compare the proposed algorithm QCC with DBSCAN, DP and DAAP algorithms on synthetic and real-world datasets. Results show that QCC performs the best, and its superiority on clustering non-spherical data and complex-manifold data is especially large. Keywords Clustering • Center • Similarity • Neighbor • Manifold 1 Introduction Clustering is one of primary methods in data mining and data analysis. It aims at classifying objects into categories or clusters, on the basis of their similarity. The clusters are collections Editor: João Gama.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.