Data clustering is a difficult problem due to the complex and heterogeneous natures of multidimensional data. To improve clustering accuracy, we propose a scheme to capture the local correlation structures: associate each cluster with an independent weighting vector and embed it in the subspace spanned by an adaptive combination of the dimensions. Our clustering algorithm takes advantage of the known pairwise instance-level constraints. The data points in the constraint set are divided into groups through inference; and each group is assigned to the feasible cluster which minimizes the sum of squared distances between all the points in the group and the corresponding centroid. Our theoretical analysis shows that the probability of points being assigned to the correct clusters is much higher by the new algorithm, compared to the conventional methods. This is confirmed by our experimental results, indicating that our design indeed produces clusters which are closer to the ground truth than clusters created by the current state-ofthe-art algorithms.
In this paper, we propose Local and Global Structures Preserving Projection (LGSPP), which is to find a small set of projection directions so as to properly preserve the local and global structures for a given set of data. Specifically, for each point in the dataset, its local neighborhood is extracted as well as a set of sampled points far away from this point, which characterize the global structure. The embedding minimizes the distances of the points in each local neighborhood while dispersing them far apart from their corresponding remote points. In this way, the local-global relationships between data points are well kept.
Query-by-example is the most popular query model in recent contentbased image retrieval (CBIR) systems. A typical query image includes relevant objects (e.g., Eiffel Tower), but also irrelevant image areas (including background). The irrelevant areas limit the effectiveness of existing CBIR systems. To overcome this limitation, the system must be able to determine similarity based on relevant regions alone. We call this class of queries region-of-interest (ROI) queries and propose a technique for processing them in a sampling-based matching framework. A new similarity model is presented and an indexing technique for this new environment is proposed. Our experimental results confirm that traditional approaches, such as Local Color Histogram and Correlogram, suffer from the involvement of irrelevant regions. Our method can handle ROI queries and provide significantly better performance. We also assessed the performance of the proposed indexing technique. The results clearly show that our retrieval procedure is effective for large image data sets. Index Terms-Image processing, image indexing and retrieval, regions of interest, arbitrary-shaped queries.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.