A content-based image retrieval (CBIR) framework for diverse collection of medical images of different imaging modalities, anatomic regions with different orientations and biological systems is proposed. Organization of images in such a database (DB) is well defined with predefined semantic categories; hence, it can be useful for category-specific searching. The proposed framework consists of machine learning methods for image prefiltering, similarity matching using statistical distance measures, and a relevance feedback (RF) scheme. To narrow down the semantic gap and increase the retrieval efficiency, we investigate both supervised and unsupervised learning techniques to associate low-level global image features (e.g., color, texture, and edge) in the projected PCA-based eigenspace with their high-level semantic and visual categories. Specially, we explore the use of a probabilistic multiclass support vector machine (SVM) and fuzzy c-mean (FCM) clustering for categorization and prefiltering of images to reduce the search space. A category-specific statistical similarity matching is proposed in a finer level on the prefiltered images. To incorporate a better perception subjectivity, an RF mechanism is also added to update the query parameters dynamically and adjust the proposed matching functions. Experiments are based on a ground-truth DB consisting of 5000 diverse medical images of 20 predefined categories. Analysis of results based on cross-validation (CV) accuracy and precision-recall for image categorization and retrieval is reported. It demonstrates the improvement, effectiveness, and efficiency achieved by the proposed framework.
The pervasiveness of location-aware devices has spawned extensive research in trajectory data mining, resulting in many important real-life applications.Yet, the privacy issue in sharing trajectory data among different parties often creates an obstacle for effective data mining. In this paper, we study the challenges of anonymizing trajectory data: high dimensionality, sparseness, and sequentiality. Employing traditional privacy models and anonymization
Set-valued data provides enormous opportunities for various data mining tasks. In this paper, we study the problem of publishing set-valued data for data mining tasks under the rigorous differential privacy model. All existing data publishing methods for set-valued data are based on partition-based privacy models, for example
k
-anonymity, which are vulnerable to privacy attacks based on background knowledge. In contrast, differential privacy provides strong privacy guarantees independent of an adversary's background knowledge and computational power. Existing data publishing approaches for differential privacy, however, are not adequate in terms of both utility and scalability in the context of set-valued data due to its high dimensionality.
We demonstrate that set-valued data could be efficiently released under differential privacy with guaranteed utility with the help of context-free taxonomy trees. We propose a probabilistic top-down partitioning algorithm to generate a differentially private release, which scales linearly with the input data size. We also discuss the applicability of our idea to the context of relational data. We prove that our result is (∈, δ)-useful for the class of counting queries, the foundation of many data mining tasks. We show that our approach maintains high utility for counting queries and frequent itemset mining and scales to large datasets through extensive experiments on real-life set-valued datasets.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.