Clustering analysis (CA) techniques consist in, given a set of objects, estimating dense regions of points separated by sparse regions, according to the dimensions that describe these objects. Independently from the data nature – structured or non-structured -, we look for homogenous clouds of points, that define clusters, from which we want to extract some meaning. In other words, when doing CA, the analyst is searching for underlying structures in a multidimensional space for what one could assign some meaning. Grossly, to carry a CA application, two main activities are involved: generating clusters configurations by means of an algorithm and interpreting these configurations in order to approximate a solution that could contribute with the CA application objective. Generating a clusters configuration is typically a computational task, while the interpretation task brings a strong burden of subjectivity. Many approaches are presented in the literature for generating clusters configuration. Unfortunately, the interpretation task has not received so much attention, possibly due to the difficulty in modeling something that is subjective in nature. In this chapter a method to guide the interpretation of a clusters configuration is proposed. The inherent subjectivity is approached directly by describing the process with the apparatus of the Ontology of Language. The main contribution of this chapter is to provide a sound conceptual basis to guide the analyst in extracting meaning from the patterns found in a set of data, no matter we are talking about data bases, a set of free texts, or a set of web pages.
The clusters' analysis process comprises two broad activities: generation of a clusters set and extracting meaning from these clusters. The first one refers to the application of algorithms to estimate high density areas separated by lower density areas from the observed space. In the second one the analyst goes inside the clusters trying to figure out some sense from them. The whole activity requires previous knowledge and a considerable burden of subjectivity. In previous works, some alternatives were proposed to take into account the background knowledge when creating the clusters. However, the subjectivity of the interpretation activity continues to be a challenge. Beyond soundness domain knowledge from specialists, a consensual interpretation depends on conversational competences for which no support has been provided. We propose a method for cluster interpretation based on the categories existing in the Ontology of Language, aiming to reduce the gap between a cluster configuration and the effective extraction of meaning from them.
Abstract. Knowledge discovery from databases, in the descriptive approach, includes clustering analysis (CA) as an alternative to estimate how a set of objects is organized in the space of their dimensions. The main objective in this task is to find "natural" groups that could exhibit some meaning. Considering the strong subjectivity that underlies this process, an important issue refers to the relationships among the CA players when looking for a model that could adjust the data. In this work, a model for actions coordination that provides an order to drive the relationships among CA players is presented. This model is presented as a conceptual contribution towards the construction of a computational environment to support effective conversations in a subjective context.
Knowledge Discovery in Databases (KDD) is the process by which unknown and useful knowledge and information are extracted, by automatic or semi-automatic methods, from large amounts of data. Along the evolution of Information Technology and the rapid growth in the number and size of databases, the development of methodologies, techniques, and tools for data mining has become a major concern for researchers, and has led, in turn, to the development of applications in a variety of areas of human activity. About 1997, the processes and techniques associated with cluster analysis had begun to be researched with increasing intensity by the KDD community. Within the context of a model intended to support decisions based on cluster analysis, prior knowledge about the data structure and the application domain can be used as important constraints that lead to better results in the clusters' configurations. This paper presents an application of cluster analysis in the area of public safety using a schema that takes into account the burden of prior knowledge acquired from statistical analysis on the data. Such an information was used as a bias for the k-means algorithm that was applied to identify the dactyloscopic (fingerprint) profile of criminals in the Brazilian capital, also known as Federal District. These results was then compared with a similar analysis that disregarded the prior knowledge. It is possible to observe that the analysis using prior knowledge generated clusters that are more coherent with the expert knowledge.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.