International audienceOne of the biggest challenges in Big Data is the exploitation of Value from large volume of data. To exploit value one must focus on extracting knowledge from Big Data sources. In this paper we present a new simple but highly scalable process to automatically learn the label hierarchy from huge sets of unstructured text. We aim to extract knowledge from these sources using a Hierarchical Multi-Label Classification process called Semantic HMC. Five steps compose the Semantic HMC: Indexation, Vectorization, Hierarchization, Resolution and Realization. The first three steps construct the label hierarchy from data sources. The last two steps classify new items according to the hierarchy labels. To perform the classification without heavily relying on the user, the process is unsupervised, where no thesaurus or label examples are required. The process is implemented in a scalable and distributed platform to process Big Data
Analyzing Big Data can help corporations to improve their efficiency. In this work we present a new vision to derive Value from Big Data using a Semantic Hierarchical Multi-label Classification called Semantic HMC based in a nonsupervised Ontology learning process. We also proposea Semantic HMC process, using scalable Machine-Learning techniques and Rule-based reasoning.
International audienceOne of the biggest challenges in Big Data is the exploitation of Value from large volumes of data that are constantly changing. To exploit value, one must focus on extracting knowledge from these Big Data sources. To extract knowledge and value from unstructured text we propose using a Hierarchical Multi-Label Classification process called Semantic HMC that uses ontologies to describe the predictive model including the label hierarchy and the classification rules. To not overload the user, this process automatically learns the ontology-described label hierarchy from a very large set of text documents. This paper aims to present a maintenance process of the ontology-described label hierarchy relations with regards to a stream of unstructured text documents in the context of Big Data that incrementally updates the label hierarchy
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.