This paper presents a novel methodology for topic ontology learning from text documents. The proposed methodology, named OntoTermExtraction is based on OntoGen, a semi-automated tool for topic ontology construction, upgraded by using and an advanced terminology extraction tool in an iterative, semiautomated ontology construction process. This process consists of (a) document clustering to find the nodes in the topic ontology, (b) term extraction from document clusters, (c) populating the term vocabulary and keyword extraction, and (d) choosing the concept names by comparing the best ranked terms with the extracted keywords. The approach is illustrated on a case study analysis of the ILPNet2 publications data.
IntroductionOntoGen [1, 2] is a semi-automated, data-driven ontology construction tool, focused on the construction and editing of topic ontologies. In a topic ontology, each node is a cluster of documents, represented by keywords (topics), and nodes are connected by relations (typically, the the SubConcept-Of relation). The system combines text mining techniques with an efficient user interface aimed to reduce user's time and the complexity of ontology construction. In this way it presents a significant improvement in comparison with present manual, and relatively complex ontology editing tools, such as Protégé [3], whose use is hindered by the lack of ontology engineering skills of domain experts constructing the ontology. Concept naming suggestion (i.e. description of a document cluster through a set of relevant terms) plays a central part of the OntoGen system. Concept naming helps the user at evaluating clusters and organizing them hierarchically. This facility is provided by employing unsupervised and supervised methods for generating the suggestions. Despite the well-elaborated and user-friendly approach to concept naming, as currently provided by OntoGen, the approach was limited to single-word keyword suggestions, and by the use of very basic text lemmatization in the OntoGen text preprocessing phase. This paper aims at improving the ontology construction process through improved concept naming, using terminology extraction as implemented in the advanced TermExtractor tool [4,5]. The improved ontology construction process, proposed in this paper, consists of the following steps:• document clustering to find the nodes in the topic ontology, • terminology extraction from document clusters,