Declaration of AuthorshipI, Zenun Kastrati, hereby declare that this thesis and the work presented in it is entirely my own. Where I have consulted the work of others, this is always clearly stated.Signed:(Zenun Kastrati) Date:
SummaryWe are living in the age of internet where massive amount of information is produced from various digital resources on daily basis. The information of these resources is typically stored in unstructured textual format such as reports, news, e-mails, blogs, etc., therefore, a proper classification and organization of this huge amount of information is apparently needed. In this regard, an automatic classification, particularly ontology-based classification, plays an important role in helping people to classify and organize the information accordingly. The ontology-based classification system is an automatic system that utilizes the ontology in order to take advantages of organizing and classifying the knowledge in a more structural and formal way, thus providing better classification accuracy comparing to the traditional keyword-based classification system.The performance of an ontology-based document classification system can be affected by several aspects involved in the entire classification process that generally is constituted of steps such as document collection and preprocessing, document representation, dimensionality reduction, and the classifier. It is almost impossible to address all these research aspects in order to obtain performance improvement in a single dissertation research work, therefore we selected to work on the aspects that we consider are either rarely studied or have a crucial role on the ontology-based classification system. Document representation is one of the main aspects that affects the performance of ontology-based document classification, thus the first research aspect that we investigated is enriching document representation with semantics utilizing the background knowledge exploited by ontologies. The background knowledge derived from an ontology is embedded in a document using a matching technique. The idea behind this technique is mapping of terms that occur in a document with the relevant ontology concepts by searching only the presence of concepts labels in that document. Searching only the presence of concepts labels occurring in a document limits the capabilities of the classification system to capture and exploit the entire conceptualization involved in that document due to the semantic gap issue, the lack of an in depth-coverage of concepts, and the ambiguity problem. In this thesis, the focus is placed on the conceptual document representation, in which, a document is associated with a set of concepts not only by looking for the appearance of concept labels, but also through the acquisition of lexical information integrated (linked) to the ontology to enriching its coverage with new concepts. In this respect, an automatic ontology concept enrichment model is developed to enrich ontologies with new concepts in order to provide a broader co...