Ontologies have gained a lot of popularity and recognition in the semantic web because of their extensive use in Internet-based applications. Ontologies are often considered a fine source of semantics and interoperability in all artificially smart systems. Exponential increase in unstructured data on the web has made automated acquisition of ontology from unstructured text a most prominent research area. Several methodologies exploiting numerous techniques of various fields (machine learning, text mining, knowledge representation and reasoning, information retrieval and natural language processing) are being proposed to bring some level of automation in the process of ontology acquisition from unstructured text. This paper describes the process of ontology learning and further classification of ontology learning techniques into three classes (linguistics, statistical and logical) and discusses many algorithms under each category. This paper also explores ontology evaluation techniques by highlighting their pros and cons. Moreover, it describes the scope and use of ontology learning in several industries. Finally, the paper discusses challenges of ontology learning along with their corresponding future directions.
Web contains a vast amount of data, which are accumulated, studied, and utilized by a huge number of users on a daily basis. A substantial amount of data on the Web is available in an unstructured format, such as Web pages, books, journals, and files. Acquiring appropriate information from such humongous data has become quite challenging and a time-consuming task. Trivial keyword-based information retrieval systems highly depend on the statistics of data, thus facing word mismatch problem due to inevitable semantic and context variations of a certain word. Therefore, this marks the desperate need to organize such massive data into a structured format so that information can be easily processed in a large context by taking data semantics into account. Ontologies are not only being extensively employed in the semantic Web to store unstructured information in an organized and structured way but it has also raised the performance of diverse information retrieval approaches to a great extent. Ontological information retrieval systems retrieve data based on the similarity of semantics between the user query and the indexed data. This paper reviews modern ontology-based information retrieval methods for textual, multimedia, and cross-lingual data types. Furthermore, we compare and categorize the most recent approaches used in the above-mentioned information retrieval methods along with their major drawbacks and advantages. INDEX TERMS Ontology, text retrieval, multimedia retrieval, cross lingual retrieval.
Biomedical information retrieval systems are becoming popular and complex due to massive amount of ever-growing biomedical literature. Users are unable to construct a precise and accurate query that represents the intended information in a clear manner. Therefore, query is expanded with the terms or features that retrieve more relevant information. Selection of appropriate expansion terms plays key role to improve the performance of retrieval task. We propose document frequency chi-square, a newer version of chi-square in pseudo relevance feedback for term selection. The effects of pre-processing on the performance of information retrieval specifically in biomedical domain are also depicted. On average, the proposed algorithm outperformed state-of-the-art term selection algorithms by 88% at pre-defined test points. Our experiments also conclude that, stemming cause a decrease in overall performance of the pseudo relevance feedback based information retrieval system particularly in biomedical domain.Database URL: http://biodb.sdau.edu.cn/gan/
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.