The development of information retrieval and extraction systems is still a challenging task. The occurrence of natural language limits the application of existing approaches. Therefore the approach of a new framework which combines natural language processing and semantic web technology is discussed.This paper focuses on ontology based knowledge modelling for semantic data extraction. Therefore, semantic verification techniques which can be used to improve the extraction are introduced.
More and more domain specific applications in the internet make use of Natural Language Processing (NLP) tools (e. g. Information Extraction systems). The output quality of these applications relies on the output quality of the used NLP tools. Often, the quality can be increased by annotating a domain specific corpus. However, annotating a corpus is a time consuming and exhaustive task. To reduce the annotation time we present a custom GraphicalUser Interface for different annotation layers.
Applications in the World Wide Web aggregate vast amounts of information from different data sources. The aggregation process is often implemented with Extract, Transform and Load (ETL) processes. Usually ETL processes require information for aggregation available in structured formats, e. g. XML or JSON. In many cases the information is provided in natural language text which makes the application of ETL processes impractical.Due to the fact that information is provided in natural language, Information Extraction (IE) systems have been evolved. They make use of Natural Language Processing (NLP) tools to derive meaning from natural language text. State-of-the-art NLP tools apply Machine Learning methods. These NLP tools perform on newspapers with good quality, but they drop accuracy in other domains.However, to improve the quality for IE systems in specific domains often NLP tools are trained on domain specific text which is a time consuming process. This paper introduces an approach using a Continuous Integration pipeline for organizing and monitoring the annotation process on domain specific corpora.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.