In this paper we focus on Sentence retrieval which is similar to Document retrieval but with a smaller unit of retrieval. Using data pre-processing in document retrieval is generally considered useful. When it comes to sentence retrieval the situation is not that clear. In this paper we use − (term frequency -inverse sentence frequency) method for sentence retrieval. As pre-processing steps, we use stop word removal and language modeling techniques: stemming and lemmatization. We also experiment with different query lengths. The results show that data pre-processing with stemming and lemmatization is useful with sentences retrieval as it is with document retrieval. Lemmatization produces better results with longer queries, while stemming shows worse results with longer queries. For the experiment we used data of the Text Retrieval Conference (TREC) novelty tracks.
Abstract-Sentence retrieval consists of retrieving relevant sentences from a document base in response to a query. Question answering, novelty detection, summarization, opinion mining and information provenance make use of sentence retrieval. Most of the sentence retrieval methods are trivial adaptations of document retrieval methods. However some newer sentence retrieval methods based on the language modeling framework successfully use some kind of context of sentences. Unlike that there is no successful improvement of the TF-ISF method that takes into account the context of sentences. In this paper we propose a recursive TF-ISF based method that takes into account the local context of a sentence. The context is considered the previous and next sentence of current sentence. We compared the new method to the TF-ISF baseline and to an earlier unsuccessful method that also incorporates a similar context into TF-ISF. We got statistically significant improvements of the results in comparison to both of the methods. Additional benefit of our method is the clear explicit model of the context that will allow us to automatically generate a document representation with context suitable for sentence retrieval which is important for our future work.
In this paper we combine our previous research in the field of Semantic web, especially ontology learning and population with Sentence retrieval. To do this we developed a new approach to sentence retrieval modifying our previous TF-ISF method which uses local context information to take into account only document level information. This is quite a new approach to sentence retrieval, presented for the first time in this paper and also compared to the existing methods that use information from whole document collection. Using this approach and developed methods for sentence retrieval on a document level it is possible to assess the relevance of a sentence by using only the information from the retrieved sentence's document and to define a document level OWL representation for sentence retrieval that can be automatically populated. In this way the idea of Semantic Web through automatic and semi-automatic extraction of additional information from existing web resources is supported. Additional information is formatted in OWL document containing document sentence relevance for sentence retrieval.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.