An Efficient Approach for Inverted Index Pruning Based on Document Relevance

Vishwakarma, Santosh K.; Lakhtaria, Kamaljit I.; Bhatnagar, Divya; Sharma, Akhilesh

doi:10.1109/csnt.2014.103

Cited by 10 publications

(1 citation statement)

References 11 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In the various methods for managing inverted indexes, great emphasis is put on storage space reduction. For instance, a pruning algorithm based on term frequency-inverse document frequency (TF*IDF) can be used to minimize index size [19]. Yet, updating an inverted index is also a problem, because it is dependent on documents.…”

Section: Document Indexingmentioning

confidence: 99%

A Scalable Document-Based Architecture for Text Analysis

Truică

Darmont

Velcin

2016

Advanced Data Mining and Applications

View full text Add to dashboard Cite

Abstract. Analyzing textual data is a very challenging task because of the huge volume of data generated daily. Fundamental issues in text analysis include the lack of structure in document datasets, the need for various preprocessing steps and performance and scaling issues. Existing text analysis architectures partly solve these issues, providing restrictive data schemas, addressing only one aspect of text preprocessing and focusing on one single task when dealing with performance optimization. Thus, we propose in this paper a new generic text analysis architecture, where document structure is flexible, many preprocessing techniques are integrated and textual datasets are indexed for efficient access. We implement our conceptual architecture using both a relational and a document-oriented database. Our experiments demonstrate the feasibility of our approach and the superiority of the document-oriented logical and physical implementation.

show abstract

Section: Document Indexingmentioning

confidence: 99%