2014
DOI: 10.5121/ijdms.2014.6602
|View full text |Cite
|
Sign up to set email alerts
|

An Effective Pre-Processing Algorithm for Information Retrieval Systems

Abstract: The Internet is probably the most successful distributed computing system ever. However

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2017
2017
2023
2023

Publication Types

Select...
3
3

Relationship

0
6

Authors

Journals

citations
Cited by 15 publications
(3 citation statements)
references
References 16 publications
0
3
0
Order By: Relevance
“…The first component of EpiNews deals with the preprocessing of HealthMap articles through a series of preprocessing steps, such as removal of non-textual elements, tokenization2829, lemmatization30 and removal of stop words via BASIS Technologies’ Rosette Language Processing (RLP) tools3132. For more details on these steps, see Supplementary Section ‘HealthMap preprocessing’.…”
Section: Methodsmentioning
confidence: 99%
See 2 more Smart Citations
“…The first component of EpiNews deals with the preprocessing of HealthMap articles through a series of preprocessing steps, such as removal of non-textual elements, tokenization2829, lemmatization30 and removal of stop words via BASIS Technologies’ Rosette Language Processing (RLP) tools3132. For more details on these steps, see Supplementary Section ‘HealthMap preprocessing’.…”
Section: Methodsmentioning
confidence: 99%
“…Tokenization and lemmatization. Tokenization 25,26 is the process of segmenting a textual content into words, phrases, symbols or other meaningful elements commonly referred to as tokens. Lemmatization 27 is performed after tokenization and can be defined as the normalization process in which various inflected forms of a word are converted to the same underlying lemma so that they can be analyzed as a single term.…”
Section: /21mentioning
confidence: 99%
See 1 more Smart Citation