2009 Seventh Brazilian Symposium in Information and Human Language Technology 2009
DOI: 10.1109/stil.2009.8
|View full text |Cite
|
Sign up to set email alerts
|

Evaluation of Stopwords Removal on the Statistical Approach for Automatic Term Extraction

Abstract: The construction of terminological products is important to the organization and spreading of knowledge. This task can be leveraged by the automatic extraction of terms, which has been considered a Natural Language Processing problem. In this paper, the interaction between the statistical approach to term extraction and the process of stopword removal is investigated. Experiments conducted on two corpora show that stopword removal improves performance when extracting bigram terms, no matter if the removal is d… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
2
0
3

Year Published

2015
2015
2025
2025

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(6 citation statements)
references
References 6 publications
1
2
0
3
Order By: Relevance
“…As Stopwords são palavras, geralmente funcionais, que não devem ser consideras para a formac ¸ão do texto [4]. O processo de remoc ¸ão consiste em identifica-las nos textos, algo que pode ser feito através de abordagem estatística, e removê-las.…”
Section: B Pré-processamento De Dadosunclassified
See 2 more Smart Citations
“…As Stopwords são palavras, geralmente funcionais, que não devem ser consideras para a formac ¸ão do texto [4]. O processo de remoc ¸ão consiste em identifica-las nos textos, algo que pode ser feito através de abordagem estatística, e removê-las.…”
Section: B Pré-processamento De Dadosunclassified
“…O processo de remoc ¸ão consiste em identifica-las nos textos, algo que pode ser feito através de abordagem estatística, e removê-las. De acordo com [4], essa técnica beneficia a construc ¸ão de modelos, porque reduz o número de entradas nas redes neurais artificiais. Algo que facilita o processo de aprendizagem de máquina.…”
Section: B Pré-processamento De Dadosunclassified
See 1 more Smart Citation
“…These words are removed to enhance computation, they don't actually relate to the information needs of the documents. Stop word removal improves performance when extracting bigram terms [3]. Stop words were removed by identifying a list of standard stop words, a table was created out of a static stop list, each token was matched against the table, hashing operation was done and the text were built into the lexical analyzer.…”
Section: Text Pre-processingmentioning
confidence: 99%
“…Vector space model otherwise known as term vector model is an algebraic model for representing text documents as vectors of identifiers, such as index terms. It is used in information filtering, information retrieval, indexing and relevancy rankings [3]. In this study, the vector space model was used to implement the text representation of essay-type marking scheme and essay-type student script.…”
Section: Vector Space Modelmentioning
confidence: 99%