2015
DOI: 10.1002/asi.23452
|View full text |Cite
|
Sign up to set email alerts
|

A quantitative analysis of the temporal effects on automatic text classification

Abstract: Automatic text classification (TC) continues to be a relevant research topic and several TC algorithms have been proposed. However, the majority of TC algorithms assume that the underlying data distribution does not change over time. In this work, we are concerned with the challenges imposed by the temporal dynamics observed in textual data sets. We provide evidence of the existence of temporal effects in three textual data sets, reflected by variations observed over time in the class distribution, in the pair… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2015
2015
2021
2021

Publication Types

Select...
7

Relationship

0
7

Authors

Journals

citations
Cited by 9 publications
(3 citation statements)
references
References 43 publications
0
3
0
Order By: Relevance
“…The main evaluation measures applied in this study are the F1 measure, and rand index (RI). F1 measure is the harmonic mean between precision and recall, and is also widely used to measure clustering [ 45 , 46 ], which is calculated in the standard approach, i.e., Eq ( 12 ). Precision and recall are widely used in classification and clustering tasks for measuring the relevance [ 47 ], and are defined as Eqs ( 13 ) and ( 14 ) respectively.…”
Section: Dataset and Experimental Resultsmentioning
confidence: 99%
“…The main evaluation measures applied in this study are the F1 measure, and rand index (RI). F1 measure is the harmonic mean between precision and recall, and is also widely used to measure clustering [ 45 , 46 ], which is calculated in the standard approach, i.e., Eq ( 12 ). Precision and recall are widely used in classification and clustering tasks for measuring the relevance [ 47 ], and are defined as Eqs ( 13 ) and ( 14 ) respectively.…”
Section: Dataset and Experimental Resultsmentioning
confidence: 99%
“…With large‐scale, constantly increasing ecologically relevant information, it is important to rapidly comprehend and classify the new diversifications of wetland documents. Aiming to alleviate the human labor costs for analyzing large‐scale sets of wetland documents, the technology of automatic text modeling has emerged to deal with these issues (Salles et al, ). The methods consist automatic allocation of a text document to a set of predefined categories using machine learning tricks.…”
Section: Introductionmentioning
confidence: 99%
“…Today's large‐scale text data, our case study in this research, are also an important high‐dimensional application field where the space regularly contains at least several thousands of distinct terms and is very sparse and noisy. We focus on text classification, which is a useful subdiscipline of data mining and is an active research path nowadays . Given a set of text documents with known class labels, text classification intends to predict the class of new instances.…”
Section: Introductionmentioning
confidence: 99%