Building for Tomorrow: Assessing the Temporal Persistence of Text Classifiers

Alkhalifa, Rabab; Kochkina, Elena; Zubiaga, Arkaitz

doi:10.48550/arxiv.2205.05435

Cited by 2 publications

(2 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…(2) Supervised machine learning methods use labeled data to classify and predict. However, these methods are sensitive to changes in the data distribution, such as shifts in the domain or temporal changes, as shown in recent research (Alkhalifa et al, 2022;AL-Sharuee et al, 2021;Bjerva et al, 2019, inter alia). This sensitivity may potentially affect the accuracy of our analysis, particularly since our questions focus on changes manifested in the data over time.…”

Section: Lexicons As Sentiment Classifiersmentioning

confidence: 99%

Rant or rave: variation over time in the language of online reviews

Ziser

Webber

Cohen

2023

Lang Resources & Evaluation

View full text Add to dashboard Cite

We examine how the language of online reviews has changed over the past 20 years. The corpora we use for this analysis consist of online reviews, each of which is paired with a numerical rating. This allows us to control for the perceived sentiment of a review when examining its linguistic features. Our findings show that reviews have become less comprehensive, and more polarized and intense. We further analyzed two subgroups to understand these trends: (1) reviews labeled “helpful” and (2) reviews posted by persistent users. These trends also exist for helpful reviews (albeit in a weaker form), suggesting that the nature of reviews perceived as helpful is also changing. A similar pattern can be observed in reviews by persistent users, suggesting that these trends are not simply associated with new users but represent changes in overall user behavior. Additional analysis of Booking.com reviews indicates that these trends may reflect the increasing use of mobile devices, whose interface encourages briefer reviews. Lastly, we discuss the implications for readers, writers, and online reviewing platforms.

show abstract

Section: Lexicons As Sentiment Classifiersmentioning

confidence: 99%

Rant or rave: variation over time in the language of online reviews

Ziser

Webber

Cohen

2023

Lang Resources & Evaluation

View full text Add to dashboard Cite

show abstract

“…It is motivated by recent research showing that the performance of the models drops as the test data becomes more distant, with respect to time, from the training data. This is true for classification [1,11,16], but also the research in information retrieval shows that deep neural network-based IR systems are dependent on the consistency between the train and test data [20]. To be able to study this, one needs several test collections created over sequential time periods, which would allow doing observations at different time stamps 𝑡, and most importantly, comparing the performance across different time stamps 𝑡 and 𝑡 ′ .…”

Section: Longeval Collectionsmentioning

confidence: 99%

LongEval-Retrieval: French-English Dynamic Test Collection for Continuous Web Search Evaluation

Deveaud¹,

Gonzalez-Saez²,

Mulhem³

et al. 2023

Preprint

View full text Add to dashboard Cite

LongEval-Retrieval is a Web document retrieval benchmark that focuses on continuous retrieval evaluation. This test collection is intended to be used to study the temporal persistence of Information Retrieval systems and will be used as the test collection in the Longitudinal Evaluation of Model Performance Track (LongEval) 1 at CLEF 2023. This benchmark simulates an evolving information system environment -such as the one a Web search engine operates in -where the document collection, the query distribution, and relevance all move continuously, while following the Cranfield paradigm for offline evaluation. To do that, we introduce the concept of a dynamic test collection that is composed of successive sub-collections each representing the state of an information system at a given time step. In LongEval-Retrieval, each sub-collection contains a set of queries, documents, and soft relevance assessments built from click models. The data comes from Qwant, a privacypreserving Web search engine that primarily focuses on the French market. LongEval-Retrieval also provides a "mirror" collection: it is initially constructed in the French language to benefit from the majority of Qwant's traffic, before being translated to English. This paper presents the creation process of LongEval-Retrieval and provides baseline runs and analysis. CCS CONCEPTS• Information systems → Test collections; Relevance assessment; Multilingual and cross-lingual retrieval.

show abstract

Building for Tomorrow: Assessing the Temporal Persistence of Text Classifiers

Cited by 2 publications

References 0 publications

Rant or rave: variation over time in the language of online reviews

Rant or rave: variation over time in the language of online reviews

LongEval-Retrieval: French-English Dynamic Test Collection for Continuous Web Search Evaluation

Contact Info

Product

Resources

About