Findings of the Association for Computational Linguistics: NAACL 2022 2022
DOI: 10.18653/v1/2022.findings-naacl.56
|View full text |Cite
|
Sign up to set email alerts
|

Challenging America: Modeling language in longer time scales

Abstract: The aim of the paper is to apply, for historical texts, the methodology used commonly to solve various NLP tasks defined for contemporary data, i.e. pre-train and fine-tune large Transformer models. This paper introduces an ML challenge, named Challenging America (Chal-lAm), based on OCR-ed excerpts from historical newspapers collected from the Chronicling America portal. ChallAm provides a dataset of clippings, labeled with metadata on their origin, and paired with their textual contents retrieved by an OCR t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
5
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
3

Relationship

1
2

Authors

Journals

citations
Cited by 3 publications
(5 citation statements)
references
References 15 publications
0
5
0
Order By: Relevance
“…Several textual benchmarks concerning the date of text publication have been published in recent years. Challenging America [3] presents a set of three temporal tasks. Authors of [5] introduce a temporal question answering task and dataset, in which the query's answer depends on a year, e.g., Who is the current president of the USA?.…”
Section: A Temporal Language Datasets and Modelsmentioning
confidence: 99%
See 2 more Smart Citations
“…Several textual benchmarks concerning the date of text publication have been published in recent years. Challenging America [3] presents a set of three temporal tasks. Authors of [5] introduce a temporal question answering task and dataset, in which the query's answer depends on a year, e.g., Who is the current president of the USA?.…”
Section: A Temporal Language Datasets and Modelsmentioning
confidence: 99%
“…The final split ratio is illustrated in Table I. Precautions similar to those described in [3] have been taken to ensure that there is no detrimental overlap between the datasets.…”
Section: A Data Splitmentioning
confidence: 99%
See 1 more Smart Citation
“…The competition dataset is based on the project "Challenging America" [16], which was initially created for three tasks. The first task, known as "RetroTemp", focused on temporal classification.…”
Section: B Dataset Descriptionmentioning
confidence: 99%
“…[18] trained an SVM model to predict the date of text as a classification problem and [11] use approach of neologism based approach. Very recently [15] released temporal NLP challenges based on a large corpus of historic texts but didn't include downstream tasks, such as classification. The corpus consists of texts covering over 100 years.…”
Section: Related Workmentioning
confidence: 99%