2023
DOI: 10.1007/s10579-023-09664-1
|View full text |Cite
|
Sign up to set email alerts
|

Assessing linguistic generalisation in language models: a dataset for Brazilian Portuguese

Abstract: Much recent effort has been devoted to creating large-scale language models. Nowadays, the most prominent approaches are based on deep neural networks, such as BERT. However, they lack transparency and interpretability, and are often seen as black boxes. This affects not only their applicability in downstream tasks but also the comparability of different architectures or even of the same model trained using different corpora or hyperparameters. In this paper, we propose a set of intrinsic evaluation tasks that… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 46 publications
0
1
0
Order By: Relevance
“…were the first to propose such an architecture by considering all possible spans of text in the document and assigning coreference links based on the mention score between a pair of spans. There are also end-to-end coreference resolution systems for French, such as DeCOFre (Grobol, 2020) and coFR (Wilkens et al, 2020). DeCOFre 7 is trained primarily on spontaneous spoken language (ANCOR corpus, (Muzerelle et al, 2013)), while coFR 8 is trained on both spoken (ANCOR corpus) and written language (Democrat corpus, (Landragin, 2016)).…”
Section: Coreference Chainsmentioning
confidence: 99%
“…were the first to propose such an architecture by considering all possible spans of text in the document and assigning coreference links based on the mention score between a pair of spans. There are also end-to-end coreference resolution systems for French, such as DeCOFre (Grobol, 2020) and coFR (Wilkens et al, 2020). DeCOFre 7 is trained primarily on spontaneous spoken language (ANCOR corpus, (Muzerelle et al, 2013)), while coFR 8 is trained on both spoken (ANCOR corpus) and written language (Democrat corpus, (Landragin, 2016)).…”
Section: Coreference Chainsmentioning
confidence: 99%