Assessing linguistic generalisation in language models: a dataset for Brazilian Portuguese

Wilkens, Rodrigo; Zilio, Leonardo; Villavicencio, Aline

doi:10.1007/s10579-023-09664-1

Cited by 1 publication

(1 citation statement)

References 46 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…were the first to propose such an architecture by considering all possible spans of text in the document and assigning coreference links based on the mention score between a pair of spans. There are also end-to-end coreference resolution systems for French, such as DeCOFre (Grobol, 2020) and coFR (Wilkens et al, 2020). DeCOFre 7 is trained primarily on spontaneous spoken language (ANCOR corpus, (Muzerelle et al, 2013)), while coFR 8 is trained on both spoken (ANCOR corpus) and written language (Democrat corpus, (Landragin, 2016)).…”

Section: Coreference Chainsmentioning

confidence: 99%

Proceedings of the 7th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature

2023

View full text Add to dashboard Cite

This paper investigates the use of standard and non-standard adverbial markers in modern Chinese literature. In Chinese, adverbials can be derived from many adjectives, adverbs and verbs with the suffix "de". The suffix has a standard and a non-standard written form, both of which are frequently used. Contrastive research on these two competing forms has mostly been qualitative or limited to small text samples. In this first large-scale quantitative study, we present statistics on 346 adverbial types from an 8-million-character text corpus drawn from Chinese literature in the 20th century. We present a semantic analysis of the verbs modified by adverbs with standard and non-standard markers, and a chronological analysis of marker choice among six prominent modern Chinese authors. We show that the non-standard form is more frequently used when the adverbial modifies an emotion verb. Further, we demonstrate that marker choice is correlated to text genre and register, as well as the writing style of the author.

show abstract

Section: Coreference Chainsmentioning

confidence: 99%

Proceedings of the 7th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature

2023

View full text Add to dashboard Cite

show abstract

Assessing linguistic generalisation in language models: a dataset for Brazilian Portuguese

Cited by 1 publication

References 46 publications

Proceedings of the 7th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature

Proceedings of the 7th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature

Contact Info

Product

Resources

About