Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 2022
DOI: 10.18653/v1/2022.acl-long.268
|View full text |Cite
|
Sign up to set email alerts
|

Detecting Unassimilated Borrowings in Spanish: An Annotated Corpus and Approaches to Modeling

Abstract: This work presents a new resource for borrowing identification and analyzes the performance and errors of several models on this task. We introduce a new annotated corpus of Spanish newswire rich in unassimilated lexical borrowings-words from one language that are introduced into another without orthographic adaptation-and use it to evaluate how several sequence labeling models (CRF, BiLSTM-CRF, and Transformer-based models) perform. The corpus contains 370,000 tokens and is larger, more borrowing-dense, OOV-r… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
5
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
2
2

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(5 citation statements)
references
References 18 publications
0
5
0
Order By: Relevance
“…While the aforementioned papers focus on corpus approaches to the collection and analysis of English words, some researchers (Görlach, 2001) resorted to non-corpus methods and sources such as questionnaires, personal interviews, and so on in their collection and analysis of English words in European languages. As English words are commonly found in the media, research naturally focuses on this register (Alvarez-Mellado, 2020;Boranijašević, 2018;Runjić-Stoilova & Pandža, 2010;Šabec, 2005).…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…While the aforementioned papers focus on corpus approaches to the collection and analysis of English words, some researchers (Görlach, 2001) resorted to non-corpus methods and sources such as questionnaires, personal interviews, and so on in their collection and analysis of English words in European languages. As English words are commonly found in the media, research naturally focuses on this register (Alvarez-Mellado, 2020;Boranijašević, 2018;Runjić-Stoilova & Pandža, 2010;Šabec, 2005).…”
Section: Related Workmentioning
confidence: 99%
“…Foreign words that have found their way into other world languages have been the focal point of many studies in the past few decades with special emphasis on English words due to their global spread through mass media and the internet (Alvarez-Mellado, 2020;Brdar, 2010;Furiassi, 2008). An English word or phrase, like any other foreign element, may undergo a level of adaptation to the language it is borrowed into, or, alternatively, may retain its original form in the recipient language.…”
Section: Introductionmentioning
confidence: 99%
“…There are multiple works related to Anglicisms detection in different languages, e.g. detecting Anglicisms in Spanish (Álvarez Mellado and Lignos, 2022). The article describes the creation of an annotated corpus of Spanish text containing examples of unassimilated borrowings, which can be used to train machine learning models to identify such borrowings in new texts.…”
Section: Related Workmentioning
confidence: 99%
“…The obtained results coincide with the work of (Leidig et al, 2014), where the authors tried the combination of several features (G2P confidence, grapheme perplexity, Google hits count) to detect Anglicisms in German and achieved a 0.75 F1 score. The work (Mellado et al, 2021) devoted to the same task for Spanish, presented in IberLef 2021, reported F1 scores ranging from 0.37 to 0.85. In addition, another research for the Norwegian language (Andersen, 2005) is devoted to Anglicism extraction using a combination of methods (rule-based, lexicon-based, and chargram-based).…”
Section: Anglicism Detectionmentioning
confidence: 99%
“…As the global language, English has become the dominant donor language for many languages, including Croatian (Drljača Margić 2011). The influence of English has been observed in many languages worldwide (e.g., Greenall 2015;Kay 1995;Pulcini et al 2012) and in different functional styles (e.g., Alvarez-Mellado 2020;Čepon 2017;Mihaljević 2003) and domains (e.g., Matić 2017; Mykytka 2017), especially the media (e.g., Alvarez-Mellado 2020;Brdar 2010;Núñez Nogueroles 2016). The media has been recognized as an important factor in the shaping of a language and introducing new words (e.g., Drljača Margić 2009;Muhvić-Dimanovski and Skelin Horvat 2008).…”
Section: Introductionmentioning
confidence: 99%