2018
DOI: 10.1007/978-3-319-99722-3_3
|View full text |Cite
|
Sign up to set email alerts
|

A Corpus Study of Verbal Multiword Expressions in Brazilian Portuguese

Abstract: Verbal multiword expressions (VMWEs) such as to make ends meet require special attention in NLP and linguistic research, and annotated corpora are valuable resources for studying them. Corpora annotated with VMWEs in several languages, including Brazilian Portuguese, were made freely available in the PARSEME shared task. The goal of this paper is to describe and analyze this corpus in terms of the characteristics of annotated VMWEs in Brazilian Portuguese. First, we summarize and exemplify the criteria used to… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
3
2

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(6 citation statements)
references
References 11 publications
0
4
0
Order By: Relevance
“…Most errors are due to the previous modules of the processing pipeline, particularly the PoS tagger and the statistical and rule-based disambiguator. Manual correction of these errors made it possible to achieve, in the strictest criterion, a recall of 91.9% and 93.6% for each type of examples, and an overall perfomance of 92.5% In the near future, we intend to integrate the corresponding lexicon-grammar of Brazilian Portuguese [19] and perform an extrinsic evaluation using (or adapting) the Portuguese corpus developed in-house [3] and that built for the PARSEME project 5 [15,16].…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…Most errors are due to the previous modules of the processing pipeline, particularly the PoS tagger and the statistical and rule-based disambiguator. Manual correction of these errors made it possible to achieve, in the strictest criterion, a recall of 91.9% and 93.6% for each type of examples, and an overall perfomance of 92.5% In the near future, we intend to integrate the corresponding lexicon-grammar of Brazilian Portuguese [19] and perform an extrinsic evaluation using (or adapting) the Portuguese corpus developed in-house [3] and that built for the PARSEME project 5 [15,16].…”
Section: Discussionmentioning
confidence: 99%
“…Processing multiword expressions, including verbal idioms, is essencial to represent the meaning of a text in a adequate way. The low frequency of many verbal idioms in corpora makes spotting them a difficult task [13], and much prior has been dedicated to identifying them in texts [14,15,16]. However, the focus of this paper will not be on identification (in a lexicographic perspective), but rather on the processing of an already built computational lexicon (a lexicon-grammar) of verbal idioms [2,3], particularly of transformationallyderived, equivalent sentence-forms (for lack of space, this lexicon-grammar will not be presented here).…”
Section: Introductionmentioning
confidence: 99%
“…Language-specific PARSEME corpus description papers not covered here can provide details, e.g. for Basque , Chinsese (Jiang et al, 2018), English , Irish , Italian (Monti and di Buono, 2019), Polish , Portuguese (Ramisch et al, 2018b), Romanian (Barbu Mititelu et al, 2019), Turkish (Berk et al, 2018b;Ozturk et al, 2022), among others. Domains Corpus domain may play an important role in MWEI.…”
Section: Corpus Constitution and Selectionmentioning
confidence: 99%
“…To overcome this limitation, data-driven approaches opt instead to make use of frequency statistics in the corpus to address both candidate generation and quality estimation [9], [30], [34], [7], [10], [23]. They do not rely on complex linguistic feature generation, domain-specific rules or extensive labeling efforts.…”
Section: Related Workmentioning
confidence: 99%