Proceedings of the ACL 2003 Workshop on Multiword Expressions Analysis, Acquisition and Treatment - 2003
DOI: 10.3115/1119282.1119289
|View full text |Cite
|
Sign up to set email alerts
|

Extracting multiword expressions with a semantic tagger

Abstract: Automatic extraction of multiword expressions (MWE) presents a tough challenge for the NLP community and corpus linguistics. Although various statistically driven or knowledge-based approaches have been proposed and tested, efficient MWE extraction still remains an unsolved issue. In this paper, we present our research work in which we tested approaching the MWE issue using a semantic field annotator. We use an English semantic tagger (USAS) developed at Lancaster University to identify multiword units which d… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
15
0
1

Year Published

2005
2005
2024
2024

Publication Types

Select...
4
4
1

Relationship

0
9

Authors

Journals

citations
Cited by 24 publications
(16 citation statements)
references
References 8 publications
0
15
0
1
Order By: Relevance
“…This is due, as observed by Piao et al (2003), to the fact that the extraction of the MWE candidates is based only on the frequency of the bigrams, and only after the extraction of these candidates we applied the linguistic information (classification in grammatical classes).…”
Section: Comparison Between the 500-best Candidates Of Each Toolmentioning
confidence: 99%
See 1 more Smart Citation
“…This is due, as observed by Piao et al (2003), to the fact that the extraction of the MWE candidates is based only on the frequency of the bigrams, and only after the extraction of these candidates we applied the linguistic information (classification in grammatical classes).…”
Section: Comparison Between the 500-best Candidates Of Each Toolmentioning
confidence: 99%
“…For their data, for example, Piao et al (2003) found that 81.88% of the recognized MWEs were bigrams. The current study uses CETENFolha (Corpus de Extractos de Textos Electrónicos/NILC Folha de São Paulo) as a Brazilian Portuguese corpus, available on the Linguateca Portuguesa website, which is part of a project on the automatic processing of the Portuguese language (Kinoshita et al, 2006).…”
Section: Acknowledgmentsmentioning
confidence: 99%
“…We found 49,589 bigrams in the selected excerpts of texts, and the manual evaluation of each one, in order to decide which one is a MWE, would take too much time. So, we estimated the amount of MWEs for the total 49,589 bigrams as in (Piao et al, 2003). Using 100 excerpts of text we generated all the bigrams, with all frequencies.…”
Section: Comparison Of Different Classification Algorithmsmentioning
confidence: 99%
“…After this step, association measures are computed. Piao et al (2003) use, what they call, a semantic field annotator. They use a semantic tagger for the English language called USAS, developed in Lancaster University.…”
Section: Introductionmentioning
confidence: 99%