Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016) 2016
DOI: 10.18653/v1/s16-1140
|View full text |Cite
|
Sign up to set email alerts
|

UFRGSandLIF at SemEval-2016 Task 10: Rule-Based MWE Identification and Predominant-Supersense Tagging

Abstract: This paper presents our approach towards the SemEval-2016 Task 10 -Detecting Minimal Semantic Units and their Meanings. Systems are expected to provide a representation of lexical semantics by (1) segmenting tokens into words and multiword units and (2) providing a supersense tag for segments that function as nouns or verbs. Our pipeline rule-based system uses no external resources and was implemented using the mwetoolkit. First, we extract and filter known MWEs from the training corpus. Second, we group input… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
10
0

Year Published

2016
2016
2021
2021

Publication Types

Select...
4
3
1

Relationship

1
7

Authors

Journals

citations
Cited by 10 publications
(10 citation statements)
references
References 12 publications
0
10
0
Order By: Relevance
“…A comparable result was observed in the 2016 DiMSUM shared task (Schneider et al, 2014), in which a rule-based baseline was ranked second. This system extracted MWEs from the training corpus and then annotated them in the test corpus based on lemma/part-of-speech matching and heuristics such as allowing a limited number of intervening words (Cordeiro et al, 2016).…”
Section: Progress Potential In Seen Datamentioning
confidence: 99%
“…A comparable result was observed in the 2016 DiMSUM shared task (Schneider et al, 2014), in which a rule-based baseline was ranked second. This system extracted MWEs from the training corpus and then annotated them in the test corpus based on lemma/part-of-speech matching and heuristics such as allowing a limited number of intervening words (Cordeiro et al, 2016).…”
Section: Progress Potential In Seen Datamentioning
confidence: 99%
“…The traditional CE techniques interpret any single and multiple token nominal chunk as a concept [32] or do a dictionary lookup, as, e.g., DBpedia Spotlight [5], which matches and links identified nominal chunks with DBpedia entries (6.6M entities, 13 billion RDF triples) 4 , based on the Apache OpenNLP 5 models for phrase chunking and named entity recognition (NER). Given the large coverage of DBpedia, the performance of DBpedia Spotlight is rather competitive.…”
Section: Related Workmentioning
confidence: 99%
“…Selection of n-grams as fragments of NP chunks that can form part of multiple token concepts. For this task, we formed the PoS-patterns based on Penn Treebank tagset 10 , which were inherited from the patterns for multiword expression detection introduced in [4] and expanded here resulting in the following set:…”
Section: Compilation Of the Training Corpusmentioning
confidence: 99%
“…Selection of n-grams as fragments of NP chunks that can form part of multiple token concepts. For this task, we formed the PoS-patterns based on Penn Treebank tagset 10 , which were inherited from the patterns for multiword expression detection introduced in [4] and expanded here resulting in the following set: P = {N N, J N, V N, N J, J J, V J, N of N, N of DT N, N of J, N of DT J, N of V, N of DT V, CD N, CD J}, where N stands for "noun", i.e., NN|NNS|NNP|NNPS, J stands for "adjective", i.e., JJ|JJR|JJS, V -"verb" but limited to VBD|VBG|VN, CD -"cardinal number", DT -"determiner", and "of" is an exact pronoun. Each pattern matches an n-gram with two open-class lexical items and at most two auxiliary tokens between them.…”
Section: Compilation Of the Training Corpusmentioning
confidence: 99%