2008
DOI: 10.1007/s10579-008-9075-7
|View full text |Cite
|
Sign up to set email alerts
|

Multilingual collocation extraction with a syntactic parser

Abstract: An impressive amount of work was devoted over the past few decades to collocation extraction. The state of the art shows that there is a sustained interest in the morphosyntactic preprocessing of texts in order to better identify candidate expressions; however, the treatment performed is, in most cases, limited (lemmatization, POS-tagging, or shallow parsing). This article presents a collocation extraction system based on the full parsing of source corpora, which supports four languages: English, French, Spani… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
19
0

Year Published

2009
2009
2022
2022

Publication Types

Select...
5
2
1

Relationship

2
6

Authors

Journals

citations
Cited by 39 publications
(19 citation statements)
references
References 20 publications
0
19
0
Order By: Relevance
“…Once the target file has been found, the sentence that is likely to be the translation of the source sentence is identified using an in-house sentence alignment method (Nerima et al, 2003;Seretan, 2008). This method consists of comparing the relative lengths of paragraphs in the source and target documents in order to find a paragraph alignment, and then assuming a 1:1 match for sentences inside a paragraph.…”
Section: Sentence Alignmentmentioning
confidence: 99%
See 1 more Smart Citation
“…Once the target file has been found, the sentence that is likely to be the translation of the source sentence is identified using an in-house sentence alignment method (Nerima et al, 2003;Seretan, 2008). This method consists of comparing the relative lengths of paragraphs in the source and target documents in order to find a paragraph alignment, and then assuming a 1:1 match for sentences inside a paragraph.…”
Section: Sentence Alignmentmentioning
confidence: 99%
“…To evaluate the performance of our extractor based on deep parsing, we performed several cross-language evaluation experiments in which we compared the two approaches, i.e., the syntax-based approach vs. the syntax-free approach (Seretan, 2008). For example, in an experiment conducted on English, French, Italian and Spanish data consisting of 2,000 pairs sampled from different levels of the significance list -ranging from top to 10% of the list -we measured the extraction precision by taking into account reference annotations produced by expert linguists.…”
Section: Collocation Extraction Evaluationmentioning
confidence: 99%
“…The ability of Twic/TwicPen to handle expressions comes from the quality of the linguistic analysis provided by the multilingual Fips parser and of the collocation knowledge base (Seretan et al, 2004(Seretan et al, , 2008. A sample analysis is given in (7b), showing how extraposed elements are connected with canonical empty positions, as assumed by generative linguists.…”
Section: Fig 2 Example Of a Collocationmentioning
confidence: 99%
“…As we already mentioned, the Greek MWE extractor is part of FipsCo, a larger extraction system based on a symbolic parsing technology (Seretan, 2008) which we previously applied on text corpora in other languages. The recent development of the Greek parser enabled us to extend it and apply it to Greek.…”
Section: Extractionmentioning
confidence: 99%
“…The tool relies on a symbolic parsing technology, and is part of FipsCo, a larger extraction system (Seretan, 2008) which has previously been used to build MWE resources for other languages, including English, French, Spanish, and Italian. Its extension to Greek will ultimately enable the inclusion of this language in the list of languages supported by an in-house translation system.…”
Section: Introductionmentioning
confidence: 99%