Syntax-Based Extraction

Seretan, Violeta

doi:10.1007/978-94-007-0134-2_4

Cited by 4 publications

(8 citation statements)

References 18 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…(Osgood, 1952, pp. 54-5) Seretan (2011) ties these notions of frequency and predictability to key principles of statistics, namely: tendency and typicality. She highlights how scholars have defined a collocation as a typical, specific, and characteristic combination of words which are arbitrary, recurrent word combinations.…”

Section: Definitions Of Collocation and Methods Of Identificationmentioning

confidence: 99%

“…In many second language studies, it has been set at one or two words to only capture adjacent pairs (e.g., Bestgen, 2017;. Seretan (2011) recognises the dangers of a span approach because it may capture syntactic noisethat is, word pairs which have no syntactic relationship. This is shown in example [1], whereby a span approach might capture 'human rights', and 'human rights organisations' as pairings but also the unrelated 'human organisations'.…”

Section: Definitions Of Collocation and Methods Of Identificationmentioning

confidence: 99%

See 1 more Smart Citation

Shaping Writing Grades

McCallum

Durrant

2022

View full text Add to dashboard Cite

This Element explores relationships between collocations, writing quality, and learner and contextual variables in a first-year composition (FYC) programme. Comprising three studies, the Element is anchored in understanding phraseological complexity and its sub-constructs of sophistication and diversity. First, the authors look at sophistication through association measures. They tap into how these measures may tell us different types of information about collocation via a cluster analysis. Selected measures from this clustering are used in a cumulative links model to establish relationships between these measures, measures of diversity and measures of task, the language background of the writer and individual writer variation, and writing quality scores. A third qualitative study of the statistically significant predictors helps understand how writers use collocations and why they might be favoured or downgraded by raters. This Element concludes by considering the implications of this modelling for assessment.

show abstract

Section: Definitions Of Collocation and Methods Of Identificationmentioning

confidence: 99%

Section: Definitions Of Collocation and Methods Of Identificationmentioning

confidence: 99%

Shaping Writing Grades

McCallum

Durrant

2022

View full text Add to dashboard Cite

show abstract

“…To achieve this goal, we added a collocation database to each of the monolingual lexical databases, using the system for collocation extraction developed by Violeta Seretan and others at LATL (cf. Seretan and Wehrli, 2009 ; Seretan, 2011 ). This system extracts candidate-collocations from a corpus, filters those candidates using standard association measures.…”

Section: Treatment Of Mwes In a Linguistically-based Systemmentioning

confidence: 99%

“…Multiword expressions can be defined as complex lexical units made of more than one word (see Wehrli, 2000 , 2013 ; Sag et al, 2002 ; Seretan, 2011 ; Constant et al, 2017 , among many others), where word is taken, very crudely, as a minimal string of letters between spaces (or some punctuation characters), and lexical unit corresponds to a syntactic or semantic unit. The following examples illustrate the diversity of (English) multiword expressions, and the discussion below will make those definitions clear.…”

Section: Collocations and Multiword Expressionsmentioning

confidence: 99%

Collocations in Parsing and Translation

Wehrli

2022

Front. Artif. Intell.

View full text Add to dashboard Cite

Proper identification of collocations (and more generally of multiword expressions (MWEs), is an important qualitative step for several NLP applications and particularly so for translation. Since many MWEs cannot be translated literally, failure to identify them yields at best inaccurate translation. This paper is mostly be concerned with collocations. We will show how they differ from other types of MWEs and how they can be successfully parsed and translated by means of a grammar-based parser and translator.

show abstract

“…This latter topic has been one of extreme importance for the computational linguistics community [6], and has seen many approaches aside from the information-theoretic, including with part-of-speech taggers [7] (where categories, e.g., noun, verb, etc. are used to identify word combinations) and with syntactic parsers [8] (where rules of grammar are used to identify word combinations). However, almost all of these methods have the common issue of scalability [9], making them difficult to use for the extraction of phrases of more than two words.…”

mentioning

confidence: 99%

Identifying missing dictionary entries with frequency-conserving context models

et al. 2015

View full text Add to dashboard Cite

In an effort to better understand meaning from natural language texts, we explore methods aimed at organizing lexical objects into contexts. A number of these methods for organization fall into a family defined by word ordering. Unlike demographic or spatial partitions of data, these collocation models are of special importance for their universal applicability. While we are interested here in text and have framed our treatment appropriately, our work is potentially applicable to other areas of research (e.g., speech, genomics, and mobility patterns) where one has ordered categorical data, (e.g., sounds, genes, and locations). Our approach focuses on the phrase (whether word or larger) as the primary meaning-bearing lexical unit and object of study. To do so, we employ our previously developed framework for generating word-conserving phrase-frequency data. Upon training our model with the Wiktionary-an extensive, online, collaborative, and open-source dictionary that contains over 100, 000 phrasal-definitions-we develop highly effective filters for the identification of meaningful, missing phrase-entries. With our predictions we then engage the editorial community of the Wiktionary and propose short lists of potential missing entries for definition, developing a breakthrough, lexical extraction technique, and expanding our knowledge of the defined English lexicon of phrases. . We focus on a particular aspect of Shannon's work, namely joint probability distributions between word-types (denoted w ∈ W ), and their groupings by appearance-orderings, or, contexts (denoted c ∈ C). For a word appearing in text, Shannon's model assigned context according to the word's immediate antecedent. In other words, the sequence · · · w i−1 w i · · · places this occurrence of the word-type of w i in the context of w i−1 (uniquely defined by the word-type of w i−1 ), where " " denotes "any word". This experiment was novel, and when these transition probabilities were observed, he found a method for the automated production of language that far better resembled true English text than simple adherence to relative word frequencies.Later, though still early on in the history of modern computational linguistics and natural language processing, theory caught up with Shannon's work. My guess is that phrase-adaption and generative gap-filling are very roughly equally important in language production, as measured in processing time spent on each, or in constituents arising from each. One way of making such an intuitive estimate is simply to listen to what people actually say when they speak. An independent way of gauging the importance of the phrasal lexicon is to determine its size.Since then, with the rise of computation and increasing availability of electronic text, there have been numerous extensions of Shannon's context model. These models have generally been information-theoretic applications as well, mainly used to predict word associations [4] and to extract multi-word expressions (MWEs) [5]. This latter topic has been one of extre...

show abstract

Syntax-Based Extraction

Cited by 4 publications

References 18 publications

Shaping Writing Grades

Shaping Writing Grades

Collocations in Parsing and Translation

Identifying missing dictionary entries with frequency-conserving context models

Contact Info

Product

Resources

About