Verb Class Discovery from Rich Syntactic Data

Sun, Lin; Korhonen, Anna; Krymolowski, Yuval

doi:10.1007/978-3-540-78135-6_2

Cited by 24 publications

(34 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…When this approach was evaluated against gold standards based on Verbnet [30,32], both containing hundreds of verbs in 15-20 classes, it achieved the highest performance (at around 80 F-measure) with deep linguistic features: SCFs refined with selectional preferences.…”

Section: Related Workmentioning

confidence: 99%

“…The gold standard was obtained by translating the Levin-based gold standard of Sun et al [32] from English to French, and a good correspondence was reported between the two gold standards. The authors reported the best results (64.5 F-measure) on high frequency verbs with the same combination of features (SCFs and selectional preferences) and the same clustering method (spectral clustering) as for English.…”

Section: Related Workmentioning

confidence: 99%

“…We used an approach similar to that earlier employed by Sun et al [25] for building a gold standard for French. They took a gold standard frequently used for evaluating verb clustering for English [32] and translated its 204 verbs and 17 classes to French. The majority of verbs and classes could be translated successfully.…”

Section: Gold Standard For Brazilian Portuguesementioning

confidence: 99%

See 2 more Smart Citations

Verb Clustering for Brazilian Portuguese

Scarton

Sun

Kipper-Schuler

et al. 2014

Computational Linguistics and Intelligent Text Processing

Self Cite

View full text Add to dashboard Cite

Abstract. Levin-style classes which capture the shared syntax and semantics of verbs have proven useful for many Natural Language Processing (NLP) tasks and applications. However, lexical resources which provide information about such classes are only available for a handful of worlds languages. Because manual development of such resources is extremely time consuming and cannot reliably capture domain variation in classification, methods for automatic induction of verb classes from texts have gained popularity. However, to date such methods have been applied to English and a handful of other, mainly resource-rich languages. In this paper, we apply the methods to Brazilian Portuguese -a language for which no VerbNet or automatic class induction work exists yet. Since Levinstyle classification is said to have a strong cross-linguistic component, we use unsupervised clustering techniques similar to those developed for English without language-specific feature engineering. This yields interesting results which line up well with those obtained for other languages, demonstrating the crosslinguistic nature of this type of classification. However, we also discover and discuss issues which require specific consideration when aiming to optimise the performance of verb clustering for Brazilian Portuguese and other less-resourced languages.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Section: Gold Standard For Brazilian Portuguesementioning

confidence: 99%

See 1 more Smart Citation

Verb Clustering for Brazilian Portuguese

Scarton

Sun

Kipper-Schuler

et al. 2014

Computational Linguistics and Intelligent Text Processing

Self Cite

View full text Add to dashboard Cite

show abstract

“…Advanced techniques for clustering verbs exist that can be used here (e.g. Vlachos et al 2009;Ó Séaghdha and Copestake 2008;Sun et al 2008). …”

Section: Discussionmentioning

confidence: 99%

Towards Unrestricted, Large-Scale Acquisition of Feature-Based Conceptual Representations from Corpus Data

Devereux

Pilkington

Poibeau

et al. 2009

Res on Lang and Comput

Self Cite

View full text Add to dashboard Cite

In recent years a number of methods have been proposed for the automatic acquisition of feature-based conceptual representations from text corpora. Such methods could offer valuable support for theoretical research on conceptual representation. However, existing methods do not target the full range of concept-relation-feature triples occurring in human-generated norms (e.g. flute produce sound) but rather focus on concept-feature pairs (e.g. flute -sound) or triples involving specific relations only (e.g. is-a or part-of relations). In this article we investigate the challenges that need to be met in both methodology and evaluation when moving towards the acquisition of more comprehensive conceptual representations from corpora. In particular, we investigate the usefulness of three types of knowledge in guiding the extraction process: encyclopedic, syntactic and semantic. We present first a semantic analysis of existing, human-generated feature production norms, which reveals information about co-occurring concept and feature classes. We introduce then a novel method for large-scale feature extraction which uses the class-based information to guide the acquisition process. The method involves extracting candidate triples consisting of concepts, relations and features (e.g. deer have antlers, flute produce sound) from corpus data parsed for grammatical dependencies, and re-weighting the triples on the basis of conditional probabilities calculated from our semantic analysis. We apply this method to an automatically parsed Wikipedia corpus which includes encyclopedic information and evaluate its accuracy using a number of different methods: direct evaluation against the McRae norms in terms of feature types and frequencies, human evaluation, and novel evaluation in terms of conceptual structure variables. Our investigation highlights a number of issues which require addressing in both methodology and evaluation when aiming to improve the accuracy of unconstrained feature extraction further.

show abstract

“…Recent research shows that it is possible to automatically induce lexical classes from corpora with promising accuracy (Schulte im Walde, 2006;Joanis et al, 2007;Sun et al, 2008). A number of machine learning (ML) methods have been applied to classify mainly syntactic features (e.g.…”

Section: Introductionmentioning

confidence: 99%

The choice of features for classification of verbs in biomedical texts

Korhonen

Krymolowski

Collier

2008

Proceedings of the 22nd International Conference on Computational Linguistics - COLING '08

Self Cite

View full text Add to dashboard Cite

We conduct large-scale experiments to investigate optimal features for classification of verbs in biomedical texts. We introduce a range of feature sets and associated extraction techniques, and evaluate them thoroughly using a robust method new to the task: cost-based framework for pairwise clustering. Our best results compare favourably with earlier ones. Interestingly, they are obtained with sophisticated feature sets which include lexical and semantic information about selectional preferences of verbs. The latter are acquired automatically from corpus data using a fully unsupervised method.

show abstract

Verb Class Discovery from Rich Syntactic Data

Cited by 24 publications

References 15 publications

Verb Clustering for Brazilian Portuguese

Verb Clustering for Brazilian Portuguese

Towards Unrestricted, Large-Scale Acquisition of Feature-Based Conceptual Representations from Corpus Data

The choice of features for classification of verbs in biomedical texts

Contact Info

Product

Resources

About