Zsolt Szántó scite author profile

Universal dependencies (UD) is a framework for morphosyntactic annotation of human language, which to date has been used to create treebanks for more than 100 languages. In this article, we outline the linguistic theory of the UD framework, which draws on a long tradition of typologically oriented grammatical theories. Grammatical relations between words are centrally used to explain how predicate–argument structures are encoded morphosyntactically in different languages while morphological features and part-of-speech classes give the properties of words. We argue that this theory is a good basis for cross-linguistically consistent annotation of typologically diverse languages in a way that supports computational natural language understanding as well as broader linguistic studies.

show abstract

Special Techniques for Constituent Parsing of Morphologically Rich Languages

Szántó¹,

Farkas²

2014

View full text Add to dashboard Cite

We introduce three techniques for improving constituent parsing for morphologically rich languages. We propose a novel approach to automatically find an optimal preterminal set by clustering morphological feature values and we conduct experiments with enhanced lexical models and feature engineering for rerankers. These techniques are specially designed for morphologically rich languages (but they are language-agnostic). We report empirical results on the treebanks of five morphologically rich languages and show a considerable improvement in accuracy and in parsing speed as well.

show abstract

Universal Dependencies and Morphology for Hungarian - and on the Price of Universality

Vincze

Simkó

Szántó

et al. 2017

View full text Add to dashboard Cite

In this paper, we present how the principles of universal dependencies and morphology have been adapted to Hungarian. We report the most challenging grammatical phenomena and our solutions to those. On the basis of the adapted guidelines, we have converted and manually corrected 1,800 sentences from the Szeged Treebank to universal dependency format. We also introduce experiments on this manually annotated corpus for evaluating automatic conversion and the added value of language-specific, i.e. non-universal, annotations. Our results reveal that converting to universal dependencies is not necessarily trivial, moreover, using languagespecific morphological features may have an impact on overall performance.

show abstract

HuSpaCy: an industrial-strength Hungarian natural language processing toolkit

Orosz¹,

Szántó²,

Berkecz³

et al. 2022

Preprint

View full text Add to dashboard Cite

Although there are a couple of open-source language processing pipelines available for Hungarian, none of them satisfies the requirements of today's NLP applications. A language processing pipeline should consist of close to state-of-the-art lemmatization, morphosyntactic analysis, entity recognition and word embeddings. Industrial text processing applications have to satisfy non-functional software quality requirements, what is more, frameworks supporting multiple languages are more and more favored. This paper introduces HuSpaCy, an industryready Hungarian language processing toolkit. The presented tool provides components for the most important basic linguistic analysis tasks. It is open-source and is available under a permissive license. Our system is built upon spaCy's NLP components resulting in an easily usable, fast yet accurate application. Experiments confirm that HuSpaCy has high accuracy while maintaining resource-efficient prediction capabilities.

show abstract

Find Me if You Can! Identification of Services on Websites by Human Beings and Artificial Intelligence

Stadlmann¹,

Berend

Fratrič³

et al. 2022

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Zsolt Szántó

Universal Dependencies

Special Techniques for Constituent Parsing of Morphologically Rich Languages

Universal Dependencies and Morphology for Hungarian - and on the Price of Universality

HuSpaCy: an industrial-strength Hungarian natural language processing toolkit

Find Me if You Can! Identification of Services on Websites by Human Beings and Artificial Intelligence

Contact Info

Product

Resources

About