Stav Klein scite author profile

Stav Klein

5Publications

28Citation Statements Received

112Citation Statements Given

How they've been cited

How they cite others

106

111

Affiliations

Tel Aviv University

Publications

Order By: Most citations

From SPMRL to NMRL: What Did We Learn (and Unlearn) in a Decade of Parsing Morphologically-Rich Languages (MRLs)?

Tsarfaty¹,

Bareket²,

Klein³

et al. 2020

View full text Add to dashboard Cite

It has been exactly a decade since the first establishment of SPMRL, a research initiative unifying multiple research efforts to address the peculiar challenges of Statistical Parsing for Morphologically-Rich Languages (MRLs).Here we reflect on parsing MRLs in that decade, highlight the solutions and lessons learned for the architectural, modeling and lexical challenges in the pre-neural era, and argue that similar challenges re-emerge in neural architectures for MRLs. We then aim to offer a climax, suggesting that incorporating symbolic ideas proposed in SPMRL terms into nowadays neural architectures has the potential to push NLP for MRLs to a new level. We sketch a strategies for designing Neural Models for MRLs (NMRL), and showcase preliminary support for these strategies via investigating the task of multi-tagging in Hebrew, a morphologically-rich, high-fusion, language.

show abstract

Getting the ##life out of living: How Adequate Are Word-Pieces for Modelling Complex Morphology?

Klein¹,

Tsarfaty²

2020

View full text Add to dashboard Cite

This work investigates the most basic units that underlie contextualized word embeddings, such as BERT -the so-called word pieces. In Morphologically-Rich Languages (MRLs) which exhibit morphological fusion and nonconcatenative morphology, the different units of meaning within a word may be fused, intertwined, and cannot be separated linearly. Therefore, when using word-pieces in MRLs, we must consider that: (1) a linear segmentation into sub-word units might not capture the full morphological complexity of words; and (2) representations that leave morphological knowledge on sub-word units inaccessible might negatively affect performance. Here we empirically examine the capacity of wordpieces to capture morphology by investigating the task of multi-tagging in Hebrew, as a proxy to evaluating the underlying segmentation. Our results show that, while models trained to predict multi-tags for complete words outperform models tuned to predict the distinct tags of WPs, we can improve the WPs tag prediction by purposefully constraining the wordpieces to reflect their internal functions. We conjecture that this is due to the naïve linear tokenization of words into word-pieces, and suggest that linguistically-informed word-pieces schemes, that make morphological knowledge explicit, might boost performance for MRLs.

show abstract

What’s Wrong with Hebrew NLP? And How to Make it Right

Tsarfaty¹,

Sadde²,

Klein³

et al. 2019

View full text Add to dashboard Cite

For languages with simple morphology, such as English, automatic annotation pipelines such as spaCy or Stanford's CoreNLP successfully serve projects in academia and the industry. For many morphologically-rich languages (MRLs), similar pipelines show sub-optimal performance that limits their applicability for text analysis in research and the industry. The sub-optimal performance is mainly due to errors in early morphological disambiguation decisions, which cannot be recovered later in the pipeline, yielding incoherent annotations on the whole. In this paper we describe the design and use of the ONLP suite, a joint morpho-syntactic parsing framework for processing Modern Hebrew texts. The joint inference over morphology and syntax substantially limits error propagation, and leads to high accuracy. ONLP provides rich and expressive output which already serves diverse academic and commercial needs. Its accompanying online demo further serves educational activities, introducing Hebrew NLP intricacies to researchers and non-researchers alike.

show abstract

From SPMRL to NMRL: What Did We Learn (and Unlearn) in a Decade of Parsing Morphologically-Rich Languages (MRLs)?

Tsarfaty¹,

Bareket²,

Klein³

et al. 2020

Preprint

View full text Add to dashboard Cite

show abstract

Notes on Modern Hebrew phonology and orthography

Klein

2020

View full text Add to dashboard Cite

This brief survey chapter starts by characterizing the phonemic inventory of consonants and vowels in Modern Hebrew (MH). It then notes departures from earlier stages of the language, such as the full or partial merger of historical “emphatic” stops with plain stops, the loss of pharyngeal and glottal phonemes (“gutturals’), degemination, and the loss of active phonological rules, such as vowel lengthening and reduction, which together account for the much reduced inventory of both consonants and vowels in all present-day usage, including “Mizrahi” and more generally used pronunciations. Selected phonotactic features of MH phonology – syllable structure, CV alternations, consonant clusters, stress, and word length – are touched on. A final section deals with the essentially conservative Hebrew orthography, as compared with the dynamics of its phonology.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Stav Klein

From SPMRL to NMRL: What Did We Learn (and Unlearn) in a Decade of Parsing Morphologically-Rich Languages (MRLs)?

Getting the ##life out of living: How Adequate Are Word-Pieces for Modelling Complex Morphology?

What’s Wrong with Hebrew NLP? And How to Make it Right

From SPMRL to NMRL: What Did We Learn (and Unlearn) in a Decade of Parsing Morphologically-Rich Languages (MRLs)?

Notes on Modern Hebrew phonology and orthography

Contact Info

Product

Resources

About