We present a more efficient version of the e-magyar NLP pipeline for Hungarian called emtsv. It integrates Hungarian NLP tools in a framework whose individual modules can be developed or replaced independently and allows new ones to be added. The design also allows convenient investigation and manual correction of the data flow from one module to another. The improvements we publish include effective communication between the modules and support of the use of individual modules both in the chain and standing alone. Our goals are accomplished using extended tsv (tab separated values) files, a simple, uniform, generic and selfdocumenting input/output format. Our vision is maintaining the system for a long time and making it easier for external developers to fit their own modules into the system, thus sharing existing competencies in the field of processing Hungarian, a mid-resourced language. The source code is available under LGPL 3.0 license 1 .
Conversational partners’ “ideal” information states – their knowledge (about the world and each other), their beliefs (with different degrees of certainty), their desires and intentions (of different degrees of intensity) – can be specified at any point in the conversation. The various elements making up an information state are sometimes standard-like, but often they serve as basis for various possible deviations from the standard. Some verbs which express particular deviations are discussed, including the extreme case of lying while saying the truth. Our analyses are presented in a formal interpretation system, which allows us to demonstrate how different meanings emerge while only changing polarity parameters from case to case. We thus intend to build a bridge between pragmatics and hardcore formal semantics.
This paper presents Manócska, a verb frame database for Hungarian. It is called unified as it was built by merging all available verb frame resources. To be able to merge these, we had to cope with their structural and conceptual differences. After that, we transformed them into two easy to use formats: a TSV and an XML file. Manócska is openaccess, the whole resource and the scripts which were used to create it are available in a github repository. This makes Manócska reproducible and easy to access, version, fix and develop in the future. During the merging process, several errors came into sight. These were corrected as systematically as possible. Thus, by integrating and harmonizing the resources, we produced a Hungarian verb frame database of a higher quality.
In this paper, we present our algorithm called nom-or-not designed for dissolving case-disambiguation in Hungarian. By case, we mean an abstract syntactic case, a kind of syntactic role of the given token. Nouns and proper names, adjectives, participles and numerals without a case suffix are always tagged as Nom, although the lack of case ending may represent various functions: it may mark the subject of the sentence or a possessor or the nominal part of a nominal predicate or the vocative case; on top of that, a modifier of a nominal or a nominal combined with a postposition lacks a case suffix as well; proper names consisting of two or more elements are also caseless. Our algorithm is motivated by the needs of a psycholinguistically motivated parser which aims to process sentences from left to right. Therefore, our case disambiguator follows the basic principles of the parser and analyses the sentences from left to right, always making a decision based on the information of the previously processed elements and the elements in a two token wide look-ahead parsing window. Our preliminary results show that if some modifications and new rules are added and it's run on a more precisely annotated corpus, it can improve the disambiguator algorithm. The preliminary results were obtained from a manually annotated corpora of 500 sentences.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.