The paper presents the strategy and results of mapping adjective synsets between plWordNet (the wordnet of Polish, cf. Piasecki et al. 2009, Maziarz et al. 2013) and Princeton WordNet (cf. Fellbaum 1998. The main challenge of this enterprise has been very different synset relation structures in the two networks: horizontal, dumbbell-model based in PWN and vertical, hyponymy-based in plWN.Moreover, the two wordnets display differences in the grouping of adjectives into semantic domains and in the size of the adjective category. The handle the above contrasts, a series of automatic prompt algorithms and a manual mapping procedure relying on corresponding synset and lexical unit relations as well as on inter-lingual relations between noun synsets were proposed in the pilot stage of mapping (Rudnicka et al. 2015). In the paper we discuss the final results of the mapping process as well as explain example mapping choices. Suggestions for further development of mapping are also given.
Automatic Prompt System in the Process of Mapping plWordNet on Princeton WordNetThe paper offers a critical evaluation of the power and usefulness of an automatic prompt system based on the extended Relaxation Labelling algorithm in the process of (manual) mapping plWordNet on Princeton WordNet. To this end the results of manual mapping – that is inter-lingual relations between plWN and PWN synsets – are juxtaposed with the automatic prompts that were generated for the source language synsets to be mapped. We check the number and type of inter-lingual relations introduced on the basis of automatic prompts and the distance of the respective prompt synsets from the actual target language synsets.
One of the main research questions concerning multi-word expressions (MWEs) is which of them are transparent word combinations created ad hoc and which are multi-word lexical units (MWUs). In this paper, we use selected corpus-linguistic and machine-learning methods to determine which lexicalization criteria guide Polish and English lexicographers in deciding which MWEs (bigrams such as adjective+noun and noun+noun combinations) should be treated as lexical units recorded in dictionaries as MWUs. We analyzed two samples: MWEs extracted from Polish and English monolingual dictionaries, and those created by the annotators, and tested two custom-designed criteria, i.e., intuition and paraphrase, also by using statistical methods (measures of collocational strength: PMI and Jaccard). We revealed that Polish lexicographers have a tendency not to include compositional MWEs as lexical entries in their dictionaries and that the criteria of paraphrase and intuition are important for them: if MWEs are not clearly and unambiguously paraphrasable and compositional, then they are recorded in dictionaries. We found that in contrast to Polish lexicographers English lexicographers tend to record also compositional and partly compositional MWEs.
Though the interest in use of wordnets for lexicography is (gradually) growing, no research has been conducted so far on equivalence between lexical units (or senses) in inter-linked wordnets. In this paper, we present and validate a procedure of sense-linking between plWordNet and Princeton WordNet. The proposed procedure employs a continuum of three equivalence types: strong, regular and weak, distinguished by a custom-designed set of formal, semantic and translational features. To validate the procedure, three independent samples of 120 sense pairs were manually analysed with respect to the features. The results show that synsets from the two wordnets linked by interlingual synonymy relation have a greater number of equivalents than those linked through interlingual partial synonymy or interlingual hyponymy relations. Even synsets linked via interlingual synonymy may have pairs of lexical units which are only weak equivalents. More-fine grained sense linking enhances the usefulness of the mapped wordnets as a bilingual lexicon for translators or researchers.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.