Having access to large lexical and grammatical resources when creating a new language resource is essential for its enhancement and enrichment. This paper describes the interplay and interactive utilization of different language technology tools and resources, in particular the Swedish lexicon SALDO and Swedish Constructicon, in the creation of Swedish FrameNet. We show how integrating resources in a larger infrastructure is much more than the sum of the parts.
The PROMISE network of excellence organized a two-days brainstorming workshop on 30 th and 31 st May 2012 in Padua, Italy, to discuss and envisage future directions and perspectives for the evaluation of information access and retrieval systems in multiple languages and multiple media. This document reports on the outcomes of this event and provides details about the six envisaged research lines: search applications; contextual evaluation; challenges in test collection design and exploitation; component-based evaluation; ongoing evaluation; and signal-aware evaluation. The ultimate goal of the PROMISE retreat is to stimulate and involve the research community along these research lines and to provide funding agencies with effective and scientifically sound ideas for coordinating and supporting information access research.
Constructing a lexical resource for Swedish, where compounding is highly productive, requires a well-structured policy of encoding. This paper presents the treatment and encoding of a certain class of compounds in Swedish FrameNet, and proposes a new approach for the automatic analysis of Swedish compounds, i.e. one that leverages existing FrameNet (Ruppenhofer et al., 2010) and Swedish FrameNet (Borin et al. 2010), as well as proven techniques for automatic semantic role labeling (Johansson et al., 2012).
BackgroundTest collections for information retrieval are scarce. Domain specific test collections even more so, and medical test collections in the Swedish language non-existent prior to the making of the MedEval test collection. Most research in information retrieval has been performed in the English language, thus most test collections contain English documents. However, English is morphologically poor compared to many other European languages and a number of interesting and important aspects have not been investigated. Building a medical test collection in Swedish opens new research opportunities.MethodsThis article describes the making of and potential uses of MedEval, a Swedish medical test collection with assessments, not only for topical relevance, but also for target reader group: Doctors or Patients. A user of the test collection may choose if she wishes to search in the Doctors or the Patients scenario where the topical relevance assessments have been adjusted with consideration to user group, or to search in a scenario which regards only topical relevance.In addition to having three user groups, MedEval, in its present form, has two indexes, one where the terms are lemmatized and one where the terms are lemmatized and the compounds split and the constituents indexed together with the whole compound.ResultsDifferences discovered between the documents written for medical professionals and documents written for laypersons are presented. These differences may be utilized in further studies of retrieval of documents aimed at certain groups of readers. Differences between the groups of documents are, for example, that professional documents have a higher ratio of compounds, have a greater average word length and contain more multi-word expressions.An experiment is described where the user scenarios have been utilized, searching with expert terms and lay terms, separately and in combination in the different scenarios. The tendency discovered is that the medical expert gets best results using expert terms and the lay person best results using lay terms, but also quite good results using expert terms or lay and expert terms in combination.ConclusionsThe many features of MedEval gives a variety of research possibilities, such as comparing the effectiveness of search terms when it comes to retrieving documents aimed at the different user groups or to study the effect of compound decomposition in retrieval of documents. As Swedish, the language of MedEval, is a morphologically more complex language than English, it is possible to study additional aspects of the effect of natural language processing in information retrieval, for example utilizing different inflectional word forms in the retrieval of expert vs lay documents. MedEval is the first Swedish test collection of the medical domain.AvailabilityThe Department of Swedish at the University of Gothenburg is in the process of making the MedEval test collection available to academic researchers.
Creation of framenets for languages other than English based on Berkeley FrameNet has tested the hypothesis that semantic frames, to a certain extent, are language independent. This working hypothesis facilitated reuse of frames for new framenets, defining language specific frame evoking lemmas and annotating language specific sentences. The caveat is the bias towards creating what is possible, rather than typical, in a language. The reuse of frames allowed developing SweFN in a relatively short period of time. However, the goal to build a typical, not a possible Swedish framenet, necessitated some frame modifications. This chapter provides a comparison between the English and Swedish framenets regarding semantic annotation and representation, and socio-cultural factors, including how differences forced modification of the original structure.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.