This paper presents a novel method for semantic annotation and search of a target corpus using several knowledge resources (KRs). This method relies on a formal statistical framework in which KR concepts and corpus documents are homogeneously represented using statistical language models. Under this framework, we can perform all the necessary operations for an efficient and effective semantic annotation of the corpus. Firstly, we propose a coarse tailoring of the KRs w.r.t the target corpus with the main goal of reducing the ambiguity of the annotations and their computational overhead. Then, we propose the generation of concept profiles, which allow measuring the semantic overlap of the KRs as well as performing a finer tailoring of them. Finally, we propose how to semantically represent documents and queries in terms of the KRs concepts and the statistical framework to perform semantic search. Experiments have been carried out with a corpus about web resources which includes several Life Sciences catalogues and Wikipedia pages related to web resources in general (e.g., databases, tools, services, etc). Results demonstrate that the proposed method is more effective and efficient than state-ofthe-art methods relying on either context-free annotation or keyword-based search.
This paper presents the ParlaMint corpora containing transcriptions of the sessions of the 17 European national parliaments with half a billion words. The corpora are uniformly encoded, contain rich meta-data about 11 thousand speakers, and are linguistically annotated following the Universal Dependencies formalism and with named entities. Samples of the corpora and conversion scripts are available from the project’s GitHub repository, and the complete corpora are openly available via the CLARIN.SI repository for download, as well as through the NoSketch Engine and KonText concordancers and the Parlameter interface for on-line exploration and analysis.
Many important applications in scientific fields such as Bioinformatics depend on the management of large collections of heterogeneous XML documents, containing complex domainspecific data. To allow advanced querying, these systems need to support techniques such as data exploration and approximate query processing by using multiple notions of similarity. The development of these kinds of information systems usually proceeds in costly ad-hoc ways, due to the lack of methodological guidance.This paper presents a model-based approach to guide the development of multi-similarity systems for XML document repositories. We describe the overall framework of our approach and in particular, we focus on the requirements phase. We provide a detailed guidance for capturing the users requirements and for specifying them in a formal way by using an adaptation of the i* modeling framework. Finally, the usefulness of our approach is demonstrated by a Bioinformatics case study.
Abstract. We introduce XTaGe (XML Tester and Generator), a system for the synthesis of XML collections meant for testing and microbenchmarking applications. In contrast with existing approaches, XTaGe focuses on complex collections, by providing a highly extensible framework to introduce controlled variability in XML structures. In this paper we present the theoretical foundation, internal architecture and main features of our generator; we describe its implementation, which includes a GUI to facilitate the specification of collections; we discuss how XTaGe's features compare with those in other XML generation systems; finally, we illustrate its usage by presenting a use case in the Bioinformatics domain.
A basic premise of this article is that the institutional teaching of translation studies has evolved in the past decades partly due to a growing connection between theory and teaching practice. The present article focuses on how seven proponents of various translation theories teach in classrooms, on why theory is important for the teaching of the profession, and on the nature of theory. This discussion leads to a fundamental concern for the training of future translators for professional work. It is argued that translation trainees should be exposed to a variety of approaches to translation which are inspired by and connect to different theoretical schools so that students are in this way taught to be flexible in their approach to texts and will also learn theory in practical application.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.