This article presents a hybrid architecture which combines rule-based machine translation (RBMT) with phrase-based statistical machine translation (SMT). The hybrid translation system is guided by the rule-based engine. Before the transfer step, a varied set of partial candidate translations is calculated with the SMT system and used to enrich the tree-based representation with more translation alternatives. The final translation is constructed by choosing the most probable combination among the available fragments using monotone statistical decoding following the order provided by the rule-based system. We apply the hybrid model to a pair of distantly related languages, Spanish and Basque, and perform extensive experimentation on two different corpora. According to our empirical evaluation, the hybrid approach outperforms the best individual system across a varied set of automatic translation evaluation metrics. Following some output analysis to better understand the behaviour of the hybrid system, we explore the possibility of adding alternative parse trees and extra features to the hybrid decoder. Finally, we present a twofold manual evaluation of the translation systems studied in this paper, consisting of (i) a pairwise output comparison and (ii) a individual task-oriented evaluation using HTER. Interestingly, the manual evaluation shows some contradictory results with respect to the automatic evaluation; humans tend to prefer the translations from the RBMT system over the statistical and hybrid translations.Peer ReviewedPostprint (author’s final draft
The study presented here relies on the integrated use of different kinds of knowledge in order to improve first-guess accuracy in non-word context-sensitive correction for general unrestricted texts. State of the art spelling correction systems, e.g.ispell, apart from detecting spelling errors, also assist the user by offering a set of candidate corrections that are close to the misspelled word. Based on the correction proposals of ispell, we built several guessers, which were combined in different ways. Firstly, we evaluated all possibilities and selected the best ones in a corpus with artificially generated typing errors. Secondly, the best combinations were tested on texts with genuine spelling errors. The results for the latter suggest that we can expect automatic non-word correction for all the errors in a free running text with 80% precision and a single proposal 98% of the times (1.02 proposals on average).
This paper presents experiments performed on lexical knowledge acquisition in the form of verbal argumental information. The system obtains the data from raw corpora after the application of a partial parser and statistical filters. We used two different statistical filters to acquire the argumental information: Mutual Information, and Fisher's Exact test. Due to the characteristics of agglutinative languages like Basque, the usual classification of arguments in terms of their syntactic category (such as NP or PP) is not suitable. For that reason, the arguments will be classified in 48 different kinds of case markers, which makes the system fine grained if compared to equivalent systems that have been developed for other languages. This work addresses the problem of distinguishing arguments from adjuncts, this being one of the most significant sources of noise in subcategorization frame acquisition.
The frame-based knowledge representation model adopted in IDHS (Intelligent Dictionary Help System) is described in this paper. It is used to represent the lexical knowledge acquired automatically from a conventional dictionary. Moreover, the enrichment processes that have been performed on the Dictionary Knowledge Base and the dynamic exploitation of this knowledge -both based on the exploitation of the properties of lexical semantic relations-are also described.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.