This paper describes the work-in-progress prototype of a dialog system that simulates a virtual patient (VP) consultation. We report some challenges and difficulties that are found during its development, especially in managing the interaction and the vocabulary from the medical domain.
In this paper we propose a hybrid approach to align single words, compound words and idiomatic expressions from English-Arabic parallel corpora. The objective is to develop, improve and maintain automatically translation lexicons. This approach combines linguistic and statistical information in order to improve word alignment results. The linguistic improvements taken into account refer to the use of an existing bilingual lexicon, named entity recognition, grammatical tag matching and detection of syntactic dependency relation between words. Statistical information refers to the number of occurrences of repeated words, their positions in the parallel corpus and their lengths in terms of number of characters. Single-word alignment uses an existing bilingual lexicon, named entities and cognate detection and grammatical tag matching. Compound word alignment consists of establishing correspondences between the compound words of the source sentence and the compound words of the target sentences. A syntactic analysis is applied to the source and target sentences in order to extract dependency relations between words and to recognize compound words. Idiomatic expression alignment starts with a monolingual term extraction for each of the source and target languages, which provides a list of sequences of repeated words and a list of potential translations. These sequences are represented with vectors which indicate their number of occurrences and the number of segments in which they appear. Then, translation relations between the source and target expressions are evaluated with a distance metric. We have evaluated the single and multiword expression aligners using two methods: A manual evaluation of the alignment quality on 1000 pairs of English-Arabic sentences and an evaluation of the impact of this alignment on the translation quality of a machine translation system. The obtained results showed that these aligners, on the one hand, generate a translation lexicon with around 85% precision, and on the other hand, report a gain in BLEU score of 0.20 for the translation quality.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.