Purpose of this paperThe purpose of this study is to evaluate freely available machine translation (MT) services' performance in translating metadata records. Design/methodology/approachRandomly selected metadata records were translated from English into Chinese using Google, Bing, and SYSTRAN Machine Translation (MT) systems. These translations were then evaluated using a five point scale for both Fluency and Adequacy. Missing Count (words not translated) and Incorrect Count (words incorrectly translated) were also recorded. FindingsConcerning both Fluency and Adequacy, Google and Bing's translations of more than 70% of test data received scores equal to or greater than three, representative of 'non-native Chinese' and 'much coverage,' respectively. SYSTRAN scored lowest in both measures. However, these differences were not statistically significant. A Pearson correlation analysis demonstrated a strong relationship (r= .86) between Fluency and Adequacy. Missing Count and Incorrect Count strongly correlated with Fluency and Adequacy. Research limitations/implicationsThis study was conducted in a specific domain with a small sample size. It is necessary to conduct the evaluation with a larger, more representative test dataset. Also, other language pairs should be evaluated applying similar technologies. Originality/valueMost existing digital collections can be accessed in English alone. Few digital collections in the United States support multilingual information access (MLIA) that enables users of differing languages to search, browse, recognize and use information in the collections. Human translation is one solution, but it is neither time nor cost effective for most libraries. This study serves as a first step to understand the 2 performance of current MT systems and to design effective and efficient MLIA services for digital collections.
One way to facilitate Multilingual Information Access (MLIA) for digital libraries is to generate multilingual metadata records by applying Machine Translation (MT) techniques. Current online MT services are available and affordable, but are not always effective for creating multilingual metadata records. In this study, we implemented 3 different MT strategies and evaluated their performance when translating English metadata records to Chinese and Spanish. These strategies included combining MT results from 3 online MT systems (Google, Bing, and Yahoo!) with and without additional linguistic resources, such as manually-generated parallel corpora, and metadata records in the two target languages obtained from international partners. The opensource statistical MT platform Moses was applied to design and implement the three translation strategies. Human evaluation of the MT results using adequacy and fluency demonstrated that two of the strategies produced higher quality translations than individual online MT systems for both languages. Especially, adding small, manuallygenerated parallel corpora of metadata records significantly improved translation performance. Our study suggested an effective and efficient MT approach for providing multilingual services for digital collections.
This paper presents the background, research design, and current progress of a new project on exploring the application of various machine translation strategies working toward multilingual information access for digital collections.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.