Classifiers in Japanese-to-English machine translation

Bond, Francis; Ogura, Kentaro; Ikehara, Satoru

doi:10.3115/992628.992653

Cited by 18 publications

(27 citation statements)

References 10 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Annotators are asked to select which of two randomly-ordered translations they prefer, one from each system (Bond, Ogura, and Ikehara, 1995;Schwartz, Aikawa, and Quirk, 2003), often over a reference set of translation pairs (Ikehara, Shirai, and Ogura, 1994).…”

Section: Past and Current Methodologiesmentioning

confidence: 99%

Can machine translation systems be evaluated by the crowd alone

et al. 2015

View full text Add to dashboard Cite

Crowd-sourced assessments of machine translation quality allow evaluations to be carried out cheaply and on a large scale. It is essential, however, that the crowd's work be filtered to avoid contamination of results through the inclusion of false assessments. One method is to filter via agreement with experts, but even amongst experts agreement levels may not be high. In this paper, we present a new methodology for crowd-sourcing human assessments of translation quality, which allows individual workers to develop their own individual assessment strategy. Agreement with experts is no longer required, and a worker is deemed reliable if they are consistent relative to their own previous work. Individual translations are assessed in isolation from all others in the form of direct estimates of translation quality. This allows more meaningful statistics to be computed for systems and enables significance to be determined on smaller sets of assessments. We demonstrate the methodology's feasibility in large-scale human evaluation through replication of the human evaluation component of WMT shared translation task for two language pairs, Spanish-to-English and English-to-Spanish. Results for measurement based solely on crowd-sourced assessments show system rankings in line with those of the original evaluation. Comparison of results produced by the relative preference approach and the direct estimate method described here demonstrate that the direct estimate method has a substantially increased ability to identify significant differences between translation systems.

show abstract

Section: Past and Current Methodologiesmentioning

confidence: 99%

Can machine translation systems be evaluated by the crowd alone

et al. 2015

View full text Add to dashboard Cite

show abstract

“…In the Thai language, as well as in some other Asian languages such as Japanese and Chinese , classifiers find significant use in quantitative noun phrases (Sornlertlamvanich et al, 1994 ;Bond et al, 1996). From our study, we realized that classifiers do not only help in expressing quantitative noun phrases, but also play a very important role in forming many types of phrases, including relative pronoun phrases, noun phrases and adverb phrases (see (Sornlertlamvanich et al, 1994) for a detailed discussion).…”

Section: Structure Of the Text Corpusmentioning

confidence: 79%

Building a Thai part-of-speech tagged corpus (ORCHID).

Sornlertlamvanich

Takahashi

Isahara

1999

J. Acoust. Soc. Jpn. (E), J Acoust Soc Jpn E

View full text Add to dashboard Cite

ORCHID (Open linguistic Resources CHanelled toward InterDisciplinary research) is aninitiative project aimed at building linguistic resources to support research in, but not limited to, natural language processing. Based on the concept of an open architecture design, the resources must be fully compatible with similar resources, and software tools must also be made available. This paper presents one result of the project, the construction of a Thai part-of-speech (POS) tagged corpus, which is a preliminary stage in the construction of a Thai speech corpus. The POS-tagged corpus is the result of collaborative research between the Communications Research Laboratory (CRL) in Japan and the National Electronics and Computer Technology Center (NECTEC) in Thailand, with technical support from the Electrotechnical Laboratory (ETL) in Japan. In this paper, we propose a new tagset, based on the results of a prior multilingual machine translation project. The corpus is annotated on three levels : the paragraph, sentence, and word levels. Text information is maintained in the form of the text information lines and the number lines, which are both utilized in data retrieval. Both word segmentation and POS tagging were carried out by way of a probabilistic trigram model. Rules for syllable demarkation were additionally used to reduce the number of candidates in computing tagging probabilities.Some typical problems in POS assignment are also formalized to resolve ambiguity.

show abstract

“…However, our framework was defined and tested on the restrictive domain of appointment scheduling. Most of the really difficult cases for article selection, as for example generics, do not occur in this domain, whilst both (Murata and Nagao, 1993) and (Bond et al, 1995) build their theories around the problem of identifying these. There are no statistics on the performance of their systems on a corpus that does not contain any generics.…”

Section: Comparison To Previous Approachesmentioning

confidence: 99%

“…This approach assigns the correct value in 85,5% of the cases when used with the training data, and 68,9% with unseen data. (Bond et al, 1995) show how the percentage of noun phrases generated with correct use of articles and number in a Japanese to English machine translation system can be increased by applying heuristic rules to distinguish between 'generic', 'referential' and 'ascriptive' uses of noun phrases. These rules are ordered in a hierarchical manner, with later rules over-ruling earlier ones.…”

Section: Introductionmentioning

confidence: 99%

Definiteness predictions for Japanese noun phrases

Heine

1998

Proceedings of the 17th International Conference on Computational Linguistics -

View full text Add to dashboard Cite

One of the major problems when translating from Japanese into a European language such as German or English is to determine definiteness of noun phrases in order to choose the correct determiner in the target language. Even though in Japanese, noun phrase reference is said to depend in large parts on the discourse context, we show that in many cases there also exist linguistic markers for definiteness. We use these to build a rule hierarchy that predicts 79,5% of the articles with an accuracy of 98,9% from syntactic-semantic properties alone, yielding an efficient pre-processing tool for the computationally expensive context checking.

show abstract

Classifiers in Japanese-to-English machine translation

Abstract: This l)a.i)cr t)rot)oses ;m mmlysis of classifters into ['our ma,jor l;yl)

Cited by 18 publications

References 10 publications

Can machine translation systems be evaluated by the crowd alone

Can machine translation systems be evaluated by the crowd alone

Building a Thai part-of-speech tagged corpus (ORCHID).

Definiteness predictions for Japanese noun phrases

Contact Info

Product

Resources

About