A Empirical Study on the Status of Software Localization in Open Source Projects

Domain relevant data and an adequate number of samples are necessary to properly evaluate the robustness of the Machine Learning (ML) models. This is the case for ML models used in the software localization translation task. In general, Neural Machine Translation (NMT) models are used in software localization by automating the translation process of textual content to consider specific linguistic aspects and culture. However, unlike general machine translation which can easily utilize translation corpus for model training and testing, domain-specific machine translation faces a major obstacle due to the scarcity of domain-specific translation data. In the absence of adequate data, this paper first presents a method to generate test samples based on a text generation Large Language Model (LLM) approach. Based on the generated samples, we run tests to assess the robustness of an NMT translation model. The evaluation indicates that human judgment is important to check if the generated text is robust and coherent under different conditions. The evaluation also demonstrates that the generated samples were crucial to show some limitations related to the model’s effectiveness in software localization translation. Basically we discuss issues in specific situations such as date, time formats, numeric representations and measurement units.

show abstract

A Empirical Study on the Status of Software Localization in Open Source Projects

Cited by 2 publications

References 8 publications

Domain-specific machine translation with recurrent neural network for software localization

Domain-specific machine translation with recurrent neural network for software localization

Generation of test datasets using LLM - Quality Assurance Perspective

Contact Info

Product

Resources

About