The present research reports on the use of data mining techniques for differentiating between translated and non-translated original Chinese based on monolingual comparable corpora. We operationalized seven entropy-based metrics including character, wordform unigram, wordform bigram and wordform trigram, POS (Part-of-speech) unigram, POS bigram and POS trigram entropy from two balanced Chinese comparable corpora (translated vs non-translated) for data mining and analysis. We then applied four data mining techniques including Support Vector Machines (SVMs), Linear discriminant analysis (LDA), Random Forest (RF) and Multilayer Perceptron (MLP) to distinguish translated Chinese from original Chinese based on these seven features. Our results show that SVMs is the most robust and effective classifier, yielding an AUC of 90.5% and an accuracy rate of 84.3%. Our results have affirmed the hypothesis that translational language is categorically different from original language. Our research demonstrates that combining information-theoretic indicator of Shannon’s entropy together with machine learning techniques can provide a novel approach for studying translation as a unique communicative activity. This study has yielded new insights for corpus-based studies on the translationese phenomenon in the field of translation studies.
Recently, deep neural networks have achieved remarkable progress in class balancing instance segmentation. However, most applications in the real world have a long-tailed distribution, i.e., limited training examples in the majority of classes. The long-tailed challenge leads to a catastrophic drop in instance segmentation because the gradient of the head classes suppresses the gradient of the tail classes, leading to a bias towards the major classes. We propose LiCAM, a novel framework for long-tailed segmentation. It features an adaptive loss function named Moac Loss, which is adjustable during the training according to the monitored classification accuracy. LiCAM also cooperates with an oversampling technique named RFS, which alleviates the severe imbalance between head and tail classes. We conducted extensive experiments on the LVIS v1 dataset to evaluate LiCAM. With a coherent end-to-end training pipeline, LiCAM significantly outperforms other baselines.
The blocks relocation problem (BRP) is a well known and important combinatorial optimization problem, in which the initial storage state and retrieval priority of containers are known, and the containers should be picked up in the retrieval order with the goal of minimizing container relocations. This paper studied how machine-learning techniques guide the solution of this Non-deterministic Polynomial problem(NP-hard problem) NP-hard problem. Through our self-developed data generator, we generated initial state stacking matrices and extracted 22 influencing factors for container relocation operations. The supervised learning method and attribution technique were used to verify the relationship between significant container relocated influence features and the number of container relocations using the unrestricted BRP and restricted BRP models. We characterize the potential patterns in the data based on 22 container relocated influence features using four supervised learning models: random forest (RF), extra trees (ET), support vector machine (SVM), and logistic regression (LR). The experimental results demonstrate that RF has a classification accuracy rate of up to 94% on the restricted BRP model. The attribution technique identifies the most sensitive features to the number of container relocations. We organically integrate machine learning into the BRP problem and propose an interactive iterative framework that may provide a novel method for studying the BRP problem.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.