The tree model is well known for expressing the historic evolution of languages. This model has been considered as a method of describing genetic relationships between languages. Nevertheless, some researchers question the model's ability to predict the proximity between two languages, since it represents genetic relatedness rather than linguistic resemblance. Defining other language proximity models has been an active research area for many years. In this paper we explore a part-of-speech model for defining proximity between languages using a multilingual language model that was fine-tuned on the task of cross-lingual part-of-speech tagging. We train the model on one language and evaluate it on another; the measured performance is then used to define the proximity between the two languages. By further developing the model, we show that it can reconstruct some parts of the tree model.
Motivation Log rank test is a widely used test that serves to assess the statistical significance of observed differences in survival, when comparing two or more groups. The log rank test is based on several assumptions that support the validity of the calculations. It is naturally assumed, implicitly, that no errors occur in the labeling of the samples. That is - that the mapping between samples and groups is perfectly correct. In this work we investigate how test results may be affected when considering some errors in the original labeling. Results We introduce and define the uncertainty that arises from labeling errors in log rank test. In order to deal with this uncertainty, we develop a novel algorithm for efficiently calculating a stability interval around the original log rank p-value and prove its correctness. We demonstrate our algorithm on several datasets. Availability We provide a Python implementation, called LoRSI, for calculating the stability interval using our algorithm. https://github.com/YakhiniGroup/LoRSI. Supplementary information Supplementary data are available at Bioinformatics online.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.