Natural Language Inference (NLI) datasets contain examples with highly ambiguous labels. While many research works do not pay much attention to this fact, several recent efforts have been made to acknowledge and embrace the existence of ambiguity, such as UNLI and ChaosNLI. In this paper, we explore the option of training directly on the estimated label distribution of the annotators in the NLI task, using a learning loss based on this ambiguity distribution instead of the goldlabels. We prepare AmbiNLI, a trial dataset obtained from readily available sources, and show it is possible to reduce ChaosNLI divergence scores when finetuning on this data, a promising first step towards learning how to capture linguistic ambiguity. Additionally, we show that training on the same amount of data but targeting the ambiguity distribution instead of gold-labels can result in models that achieve higher performance and learn better representations for downstream tasks.
Parkinson’s disease (PD) is often detected only in later stages, when about 50% of nigrostriatal dopaminergic projections have already been lost. Thus, there is a need for biomarkers to monitor the earliest phases, especially for those that are at higher risk. In this work, we explore the use of machine learning methods to diagnose PD by analyzing gait alterations via an inertial sensors system that participants in the study wear while walking down a 15 m long corridor in three different scenarios. To achieve this goal, we have trained six well-known machine learning models: support vector machines, logistic regression, neural networks, k nearest neighbors, decision trees and random forest. We thoroughly explored several ways to mitigate the problems derived from the small amount of available data. We found that, while achieving accuracy rates of over 70% is quite common, the accuracy of the best model trained is only slightly above the 80% mark. This model has high precision and specificity (over 90%), but lower sensitivity (only 71%). We believe that these results are promising, especially given the size of the population sample (41 PD patients and 36 healthy controls), and that this research venue should be further explored.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.