Diachronic distributional models track changes in word use over time. In this paper, we propose a deep neural network diachronic distributional model. Instead of modeling lexical change via a time series as is done in previous work, we represent time as a continuous variable and model a word's usage as a function of time. Additionally, we have created a novel synthetic task, which quantitatively measures how well a model captures the semantic trajectory of a word over time. Finally, we explore how well the derivatives of our model can be used to measure the speed of lexical change.
We present a log-linear ranking model for interpreting questions in a virtual patient dialogue system and demonstrate that it substantially outperforms a more typical multiclass classifier model using the same information. The full model makes use of weighted and concept-based matching features that together yield a 15% error reduction over a strong lexical overlap baseline. The accuracy of the ranking model approaches that of an extensively handcrafted pattern matching system, promising to reduce the authoring burden and make it possible to use confidence estimation in choosing dialogue acts; at the same time, the effectiveness of the concept-based features indicates that manual development resources can be productively employed with the approach in developing concept hierarchies.
Insightful findings in political science often require researchers to analyze documents of a certain subject or type, yet these documents are usually contained in large corpora that do not distinguish between pertinent and nonpertinent documents. In contrast, we can find corpora that label relevant documents but have limitations (e.g., from a single source or era), preventing their use for political science research. To bridge this gap, we present adaptive ensembling, an unsupervised domain adaptation framework, equipped with a novel text classification model and time-aware training to ensure our methods work well with diachronic corpora. Experiments on an expert-annotated dataset show that our framework outperforms strong benchmarks. Further analysis indicates that our methods are more stable, learn better representations, and extract cleaner corpora for fine-grained analysis.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.