Previous research has demonstrated the utility of clustering in inducing semantic verb classes from undisambiguated corpus data. We describe a new approach which involves clustering subcategorization frame (SCF) distributions using the Information Bottleneck and nearest neighbour methods. In contrast to previous work, we particularly focus on clustering polysemic verbs. A novel evaluation scheme is proposed which accounts for the effect of polysemy on the clusters, offering us a good insight into the potential and limitations of semantically classifying undisambiguated SCF data.
We present a method for identifying corresponding themes across several corpora that are focused on related, but distinct, domains. This task is approached through simultaneous clustering of keyword sets extracted from the analyzed corpora. Our algorithm extends the informationbottleneck soft clustering method for a suitable setting consisting of several datasets. Experimentation with topical corpora reveals similar aspects of three distinct religions. The evaluation is by way of comparison to clusters constructed manually by an expert.
This work addresses the task of identifying thematic correspondences across subcorpora focused on different topics. We introduce an unsupervised algorithmic framework based on distributional data clustering, which generalizes previous initial works on this task. The empirical results reveal interesting commonalities of different religions. We evaluate the results through measuring the overlap of our clusters with clusters compiled manually by experts. The tested variants of our framework are shown to outperform alternative methods applicable to the task.
This article studies the task of discovering correspondences across related domains based on real-world data collections. We address this task through a designated extension of distributional data-clustering methods. The method is empirically demonstrated on synthetic data as well as on texts addressing different religions, where the goal is to identify commonalities shared by all religions. This article generalises and demonstrates the empirical improvement relative to our previous studies on this subject, as well as to other comparable methods.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.