Recent work on the problem of detecting synonymy through corpus analysis has used the Test of English as a Foreign Language (TOEFL) as a benchmark. However, this test involves as few as 80 questions, prompting questions regarding the statistical significance of reported results. We overcome this limitation by generating a TOEFL-like test using WordNet, containing thousands of questions and composed only of words occurring with sufficient corpus frequency to support sound distributional comparisons. Experiments with this test lead us to a similarity measure which significantly outperforms the best proposed to date. Analysis suggests that a strength of this measure is its relative robustness against polysemy.
We present a fast yet highly effective stochastic algorithm, Simmered Greedy Optimization (SG(N)) for solving the coclustering problem: to simultaneously cluster two finite sets by maximizing the mutual information between the clusterings. (Clustering one set by this criterion is a special case.) This is a combinatorial optimization problem of great interest for deriving maximally predictive feature sets. Co-clustering has found applications in many areas, particularly statistical natural language processing and bioinformatics. We report results of tests on a suite of statistical natural language problems, comparing SG(N) with simulated annealing and a publicly available implementation of co-clustering. In all cases we obtain superior results with far less computation using SG(N).
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.