Data imbalance, also known as the long-tail distribution of data, is an important challenge for data-driven models. In the Word Sense Disambiguation (WSD) task, the long-tail phenomenon of word sense distribution is more common, making it difficult to effectively represent and identify Long-Tail Senses (LTSs). Therefore exploring representation methods that do not rely heavily on the training sample size is an important way to combat LTSs. Considering that many new states, namely superposition states, can be constructed from several known states in quantum mechanics, superposition states provide the possibility to obtain more accurate representations from inferior representations learned from a small sample size. Inspired by quantum superposition states, a representation method in Hilbert space is proposed to reduce the dependence on large sample sizes and thus combat LTSs. We theoretically prove the correctness of the method, and verify its effectiveness under the standard WSD evaluation framework and obtain state-of-the-art performance. Furthermore, we also test on the constructed LTS and the latest cross-lingual datasets, and achieve promising results.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.