The paper presents a flexible system for extracting features and creating training and testexamples for solving the all-words sense disambiguation (WSD) task. The system allowsintegrating word and sense embeddings as part of an example description. The system possessestwo unique features distinguishing it from all similar WSD systems—the ability to construct aspecial compressed representation for word embeddings and the ability to construct training andtest sets of examples with different data granularity. The first feature allows generation of data setswith quite small dimensionality, which can be used for training highly accurate classifiers ofdifferent types. The second feature allows generating sets of examples that can be used for trainingclassifiers specialized in disambiguating a concrete word, words belonging to the samepart-of-speech (POS) category or all open class words. Intensive experimentation has shown thatclassifiers trained on examples created by the system outperform the standard baselines formeasuring the behaviour of all-words WSD classifiers.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.