In this paper, we introduce two advancements in the automatic keyphrase extraction (AKE) space -KeyGames and pke+. KeyGames is an unsupervised AKE framework that employs the concept of evolutionary game theory and consistent labelling problem to ensure consistent classification of candidates into keyphrase and non-keyphrase. Pke+ is a python based pipeline built on top of the existing pke library to standardize various AKE steps, namely candidate extraction and evaluation, to ensure truly systematic and comparable performance analysis of AKE models.In the experiments section, we compare the performance of KeyGames across three publicly available datasets (Inspec 2001, SemEval 2010, DUC 2001 against the results quoted by the existing state-of-the-art models as well as their performance when reproduced using pke+. The results show that KeyGames outperforms most of the state-of-the-art systems while generalizing better on input documents with different domains and length. Further, pke+'s pre-processing brings out improvement in several other system's quoted performance as well.
In this article, we propose a system called “UTTAM,” for correcting spelling errors in Hindi language text using supervised learning. Unlike other languages, Hindi contains a large set of characters, words with inflections and complex characters, phonetically similar sets of characters, and so on. The complexity increases the possibility of confusion and occasionally leads to entering a wrong character in a word. The existence of spelling errors in text significantly decreases the accuracy of the available resources, like search engine, text editor, and so on. The proposed work is the first approach to correct non-word (Out of Vocabulary) errors as well as real-word errors simultaneously in a sentence of Hindi language. The proposed method investigates the human behavior, i.e., the type and frequency of spelling errors done by humans in Hindi text. Based on the type and frequency of spelling errors, the heterogeneous data is collected in matrices. This data in matrices is used to generate the suitable candidate words for an input word. After generating candidate words, the Viterbi algorithm is applied to perform the word correction. The Viterbi algorithm finds the best sequence of candidate words to correct the input sentence. For Hindi, this work is the first attempt for real-word error correction. For non-word errors, the experiments show that “UTTAM” performs better than the existing systems SpellGuru and Saksham.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.