A digital library is a type of information retrieval (IR) system. The existing IR methodologies generally have problems on keyword searching. Some of search engine has not been able to provide search results with partial matching and typographical error. Therefore, it is required to be able to provide search results that are relevant to keywords provided by the user. We proposed a model to solve the problem by combining the spell correction and query expansion. Searching is starting with indexing the title of the document by preprocessing the title of all incoming document data and then weighting the Term Frequency -Inverse Document Frequency (TF-IDF) against all terms of the whole document. Levenshtein Distance algorithm is used in the search process to correct typo-indicated keywords. Before calculating the relevance between the keywords and the documents using Cosine Similarity, the keywords are expanded using Query Expansion to increase number of documents retrieved. Calculation results using Cosine Similarity are then added to Query Expansion weight calculation to get final ranking result. Results show improvements over IR system compared with system without spell check and query expansion. The results of the study in the form of web-based application conducted testing for 50 times with number of data of 2,045. The system was able to correct typo-indicated keywords and search documents with average recall value of 95.91%, average precision value of 63.82% and average Non Interpolated Average Precision (NIAP) value of 86.29%.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.