Purpose Text mining is growing in importance proportionate to the growth of unstructured data and its applications are increasing day by day from knowledge management to social media analysis. Mapping skillset of a candidate and requirements of job profile is crucial for conducting new recruitment as well as for performing internal task allocation in the organization. The automation in the process of selecting the candidates is essential to avoid bias or subjectivity, which may occur while shuffling through thousands of resumes and other informative documents. The system takes skillset in the form of documents to build the semantic space and then takes appraisals or resumes as input and suggests the persons appropriate to complete a task or job position and employees needing additional training. The purpose of this study is to extend the term-document matrix and achieve refined clusters to produce an improved recommendation. The study also focuses on achieving consistency in cluster quality in spite of increasing size of data set, to solve scalability issues. Design/methodology/approach In this study, a synset-based document matrix construction method is proposed where semantically similar terms are grouped to reduce the dimension curse. An automated Task Recommendation System is proposed comprising synset-based feature extraction, iterative semantic clustering and mapping based on semantic similarity. Findings The first step in knowledge extraction from the unstructured textual data is converting it into structured form either as Term frequency–Inverse document frequency (TF-IDF) matrix or synset-based TF-IDF. Once in structured form, a range of mining algorithms from classification to clustering can be applied. The algorithm gives a better feature vector representation and improved cluster quality. The synset-based grouping and feature extraction for resume data optimizes the candidate selection process by reducing entropy and error and by improving precision and scalability. Research limitations/implications The productivity of any organization gets enhanced by assigning tasks to employees with a right set of skills. Efficient recruitment and task allocation can not only improve productivity but also cater to satisfy employee aspiration and identifying training requirements. Practical implications Industries can use the approach to support different processes related to human resource management such as promotions, recruitment and training and, thus, manage the talent pool. Social implications The task recommender system creates knowledge by following the steps of the knowledge management cycle and this methodology can be adopted in other similar knowledge management applications. Originality/value The efficacy of the proposed approach and its enhancement is validated by carrying out experiments on the benchmarked dataset of resumes. The results are compared with existing techniques and show refined clusters. That is Absolute error is reduced by 30 per cent, precision is increased by 20 per cent and dimensions are lowered by 60 per cent than existing technique. Also, the proposed approach solves issue of scalability by producing improved recommendation for 1,000 resumes with reduced entropy.
Implementing supervised machine learning on the Hindi corpus for classification and prediction of verses is an untouched and useful area. Classifying and predictions benefits many applications like organizing a large corpus, information retrieval and so on. The metalinguistic facility provided by websites makes Hindi as a major language in the digital domain of information technology today. Text classification algorithms along with Natural Language Processing (NLP) facilitates fast, cost-effective, and scalable solution. Performance evaluation of these predictors is a challenging task. To reduce manual efforts and time spent for reading the document, classification of text data is important. In this paper, 697 Hindi poems are classified based on four topics using four eager machine-learning algorithms. In the absence of any other technique, which achieves prediction on Hindi corpus, misclassification error is used and compared to prove the betterment of the technique. Support vector machine performs best amongst all.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.