This paper introduces a hidden topic-based framework for processing short and sparse documents (e.g., search result snippets, product descriptions, book/movie summaries, and advertising messages) on the Web. The framework focuses on solving two main challenges posed by these kinds of documents: 1) data sparseness and 2) synonyms/homonyms. The former leads to the lack of shared words and contexts among documents while the latter are big linguistic obstacles in natural language processing (NLP) and information retrieval (IR). The underlying idea of the framework is that common hidden topics discovered from large external data sets (universal data sets), when included, can make short documents less sparse and more topic-oriented. Furthermore, hidden topics from universal data sets help handle unseen data better. The proposed framework can also be applied for different natural languages and data domains. We carefully evaluated the framework by carrying out two experiments for two important online applications (Web search result classification and matching/ranking for contextual advertising) with large-scale universal data sets and we achieved significant results.
This paper presents a study on Predicting Student Performance (PSP) in academic systems. In order to solve the task, we have proposed and investigated different strategies. Specifically, we consider this task as a regression problem and a rating prediction problem in recommender systems. To improve the performance of the former, we proposed the use of additional features based on course-related skills. Moreover, to effectively utilize the outputs of these two strategies, we also proposed a combination of the two methods to enhance the prediction performance. We evaluated the proposed methods on a dataset which was built using the mark data of students in information technology at Vietnam National University, Hanoi (VNU). The experimental results have demonstrated that unlike the PSP in e-Learning systems, the regression-based approach should give better performance than the recommender system-based approach. The integration of the proposed features also helps to enhance the performance of the regression-based systems. Overall, the proposed hybrid method achieved the best RMSE score of 1.668. These promising results are expected to provide students early feedbacks about their (predicted) performance on their future courses, and therefore saving times of students and their tutors in determining which courses are appropriate for students’ ability.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.