The ease of creating digital content coupled with technological advancements allows institutions and organizations to further embrace distance learning. Teaching materials also receive attention, because it is difficult for the student to obtain adequate didactic material, being necessary a high effort and knowledge about the material and the repository. This work presents a framework that enables the automatic metadata generation for materials available in educational video repositories. Each module of the framework works autonomously and can be used in isolation, complemented by another technique or replaced by a more appropriate approach to the field of use, such as repositories with other types of media or other content.
Increasing videos available in educational content repositories makes searching difficult, and recommendation systems have been used to help students and teachers receive a content of interest. Speech is an important carrier of information in video lectures and is used by content-based video recommendation systems. Although automatic speech recognition (ASR) transcripts have been used in modern video recommendation systems, it is not clear how annotation techniques work with noisy text. This article presents an analysis on a set of semantic annotation techniques when applied to text extracted from video lecture speech and their impact on two tasks: annotation and similarity analysis. Experiments show that topic models have good results in this scenario. Besides, a new benchmark for this task has been created and researchers can use it to evaluate new techniques.
Author name ambiguity is one of the most challenging issues that can compromise the information quality in a scholarly digital library. For years, researchers have been searched for solutions to solve such a problem. Despite the many methods already proposed, the question remains open. In this study, we address the issue of producing a more accurate disambiguation function by means of applying data augmentation in the set of data training. We also propose a SyGAR-based data augmentation approach and evaluate our proposal on three collections commonly used in works about author name disambiguation task. The experimental results showed scenarios where improvements are possible in the author name disambiguation task. The proposal of data augmentation outperforms other data augmentation approach, as well as improves some machine learning techniques that were not specifically designed for the author name disambiguation task.
O crescimento acelerado dos repositórios de conteúdo tem ocasionado à necessidade de melhores mecanismos de indexação e busca, incluindo sistemas de perguntas e respostas. Os usuários ainda enfrentam dificuldades para navegar no grande volume de informações na Web. No entanto, estudos sobre anotação semântica automática permitem a identificação de conteúdos nos repositórios e auxiliam diversos sistemas. Este trabalho propõe um método de processamento de perguntas, por meio da BERT, para a realização da tarefa de anotação semântica, agregando recursos da DBpedia como contexto às perguntas. Os resultados experimentais mostram avanços de até 13% quando comparados ao baseline.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.