We propose using active learning for extractive speech summarization in order to reduce human effort in generating reference summaries. Active learning chooses a selective set of samples to be labeled. We propose a combination of informativeness and representativeness criteria for selection. We further propose a semiautomatic method to generate reference summaries for presentation speech by using Relaxed Dynamic Time Warping (RDTW) alignment between presentation speech and its accompanied slides. Our summarization results show that the amount of labeled data needed for a given summarization accuracy can be reduced by more than 23% compared to random sampling.
ACM Reference Format:Zhang, J. J. and Fung, P. 2012. Active learning with semi-automatic annotation for extractive speech summarization.
Extractive summarization of conference and lecture speech is useful for online learning and references. We show for the first time that deep(er) rhetorical parsing of conference speech is possible and helpful to extractive summarization task. This type of rhetorical structures is evident in the corresponding presentation slide structures. We propose using Hidden Markov SVM (HMSVM) to iteratively learn the rhetorical structure of the speeches and summarize them. We show that system based on HMSVM gives a 64.3% ROUGE-L F-measure, a 10.1% absolute increase in lecture speech summarization performance compared with the baseline system without rhetorical information. Our method equally outperforms the baseline with a conventional discourse feature. Our proposed approach is more efficient than and also improves upon a previous method of using shallow rhetorical structure parsing [1].
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.