SUMMARYIncreasing the number of closed-captioned television programs represents a social responsibility in the sense of providing information. In terms of the system to create closed-captioned television programs by hand, there is considerable hope that the time involved can be reduced and the burden on workers can be eased. The system the authors report on automates three processes in the creation of closed-captioned television programs: summarization, synchronization, and closed-captioned screen creation, yielding from an electronic manuscript closed-caption data applicable to current closed-captioned broadcasts. The authors created closed captions for 12 types of news programs and one documentary program, confirming that the process of creating a closed-captioned television program could be completed in three to six times the program length, excluding the process of creating the electronic manuscript and testing/editing. The authors demonstrate the validity of their system insofar as the time needed to create closed captions using their system was about 70% of the time needed to create closed captions by hand, excluding the process of testing and editing.
SUMMARYThis paper considers a technique for prerecorded TV programs in which captions for the hearing-impaired are automatically superimposed on the basis of the program VTR and the advance electronic script. A method of detecting the caption presentation timing by detecting the synchronization timing for the speech and captions is described. For broadcast speech on which background sound is superimposed, it is difficult to achieve high detection accuracy by timing detection based only on a phoneme HMM word spotter. Consequently, this paper proposes the following method. For each sentence in the program script, multiple timing candidates are detected by word spotting. The optimal timing for the whole program is selected by using dynamic programming based on three scores (the time-order in the manuscript, the ratio of the utterance times estimated from the manuscript, and the likelihood of being speech) in addition to the acoustic likelihood. An evaluation experiment was performed on 10 sessions of a documentary program. Assuming tolerable detection errors of 1 and 3 seconds, detection rates of 99.0 and 99.7% were obtained, respectively, indicating that the method is of practical value.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.