Real-time captioning by groups of non-experts

Lasecki, Walter S.; Miller, Christopher D.; Sadilek, Adam; Abumoussa, Andrew; Borrello, Donato; Kushalnagar, Raja; Bigham, Jeffrey P.

doi:10.1145/2380116.2380122

Cited by 167 publications

(115 citation statements)

References 17 publications

Supporting

Mentioning

114

Contrasting

Unclassified

Order By: Relevance

“…The same dataset used by (Lasecki et al, 2012) and (Naim et al, 2013). Each audio clip is transcribed by 10 non-expert human workers in real time.…”

Section: Resultsmentioning

confidence: 99%

“…Out of the systems in Figure 2, the first three systems consist of sliding alignment window algorithm with different values of keep-length parameter: (1) keep-length = 0.5; (2) keep-length = 0.67; and (3) keep-length = 0.85. The other systems are the graph-based algorithm of (Lasecki et al, 2012), the MUSCLE algorithm of (Edgar, 2004), and the most accu-rate fixed alignment window algorithm of (Naim et al, 2013). We set the heuristic weight parameter (w) to 3 and the chunk size parameter (c) to 5 seconds for all the three sliding window systems and the fixed window system.…”

Section: Resultsmentioning

confidence: 99%

“…This approach has been shown to dramatically outperform ASR in terms of both accuracy and Word Error Rate (WER) (Lasecki et al, 2012;Naim et al, 2013). Furthermore, recall of individual words irrespective of their order approached and even exceeded that of a trained expert stenographer with seven workers contributing, suggesting that the information is present to meet the performance of a stenographer (Lasecki et al, 2012). However, aligning these individual words in the correct sequential order remains a challenging problem.…”

Section: Introductionmentioning

confidence: 99%

“…However, aligning these individual words in the correct sequential order remains a challenging problem. Lasecki et al (2012) addressed this alignment problem using off-the-shelf multiple sequence alignment tools, as well as an algorithm based on incrementally building a precedence graph over output words. Improved results for the alignment problem were shown using weighted A * search by Naim et al (2013).…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Sliding Alignment Windows for Real-Time Crowd Captioning

Kazemi

Lavaee

Naim

et al. 2014

Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

View full text Add to dashboard Cite

The primary way of providing real-time speech to text captioning for hard of hearing people is to employ expensive professional stenographers who can type as fast as natural speaking rates. Recent work has shown that a feasible alternative is to combine the partial captions of ordinary typists, each of whom is able to type only part of what they hear. In this paper, we extend the state of the art fixed-window alignment algorithm (Naim et al., 2013) for combining the individual captions into a final output sequence. Our method performs alignment on a sliding window of the input sequences, drastically reducing both the number of errors and the latency of the system to the end user over the previously published approaches.

show abstract

“…The same dataset used by (Lasecki et al, 2012) and (Naim et al, 2013). Each audio clip is transcribed by 10 non-expert human workers in real time.…”

Section: Resultsmentioning

confidence: 99%

Section: Resultsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Sliding Alignment Windows for Real-Time Crowd Captioning

Kazemi

Lavaee

Naim

et al. 2014

Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

View full text Add to dashboard Cite

show abstract

“…OKT teknolojisi kullanımının yaygınlaşmasıyla artık günümüzde alt yazı oluşturmada bu teknolojiden yararlanılmaktadır. Gerçek zamanlı alt yazılar ise işitsel konuşmanın okul, toplantı, gündelik sohbet ve diğer gerçek zamanlı durumlarda işitme engelliler için görsel metin oluşturulması için kullanılmaktadır [23].…”

Section: Speech Recognition Technology and Generating Captions Using unclassified

Canlı İnternet Yayınları İçin Otomatik Konuşma Tanıma Tekniği Kullanılarak Alt Yazı Oluşturulması

Koruyan

2015

BTD

View full text Add to dashboard Cite

Özet-Televizyon, sinema veya diğer farklı görüntülerde konuşulan dili, çevirisini gösterme veya görüntü ve ses hakkında metin şeklinde bilgi verme amaçlı uygulanan alt yazı tekniği, 1900'lerin başlarında kullanılmaya başlanmış ve gelişerek günümüzdeki haline gelmiştir. Bilişimdeki gelişmeler ise alt yazı tekniklerinin ilerlemesine büyük katkı sağlamış, özellikle konuşmaların metne dönüştürülmesi konuşma tanıma teknikleri ile daha kolay bir hale gelmiştir. Bunun yanında, işitme engelliler için kullanılan alt yazılar ise özellikle canlı yayınlarda konuşma tanıma tekniği ile işaret diline alternatif olarak kullanılmaktadır. Bu teknik daha çok ticari amaçlı özel donanım ve yazılımlarla beraber kullanılmakta, bireysel kullanım veya küçük ölçekli kurumlar için maliyet oluşturmaktadır. 2011'de Google Chrome'un Türkçe'yi de destekleyen sesle aramayı dünyaya duyurması ise bu çalışmanın çıkış noktasını oluşturmuştur. Bu çalışmada, bir medya sunucusu yardımıyla internet sayfasında canlı yayınlanan bir videodaki konuşmaların Google'ın desteklediği açık kaynak kodlu Web Speech API kullanılarak metne dönüştürülmesi ve anlık alt yazı haline getirilmesi uygulaması anlatılmaktadır. Çalışmada, web sayfasında video yayını HTML5 dilinin getirdiği video elementi ile sağlanmakta, web uygulaması JavaScript ve PHP programlama dilleri ve jQuery kütüphanesi kullanılarak yazılmıştır.Anahtar Kelimeler-Otomatik Konuşma Tanıma, Canlı İnternet Yayını, Canlı Altyazı, HTML5, İnternet Generating Captions Using Automatic Speech Recognition Technique for Live WebcastsAbstract-Captioning technique used to display speaking language or its translation or to give information about images or sounds on television, cinema or other images as text has been used since the beginning of the 1900s and has developed to take its contemporary form. The development of informatics has greatly contributed to the progress of captioning techniques; it has especially become easier to convert speech to text with the aid of speech recognition. Furthermore, the captions for the hearing impaired, especially with speech recognition technique, is an alternative to sign language on live events. This technique is commercial and predominantly used with special hardware and software, and increases costs for the individual usage or small-sized companies. The announcement of voice search of Google Chrome in 2011 has been the start of this work. In this study, an application converting the speech to the text and displaying the live captions on a video broadcasted live on a web page using Google supported open source Web Speech API with the help of a media server is represented. The broadcast of a video on a web page is performed by the HTML5 video element, and the web application is coded using JavaScript and PHP programming languages and jQuery library.

show abstract

Assistive Technologies

Nicolau

Montague

2019

Human–Computer Interaction Series

View full text Add to dashboard Cite

Real-time captioning by groups of non-experts

Cited by 167 publications

References 17 publications

Sliding Alignment Windows for Real-Time Crowd Captioning

Sliding Alignment Windows for Real-Time Crowd Captioning

Canlı İnternet Yayınları İçin Otomatik Konuşma Tanıma Tekniği Kullanılarak Alt Yazı Oluşturulması

Assistive Technologies

Contact Info

Product

Resources

About