Speech activity detection (or endpointing) is an important processing step for applications such as speech recognition, language identification and speaker diarization. Both audio-and vision-based approaches have been used for this task in various settings, often tailored toward end applications. However, much of the prior work reports results in synthetic settings, on taskspecific datasets, or on datasets that are not openly available. This makes it difficult to compare approaches and understand their strengths and weaknesses. In this paper, we describe a new dataset which we will release publicly containing densely labeled speech activity in YouTube videos 1 , with the goal of creating a shared, available dataset for this task. The labels in the dataset annotate three different speech activity conditions: clean speech, speech co-occurring with music, and speech cooccurring with noise, which enable analysis of model performance in more challenging conditions based on the presence of overlapping noise. We report benchmark performance numbers on AVA-Speech using off-the-shelf, state-of-the-art audio and vision models that serve as a baseline to facilitate future research.
Since the Web Content Accessibility Guidelines 1.0 (WCAG) became a W3C recommendation in May 1999, the Web has changed dramatically. This paper describes some of the major issues encountered because of these changes and the approaches developed to address them in WCAG 2.0.
This paper addresses the problem of supporting accessibility in applications that run in multiple operating environments. It analyzes the commonalities of existing platform-specific Accessibility APIs, and defines a platform-independent accessibility API, the Accessible DOM.The Accessible DOM encompasses the features of existing APIs and overcomes the limitations of existing APIs to express dynamic, complex document contents.The Accessible DOM can be used to support existing and future platform-specific accessibility APIs. It will also allow the development of platform-independent accessibility clients.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.