In this work, we propose a classifier for distinguishing device-directed queries from background speech in the context of interactions with voice assistants. Applications include rejection of false wake-ups or unintended interactions as well as enabling wake-word free followup queries. Consider the example interaction: "Computer, play music", "Computer, reduce the volume". In this interaction, the user needs to repeat the wake-word (Computer) for the second query. To allow for more natural interactions, the device could immediately re-enter listening state after the first query (without wake-word repetition) and accept or reject a potential follow-up as device-directed or background speech. The proposed model consists of two long short-term memory (LSTM) neural networks trained on acoustic features and automatic speech recognition (ASR) 1-best hypotheses, respectively. A feed-forward deep neural network (DNN) is then trained to combine the acoustic and 1-best embeddings, derived from the LSTMs, with features from the ASR decoder. Experimental results show that ASR decoder, acoustic embeddings, and 1-best embeddings yield an equal-error-rate (EER) of 9.3 %, 10.9 % and 20.1 %, respectively. Combination of the features resulted in a 44 % relative improvement and a final EER of 5.2 %.
New camera technology is allowing avian ecologists to perform detailed studies of avian behavior, nesting strategies and predation in areas where it was previously impossible to gather data. Unfortunately, studies have shown mechanical triggers and a variety of sensors to be inadequate in capturing footage of small predators (e.g., snakes, rodents) or events in dense vegetation. Because of this, continuous camera recording is currently the most robust solution for avian monitoring, especially in ground nesting species. However, continuous video footage results in a data deluge, as monitoring enough nests to make biologically significant inferences results in massive amounts of data which is unclassifiable by humans alone. In the summer of 2012, Dr. Ellis-Felege gathered video footage from 63 sharp-tailed grouse (Tympanuchus phasianellus) nests, as well as preliminary interior least tern (Sternula antillarum) and piping plover (Charadrius melodus) nests, resulting in over 20,000 hours of video footage. In order to effectively analyze this video, a project combining both crowd sourcing and volunteer computing was developed, where volunteers can stream nesting video and report their observations, as well as have their computers download video for analysis by computer vision techniques. This provides a robust way to analyze the video, as user observations are validated by multiple views as well as the results of the computer vision techniques. This work provides initial results analyzing the effectiveness of the crowd sourced observations and computer vision techniques.
Wildlife@Home is citizen science project developed to provide wildlife biologists a way to swiftly analyze the massive quantities of data that they can amass during video surveillance studies. The project has been active for two years, with over 200 volunteers who have participated in providing observations through a web interface where they can stream video and report the occurrences of various events within that video. Wildlife@Home is currently analyzing avian nesting video from three species: Sharptailed-Grouse (Tympanuchus phasianellus) an indicator species which plays a role in determining the effect of North Dakota's oil development on the local wildlife, Interior Least Tern (Sternula antillarum) a federally listed endangered species, and Piping Plover (Charadrius Melodus) a federally listed threatened species. Video comes from 105 grouse, 61 plover and 37 tern nests from multiple nesting seasons, and consists of over 85,000 hours (13 terabytes) of 24/7 uncontrolled outdoor surveillance video. This work describes the infrastructure supporting this citizen science project, and examines the effectiveness of two different interfaces for crowd sourcing: a simpler interface where users watch short clips of video and report if an event occurred within that video, and a more involved interface where volunteers can watch entire videos and provide detailed event information including beginning and ending times for events. User observations are compared against expert observations made by wildlife biology research assistants, and are shown to be quite effective given strategies used in the project to promote accuracy and correctness.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.