This paper describes the technical and system building advances made to the Google Home multichannel speech recognition system, which was launched in November 2016. Technical advances include an adaptive dereverberation frontend, the use of neural network models that do multichannel processing jointly with acoustic modeling, and Grid-LSTMs to model frequency variations. On the system level, improvements include adapting the model using Google Home specific data. We present results on a variety of multichannel sets. The combination of technical and system advances result in a reduction of WER of 8-28% relative compared to the current production system.
This paper reports our progress in developing techniques for "parsing" raw motion data from a simple surgical task into a labeled sequence of surgical gestures. The ability to automatically detect and segment surgical motion can be useful in evaluating surgical skill, providing surgical training feedback, or documenting essential aspects of a procedure. If processed online, the information can be used to provide context-specific information or motion enhancements to the surgeon. However, in every case, the key step is to relate recorded motion data to a model of the procedure being performed. Robotic surgical systems such as the da Vinci system from Intuitive Surgical provide a rich source of motion and video data from surgical procedures. The application programming interface (API) of the da Vinci outputs 192 kinematics values at 10 Hz. Through a series of feature-processing steps, tailored to this task, the highly redundant features are projected to a compact and discriminative space. The resulting classifier is simple and effective. Cross-validation experiments show that the proposed approach can achieve accuracies higher than 90% when segmenting gestures in a 4-throw suturing task, for both expert and intermediate surgeons. These preliminary results suggest that gesture-specific features can be extracted to provide highly accurate surgical skill evaluation.
This paper reports our progress in developing techniques for "parsing" raw motion data from a simple surgical task into a labeled sequence of surgical gestures. The ability to automatically detect and segment surgical motion can be useful in evaluating surgical skill, providing surgical training feedback, or documenting essential aspects of a procedure. If processed online, the information can be used to provide context-specific information or motion enhancements to the surgeon. However, in every case, the key step is to relate recorded motion data to a model of the procedure being performed. Robotic surgical systems such as the da Vinci system from Intuitive Surgical provide a rich source of motion and video data from surgical procedures. The application programming interface (API) of the da Vinci outputs 192 kinematics values at 10 Hz. Through a series of feature-processing steps, tailored to this task, the highly redundant features are projected to a compact and discriminative space. The resulting classifier is simple and effective.Cross-validation experiments show that the proposed approach can achieve accuracies higher than 90% when segmenting gestures in a 4-throw suturing task, for both expert and intermediate surgeons. These preliminary results suggest that gesture-specific features can be extracted to provide highly accurate surgical skill evaluation.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.