Oleg Rybakov scite author profile

Oleg Rybakov

5Publications

107Citation Statements Received

83Citation Statements Given

How they've been cited

140

106

How they cite others

Affiliations

Google (United States), Amazon (United States), Kutafin Moscow State Law University

Publications

Order By: Most citations

Streaming Keyword Spotting on Mobile Devices

Rybakov

Kononenko²,

Subrahmanya

et al. 2020

View full text Add to dashboard Cite

In this work we explore the latency and accuracy of keyword spotting (KWS) models in streaming and non-streaming modes on mobile phones. NN model conversion from non-streaming mode (model receives the whole input sequence and then returns the classification result) to streaming mode (model receives portion of the input sequence and classifies it incrementally) may require manual model rewriting. We address this by designing a Tensorflow/Keras based library which allows automatic conversion of non-streaming models to streaming ones with minimum effort. With this library we benchmark multiple KWS models in both streaming and non-streaming modes on mobile phones and demonstrate different tradeoffs between latency and accuracy. We also explore novel KWS models with multi-head attention which reduce the classification error over the state-of-art by 10% on Google speech commands data sets V2. The streaming library with all experiments is open-sourced. 1

show abstract

Real-Time Speech Frequency Bandwidth Extension

Liu

Tagliasacchi

Rybakov

et al. 2021

View full text Add to dashboard Cite

In this paper we propose a lightweight model for frequency bandwidth extension of speech signals, increasing the sampling frequency from 8kHz to 16kHz while restoring the high frequency content to a level almost indistinguishable from the 16kHz ground truth. The model architecture is based on SEANet (Sound EnhAncement Network), a wave-to-wave fully convolutional model, which uses a combination of feature losses and adversarial losses to reconstruct an enhanced version of the input speech. In addition, we propose a variant of SEANet that can be deployed on-device in streaming mode, achieving an architectural latency of 16ms. When profiled on a single core of a mobile CPU, processing one 16ms frame takes only 1.5ms. The low latency makes it viable for bi-directional voice communication systems.

show abstract

4-bit Conformer with Native Quantization Aware Training for Speech Recognition

Ding¹,

Phoenix²,

He³

et al. 2022

Preprint

View full text Add to dashboard Cite

Low-Bit Quantization and Quantization-Aware Training for Small-Footprint Keyword Spotting

Mishchenko¹,

Gören²,

Sun³

et al. 2019

View full text Add to dashboard Cite

Pareto-Optimal Quantized ResNet Is Mostly 4-bit

Abdolrashidi¹,

Wang²,

Agrawal³

et al. 2021

Preprint

View full text Add to dashboard Cite

Quantization has become a popular technique to compress neural networks and reduce compute cost, but most prior work focuses on studying quantization without changing the network size. Many real-world applications of neural networks have compute cost and memory budgets, which can be traded off with model quality by changing the number of parameters. In this work, we use ResNet as a case study to systematically investigate the effects of quantization on inference compute cost-quality tradeoff curves. Our results suggest that for each bfloat16 ResNet model, there are quantized models with lower cost and higher accuracy; in other words, the bfloat16 compute cost-quality tradeoff curve is Pareto-dominated by the 4-bit and 8-bit curves, with models primarily quantized to 4-bit yielding the best Pareto curve. Furthermore, we achieve stateof-the-art results on ImageNet for 4-bit ResNet-50 with quantization-aware training, obtaining a top-1 eval accuracy of 77.09%. We demonstrate the regularizing effect of quantization by measuring the generalization gap. The quantization method we used is optimized for practicality: It requires little tuning and is designed with hardware capabilities in mind. Our work motivates further research into optimal numeric formats for quantization, as well as the development of machine learning accelerators supporting these formats. As part of this work, we contribute a quantization library written in JAX, which is open-sourced at https : / / github . com / google -research / google-research/tree/master/aqt.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Oleg Rybakov

Streaming Keyword Spotting on Mobile Devices

Real-Time Speech Frequency Bandwidth Extension

4-bit Conformer with Native Quantization Aware Training for Speech Recognition

Low-Bit Quantization and Quantization-Aware Training for Small-Footprint Keyword Spotting

Pareto-Optimal Quantized ResNet Is Mostly 4-bit

Contact Info

Product

Resources

About