Accelerating Recurrent Neural Network Language Model Based Online Speech Recognition System

Lee, Kyung-Min; Park, Chiyoun; Kim, Nam Hoon; Lee, Jaewon

doi:10.1109/icassp.2018.8462334

Cited by 18 publications

(11 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…RNNs have been used for a variety of tasks, both regression and classification, such as natural language processing (Li and Xu 2018), speech recognition (Lee et al 2018), in clinical application (Tomašev et al 2019), and more recently, activity recognition from accelerometer data (Edel and Köppe 2016; Guan and Plötz 2017) and modeling of long-term human activity (Kim et al 2017). Because PAEE is influenced by past activities (lag effect), RNNs could be a suitable modeling candidate for tackling the challenge of PAEE estimation.…”

Section: Modeling Architecturementioning

confidence: 99%

A recurrent neural network architecture to model physical activity energy expenditure in older people

Paraschiakos

Sá

Okai

et al. 2022

Data Min Knowl Disc

View full text Add to dashboard Cite

Through the quantification of physical activity energy expenditure (PAEE), health care monitoring has the potential to stimulate vital and healthy ageing, inducing behavioural changes in older people and linking these to personal health gains. To be able to measure PAEE in a health care perspective, methods from wearable accelerometers have been developed, however, mainly targeted towards younger people. Since elderly subjects differ in energy requirements and range of physical activities, the current models may not be suitable for estimating PAEE among the elderly. Furthermore, currently available methods seem to be either simple but non-generalizable or require elaborate (manual) feature construction steps. Because past activities influence present PAEE, we propose a modeling approach known for its ability to model sequential data, the recurrent neural network (RNN). To train the RNN for an elderly population, we used the growing old together validation (GOTOV) dataset with 34 healthy participants of 60 years and older (mean 65 years old), performing 16 different activities. We used accelerometers placed on wrist and ankle, and measurements of energy counts by means of indirect calorimetry. After optimization, we propose an architecture consisting of an RNN with 3 GRU layers and a feedforward network combining both accelerometer and participant-level data. Our efforts included switching mean to standard deviation for down-sampling the input data and combining temporal and static data (person-specific details such as age, weight, BMI). The resulting architecture produces accurate PAEE estimations while decreasing training input and time by a factor of 10. Subsequently, compared to the state-of-the-art, it is capable to integrate longer activity data which lead to more accurate estimations of low intensity activities EE. It can thus be employed to investigate associations of PAEE with vitality parameters of older people related to metabolic and cognitive health and mental well-being.

show abstract

Section: Modeling Architecturementioning

confidence: 99%

A recurrent neural network architecture to model physical activity energy expenditure in older people

Paraschiakos

Sá

Okai

et al. 2022

Data Min Knowl Disc

View full text Add to dashboard Cite

show abstract

“…Toshniwal et al (2018) proposed a single end-to-end speech recognition system that works on 9 different Indian languages. Lee et al (2018) presented methods to accelerate RNNs language models for online speech recognition systems.…”

Section: Fig 1 Basic Structure Of Rnnsmentioning

confidence: 99%

LSTM Hiperparametrelerinin Ses Tanıma Performansına olan Etkilerinin Araştırılması

Dokuz¹,

Tüfekçi²

2020

European Journal of Science and Technology

View full text Add to dashboard Cite

Öz Bilgisayara dayalı hesaplamalı metotlar ve donanım teknolojilerindeki gelişmelerle birlikte, bilgisayarlar ses tanıma ve görüntü işleme gibi zor görevlerin üstesinden gelme konusunda daha güçlü hale gelmiştir. Ses tanıma, hesaplamalı veya analitik yöntemler kullanarak ses sinyallerinin metinsel karşılığını çıkarma görevidir. Ses tanıma aksanlar ve diller arasındaki değişkenlikler, güçlü donanım gereksinimleri, doğru modellerin üretilebilmesi için büyük veri setlerine olan ihtiyaç ve ses kalitesini etkileyen çevresel faktörlerden dolayı zor bir problemdir. Son yıllarda, Grafiksel İşleme Birimleri gibi donanım cihazlarının yükselen veri işleme yetenekleri yardımıyla derin öğrenme metotları, özellikle Özyinelemeli Sinir Ağları (ÖSA-Recurrent Neural Networks, RNN) ve RNN'in bir varyantı olan LSTM (Long Short Term Memory-Uzun Kısa Dönem Hafıza), ses tanıma alanında çok yaygın ve kabul gören metotlar haline gelmişlerdir. Literatürde, RNN ve LSTM ses tanıma ve ses tanımanın uygulamaları için katman sayısı, gizli katman sayısı ve yığın boyutu gibi çeşitli parametrelerle kullanılmaktadır. Kullanılan bu parametre değerlerin hangi kriterlere göre seçildiği ve bu parametre değerlerinin daha sonraki çalışmalarda da kullanılabilirliği ise incelenmemiştir. Bu çalışmada, LSTM hiperparametrelerinin ses tanıma performansına olan etkileri hata oranları ve derin mimari maliyeti dikkate alınarak incelenmiştir. Her bir parametre ayrı olarak değerlendirilmiş ve bu esnada diğer parametreler sabit tutulmuş ve parametrelerin ses verisi üzerindeki etkisi gözlemlenmiştir. Deneysel sonuçlarda, daha düşük hata oranları ve daha iyi ses tanıma performansı elde edebilmek için her parametrenin seçilen eğitim seti için farklı değerlere sahip olduğu görülmüştür. Bu çalışmanın sonuçlarına göre, LSTM için en uygun parametrelerin seçilmesinden önce ses veri kümesi üzerinde farklı deneyler yapılarak her bir parametre için en uygun değerin bulunması gerektiği gözlemlenmiştir.

show abstract

“…Despite using feed-forward neural LMs in decoding, empirical results showed significant relative improvements both in speed and accuracy. Other relevant contributions addressing one-pass decoding with neural LMs have focused on heuristics to reduce the number of queries to the model and catching network states [20], alternative one-pass decoding strategies such as on-the-fly rescoring [21], improving CPU-GPU communications [22] and, more recently, combining Gated Recurrent Units with more efficient objective functions, such as Noise Contrastive Estimation [23]. Certainly different from these contributions, other authors have explored the idea of converting neural LMs, either recurrent or not, into ngram models that can thus be smoothly integrated into a conventional decoder [24], [25].…”

Section: Introductionmentioning

confidence: 99%

Live Streaming Speech Recognition Using Deep Bidirectional LSTM Acoustic Models and Interpolated Language Models

Jorge

Giménez

Silvestre-Cerdà

et al. 2022

IEEE/ACM Trans. Audio Speech Lang. Process.

View full text Add to dashboard Cite

Although Long-Short Term Memory (LSTM) networks and deep Transformers are now extensively used in offline ASR, it is unclear how best offline systems can be adapted to work with them under the streaming setup. After gaining considerable experience on this regard in recent years, in this paper we show how an optimized, low-latency streaming decoder can be built in which bidirectional LSTM acoustic models, together with general interpolated language models, can be nicely integrated with minimal perfomance degradation. In brief, our streaming decoder consists of a one-pass, real-time search engine relying on a limited-duration window sliding over time and a number of ad hoc acoustic and language model pruning techniques. Extensive empirical assessment is provided on truly streaming tasks derived from the well-known LibriSpeech and TED talks datasets, as well as from TV shows on a main Spanish broadcasting station.

show abstract

Accelerating Recurrent Neural Network Language Model Based Online Speech Recognition System

Cited by 18 publications

References 15 publications

A recurrent neural network architecture to model physical activity energy expenditure in older people

A recurrent neural network architecture to model physical activity energy expenditure in older people

LSTM Hiperparametrelerinin Ses Tanıma Performansına olan Etkilerinin Araştırılması

Live Streaming Speech Recognition Using Deep Bidirectional LSTM Acoustic Models and Interpolated Language Models

Contact Info

Product

Resources

About