Continuous Punjabi speech recognition model based on Kaldi ASR toolkit

Guglani, Jyoti; Mishra, Achyuta Nand

doi:10.1007/s10772-018-9497-6

Cited by 38 publications

(8 citation statements)

References 18 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…We chose 4 people, 3 women and a 2 mens to pronounce 4 words of the Moroccan dialect which are: ‫)ﺳﻼم(‬ = (Hi), ‫)ﻛﯿﺪاﯾﺮ(‬ = (How are you), ‫)ﻻﺑﺎس(‬ = (There is nothing wrong), ‫)ﺑﺨﯿﺮ(‬ = (Fine) and we recorded the voices of the speakers in files in (.wav) format. later we began our work of the recognition and the lyrics of the Moroccan dialect by the part Learning [11], through ''add a new sound from file'' which invites the user to choose a file (.wav) and classify it by Identity , from ID:1 to ID:5 [15]. we continued the training phase to build a database of files (.wav) with 4 classes, each class represents a welldefined speaker.The speech of eleven male speakers and nine female speakers are used for training, and the speech of one male speaker and three female speakers are used for testing.…”

Section: Resultsmentioning

confidence: 99%

“…These sections are called frames and the motivation for this framing process is the quasistationary nature of the 1-D signals. However, if we examine the signal over discrete sections, which are sufficiently short in duration, then these sections can be considered as stationary and exhibit stable characteristics [9][10][11]. To avoid loss of information, frame overlap is used.…”

Section: Extraction Of Mfccsmentioning

confidence: 99%

See 1 more Smart Citation

Speech Recognition of Moroccan Dialect Using Hidden Markov Models

Bezoui

2019

IJ-AI

View full text Add to dashboard Cite

<p>This paper addresses the development of an Automatic Speech Recognition (ASR) system for the Moroccan Dialect. Dialectal Arabic (DA) refers to the day-to-day vernaculars spoken in the Arab world. In fact, Moroccan Dialect is very different from the Modern Standard Arabic (MSA) because it is highly influenced by the French Language. It is observed throughout all Arab countries that standard Arabic widely written and used for official speech, news papers, public administration and school but not used in everyday conversation and dialect is widely spoken in everyday life but almost never written. we propose to use the Mel Frequency Cepstral Coefficient (MFCC) features to specify the best speaker identification system. The extracted speech features are quantized to a number of centroids using vector quantization algorithm. These centroids constitute the codebook of that speaker. MFCC’s are calculated in training phase and again in testing phase. Speakers uttered same words once in a training session and once in a testing session later. The Euclidean distance between the MFCC’s of each speaker in training phase to the centroids of individual speaker in testing phase is measured and the speaker is identified according to the minimum Euclidean distance. The code is developed in the MATLAB environment and performs the identification satisfactorily.</p>

show abstract

Section: Resultsmentioning

confidence: 99%

Section: Extraction Of Mfccsmentioning

confidence: 99%

Speech Recognition of Moroccan Dialect Using Hidden Markov Models

Bezoui

2019

IJ-AI

View full text Add to dashboard Cite

show abstract

“…It is observed that the Tri3 models outperform Tri2 models and Tri2 models lead to improvements over Tri1 models and all triphone models outperformed the monophone models and MFCC features worked well than PLP features. The Tri2 and Tri3 models using MFCC achieved a best WER of 21.8% and 21.2% [7].…”

Section: Literature Surveymentioning

confidence: 98%

Performance of Isolated and Continuous Digit Recognition System using Kaldi Toolkit

2019

ijrte

View full text Add to dashboard Cite

A digit recognition system is built for recognizing the sequence of digits through 0-9. The system is experimented with speech corpus created in the room environment. The acoustic information to feature representation is achieved using PLP and MFCC features. The system initially utilized the conventional GMM-HMM framework, state of the art hybrid classifier with varied number of states to complete the speech recognition task, i.e., the system is first trained and tested using Monophone models, and system’s recognition accuracy is then evaluated using Triphone Models: Triphone1 models, which was later followed by Triphones2 models and Triphones3 Models. The Ngram Language model is used for both Monophone and Triphone training. The system performance is evaluated with the use of MFCC and PLP parameterisation techniques on Kaldi toolkit. The system performance is evaluated using metrics word error rate (WER) and Word Recognition Accuracy (WRA). The proposed system can be utilized for building speech applications

show abstract

“…Bu özellikler, ASR sistemleri için önerilen ve başarılı bir şekilde kullanılan farklı özellik çıkarım teknikleriyle elde edilmektedir. Özellik çıkarımı ve dalga formunu okuyabilmek için Kaldi, standart Mel Frekanslı Cepstral Katsayıları (MFCC: Mel-Frequency Cepstrum Coefficient) özelliklerinin oluşturulmasını desteklemektedir [32]. MFCC hesaplamanın tekniği temel olarak kısa vadeli analize dayanmaktadır.…”

Section: B 1 öZellik çıKarımıunclassified

“…Dolayısıyla uzun bir cümledeki kelimelerin dizilişini modellemek n-gram'lar ile mümkün olmayıp sadece kısıtlı kelime geçmişi modellenebilmektedir. İleri beslemeli sinir ağını kullanan dil modellerinde Markov varsayımı bulunmadığı için bu modeller ile kelimelerdeki uzun bağımlılıklar modellenebilmektedir[38].Dil Modeli, bir dildeki kelimelerin ve cümlelerin yapısı ve sırasını modelleyerek o dile ait bir istatistiksel model üretmektedir[39]. En basit ifade ile dil modeli bir kelime dizisinden sonra hangi kelimelerin gelebileceğini modelleyip kod çözme zamanında olası dizilişleri üretmektedir.…”

unclassified

The Effect of Removal the Silence and Speech Parsing Processes on Turkish Automatic Speech Recognition

Oyucu

Polat

Sever

2020

Düzce Üniversitesi Bilim Ve Teknoloji Dergisi

View full text Add to dashboard Cite

Otomatik Konuşma Tanıma sistemleri temel olarak akustik bilgiden faydalanılarak geliştirilmektedir. Akustik bilgiden fonem bilgisinin elde edilmesi için eşleştirilmiş konuşma ve metin verileri kullanılmaktadır. Bu veriler ile eğitilen akustik modeller gerçek hayattaki bütün akustik bilgiyi modelleyememektedir. Bu nedenle belirli ön işlemlerin yapılması ve otomatik konuşma tanıma sistemlerinin başarımını düşürecek akustik bilgilerin ortadan kaldırılması gerekmektedir. Bu çalışmada konuşma içerisinde geçen sessizliklerin kaldırılması için bir yöntem önerilmiştir. Önerilen yöntemin amacı sessizlik bilgisinin ortadan kaldırılması ve akustik bilgide uzun bağımlılıklar sağlayan konuşmaların parçalara ayrılmasıdır. Geliştirilen yöntemin sonunda elde edilen sessizlik içermeyen ve parçalara ayrılan konuşma bilgisi bir Türkçe Otomatik Konuşma Tanıma sistemine girdi olarak verilmiştir. Otomatik Konuşma Tanıma sisteminin çıkışında sisteme giriş olarak verilen konuşma parçalarına karşılık gelen metinler birleştirilerek sunulmuştur. Gerçekleştirilen deneylerde sessizliğin kaldırılması ve konuşmanın parçalara ayrılması işleminin Otomatik Konuşma Tanıma sistemlerinin başarımını artırdığı görülmüştür.

show abstract

Continuous Punjabi speech recognition model based on Kaldi ASR toolkit

Cited by 38 publications

References 18 publications

Speech Recognition of Moroccan Dialect Using Hidden Markov Models

Speech Recognition of Moroccan Dialect Using Hidden Markov Models

Performance of Isolated and Continuous Digit Recognition System using Kaldi Toolkit

The Effect of Removal the Silence and Speech Parsing Processes on Turkish Automatic Speech Recognition

Contact Info

Product

Resources

About