Metode Wavelet-MFCC dan Korelasi dalam Pengenalan Suara Digit

Dyarbirru, Zaurarista; Hidayat, Syahroni

doi:10.35746/jtim.v2i2.99

Cited by 3 publications

(2 citation statements)

References 6 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Automatic speech recognition (ASR) is a technology that enables interaction between humans and computers through voice (Dyarbirru and Hidayat, 2020). Google voice search applies an example of ASR technology that converts voice into text to perform everyday commands on mobile devices.…”

Section: Automatic Speech Recognitionmentioning

confidence: 99%

“…Common Voice is a multilingual dataset of transcribed, community-based, Creative Commons Zero (CC0) licensed audio talks built by Mozilla (Handoko and Suyanto, 2019). The Common Voice Indonesian dataset consists of 54 unique voices with a total of 5 hours of speech and 4 hours of validation (Dyarbirru and Hidayat, 2020). The data obtained is the result of crowdsourcing, for several languages the Mozilla Deep Speech and Common Voice models produce an average CER improvement of 5.99 ± 5.48 (Tachbelie et al, 2022).…”

Section: Audio and Language Models In Mozilla Deep Speechmentioning

confidence: 99%

See 1 more Smart Citation

Exploration of Spontaneous Speech Corpus Development in Urban Agriculture Instructional Videos

Gelar,

Nanda

2022

J. Soft. Eng. Inf. Comm. Tech.

View full text Add to dashboard Cite

Video transcription can be obtained automatically based on the original language translation of the video maker's speech, but the quality of the transcription depends on the quality of the audio signal and the natural voice of the speaker. In this study, Deep Speech is used to predict letters based on acoustic recognition without understanding language rules. The Common Voice multilingual corpus helps Deep Seech to transcribe Indonesian. However, this corpus does not accommodate the special topic of urban agriculture, so an additional corpus is needed to build acoustic and language models with the urban agriculture domain. A total of 15 popular videos with closed captions and nine E-Books with the theme of Horticulture (fruit, vegetables and medicinal plants) were curated. The video data were extracted into audio and transcription according to specifications as training data, while the agricultural text data were transformed into language models, which were used to predict recognition results. The evaluation results show that the number of epochs has an effect on improving the transcription performance. The language model score used during prediction improved WER performance as it interpreted words with agricultural terms. Another finding was that the model was unable to predict short words with informal varieties and located at the end of the sentence.

show abstract

Section: Automatic Speech Recognitionmentioning

confidence: 99%

Section: Audio and Language Models In Mozilla Deep Speechmentioning

confidence: 99%

Exploration of Spontaneous Speech Corpus Development in Urban Agriculture Instructional Videos

Gelar,

Nanda

2022

J. Soft. Eng. Inf. Comm. Tech.

View full text Add to dashboard Cite

show abstract

Best wavelet decomposition channel determination for speech processing application using two-way ANOVA

Qudsi¹,

Tajuddin²,

Hidayat³

et al. 2023

Computational Intelligence and Network Security

View full text Add to dashboard Cite

Pengenalan Pola Fonem Vokal menggunakan Short Time Fourier Transform (STFT) dan Fitur Mel Frequency Cepstral Coefficient (MFCC)

Adriansyah¹,

Prasetyo²,

Faruqi³

2021

j. teknologi terpadu

View full text Add to dashboard Cite

Fonem adalah bagian yang menyusun semua bahasa lisan. Setiap kata dan kalimat yang diutarakan terdiri dari satu fonem atau lebih. Untuk meningkatkan akurasi dari model akustik, peneliti mencoba mengidentifikasi pola fonem vokal dalam bahasa Indonesia menggunakan STFT dan Fitur MFCC. Dalam penelitian ini, peneliti menganalisis data dari 398 file suara yang bersumber dari 51 orang partisipan dan mengeksplorasi perbedaan pola dari fonem vokal a,i,u,e,o. Dengan menggunakan SVM dan JST, fitur tersebut diklasifikasikan dan diuji. Hasil pengujian memberikan akurasi 93,8% menggunakan SVM dengan kernel radial.

show abstract

Metode Wavelet-MFCC dan Korelasi dalam Pengenalan Suara Digit

Cited by 3 publications

References 6 publications

Exploration of Spontaneous Speech Corpus Development in Urban Agriculture Instructional Videos

Exploration of Spontaneous Speech Corpus Development in Urban Agriculture Instructional Videos

Best wavelet decomposition channel determination for speech processing application using two-way ANOVA

Pengenalan Pola Fonem Vokal menggunakan Short Time Fourier Transform (STFT) dan Fitur Mel Frequency Cepstral Coefficient (MFCC)

Contact Info

Product

Resources

About