Frontier Research on Low-Resource Speech Recognition Technology

Slam, Wushour; Li, Yanan; Urouvas, Nurmamet

doi:10.3390/s23229096

Cited by 3 publications

(2 citation statements)

References 117 publications

(164 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The formula for calculating the grayscale level in the histogram equalization algorithm is as shown in Equation (5).…”

Section: Image Enhancement Algorithmsmentioning

confidence: 99%

“…Today, artificial intelligence has been widely applied in various fields, such as in the medical domain for tasks like image denoising [ 1 ], ultrasound image processing [ 2 ], and image classification [ 3 ]. Additionally, significant achievements have been made in areas like small object detection [ 4 ] and speech recognition [ 5 ]. Currently, most AI computational tasks rely on deployment on cloud and other large-scale computing platforms, but the significant physical distance between these resource-intensive platforms and smart endpoints limits the convenience of AI.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Research on Convolutional Neural Network Inference Acceleration and Performance Optimization for Edge Intelligence

Liang,

Tan,

Xie

et al. 2023

Sensors

View full text Add to dashboard Cite

In recent years, edge intelligence (EI) has emerged, combining edge computing with AI, and specifically deep learning, to run AI algorithms directly on edge devices. In practical applications, EI faces challenges related to computational power, power consumption, size, and cost, with the primary challenge being the trade-off between computational power and power consumption. This has rendered traditional computing platforms unsustainable, making heterogeneous parallel computing platforms a crucial pathway for implementing EI. In our research, we leveraged the Xilinx Zynq 7000 heterogeneous computing platform, employed high-level synthesis (HLS) for design, and implemented two different accelerators for LeNet-5 using loop unrolling and pipelining optimization techniques. The experimental results show that when running at a clock speed of 100 MHz, the PIPELINE accelerator, compared to the UNROLL accelerator, experiences an 8.09% increase in power consumption but speeds up by 14.972 times, making the PIPELINE accelerator superior in performance. Compared to the CPU, the PIPELINE accelerator reduces power consumption by 91.37% and speeds up by 70.387 times, while compared to the GPU, it reduces power consumption by 93.35%. This study provides two different optimization schemes for edge intelligence applications through design and experimentation and demonstrates the impact of different quantization methods on FPGA resource consumption. These experimental results can provide a reference for practical applications, thereby providing a reference hardware acceleration scheme for edge intelligence applications.

show abstract

“…The formula for calculating the grayscale level in the histogram equalization algorithm is as shown in Equation (5).…”

Section: Image Enhancement Algorithmsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Research on Convolutional Neural Network Inference Acceleration and Performance Optimization for Edge Intelligence

Liang,

Tan,

Xie

et al. 2023

Sensors

View full text Add to dashboard Cite

show abstract

Computer-Assisted Pronunciation Training System for Atayal, an Indigenous Language in Taiwan

Chuang,

Hsu,

Luu

et al. 2024

2024 27th Conference of the Oriental COCOSDA International Committee for the Co-Ordination and Standardisation of Speech Databa

View full text Add to dashboard Cite

Building a Speech Dataset and Recognition Model for the Minority Tu Language

Kong,

Li,

Fang

et al. 2024

Applied Sciences

View full text Add to dashboard Cite

Speech recognition technology has many applications in our daily life. However, for many low-resource languages without written forms, acquiring sufficient training data remains a significant challenge for building accurate ASR models. The Tu language, spoken by an ethnic minority group in Qinghai Province in China, is one such example. Due to the lack of written records and the great diversity in regional pronunciations, there has been little previous research on Tu-language speech recognition. This work seeks to address this research gap by creating the first speech dataset for the Tu language spoken in Huzhu County, Qinghai. We first formulated the relevant pronunciation rules for the Tu language based on linguistic analysis. Then, we constructed a new speech corpus, named HZ-TuDs, through targeted data collection and annotation. Based on the HZ-TuDs dataset, we designed several baseline sequence-to-sequence deep neural models for end-to-end Tu-language speech recognition. Additionally, we proposed a novel SA-conformer model, which combines convolutional and channel attention modules to better extract speech features. Experiments showed that our proposed SA-conformer model can significantly reduce the character error rate from 23% to 12%, effectively improving the accuracy of Tu language recognition compared to previous approaches. This demonstrates the effectiveness of our dataset construction and model design efforts in advancing speech recognition technology for this low-resource minority language.

show abstract

Frontier Research on Low-Resource Speech Recognition Technology

Cited by 3 publications

References 117 publications

Research on Convolutional Neural Network Inference Acceleration and Performance Optimization for Edge Intelligence

Research on Convolutional Neural Network Inference Acceleration and Performance Optimization for Edge Intelligence

Computer-Assisted Pronunciation Training System for Atayal, an Indigenous Language in Taiwan

Building a Speech Dataset and Recognition Model for the Minority Tu Language

Contact Info

Product

Resources

About