Lip Reading in Cantonese

Xiao, Yewei; Teng, Lianwei; Zhu, Aosu; Liu, Xuanming; Tian, Picheng

doi:10.1109/access.2022.3204677

Cited by 5 publications

(4 citation statements)

References 20 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Comprising 50 frequently used Tibetan words, the dataset initially featured 20 native speakers, equally divided by gender, and was later augmented using data enhancement techniques to produce 36,000 videos, averaging 720 videos per sample. Recently, Teng [8] propose a word-level Cantonese lip-reading dataset called CLRW which contains 800word classes with 400,000 samples. Dai et al [29] introduced a new Cantonese in-car audio-visual speech recognition (CI-AVSR) dataset for in-car command recognition in Cantonese.…”

Section: Lip-reading Datasetsmentioning

confidence: 99%

“…The CLRS, CI-AVSR [29], and CLRW [8] datasets are all significant resources in the field of lipreading. The CLRS dataset is a multimodal corpus tailored for Cantonese sentence-level lipreading, comprising over 30,000 natural Cantonese sentences and recordings from more than 1,000 speakers.…”

Section: Comparison With Other Cantonese Datasetsmentioning

confidence: 99%

“…The linguistic uniqueness of Cantonese, along with its extensive speakership, highlights the importance of investigating Cantonese lip-reading. In recent work, Teng et al [8] developed the first large-scale Cantonese word-level lip-reading dataset, Cantonese lip reading in the wild (CLRW). This dataset encompasses various factors such as age, gender, identity, program format, lighting, and facial resolution, containing 800 word classes and 400,000 word samples that are both challenging and realistic.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Cantonese sentence dataset for lip‐reading

Xiao,

Liu,

Teng

et al. 2024

IET Image Processing

Self Cite

View full text Add to dashboard Cite

Lip‐reading deciphers speech by observing lip movements without relying on audio data. The rapid advancements in deep learning have significantly improved lip‐reading for both English and Chinese; however, research on dialects such as Cantonese remains scarce. Consequently, most Chinese lip‐reading datasets focus on Mandarin, with only a few addressing Cantonese. To bridge this gap, a sentence‐level Cantonese lip‐reading dataset, designated as Cantonese lip‐reading sentences are introduced, comprising over 500 unique speakers and more than 30,000 samples. To ensure alignment with real‐world scenarios, no restrictions are imposed on factors such as gender, age, posture, lighting conditions, or speech rate. A comprehensive description of the pipeline employed is provided for collecting and constructing the dataset and introduce an innovative visual frontend, 3D‐visual attention net. This frontend combines the advantages of convolution and self‐attention mechanisms to extract fine‐grained lip region features. These features are subsequently input into the conformer backend for temporal sequence modelling, achieving comparable performance on Chinese Mandarin lip reading dataset, lip reading sentences 2, lip reading sentences 3, and Cantonese lip‐reading sentences datasets. Benchmark tests on Cantonese lip‐reading sentences demonstrate the challenges it poses, providing a novel research foundation for dialect lip‐reading and fostering the advancement of Cantonese lip‐reading tasks.

show abstract

Section: Lip-reading Datasetsmentioning

confidence: 99%

Section: Comparison With Other Cantonese Datasetsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Cantonese sentence dataset for lip‐reading

Xiao,

Liu,

Teng

et al. 2024

IET Image Processing

Self Cite

View full text Add to dashboard Cite

show abstract

“…Several approaches related to lip reading are briefly addressed in this paper. The authors [3] presented the method of detecting lips and using the cropped images as a dataset for the training set for Convolutional Neural Networks. Also, they discussed different methods of evaluation that can be used.…”

Section: Related Workmentioning

confidence: 99%

Application of convolutional neural networks to spoken words evaluation based on lip movements without accompanying sound signal

Perić¹,

Maček²,

Bogdanoski³

2022

J Comp Foren Sci

View full text Add to dashboard Cite

This paper proposes an approach to evaluate spoken words based on lip movements without accompanying sound signals using convolutional neural networks. The main goal of this research is to prove the efficiency of neural networks in the field, where all data is received from an array of images. The modeling and the hypotheses are validated based on the results obtained for a specific case study. Our study reports on speech recognition from only a sequence of images provided, where all crucial data and features are extracted, processed, and used in a model to create artificial consciousness.

show abstract