Lipāreading deciphers speech by observing lip movements without relying on audio data. The rapid advancements in deep learning have significantly improved lipāreading for both English and Chinese; however, research on dialects such as Cantonese remains scarce. Consequently, most Chinese lipāreading datasets focus on Mandarin, with only a few addressing Cantonese. To bridge this gap, a sentenceālevel Cantonese lipāreading dataset, designated as Cantonese lipāreading sentences are introduced, comprising over 500 unique speakers and more than 30,000 samples. To ensure alignment with realāworld scenarios, no restrictions are imposed on factors such as gender, age, posture, lighting conditions, or speech rate. A comprehensive description of the pipeline employed is provided for collecting and constructing the dataset and introduce an innovative visual frontend, 3Dāvisual attention net. This frontend combines the advantages of convolution and selfāattention mechanisms to extract fineāgrained lip region features. These features are subsequently input into the conformer backend for temporal sequence modelling, achieving comparable performance on Chinese Mandarin lip reading dataset, lip reading sentences 2, lip reading sentences 3, and Cantonese lipāreading sentences datasets. Benchmark tests on Cantonese lipāreading sentences demonstrate the challenges it poses, providing a novel research foundation for dialect lipāreading and fostering the advancement of Cantonese lipāreading tasks.