Improving Ultrasound Tongue Image Reconstruction from Lip Images Using Self-supervised Learning and Attention Mechanism

Liu, Haiyang; Zhang, Jihan

doi:10.48550/arxiv.2106.11769

Cited by 1 publication

(2 citation statements)

References 30 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…These transformations lead to a vector-based connection notation for cascaded features between x l and g within the intermediary space of R F int . The AG's output merges input elements with attention coefficients through elementwise multiplication, as formulaically represented in Equation (3). In this study, the AG calculates a singular scalar focus value for each pixel vector x l i ∈ R F l , with F l indicating the number of feature maps at layer l.…”

Section: Gated Attentionmentioning

confidence: 99%

“…Research indicates that tongue contours serve as an invaluable foundation for the quantitative analysis of speech, with data obtained from these contours facilitating the advancement and comprehension of speech models [ 3 , 4 ]. Ultrasonic tongue contour extraction can dynamically capture the tongue’s position across various phonetic expressions and depict the movements responsible for sound transitions during articulation [ 5 ].…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

DAFT-Net: Dual Attention and Fast Tongue Contour Extraction Using Enhanced U-Net Architecture

Wang,

Lu,

Liu

et al. 2024

Entropy

View full text Add to dashboard Cite

In most silent speech research, continuously observing tongue movements is crucial, thus requiring the use of ultrasound to extract tongue contours. Precisely and in real-time extracting ultrasonic tongue contours presents a major challenge. To tackle this challenge, the novel end-to-end lightweight network DAFT-Net is introduced for ultrasonic tongue contour extraction. Integrating the Convolutional Block Attention Module (CBAM) and Attention Gate (AG) module with entropy-based optimization strategies, DAFT-Net establishes a comprehensive attention mechanism with dual functionality. This innovative approach enhances feature representation by replacing traditional skip connection architecture, thus leveraging entropy and information-theoretic measures to ensure efficient and precise feature selection. Additionally, the U-Net’s encoder and decoder layers have been streamlined to reduce computational demands. This process is further supported by information theory, thus guiding the reduction without compromising the network’s ability to capture and utilize critical information. Ablation studies confirm the efficacy of the integrated attention module and its components. The comparative analysis of the NS, TGU, and TIMIT datasets shows that DAFT-Net efficiently extracts relevant features, and it significantly reduces extraction time. These findings demonstrate the practical advantages of applying entropy and information theory principles. This approach improves the performance of medical image segmentation networks, thus paving the way for real-world applications.

show abstract

Section: Gated Attentionmentioning

confidence: 99%