A visual voice activity detection method with adaboosting

Liu, Qingju; Wang, Wenwu; Jackson, Philip J. B.

doi:10.1049/ic.2011.0145

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

Supporting

Mentioning

Contrasting

Year Published

2011

2024

Publication Types

Select...

Other4

Article1

Relationship

Self Cite0

Independent5

Authors

Journals

Cited by 13 publications

References 8 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

Active Speaker Detection Using Audio, Visual, and Depth Modalities: A Survey

Nur Aisyah Mohd Robi,

Atiff Zakwan Mohd Ariffin,

Mohd Izhar

et al. 2024

IEEE Access

View full text Add to dashboard Cite

Active Speaker Detection Using Audio, Visual, and Depth Modalities: A Survey

Nur Aisyah Mohd Robi,

Atiff Zakwan Mohd Ariffin,

Mohd Izhar

et al. 2024

IEEE Access

View full text Add to dashboard Cite

Sub-word Level Lip Reading With Visual Attention

Prajwal

Afouras

Zisserman

2022

2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

View full text Add to dashboard Cite

We introduce MusicFlow, a cascaded text-tomusic generation model based on flow matching. Based on self-supervised representations to bridge between text descriptions and music audios, we construct two flow matching networks to model the conditional distribution of semantic and acoustic features. Additionally, we leverage masked prediction as the training objective, enabling the model to generalize to other tasks such as music infilling and continuation in a zero-shot manner. Experiments on MusicCaps reveal that the music generated by MusicFlow exhibits superior quality and text coherence despite being over 2 ∼ 5 times smaller and requiring 5 times fewer iterative steps. Simultaneously, the model can perform other music generation tasks and achieves competitive performance in music infilling and continuation. Our code and model will be publicly available.

show abstract

Blind source separation and visual voice activity detection for target speech extraction

Liu

Wang

2011

2011 3rd International Conference on Awareness Science and Technology (iCAST)

View full text Add to dashboard Cite

Abstract-Despite being studied extensively, the performance of blind source separation (BSS) is still limited especially for the sensor data collected in adverse environments. Recent studies show that such an issue can be mitigated by incorporating multimodal information into the BSS process. In this paper, we propose a method for the enhancement of the target speech separated by a BSS algorithm from sound mixtures, using visual voice activity detection (VAD) and spectral subtraction. First, a classifier for visual VAD is formed in the off-line training stage, using labelled features extracted from the visual stimuli. Then we use this visual VAD classifier to detect the voice activity of the target speech. Finally we apply a multi-band spectral subtraction algorithm to enhance the BSS-separated speech signal based on the detected voice activity. We have tested our algorithm on the mixtures generated artificially by the mixing filters with different reverberation times, and the results show that our algorithm improves the quality of the separated target signal.

show abstract

A visual voice activity detection method with adaboosting

Cited by 13 publications

References 8 publications

Active Speaker Detection Using Audio, Visual, and Depth Modalities: A Survey

Active Speaker Detection Using Audio, Visual, and Depth Modalities: A Survey

Sub-word Level Lip Reading With Visual Attention

Blind source separation and visual voice activity detection for target speech extraction

Contact Info

Product

Resources

About