2021
DOI: 10.1007/978-3-030-69544-6_25
|View full text |Cite
|
Sign up to set email alerts
|

Visually Guided Sound Source Separation Using Cascaded Opponent Filter Network

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

1
44
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
3
2

Relationship

2
6

Authors

Journals

citations
Cited by 17 publications
(45 citation statements)
references
References 51 publications
1
44
0
Order By: Relevance
“…Then, PixelPlayer only considered semantic features extracted from the video frames. Appearance information is important as highlighted in [290], where the separation was guided with a single image, but higher performance is expected to be achieved when also motion information is exploited. Zhao et al [287] proposed to combine trajectory and semantic features to condition a source separation network.…”
Section: B Audio-visual Sound Source Separation For Non-speech Signalsmentioning
confidence: 99%
“…Then, PixelPlayer only considered semantic features extracted from the video frames. Appearance information is important as highlighted in [290], where the separation was guided with a single image, but higher performance is expected to be achieved when also motion information is exploited. Zhao et al [287] proposed to combine trajectory and semantic features to condition a source separation network.…”
Section: B Audio-visual Sound Source Separation For Non-speech Signalsmentioning
confidence: 99%
“…Recent works [64,17,63,25,65,66,21,67] have started to exploit visual information (e.g. talking face, playing instruments) to solve the sound separation task.…”
Section: Introductionmentioning
confidence: 99%
“…While visual motions may be important under certain circumstances (e.g. separating similar type of sources), the single visual frame based approaches have demonstrated surprisingly well performance in [64,65,66]. In this paper, we focus on improving the single visual frame based sound separation.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Previous works have proposed models to controllably generate e.g. images [13,17,38,45,48,51,55,57,73,76,77], videos [6,12,25,37,42,46,64,65,65,71], and audios [1,9,15,22,24,47,62,63], or separate sounds [18,19,79,80,84]. However, most of the audio works are music-related, and only a few attempts have been made to generate visually guided audio in an open domain setup [11,83].…”
mentioning
confidence: 99%