2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2012
DOI: 10.1109/icassp.2012.6288368
|View full text |Cite
|
Sign up to set email alerts
|

Improving faster-than-real-time human acoustic event detection by saliency-maximized audio visualization

Abstract: We propose a saliency-maximized audio spectrogram as a representation that lets human analysts quickly search for and detect events in audio recordings. By rendering target events as visually salient patterns, this representation minimizes the time and effort needed to examine a recording. In particular, we propose a transformation of a conventional spectrogram that maximizes the mutual information between the spectrograms of isolated target events and the estimated saliency of the overall visual representatio… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
11
0

Year Published

2013
2013
2018
2018

Publication Types

Select...
5
2
1

Relationship

1
7

Authors

Journals

citations
Cited by 12 publications
(11 citation statements)
references
References 8 publications
0
11
0
Order By: Relevance
“…We speculate that one possible solution to mitigate confusion errors would be to provide example recordings of sound classes to which annotators could refer while annotating. It is also possible that saliency maximization techniques such as the one proposed by Lin et al [27] could help reduce missed detection of events.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…We speculate that one possible solution to mitigate confusion errors would be to provide example recordings of sound classes to which annotators could refer while annotating. It is also possible that saliency maximization techniques such as the one proposed by Lin et al [27] could help reduce missed detection of events.…”
Section: Discussionmentioning
confidence: 99%
“…Lin et al [27] developed a saliency-maximized audio spectrogram to enable fast detection of sound events by human annotators. They then conducted a study on the effect of this alternative representation on audio annotation quality.…”
Section: Related Workmentioning
confidence: 99%
“…In an AED task, the user is not permitted to observe Y [n 1 , n 2 ] directly; instead, he or she must observe X[n 1 , n 2 ], the spectrogram of the mixed noisy signal. The background noise with spectrogram N [n 1 , n 2 ] is irrelevant to the task (e.g., symphony music [Hasegawa-Johnson et al 2011] or speech [Lin et al 2012]). In order to help the user correctly identify the locations at which the target signal Y [n 1 , n 2 ] is nonzero, we propose to transform the image prior to display, using a learned image…”
Section: Saliency-maximized Audio Visualizationmentioning
confidence: 99%
“…where i and j are the row and column pixel number of the spectrogram image, respectively, and J is the total number of column pixels, calculate the local image saliency by using equation (6).…”
Section: ) the Local Saliency Feature Of Mfccmentioning
confidence: 99%
“…Though the sound recognition work has been proved to be efficient by using the features mentioned above, however, these features are not visualized features which could be extracted automatically and complex process algorithm is needed. Some research work has been done recently by fusing both audio and visual signal information to do the recognition and perception work for robot or other platforms [6] [7] , but the image feature for fusion they used is the entire image, therefore, the fusion processing needs a lot of computing resource and the image features of the sound signal are still not saliency features.…”
mentioning
confidence: 99%