2020
DOI: 10.3390/app10062076
|View full text |Cite
|
Sign up to set email alerts
|

An Unsupervised Deep Learning System for Acoustic Scene Analysis

Abstract: Acoustic scene analysis has attracted a lot of attention recently. Existing methods are mostly supervised, which requires well-predefined acoustic scene categories and accurate labels. In practice, there exists a large amount of unlabeled audio data, but labeling large-scale data is not only costly but also time-consuming. Unsupervised acoustic scene analysis on the other hand does not require manual labeling but is known to have significantly lower performance and therefore has not been well explored. In this… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(3 citation statements)
references
References 13 publications
0
3
0
Order By: Relevance
“…Thus each channel yields a feature map X(k,l+1)=δ(X(l)W(k,l)+b(k,l)),where X(k,l+1)=δ(X(l)W(k,l)+b(k,l)), denotes the convolution operator, b(k,l) denotes the bias for kth feature map, and δ)(· is a nonlinear activation function. In this paper, Mish 42 function is applied to the 1D‐CNN as the activation function, which is defined as follows: δx=x·tanh(ς(x)),where ς(x)=ln(1+ex) refers to the softplus 43 activation. More specially, the value of the conv1d operation at position i is given by Xi(k,l+1)=δj=1m(l)c=1c(l)Xr(l)×(i1)+1+j,c(l)Wj,c(k,l)+b(k,l),where refers to Hadamard product and m(l) is the window size of filters in the lth conv1d layer.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…Thus each channel yields a feature map X(k,l+1)=δ(X(l)W(k,l)+b(k,l)),where X(k,l+1)=δ(X(l)W(k,l)+b(k,l)), denotes the convolution operator, b(k,l) denotes the bias for kth feature map, and δ)(· is a nonlinear activation function. In this paper, Mish 42 function is applied to the 1D‐CNN as the activation function, which is defined as follows: δx=x·tanh(ς(x)),where ς(x)=ln(1+ex) refers to the softplus 43 activation. More specially, the value of the conv1d operation at position i is given by Xi(k,l+1)=δj=1m(l)c=1c(l)Xr(l)×(i1)+1+j,c(l)Wj,c(k,l)+b(k,l),where refers to Hadamard product and m(l) is the window size of filters in the lth conv1d layer.…”
Section: Methodsmentioning
confidence: 99%
“…denotes the convolution operator, b k l ( , ) denotes the bias for kth feature map, and δ (•) is a nonlinear activation function. In this paper, Mish 42 function is applied to the 1D-CNN as the activation function, which is defined as follows:…”
Section: D Convolutionmentioning
confidence: 99%
“…While reproducing the auditory model of binaural hearing may be a challenging problem, the past decade has seen a renewed interest in binaural approaches to sound localization, which has been applied in a wide area of research and development, including rescue and surveillance robots, animal acoustics, as well as human robot interactions [ 29 , 30 , 31 , 32 , 33 , 34 , 35 ]. Unique or predetermined sound sources for instance can be embedded with search and rescue robots for ad-hoc localization in hazardous or cluttered environments, as well as for emergency signals in remote or unknown areas [ 36 , 37 ].…”
Section: Introductionmentioning
confidence: 99%