2019
DOI: 10.1523/jneurosci.2914-18.2019
|View full text |Cite
|
Sign up to set email alerts
|

Cascaded Tuning to Amplitude Modulation for Natural Sound Recognition

Abstract: The auditory system converts the physical properties of a sound waveform to neural activities and processes them for recognition. During the process, the tuning to amplitude modulation (AM) is successively transformed by a cascade of brain regions. To test the functional significance of the AM tuning, we conducted single-unit recording in a deep neural network (DNN) trained for natural sound recognition. We calculated the AM representation in the DNN and quantitatively compared it with those reported in previo… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
41
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
6
2

Relationship

0
8

Authors

Journals

citations
Cited by 30 publications
(42 citation statements)
references
References 75 publications
1
41
0
Order By: Relevance
“…Analogously, layer-wise correspondence has been found between CNNs trained for audio classification and the human auditory cortex 25 or the monkey peripheral auditory network 26 . Although all these studies are positive in the generality of explanatory capabilities of goal-optimized neural networks, the same story might not go all the way through.…”
Section: Discussionmentioning
confidence: 74%
“…Analogously, layer-wise correspondence has been found between CNNs trained for audio classification and the human auditory cortex 25 or the monkey peripheral auditory network 26 . Although all these studies are positive in the generality of explanatory capabilities of goal-optimized neural networks, the same story might not go all the way through.…”
Section: Discussionmentioning
confidence: 74%
“…As mentioned in Introduction, more recent studies have argued that CNNs trained for image classification have layers similar to higher 2-4,7 , intermediate [3][4][5] , or lower 6 areas in the monkey or human visual ventral stream. Analogously, layer-wise correspondence has been found between CNNs trained for audio classification and the human auditory cortex 25 or the monkey peripheral auditory network 26 . Although all these studies are positive in the generality of explanatory capabilities of goal-optimized neural networks, the same story might not go all the way through.…”
Section: View-identity Tuningmentioning
confidence: 71%
“…A very productive line of research put the emphasis on the temporal aspects of the speech structure and explored speech perception in terms of temporal-modulation processing (e.g., Houtgast and Steeneken, 1973;Plomp, 1983;Rosen, 1992;Drullman, 1995;Shannon et al, 1995;Zeng et al, 2005;Moore, 2008;Shamma and Lorenzi, 2013). Altogether, these studies demonstrated that (i) speech sounds convey salient modulations in amplitude (AM) and frequency (FM) resulting from the dynamic modulation of the vocal-tract geometric characteristics and vocal-fold vibrations (e.g., Varnet et al, 2017); (ii) the human auditory system is exquisitely sensitive to these modulation cues and certainly optimized to detect and discriminate modulation cues at the output of perceptual filters selectively tuned in the AM domain (Rodriguez et al, 2010;Koumura et al, 2019) and, in the case of slow FM carried by low-frequency sounds, due to temporal coding mechanisms using neural phase-locking to the temporal fine structure of narrowband signals at the output of cochlear filters (Paraouty et al, 2018); and (iii) the ability to identify speech in a variety of listening conditions is constrained by the ability to perceive accurately these relatively slow AM and FM components (e.g., Fu, 2002;Johannesen et al, 2016;Parthasarathy et al, 2020).…”
Section: Introductionmentioning
confidence: 88%