TweetyNet: A neural network that enables high-throughput, automated annotation of birdsong

Cohen, Yarden; Nicholson, David; Gardner, Timothy J.

doi:10.1101/2020.08.28.272088

Cited by 12 publications

(50 citation statements)

References 55 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…MFCC computations where performed using the Librosa [13] Python library. Song spectrograms are first extracted using Short Time Fourier Transform every 11ms (often called frame stride) and computed on overlapping windows of 23ms (often called window width) 4 , using a Hanning window to reduce edge effects. Then, we set the frequency range of a 128 filters Mel filterbank to [500Hz; 8kHz], as canaries vocal patterns occur below 8kHz and as the [0Hz; 500Hz] bandwidth represents mostly noise.…”

Section: Data Preprocessingmentioning

confidence: 99%

“…Comparison with [4] can not be done fairly, as our method operate at phrase level and not at syllable level, and as we did not use the same dataset. We discuss the possibility of extending this work in Discussion.…”

Section: Performance Of Transductionmentioning

confidence: 99%

“…Ongoing work described in Cohen et. al [4] solves the segmentation problem by using machine learning techniques to try to extract the position of syllables on spectrograms of the song while classifying them using CNN and Long Short Term Memory Recurrent Neural Networks (LSTM RNN).…”

Section: Introductionmentioning

confidence: 99%

“…We propose a method based on RNN processing spectral features to segment and classify canary songs. However, unlike [4], we focused on segmenting canary songs at phrase level, which is supposed to be sufficient for analysis like [12]. We compare simple neural architectures like a single LSTM or Echo States Networks (ESN) [7].…”

Section: Introductionmentioning

confidence: 99%

“…syllables types). Importantly, the limited number of parameters that need to be learned (compared to more complex architectures like [4]) enables one to apply them on limited amount of data while not overfitting on it; this is particularly interesting because it limits the amount of data necessary to be hand-labelled.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

Canary Song Decoder: Transduction and Implicit Segmentation with ESNs and LTSMs

Trouvain

Hinaut

2021

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Domestic canaries produce complex vocal patterns embedded in various levels of abstraction. Studying such temporal organization is of particular relevance to understand how animal brains represent and process vocal inputs such as language. However, this requires a large amount of annotated data. We propose a fast and easy-to-train transducer model based on RNN architectures to automate parts of the annotation process. This is similar to a speech recognition task. We demonstrate that RNN architectures can be efficiently applied on spectral features (MFCC) to annotate songs at time frame level and at phrase level. We achieved around 95% accuracy at frame level on particularly complex canary songs, and ESNs achieved around 5% of word error rate (WER) at phrase level. Moreover, we are able to build this model using only around 13 to 20 minutes of annotated songs. Training time takes only 35 seconds using 2 hours and 40 minutes of data for the ESN, allowing to quickly run experiments without the need of powerful hardware.

show abstract

Section: Data Preprocessingmentioning

confidence: 99%

Section: Performance Of Transductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations