2020 3rd International Seminar on Research of Information Technology and Intelligent Systems (ISRITI) 2020
DOI: 10.1109/isriti51436.2020.9315380
|View full text |Cite
|
Sign up to set email alerts
|

Speech Gender Classification Using Bidirectional Long Short Term Memory

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
1
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
2
2

Relationship

0
8

Authors

Journals

citations
Cited by 13 publications
(5 citation statements)
references
References 33 publications
0
1
0
Order By: Relevance
“…Initially, the mixed audio (mixture) undergoes STFT to obtain the time frequency spectrogram. Then, a neural network consisting of four layers of bidirectional long short-term memory (BLSTM) [21] and fully connected layers is used to record the mapping from the time frequency spectrogram to an embedding space. In this embedding space, the mapping of the spectrogram is represented using an embedding matrix V, and k represents the dimensionality of the embedding space.…”
Section: Time Frequency Domain-based Modelmentioning
confidence: 99%
“…Initially, the mixed audio (mixture) undergoes STFT to obtain the time frequency spectrogram. Then, a neural network consisting of four layers of bidirectional long short-term memory (BLSTM) [21] and fully connected layers is used to record the mapping from the time frequency spectrogram to an embedding space. In this embedding space, the mapping of the spectrogram is represented using an embedding matrix V, and k represents the dimensionality of the embedding space.…”
Section: Time Frequency Domain-based Modelmentioning
confidence: 99%
“…The performance of MFCC with other extraction methods has been tested [11] and the results show MFCC outperforms other methods. From [1,5,19] and several other studies also use MFCC as a feature extraction method. The stages of MFCC [2,6] can be seen in Figure 2.…”
Section: Feature Extractionmentioning
confidence: 99%
“…This reshaping technique gives 0.4% improvement with 1D to 3D signal preprocessing as CNN input. Another deep learning architecture is Bidirectional Long Short-Term Memory (BLSTM) with a division of training and testing datasets of 80:20 resulting in the highest accuracy of 90.5% [19].…”
Section: Introductionmentioning
confidence: 99%
“…BLSTM has several applications in voice recognition. Examples include speech gender classification [28], speech emotion recognition [29]. and native language identification in brief speech utterances [30].…”
Section: Introductionmentioning
confidence: 99%