The Speaker and Language Recognition Workshop (Odyssey 2018) 2018
DOI: 10.21437/odyssey.2018-40
|View full text |Cite
|
Sign up to set email alerts
|

BUT/Phonexia Bottleneck Feature Extractor

Abstract: This paper complements the public release of the BUT/Phonexia bottleneck (BN) feature extractor. Starting with a brief history of Neural Network (NN)-based and BN-based approaches to speech feature extraction, it describes the structure of the released software. It follows by describing the three provided NNs: the first two trained on the US English Fisher corpus with monophone-state and tied-state targets, and the third network trained in a multilingual fashion on 17 Babel languages. The NNs were technically … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
19
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
3
3
2

Relationship

0
8

Authors

Journals

citations
Cited by 23 publications
(19 citation statements)
references
References 16 publications
0
19
0
Order By: Relevance
“…1) of the input speech sample. This block contains a pre-trained bottleneck feature (BNF) extractor at the front-end to convert the speech utterance into a sequence of 80-dimensional BNFs [14]. This BNF extractor was originally trained for identifying 3096 phone states from 17 languages.…”
Section: Feature Extractor Block For Obtaining Fixed-length U-vectormentioning
confidence: 99%
“…1) of the input speech sample. This block contains a pre-trained bottleneck feature (BNF) extractor at the front-end to convert the speech utterance into a sequence of 80-dimensional BNFs [14]. This BNF extractor was originally trained for identifying 3096 phone states from 17 languages.…”
Section: Feature Extractor Block For Obtaining Fixed-length U-vectormentioning
confidence: 99%
“…The second baseline is an i-vector system which is trained using Mel frequency cepstral coefficients (MFCCs) [20], [40]. The third baseline is an i-vector system, which is trained using the BUT/phonexia bottleneck features [41]. The second and third system will be referred to as the i-vector MFCC system and the i-vector BUT-BNF system, respectively.…”
Section: B Baseline Configurationsmentioning
confidence: 99%
“…In the latter, BNFs are extracted from a multi-lingual phone recognizer neural network. For our experiments, we considered a pretrained phone recognizer from BUT/phonexia [41] which was trained using 17 Babel languages. For both systems, 100-dimensional i-vectors are extracted using 256 Gaussian mixture components and the obtained i-vectors are transformed by a whitening transformation to be used in the dialect prediction [20], [40].…”
Section: B Baseline Configurationsmentioning
confidence: 99%
“…The feature extraction module performs stacked bottleneck feature (sBNF) computation following the BUT/Phonexia approach [125], both for queries and utterances. To do so, three different neural networks are The operation of BUT/Phonexia sBNF extractors requires an external VAD module providing speech/nonspeech information.…”
Section: Feature Extractionmentioning
confidence: 99%