Interspeech 2018 2018
DOI: 10.21437/interspeech.2018-1456
|View full text |Cite
|
Sign up to set email alerts
|

ESPnet: End-to-End Speech Processing Toolkit

Abstract: This paper introduces a new open source platform for end-toend speech processing named ESPnet. ESPnet mainly focuses on end-to-end automatic speech recognition (ASR), and adopts widely-used dynamic neural network toolkits, Chainer and Py-Torch, as a main deep learning engine. ESPnet also follows the Kaldi ASR toolkit style for data processing, feature extraction/format, and recipes to provide a complete setup for speech recognition and other speech processing experiments. This paper explains a major architectu… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

4
619
0
3

Year Published

2018
2018
2020
2020

Publication Types

Select...
5
5

Relationship

4
6

Authors

Journals

citations
Cited by 1,075 publications
(626 citation statements)
references
References 31 publications
4
619
0
3
Order By: Relevance
“…All the proposed end-to-end multi-speaker speech recognition models are implemented with the ESPnet framework [29] using the Pytorch backend. Some basic parts are the same for all the models.…”
Section: Methodsmentioning
confidence: 99%
“…All the proposed end-to-end multi-speaker speech recognition models are implemented with the ESPnet framework [29] using the Pytorch backend. Some basic parts are the same for all the models.…”
Section: Methodsmentioning
confidence: 99%
“…We use TED-LIUM2 as a second dataset to investigate how the performance evolves when using a larger training set with a higher number of speakers. We use the ESPnet toolkit [30] to implement and investigate our proposed methods. I-vectors are extracted using the Kaldi toolkit [31].…”
Section: Methodsmentioning
confidence: 99%
“…Audio segmentation is an essential technique for saving computational resources and complexity. ESPnet [30] was used as an E2E-ASR toolkit through the experiments. The following four methods were compared:…”
Section: Methodsmentioning
confidence: 99%