2014
DOI: 10.4218/etrij.14.0213.0181
|View full text |Cite
|
Sign up to set email alerts
|

Intra- and Inter-frame Features for Automatic Speech Recognition

Abstract: In this paper, alternative dynamic features for speech recognition are proposed. The goal of this work is to improve speech recognition accuracy by deriving the representation of distinctive dynamic characteristics from a speech spectrum. This work was inspired by two temporal dynamics of a speech signal. One is the highly non‐stationary nature of speech, and the other is the inter‐frame change of a speech spectrum. We adopt the use of a sub‐frame spectrum analyzer to capture very rapid spectral changes within… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
12
0

Year Published

2014
2014
2024
2024

Publication Types

Select...
3
2
2

Relationship

1
6

Authors

Journals

citations
Cited by 15 publications
(12 citation statements)
references
References 7 publications
(4 reference statements)
0
12
0
Order By: Relevance
“…We first extracted speech feature vectors at every 10 ms for a 20 ms analysis window Lee et al 2014). Then, we trained separately native acoustic models (AMs) and non-native AMs by using native and non-native utterances, respectively (Young et al 2009).…”
Section: Non-native-optimized Speech Recognitionmentioning
confidence: 99%
“…We first extracted speech feature vectors at every 10 ms for a 20 ms analysis window Lee et al 2014). Then, we trained separately native acoustic models (AMs) and non-native AMs by using native and non-native utterances, respectively (Young et al 2009).…”
Section: Non-native-optimized Speech Recognitionmentioning
confidence: 99%
“…1 shows widely used decision logic [2]. This condition is usually based on heuristic knowledge, and Fig.…”
Section: Conventional Endpoint Detectionmentioning
confidence: 99%
“…where F is a frame-level WFST representing equations (1) and (2), and where U is an utterance-level WFST representing equation (4) with two additional output symbols to mark the begin-of-utterance (BOU) and end-of-utterance (EOU). The endpoint is detected if the output symbol of the last transition of the best path P, o(e t ), satisfies the following condition:…”
Section: Wfst-based Endpoint Detectionmentioning
confidence: 99%
See 1 more Smart Citation
“…After filtering with the proposed blind normalization filtering approach, 13 MFCCs were extracted by taking a discrete cosine transform. We then derived 39 dynamic feature vectors (inter-frame features) and one intra-log energy measure from the 13 MFCC features [2]. For our speech recognition experiments, we used 53 feature vector sequences (13 MFCCs + 39 dynamic features + 1 intra-log energy).…”
Section: Speech Databasementioning
confidence: 99%