Proceedings of the 2018 Conference of the North American Chapter Of the Association for Computational Linguistics: Hu 2018
DOI: 10.18653/v1/n18-1007
|View full text |Cite
|
Sign up to set email alerts
|

Parsing Speech: a Neural Approach to Integrating Lexical and Acoustic-Prosodic Information

Abstract: In conversational speech, the acoustic signal provides cues that help listeners disambiguate difficult parses. For automatically parsing spoken utterances, we introduce a model that integrates transcribed text and acoustic-prosodic features using a convolutional neural network over energy and pitch trajectories coupled with an attention-based recurrent neural network that accepts text and prosodic features. We find that different types of acoustic-prosodic features are individually helpful, and together give s… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
82
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
6
3
1

Relationship

2
8

Authors

Journals

citations
Cited by 40 publications
(83 citation statements)
references
References 25 publications
1
82
0
Order By: Relevance
“…Fundamental frequency (F0) and Energy (E). Similar to Tran et al (2018), we use three F0 features and three energy features. The three F0 features include normalized cross correlation function (NCCF), log-pitch weighted by probability of voicing (POV), and the estimated delta of log pitch.…”
Section: Prosodic Cuesmentioning
confidence: 99%
“…Fundamental frequency (F0) and Energy (E). Similar to Tran et al (2018), we use three F0 features and three energy features. The three F0 features include normalized cross correlation function (NCCF), log-pitch weighted by probability of voicing (POV), and the estimated delta of log pitch.…”
Section: Prosodic Cuesmentioning
confidence: 99%
“…This work was motivated in part by a prior study showing that transcription errors impact findings related to the usefulness of prosodic features in parsing [18], i.e., a significant fraction of the cases where prosody seems to hurt parsing are associated with transcription errors. The availability of the new disfluency annotations will make it possible to explore this question for disfluencies.…”
Section: Discussionmentioning
confidence: 99%
“…Our statistical analysis is hence conservative and points out only the most direct and most relevant correlations. Feature combinations instead of simple flat representations have lead to a break-through in parsing [9]. Hence, we expect many more and more complex interplays in the syntax-prosody interface to be found in future work building on more complex notions of prosodic and syntactic features.…”
Section: Discussionmentioning
confidence: 99%