Parsing transcripts of speech

Caines, Andrew; McCarthy, Michael; Buttery, Paula

doi:10.18653/v1/w17-4604

Cited by 5 publications

(5 citation statements)

References 23 publications

(20 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In addition, no significant improvement is gained if the written data is modified so as to exclude punctuation (ssj no-punct) or perform lowercasing (ssj lc), which even worsens the results. Somewhat surprisingly, no definite conclusion can be drawn on the joint training model based on both spoken and written data (sst+ssj), as the parsers give significantly different results: while Stanford parser substantially outperforms the baseline result when adding written data to the model (similar to the findings by Caines et al (2017)), this addition has a negative affect on UD-Pipe. This could be explained by the fact that global, exhaustive, graph-based parsing systems are more capable of leveraging the richer contextual information gained with a larger train set in comparison with local, greedy, transition-based systems (McDonald and Nivre, 2007).…”

Section: Modifications Of Ud Annotationmentioning

confidence: 97%

“…Nevertheless, apart from research on speechspecific parsing systems, very little research has been dedicated to other, data-related aspects of spoken language parsing. To our knowledge, with expection of Caines et al (2017) and Nasr et al (2014), who investigate the role of different types of training data used for parsing transcripts of speech, there have been no other systematic studies on the role of spoken data representations, such as transcription or annotation conventions, in spoken language parsing.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Er ... well, it matters, right? On the role of data representations in spoken language dependency parsing

Dobrovoljc

Martinc

2018

Proceedings of the Second Workshop on Universal Dependencies (UDW 2018)

View full text Add to dashboard Cite

Despite the significant improvement of datadriven dependency parsing systems in recent years, they still achieve a considerably lower performance in parsing spoken language data in comparison to written data. On the example of Spoken Slovenian Treebank, the first spoken data treebank using the UD annotation scheme, we investigate which speechspecific phenomena undermine parsing performance, through a series of training data and treebank modification experiments using two distinct state-of-the-art parsing systems. Our results show that utterance segmentation is the most prominent cause of low parsing performance, both in parsing raw and pre-segmented transcriptions. In addition to shorter utterances, both parsers perform better on normalized transcriptions including basic markers of prosody and excluding disfluencies, discourse markers and fillers. On the other hand, the effects of written training data addition and speech-specific dependency representations largely depend on the parsing system selected.

show abstract

Section: Modifications Of Ud Annotationmentioning

confidence: 97%

Section: Related Workmentioning

confidence: 99%

Er ... well, it matters, right? On the role of data representations in spoken language dependency parsing

Dobrovoljc

Martinc

2018

Proceedings of the Second Workshop on Universal Dependencies (UDW 2018)

View full text Add to dashboard Cite

show abstract

“…This poses a particular challenge as most models used in data pre-processing and representation learning have been trained on written not spoken texts (Caines et al, 2017). Furthermore, most existing approaches to speech grading do have access to audio features, and indeed extract a large number of prosodic or duration-based features (Zechner et al, 2009;Higgins et al, 2011;Loukina et al, 2017;Wang et al, 2018a).…”

Section: Related Workmentioning

confidence: 99%

Investigating the effect of auxiliary objectives for the automated grading of learner English speech transcriptions

Craighead¹,

Caines

Buttery

et al. 2020

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Self Cite

View full text Add to dashboard Cite

We address the task of automatically grading the language proficiency of spontaneous speech based on textual features from automatic speech recognition transcripts. Motivated by recent advances in multi-task learning, we develop neural networks trained in a multi-task fashion that learn to predict the proficiency level of non-native English speakers by taking advantage of inductive transfer between the main task (grading) and auxiliary prediction tasks: morpho-syntactic labeling, language modeling, and native language identification (L1). We encode the transcriptions with both bi-directional recurrent neural networks and with bi-directional representations from transformers, compare against a featurerich baseline, and analyse performance at different proficiency levels and with transcriptions of varying error rates. Our best performance comes from a transformer encoder with L1 prediction as an auxiliary task. We discuss areas for improvement and potential applications for text-only speech scoring.

show abstract

“…Unlike written discourse, speech is full of disfluencies which make discovering the underlying syntactic structure challenging, as such disfluencies interrupt the syntactic structure of the utterance Caines et al, 2017). For example, according to Meteer and Taylor (1995), 17% of tokens in the Switchboard telephone conversations are various disfluencies.…”

Section: Related Workmentioning

confidence: 99%

Dependency Parsing for Spoken Dialog Systems

Davidson

2019

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conferen

View full text Add to dashboard Cite

Dependency parsing of conversational input can play an important role in language understanding for dialog systems by identifying the relationships between entities extracted from user utterances. Additionally, effective dependency parsing can elucidate differences in language structure and usage for discourse analysis of human-human versus human-machine dialogs. However, models trained on datasets based on news articles and web data do not perform well on spoken human-machine dialog, and currently available annotation schemes do not adapt well to dialog data. Therefore, we propose the Spoken Conversation Universal Dependencies (SCUD) annotation scheme that extends the Universal Dependencies (UD) (Nivre et al., 2016) guidelines to spoken human-machine dialogs. We also provide ConvBank, a conversation dataset between humans and an opendomain conversational dialog system with SCUD annotation. Finally, to demonstrate the utility of the dataset, we train a dependency parser on the ConvBank dataset. We demonstrate that by pre-training a dependency parser on a set of larger public datasets and finetuning on ConvBank data, we achieved the best result, 85.05% unlabeled and 77.82% labeled attachment accuracy.

show abstract

Parsing transcripts of speech

Cited by 5 publications

References 23 publications

Er ... well, it matters, right? On the role of data representations in spoken language dependency parsing

Er ... well, it matters, right? On the role of data representations in spoken language dependency parsing

Investigating the effect of auxiliary objectives for the automated grading of learner English speech transcriptions

Dependency Parsing for Spoken Dialog Systems

Contact Info

Product

Resources

About