Automatic evaluation of spoken summaries: the case of language assessment

Loukina, Anastassia; Zechner, Klaus; Chen, Lei

doi:10.3115/v1/w14-1809

Cited by 9 publications

(8 citation statements)

References 17 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We use significance test to prove that similarity metric is reliable even though the numerical difference of similarity scores in experiment is little. Because the similarity scores of generated summaries do not follow normal distribution, we take Kruskal-Wallis test (Loukina et al, 2014;Albert, 2017) as our significance test to measure that the difference of similarity results of three methods is significant or not. As shown in Table 9, all pvalues are less than 0.05.…”

Section: Significance Test On Similarity Resultsmentioning

confidence: 99%

Controlling Length in Abstractive Summarization Using a Convolutional Neural Network

Liu¹,

Luo²,

Zhu³

2018

Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

View full text Add to dashboard Cite

Convolutional neural networks (CNNs) have met great success in abstractive summarization, but they cannot effectively generate summaries of desired lengths. Because generated summaries are used in difference scenarios which may have space or length constraints, the ability to control the summary length in abstractive summarization is an important problem. In this paper, we propose an approach to constrain the summary length by extending a convolutional sequence to sequence model. The results show that this approach generates high-quality summaries with user defined length, and outperforms the baselines consistently in terms of ROUGE score, length variations and semantic similarity.

show abstract

Section: Significance Test On Similarity Resultsmentioning

confidence: 99%

Controlling Length in Abstractive Summarization Using a Convolutional Neural Network

Liu¹,

Luo²,

Zhu³

2018

Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

View full text Add to dashboard Cite

show abstract

“…In addition to the relatively straightforward method of using CVA models and cosine similarity calculations to produce the content features, additional approaches have been investigated for scoring spontaneous speech. Some of these include using latent semantic analysis (LSA; Metallinou & Cheng, ), pointwise mutual information (Xie, Evanini, & Zechner, ), and the ROUGE summarization evaluation metric (Lin & Rey, ; Loukina, Zechner, & Chen, ).…”

Section: Discussionmentioning

confidence: 99%

Automated Scoring of Nonnative Speech Using the SpeechRater^SMv. 5.0 Engine

Chen

Zechner

Yoon

et al. 2018

ETS Research Report Series

Self Cite

View full text Add to dashboard Cite

This research report provides an overview of the R&D efforts at Educational Testing Service related to its capability for automated scoring of nonnative spontaneous speech with the SpeechRaterSM automated scoring service since its initial version was deployed in 2006. While most aspects of this R&D work have been published in various venues in recent years, no comprehensive account of the current state of SpeechRater has been provided since the initial publications following its first operational use in 2006. After a brief review of recent related work by other institutions, we summarize the main features and feature classes that have been developed and introduced into SpeechRater in the past 10 years, including features measuring aspects of pronunciation, prosody, vocabulary, grammar, content, and discourse. Furthermore, new types of filtering models for flagging nonscorable spoken responses are described, as is our new hybrid way of building linear regression scoring models with improved feature selection. Finally, empirical results for SpeechRater 5.0 (operationally deployed in 2016) are provided.

show abstract

“…Since the early 2000s, several groups have built systems for scoring less constrained and more unpredictable speaking items, which incorporated additional sources of information for scoring, for example, diversity of vocabulary or grammatical complexity (Bernstein et al, ; Chen & Zechner, ; Strik, Van De Loo, Van Doremalen, & Cucchiarini, ; Yoon, Bhat, & Zechner, ; Zechner, Higgins, Xi, & Williamson, ). Recent work has also looked at evaluating the content relevance of spoken responses (Loukina, Zechner, & Chen, ; Somasundaran, Lee, Chodorow, & Wang, ; Xie, Evanini, & Zechner, ).…”

Section: Overview Of Item Types Usedmentioning

confidence: 99%

Performance of Automated Speech Scoring on Different Low‐ to Medium‐Entropy Item Types for Low‐Proficiency English Learners

Loukina

Zechner

Yoon

et al. 2017

ETS Research Report Series

Self Cite

View full text Add to dashboard Cite

This report presents an overview of the SpeechRaterSM automated scoring engine model building and evaluation process for several item types with a focus on a low‐English‐proficiency test‐taker population. We discuss each stage of speech scoring, including automatic speech recognition, filtering models for nonscorable responses, and scoring model building and evaluation and compare how the performance at each step differs between different item types. We conclude by discussing the effect of item type on automated scoring performance. We also give recommendations about what considerations should be taken into account when developing tests for low‐proficiency English speakers to obtain reliable scores from an automatic scoring engine.

show abstract

Automatic evaluation of spoken summaries: the case of language assessment

Cited by 9 publications

References 17 publications

Controlling Length in Abstractive Summarization Using a Convolutional Neural Network

Controlling Length in Abstractive Summarization Using a Convolutional Neural Network

Automated Scoring of Nonnative Speech Using the SpeechRater^SMv. 5.0 Engine

Performance of Automated Speech Scoring on Different Low‐ to Medium‐Entropy Item Types for Low‐Proficiency English Learners

Contact Info

Product

Resources

About

Automatic evaluation of spoken summaries: the case of language assessment

Cited by 9 publications

References 17 publications

Controlling Length in Abstractive Summarization Using a Convolutional Neural Network

Controlling Length in Abstractive Summarization Using a Convolutional Neural Network

Automated Scoring of Nonnative Speech Using the SpeechRaterSMv. 5.0 Engine

Performance of Automated Speech Scoring on Different Low‐ to Medium‐Entropy Item Types for Low‐Proficiency English Learners

Contact Info

Product

Resources

About

Automated Scoring of Nonnative Speech Using the SpeechRater^SMv. 5.0 Engine