Though people rarely speak in complete sentences, punctuation confers many benefits to the readers of transcribed speech. Unfortunately, most ASR systems do not produce punctuated output. To address this, we propose a solution for automatic punctuation that is both cost efficient and easy to train. Our solution benefits from the recent trend in fine-tuning transformer-based language models. We also modify the typical framing of this task by predicting punctuation for sequences rather than individual tokens, which makes for more efficient training and inference. Finally, we find that aggregating predictions across multiple context windows improves accuracy even further. Our best model achieves a new state of the art on benchmark data (TED Talks) with a combined F1 of 83.9, representing a 48.7% relative improvement (15.3 absolute) over the previous state of the art.
Cognitive tests have traditionally resorted to standardizing testing materials in the name of equality and because of the onerous nature of creating test items. This approach ignores participants' diverse language experiences that potentially significantly affect testing outcomes. Here, we seek to explain our prior finding of significant performance differences on two cognitive tests (reading span and SPiN) between clusters of participants based on their media consumption. Here, we model the language contained in these media sources using an LSTM trained on corpora of each cluster's media sources to predict target words. We also model semantic similarity of test items with each cluster's corpus using skip-thought vectors. We find robust, significant correlations between performance on the SPiN test and the LSTMs and skip-thought models we present here, but not the reading span test.
Product reviews and satisfaction surveys seek customer feedback in the form of ranked scales. In these settings, widely used evaluation metrics including F1 and accuracy ignore the rank in the responses (e.g., 'very likely' is closer to 'likely' than 'not at all'). In this paper, we hypothesize that the order of class values is important for evaluating classifiers on ordinal target variables and should not be disregarded. To test this hypothesis, we compared Multi-class Classification (MC) and Ordinal Regression (OR) by applying OR and MC to benchmark tasks involving ordinal target variables using the same underlying model architecture. Experimental results show that while MC outperformed OR for some datasets in accuracy and F1, OR is significantly better than MC for minimizing the error between prediction and target for all benchmarks, as revealed by error-sensitive metrics, e.g. mean-squared error (MSE) and Spearman correlation. Our findings motivate the need to establish consistent, error-sensitive metrics for evaluating benchmarks with ordinal target variables, and we hope that it stimulates interest in exploring alternative losses for ordinal problems.
In the real world, speech perception frequently occurs under adverse listening conditions. Laboratory studies have identified distinct phenomena associated with more or less constant sources of noise (energetic masking), and competing verbal information (informational masking). One issue complicating direct comparisons between them is that common paradigms for studying energetic and informational masking di↵er along many dimensions, in particular, informational masking is almost always measured using linguistically meaningful information. We have developed a paradigm that uses temporally patterned noise, with the goal of comparing energetic and informational masking under more comparable conditions. We hypothesized that listeners would be able to take advantage of the structure in the masking noise, providing a processing advantage over energetic masking. The initial experiment provides strong evidence for this hypothesis, but conceptual replications did not produce the same pattern of results – at least with respect to measures of central tendency. A direct replication of the first experiment did not replicate the large di↵erences in the means but a final experiment strengthening the e↵ect did. Interestingly, however, exploratory analyses across all five experiments reveal robust evidence that patterned noise conditions produce increased individual variability. Further, we observed strong correlations, specifically between the patterned conditions. We attribute these findings to as yet unidentified cognitive ability di↵erences allowing some participants to benefit from the use of additional temporal information while others are hurt by the addition of unusable distracting information. Hypothesized predictive measures of task performance, such as working memory, inhibitory control, and musical experience did not correlate with performance, however.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.