Sreyan Ghosh scite author profile

Sreyan Ghosh

5Publications

56Citation Statements Received

71Citation Statements Given

How they've been cited

105

How they cite others

Affiliations

University of Maryland, College Park, Christ University, Indian Institute of Technology Madras

Publications

Order By: Most citations

End-to-End Named Entity Recognition from English Speech

Yadav

Ghosh

et al. 2020

View full text Add to dashboard Cite

Named entity recognition (NER) from text has been a widely studied problem and usually extracts semantic information from text. Until now, NER from speech is mostly studied in a twostep pipeline process that includes first applying an automatic speech recognition (ASR) system on an audio sample and then passing the predicted transcript to a NER tagger. In such cases, the error does not propagate from one step to another as both the tasks are not optimized in an end-to-end (E2E) fashion. Recent studies confirm that integrated approaches (e.g., E2E ASR) outperform sequential ones (e.g., phoneme based ASR). In this paper, we introduce a first publicly available NER annotated dataset for English speech and present an E2E approach, which jointly optimizes the ASR and NER tagger components. Experimental results show that the proposed E2E approach outperforms the classical two-step approach. We also discuss how NER from speech can be used to handle out of vocabulary (OOV) words in an ASR system.

show abstract

Accident Detection Using Convolutional Neural Networks

Ghosh

Sunny

Roney

2019

View full text Add to dashboard Cite

End-to-end Named Entity Recognition from English Speech

Yadav¹,

Ghosh²,

Chen³

et al. 2020

Preprint

View full text Add to dashboard Cite

DeLoRes: Decorrelating Latent Spaces for Low-Resource Audio Representation Learning

Ghosh¹,

Seth²,

Umesh³

2022

Preprint

View full text Add to dashboard Cite

Inspired by the recent progress in self-supervised learning for computer vision, in this paper, through the DeLoRes learning framework, we introduce two new general-purpose audio representation learning approaches, the DeLoRes-S and DeLoRes-M. Our main objective is to make our network learn representations in a resource-constrained setting (both data and compute), that can generalize well across a diverse set of downstream tasks. Inspired from the Barlow Twins objective function, we propose to learn embeddings that are invariant to distortions of an input audio sample, while making sure that they contain non-redundant information about the sample. To achieve this, we measure the cross-correlation matrix between the outputs of two identical networks fed with distorted versions of an audio segment sampled from an audio file and make it as close to the identity matrix as possible. We call this the DeLoRes learning framework, which we employ in different fashions with the DeLoRes-S and DeLoRes-M. We use a combination of a small subset of the largescale AudioSet dataset and FSD50K for self-supervised learning and are able to learn with less than half the parameters compared to state-of-the-art algorithms. For evaluation, we transfer these learned representations to 11 downstream classification tasks, including speech, music, and animal sounds, and achieve stateof-the-art results on 7 out of 11 tasks on linear evaluation with DeLoRes-M and show competitive results with DeLoRes-S, even when pre-trained using only a fraction of the total data when compared to prior art. Our transfer learning evaluation setup also shows extremely competitive results for both DeLoRes-S and DeLoRes-M, with DeLoRes-M achieving state-of-the-art in 4 tasks. In addition to being simple and intuitive, our pre-training procedure is amenable to compute through its inherent nature of construction and does not require careful implementation details to avoid trivial or degenerate solutions like prior art in this domain. Furthermore, we conduct extensive ablation studies on our training algorithm, model architecture and results, and make all our code and pre-trained models publicly available 1 .

show abstract

Analyzing the factors affecting usefulness of Self-Supervised Pre-trained Representations for Speech Recognition

Prasad¹,

Seth²,

Ghosh³

et al. 2022

Preprint

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Sreyan Ghosh

End-to-End Named Entity Recognition from English Speech

Accident Detection Using Convolutional Neural Networks

End-to-end Named Entity Recognition from English Speech

DeLoRes: Decorrelating Latent Spaces for Low-Resource Audio Representation Learning

Analyzing the factors affecting usefulness of Self-Supervised Pre-trained Representations for Speech Recognition

Contact Info

Product

Resources

About