Abstract-Advances in formulating spoken document retrieval for a new National Gallery of the Spoken Word (NGSW) are addressed. NGSW is the first large-scale repository of its kind, consisting of speeches, news broadcasts, and recordings from the 20th century. After presenting an overview of the audio stream content of the NGSW, with sample audio files from U.S. Presidents from 1893 to the present, an overall system diagram is proposed with a discussion of critical tasks associated with effective audio information retrieval. These include advanced audio segmentation, speech recognition model adaptation for acoustic background noise and speaker variability, and information retrieval using natural language processing for text query requests that include document and query expansion. For segmentation, a new evaluation criterion entitled fused error score (FES) is proposed, followed by application of the CompSeg segmentation scheme on DARPA Hub4 Broadcast News (30.5% relative improvement in FES) and NGSW data. Transcript generation is demonstrated for a six-decade portion of the NGSW corpus. Novel model adaptation using structure maximum likelihood eigenspace mapping shows a relative 21.7% improvement. Issues regarding copyright assessment and metadata construction are also addressed for the purposes of a sustainable audio collection of this magnitude. Advanced parameter-embedded watermarking is proposed with evaluations showing robustness to correlated noise attacks. Our experimental online system entitled "SpeechFind" is presented, which allows for audio retrieval from a portion of the NGSW corpus. Finally, a number of research challenges such as language modeling and lexicon for changing time periods, speaker trait and identification tracking, as well as new directions, are discussed in Manuscript
Fake news has itself become a prominent news topic in recent years. This ASIST President's Invited Panel will focus on the need for and roles filled by information professionals in preparing the public to become more critical consumers of information products and services, as well as discuss research around the development of tools and algorithmic solutions that filter, detect and flag fake stories.
Purpose -This editorial seeks to examine the definition of a "digital library" to see whether one can be constructed that usefully distinguishes a digital library from other types of electronic resources. Design/methodology/approach -The primary methodology compares definitions from multiple settings, including formal institutional settings, working definitions from articles, and a synthesis created in a seminar at Humboldt University in Berlin. Findings -At this point, digital libraries are evolving too fast for any lasting definition. Definitions that users readily understand are too broad and imprecise, and definitions with more technical precision quickly grow too obscure for common use. Originality/value -A functional definition of a digital library would add clarity to a burgeoning field, especially when trying to evaluate a resource. The student perspective provides a fresh look at the problem.
Purpose -To introduce the special theme issue on "Content management systems". Design/methodology/approach -Each of the articles in the theme are described in brief. Findings -The articles cover a range of topics from implementation to interoperability, object-oriented database management systems, and research about meeting user needs. Originality/value -Libraries have only just begun to realize that their web presence is potentially as rich and complex as their online catalogs, and that it needs an equal amount of management to keep it under control.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.