Very deep CNNs achieve state-of-the-art results in both computer vision and speech recognition, but are difficult to train. The most popular way to train very deep CNNs is to use shortcut connections (SC) together with batch normalization (BN). Inspired by Self-Normalizing Neural Networks [1], we propose the self-normalizing deep CNN (SNDCNN) based acoustic model topology, by removing the SC/BN and replacing the typical RELU activations with scaled exponential linear unit (SELU) in ResNet-50. SELU activations make the network self-normalizing and remove the need for both shortcut connections and batch normalization. Compared to ResNet-50, we can achieve the same or lower word error rate (WER) while at the same time improving both training and inference speed by 60%-80%. We also explore other model inference optimizations to further reduce latency for production use.
A special section on Enterprise Resource Planning (ERP) covers experiences planning and implementing large-scale projects across diverse global organizations. Articles appearing the section address ERP issues such as: enterprise application package componentization, business process models, reengineering, customization, and system migrations.April feature article topics include: the ethics of safety-critical systems, e-cataloging systems, parallel computing, intrusion detection systems, and muttisensor data fusion.
PyUc reporting bwden lor tlis colictkiii o( Mormatioii ii eitinatnl to avnago 1 how per mponsi, indtnfng tin timo for rniowng imtnictioiis, saorcMig uisting diti sonms, gatharing ami miintaMng tha data ivoileil, and com|dating anti raviamg diB cofcctnn 0) Woimaaoii Send conmools regarding this burden estimate er any other aspect of this alection of inlomiation, induding suggestions for teduciog this burden, to Washington Haadquartsrs Sendees, Directorate for hfomation OperMkns aid Reports. 1215 Jefferson Davis Highwav, Suite 1204, Arlington, VA 22202-4302, and to the Office of Menagmont and Budget, Paperwork Beihction Project (0704-0 ml, Washington, DC 20503. AGENCY USE ONLY ILeave blank)2. REPORT DATE SEPTEMBER 2002 REPORT TYPE AMD DATES COVEREDFinal Jun97-Jun00 TITLE AND SUBTITLE ROUGH 'N' READY: A MEETING RECORDER AND BROWSER AUTHOR(S)John Makhoul and Francis Kubala PERFORMING ORGANIZATION NAMEIS) AND ADDRESSIES)BBN Technologies 10 Moulton Street Cambridge Massachusetts 02138 SPONSORINGIMONITORING AGENCY NAME(S| AND ADDRESSIES)Defense The objective of this effort is to integrate and enhance existing technologies m speech recognition, speaker identification, and topic classification to provide cost-effective transcription, structural summarization, and retrieval of user-specified aspects of meetings. A software system consistuig of a meeting recorder and browser was designed and developed to provide a higher level view of collaborative meetings, co-locational or distributed and a way to browse through and listen to those parts which are most relevant to the user. List of Publications 14 At the outset of this contract, we set four major goals: SUBJECT TERMS List of Figures List of TablesIntegration of several diverse speech and language processing components to extract a richcontent representation of audio data Transfer of Rough'n'Ready technology to real-world applications for the U.S. Government Over the original 36-month period of this contract, we made substantial and demonstrable progress against each of these four objectives. Component IntegrationEight advanced speech and language technologies were successfully taken from the research enviroimient and developed into runtime components that work together to produce a richcontent representation of audio. By the end of the contract, we had successfiiUy integrated the following components into a unified system:• Speaker Change Detection Integration of such diverse speech and language technologies is a novel approach for extracting information from speech and audio sources. Rough'n'Ready has clearly demonstrated that there is considerable benefit for using all available acoustic and linguistic extraction technology at once. Individually, each technology is imperfect, but together, they offer a rich and redundant representation of speech that is usefiil for many important applications. ' Browser PresentationThe rich-content meta-data extracted by Rpugh'n'Ready is stored in a relational database and is made available to users using a common browser over the In...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.