The first part of this paper surveys and classifies the best performing currently available lossless compression algorithms for stereo-CD-quality digital audio signals sampled at 44.1 kHz and quantized to 16 bits. This study finds that these algorithms appear to have r eached a limit in compression that is very modest compared to what can be achieved with lossy audio coding technology. With this limit as a target, we designed a computationally efficient algorithm for lossless audio compression (which we call AudioPaK). T his new lossless compression algorithm uses only a small number of integer arithmetic operations and performs as well, or better than most state-of-the-art lossless compression algorithms. The main operations of the algorithm are prediction with integer coefficients and Golomb-Rice coding. The second part of the paper presents the complete architecture of AudioPaK including suggestions on parallelizing parts of the algorithm using the MMX instruction set.
This is the accepted version of the paper.This version of the publication may differ from the final published version. Permanent repository link KeywordsPhoto hull, image-based rendering, 3D photography, new view synthesis, voxel coloring, space carving, color consistency, view-dependent scene reconstruction.
Multilingual ASR technology simplifies model training and deployment, but its accuracy is known to depend on the availability of language information at runtime. Since language identity is seldom known beforehand in real-world scenarios, it must be inferred on-the-fly with minimum latency. Furthermore, in voice-activated smart assistant systems, language identity is also required for downstream processing of ASR output. In this paper, we introduce streaming, end-to-end, bilingual systems that perform both ASR and language identification (LID) using the recurrent neural network transducer (RNN-T) architecture. On the input side, embeddings from pretrained acousticonly LID classifiers are used to guide RNN-T training and inference, while on the output side, language targets are jointly modeled with ASR targets. The proposed method is applied to two language pairs: English-Spanish as spoken in the United States, and English-Hindi as spoken in India. Experiments show that for English-Spanish, the bilingual joint ASR-LID architecture matches monolingual ASR and acoustic-only LID accuracies. For the more challenging (owing to within-utterance code switching) case of English-Hindi, English ASR and LID metrics show degradation. Overall, in scenarios where users switch dynamically between languages, the proposed architecture offers a promising simplification over running multiple monolingual ASR models and an LID classifier in parallel.
This is the accepted version of the paper.This version of the publication may differ from the final published version. Permanent
We present a multi-resolution space carving algorithm that reconstructs a 3D model of visual scene photographed by a calibrated digital camera placed at multiple viewpoints. Our approach employs a level set framework for reconstructing the scene. Unlike most standard space carving approaches, our level set approach produces a smooth reconstruction composed of manifold surfaces. Our method outputs a polygonal model, instead of a collection of voxels. We texturemap the reconstructed geometry using the photographs, and then render the model to produce photo-realistic new views of the scene.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.