Abstract. Automatically identifying the person you are talking with using continuous audio sensing has the potential to enable many pervasive computing applications from memory assistance to annotating life logging data. However, a number of challenges, including energy efficiency and training data acquisition, must be addressed before unobtrusive audio sensing is practical on mobile devices. We built SpeakerSense, a speaker identification prototype that uses a heterogeneous multi-processor hardware architecture that splits computation between a low power processor and the phone's application processor to enable continuous background sensing with minimal power requirements. Using SpeakerSense, we benchmarked several system parameters (sampling rate, GMM complexity, smoothing window size, and amount of training data needed) to identify thresholds that balance computation cost with performance. We also investigated channel compensation methods that make it feasible to acquire training data from phone calls and an automatic segmentation method for training speaker models based on one-to-one conversations.
No abstract
A major hurdle to frequently performing mobile computer vision tasks is the high power consumption of image sensing. In this work, we report the first publicly known experimental and analytical characterization of CMOS image sensors. We find that modern image sensors are not energy-proportional: energy per pixel is in fact inversely proportional to frame rate and resolution of image capture, and thus image sensor systems fail to provide an important principle of energy-aware system design: trading quality for energy efficiency.We reveal two energy-proportional mechanisms, supported by current image sensors but unused by mobile systems: (i) using an optimal clock frequency reduces the power up to 50% or 30% for low-quality single frame (photo) and sequential frame (video) capturing, respectively; (ii) by entering low-power standby mode between frames, an image sensor achieves almost constant energy per pixel for video capture at low frame rates, resulting in an additional 40% power reduction. We also propose architectural modifications to the image sensor that would further improve operational efficiency. Finally, we use computer vision benchmarks to show the performance and efficiency tradeoffs that can be achieved with existing image sensors. For image registration, a key primitive for image mosaicking and depth estimation, we can achieve a 96% success rate at 3 FPS and 0.1 MP resolution. At these quality metrics, an optimal clock frequency reduces image sensor power consumption by 36% and aggressive standby mode reduces power consumption by 95%.
Can we automatically design a Convolutional Network (Con-vNet) with the highest image classification accuracy under the latency constraint of a mobile device? Neural architecture search (NAS) has revolutionized the design of hardware-efficient ConvNets by automating this process. However, the NAS problem remains challenging due to the combinatorially large design space, causing a significant searching time (at least 200 GPU-hours). To alleviate this complexity, we propose Single-Path NAS, a novel differentiable NAS method for designing hardware-efficient ConvNets in less than 4 hours. Our contributions are as follows: 1. Single-path search space: Compared to previous differentiable NAS methods, Single-Path NAS uses one singlepath over-parameterized ConvNet to encode all architectural decisions with shared convolutional kernel parameters, hence drastically decreasing the number of trainable parameters and the search cost down to few epochs. 2. Hardware-efficient ImageNet classification: Single-Path NAS achieves 74.96% top-1 accuracy on ImageNet with 79ms latency on a Pixel 1 phone, which is state-of-the-art accuracy compared to NAS methods with similar inference latency constraints (≤ 80ms). 3. NAS efficiency: Single-Path NAS search cost is only 8 epochs (30 TPUhours), which is up to 5,000× faster compared to prior work. 4. Reproducibility: Unlike all recent mobile-efficient NAS methods which only release pretrained models, we open-source our entire codebase at: https://github.com/dstamoulis/single-path-nas.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.