Standard deep neural network-based acoustic models for automatic speech recognition (ASR) rely on hand-engineered input features, typically log-mel filterbank magnitudes. In this paper, we describe a convolutional neural network -deep neural network (CNN-DNN) acoustic model which takes raw multichannel waveforms as input, i.e. without any preceding feature extraction, and learns a similar feature representation through supervised training.By operating directly in the time domain, the network is able to take advantage of the signal's fine time structure that is discarded when computing filterbank magnitude features. This structure is especially useful when analyzing multichannel inputs, where timing differences between input channels can be used to localize a signal in space. The first convolutional layer of the proposed model naturally learns a filterbank that is selective in both frequency and direction of arrival, i.e. a bank of bandpass beamformers with an auditory-like frequency scale. When trained on data corrupted with noise coming from different spatial locations, the network learns to filter them out by steering nulls in the directions corresponding to the noise sources. Experiments on a simulated multichannel dataset show that the proposed acoustic model outperforms a DNN that uses log-mel filterbank magnitude features under noisy and reverberant conditions.
Unsupervised word translation from nonparallel inter-lingual corpora has attracted much research interest. Very recently, neural network methods trained with adversarial loss functions achieved high accuracy on this task. Despite the impressive success of the recent techniques, they suffer from the typical drawbacks of generative adversarial models: sensitivity to hyper-parameters, long training time and lack of interpretability. In this paper, we make the observation that two sufficiently similar distributions can be aligned correctly with iterative matching methods. We present a novel method that first aligns the second moment of the word distributions of the two languages and then iteratively refines the alignment. Extensive experiments on word translation of European and Non-European languages show that our method achieves better performance than recent state-of-the-art deep adversarial approaches and is competitive with the supervised baseline. It is also efficient, easy to parallelize on CPU and interpretable.
Anomaly detection methods require high-quality features. One way of obtaining strong features is to adapt pre-trained features to anomaly detection on the target distribution. Unfortunately, simple adaptation methods often result in catastrophic collapse (feature deterioration) and reduce performance. DeepSVDD combats collapse by removing biases from architectures, but this limits the adaptation performance gain. In this work, we propose two methods for combating collapse: i) a variant of early stopping that dynamically learns the stopping iteration ii) elastic regularization inspired by continual learning. In addition, we conduct a thorough investigation of Imagenet-pretrained features for one-class anomaly detection. Our method, PANDA, outperforms the state-of-the-art in the one-class and outlier exposure settings (CIFAR10: 96.2% vs. 90.1% and 98.9% vs. 95.6%).
Deep anomaly detection methods learn representations that separate between normal and anomalous samples. Very effective representations are obtained when powerful externally trained feature extractors (e.g. ResNets pre-trained on Ima-geNet) are fine-tuned on the training data which consists of normal samples and no anomalies. However, this is a difficult task that can suffer from catastrophic collapse, i.e. it is prone to learning trivial and non-specific features. In this paper, we propose a new loss function which can overcome failure modes of both center-loss and contrastive-loss methods. Furthermore, we combine it with a confidence-invariant angular center loss, which replaces the Euclidean distance used in previous work, that was sensitive to prediction confidence. Our improvements yield a new anomaly detection approach, based on Mean-Shifted Contrastive Loss, which is both more accurate and less sensitive to catastrophic collapse than previous methods. Our method achieves state-of-the-art anomaly detection performance on multiple benchmarks including 97.5% ROC-AUC on the CIFAR-10 dataset 1 . IntroductionAnomaly detection is a fundamental task for intelligent agents that aims to detect if an observed pattern is normal or anomalous (unusual or unlikely). Anomaly detection has broad applications in scientific and industrial tasks such as detecting new physical phenomena (black holes, supernovae) or genetic mutations, as well as production line inspection and video surveillance. Due to the significance of the task, many efforts have been focused on automatic anomaly detection, particularly on statistical and machine learning methods. A common paradigm used by many anomaly detection methods is measuring the probability of samples and assigning high-probability samples as normal and low-probability samples as anomalous. The quality of the density estimators is closely related to the quality of features used to represent the data. Classical methods used statistical estimators such as K nearest-neighbors (kNN) or Gaussian mixture models (GMMs) on raw features, however this often results in sub-optimal results on high-dimensional data such as images. Many recent methods, learn features in a self-supervised way and use them in order to detect anomalies. Their main weakness is that anomaly detection datasets are typically small and do not include anomalous samples resulting in weak features. An alternative direction, which achieved better results, is to transfer features learned from auxiliary tasks on large-scale external datasets such as ImageNet classification. It was found that fine-tuning the pre-trained features on the normal training data can result in significant performance improvements, however it is quite challenging. The main issue with fine-tuning on one-class classification (OCC) tasks such as anomaly detection is catastrophic collapse i.e. after an initial improvement in efficacy, the features degrade and become uninformative. This phenomenon is caused by trivial solutions allowed by OCC tasks such as t...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.