Given the size of modern cities in the urbanising age, it is beyond the perceptual capacity of most people to develop a good knowledge about the beauty and ugliness of the city at every street corner. Correspondingly, for planners, it is also difficult to accurately answer questions like 'where are the worst-looking places in the city that regeneration should give first consideration', or 'in the fast urbanising cities, how is the city appearance changing', etc. To address this issue, we here present a computer vision method for the large-scale and automatic evaluation of the urban visual environment, by leveraging state-of-the-art machine learning techniques and the wide-coverage street view images. From the various factors that are at work, we choose two key features, the visual quality of street façade and the continuity of street wall, as the starting point of this line of analysis. In order to test the validity of this method, we further compare the machine ratings with ratings collected on site from 752 passers-by on fifty-six locations. We show that the machine learning model can produce a good estimation of people's real visual experience, and it holds much potential for various tasks in terms of urban design evaluation, culture identification, etc.
Recent progress in acoustic modeling with deep neural network has significantly improved the performance of automatic speech recognition systems. However, it remains as an open problem how to rapidly adapt these networks with limited, unsupervised, data. Most existing methods to adapt a neural network involve modifying a large number of parameters thus rapid adaptation is not possible with these schemes. In this paper, the multi-basis adaptive neural network is proposed, a new neural network configuration which only requires very few parameters for adaptation. By modifying the topology of a single multi-layer perceptron, a set of sub-networks with restricted connectivity are introduced to collaboratively capture different acoustic properties. The outputs of those sub-networks are combined by speaker-dependent interpolation weights. In addition, the complete system can be optimized in an adaptive training fashion when non-homogeneous training data are used. The performance of unsupervised adaptation is evaluated on two datasets. It outperforms the speaker-independent hybrid DNN-HMM baseline both on the Broadcast News English and the AURORA-4 tasks.
This paper proposes an efficient memory transformer Emformer for low latency streaming speech recognition. In Emformer, the longrange history context is distilled into an augmented memory bank to reduce self-attention's computation complexity. A cache mechanism saves the computation for the key and value in self-attention for the left context. Emformer applies a parallelized block processing in training to support low latency models. We carry out experiments on benchmark LibriSpeech data. Under average latency of 960 ms, Emformer gets WER 2.50% on test-clean and 5.62% on test-other. Comparing with a strong baseline augmented memory transformer (AM-TRF), Emformer gets 4.6 folds training speedup and 18% relative real-time factor (RTF) reduction in decoding with relative WER reduction 17% on test-clean and 9% on test-other. For a low latency scenario with an average latency of 80 ms, Emformer achieves WER 3.01% on test-clean and 7.09% on test-other. Comparing with the LSTM baseline with the same latency and model size, Emformer gets relative WER reduction 9% and 16% on test-clean and testother, respectively.
Background: No data is available about in-flight transmission of SARS-CoV-2. Here, we report an in-flight transmission cluster of COVID-19 and describe the clinical characteristics of these patients. Methods: After a flight, laboratory-confirmed COVID-19 was reported in 12 patients. Ten patients were admitted to the designated hospital. Data was collected from 25th January to 28th February 2020. Clinical information was retrospectively collected. Results: All patients were passengers, and none were flight attendants. The median age was 33 years, and 70% were females. None was admitted to intensive care unit, and no patients died up to 28th February. The median incubation period was 3.0 days and time from onset of illness to hospital admission was 2 days. The most common symptom was fever. Two patients were asymptomatic and had normal chest CT scan during hospital stay. On admission, initial RT-PCR was positive in 9 patients, and initial chest CT was positive in half of the patients. The median lung 'total severity score' of chest CT was 6. 'Crazy-paving' pattern, pleural effusion, and ground-glass nodules were seen. Conclusion: There is potential for COVID-19 transmission in aeroplanes, but the symptoms were mild in our patients. Passengers and attendants must be protected during flights.
Transformer-based acoustic modeling has achieved great success for both hybrid and sequence-to-sequence speech recognition. However, it requires access to the full sequence, and the computational cost grows quadratically with respect to the input sequence length. These factors limit its adoption for streaming applications. In this work, we proposed a novel augmented memory self-attention, which attends on a short segment of the input sequence and a bank of memories. The memory bank stores the embedding information for all the processed segments. On the librispeech benchmark, our proposed method outperforms all the existing streamable transformer methods by a large margin and achieved over 15% relative error reduction, compared with the widely used LC-BLSTM baseline. Our findings are also confirmed on some large internal datasets.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.