Given an arbitrary face image and an arbitrary speech clip, the proposed work attempts to generate the talking face video with accurate lip synchronization. Existing works either do not consider temporal dependency across video frames thus yielding abrupt facial and lip movement or are limited to the generation of talking face video for a specific person thus lacking generalization capacity. We propose a novel conditional recurrent generation network that incorporates both image and audio features in the recurrent unit for temporal dependency. To achieve both image-and videorealism, a pair of spatial-temporal discriminators are included in the network for better image/video quality. Since accurate lip synchronization is essential to the success of talking face video generation, we also construct a lip-reading discriminator to boost the accuracy of lip synchronization. We also extend the network to model the natural pose and expression of talking face on the Obama Dataset. Extensive experimental results demonstrate the superiority of our framework over the state-of-the-art in terms of visual quality, lip sync accuracy, and smooth transition pertaining to both lip and facial movement.
Large volumes of data from material characterizations call for rapid and automatic data analysis to accelerate materials discovery. Herein, we report a convolutional neural network (CNN) that was trained based on theoretic data and very limited experimental data for fast identification of experimental X-ray diffraction (XRD) spectra of metal-organic frameworks (MOFs). To augment the data for training the model, noise was extracted from experimental spectra and shuffled, then merged with the main peaks that were extracted from theoretical spectra to synthesize new spectra.For the first time, one-to-one material identification was achieved. The optimized model showed the highest identification accuracy of 96.7% for the Top 5 ranking among a dataset of 1012 MOFs.Neighborhood components analysis (NCA) on the experimental XRD spectra shows that the spectra from the same material are clustered in groups in the NCA map. Analysis on the class activation maps of the last CNN layer further discloses the mechanism by which the CNN model successfully identifies individual MOFs from the XRD spectra. This CNN model trained by the data-augmentation technique would not only open numerous potential applications for identifying XRD spectra for different materials, but also pave avenues to autonomously analyze data by other characterization tools such as FTIR, Raman, and NMR.
Machine lipreading is a special type of automatic speech recognition (ASR) which transcribes human speech by visually interpreting the movement of related face regions including lips, face, and tongue. Recently, deep neural network based lipreading methods show great potential and have exceeded the accuracy of experienced human lipreaders in some benchmark datasets. However, lipreading is still far from being solved, and existing methods tend to have high error rates on the wild data. In this paper, we propose LCANet, an end-to-end deep neural network based lipreading system. LCANet encodes input video frames using a stacked 3D convolutional neural network (CNN), highway network and bidirectional GRU network. The encoder effectively captures both short-term and long-term spatio-temporal information. More importantly, LCANet incorporates a cascaded attention-CTC decoder to generate output texts. By cascading CTC with attention, it partially eliminates the defect of the conditional independence assumption of CTC within the hidden neural layers, and this yields notably performance improvement as well as faster convergence. The experimental results show the proposed system achieves a 1.3% CER and 3.0% WER on the GRID corpus database, leading to a 12.3% improvement compared to the state-of-the-art methods.
The analysis of nuclear magnetic resonance (NMR) spectra for the comprehensive and unambiguous identification and characterization of peaks is a difficult, but critically important step in all NMR analyses of complex biological molecular systems. Here, we introduce DEEP Picker, a deep neural network (DNN)-based approach for peak picking and spectral deconvolution which semi-automates the analysis of two-dimensional NMR spectra. DEEP Picker includes 8 hidden convolutional layers and was trained on a large number of synthetic spectra of known composition with variable degrees of crowdedness. We show that our method is able to correctly identify overlapping peaks, including ones that are challenging for expert spectroscopists and existing computational methods alike. We demonstrate the utility of DEEP Picker on NMR spectra of folded and intrinsically disordered proteins as well as a complex metabolomics mixture, and show how it provides access to valuable NMR information. DEEP Picker should facilitate the semi-automation and standardization of protocols for better consistency and sharing of results within the scientific community.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.