The task of talking head generation is to synthesize a lip synchronized talking head video by inputting an arbitrary face image and audio clips. Most existing methods ignore the local driving information of the mouth muscles. In this paper, we propose a novel recurrent generative network that uses both audio and speech-related facial action units (AUs) as the driving information. AU information related to the mouth can guide the movement of the mouth more accurately. Since speech is highly correlated with speech-related AUs, we propose an Audio-to-AU module in our system to predict the speech-related AU information from speech. In addition, we use AU classifier to ensure that the generated images contain correct AU information. Frame discriminator is also constructed for adversarial training to improve the realism of the generated face. We verify the effectiveness of our model on the GRID dataset and TCD-TIMIT dataset. We also conduct an ablation study to verify the contribution of each component in our model. Quantitative and qualitative experiments demonstrate that our method outperforms existing methods in both image quality and lip-sync accuracy.Recently, talking head generation has attracted more and more attention in the fields of academic and industry, which is essential in the applications of human-computer interaction, film making, virtual reality, computer games, etc. This research explores how to generate a talking head video by inputting anyone's image as an identity image and driving information related to mouth movement, e.g., speech audio, and text.Before deep learning became popular, many researchers in early work relied on Hidden Markov Models (HMM) to capture the dynamic relationship between audio and lip motion
Background To provide appropriate surgical training guidance, some skill evaluation and safety detection methods have been developed. However, these methods are difficult to provide predictive information for trainees. This paper proposes a new approach for real‐time trajectory prediction of the laparoscopic instrument tip to improve surgical training and the patient safety. Methods This paper proposes a real‐time trajectory prediction model of laparoscopic instrument tip based on long short‐term memory (LSTM) neural network. Meanwhile, motion state is introduced to capture more motion information of the instrument tip and improve the model performance. Results The feasibility, effectiveness and generalisation ability of this proposed model are preliminarily verified. The model shows satisfactory prediction accuracy for the trajectory of the laparoscopic instrument tip. Conclusion LSTM neural network can accurately predict the movement trajectory of the laparoscopic instrument tip. The prediction model can play a critical role in operational risk perception in advance, which can be used in laparoscopic surgery training.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.