Recent Transformer-based contextual word representations, including BERT and XLNet, have shown state-of-the-art performance in multiple disciplines within NLP. Fine-tuning the trained contextual models on task-specific datasets has been the key to achieving superior performance downstream. While finetuning these pre-trained models is straightforward for lexical applications (applications with only language modality), it is not trivial for multimodal language (a growing area in NLP focused on modeling face-to-face communication). Pre-trained models don't have the necessary components to accept two extra modalities of vision and acoustic. In this paper, we proposed an attachment to BERT and XLNet called Multimodal Adaptation Gate (MAG). MAG allows BERT and XL-Net to accept multimodal nonverbal data during fine-tuning. It does so by generating a shift to internal representation of BERT and XLNet; a shift that is conditioned on the visual and acoustic modalities. In our experiments, we study the commonly used CMU-MOSI and CMU-MOSEI datasets for multimodal sentiment analysis. Fine-tuning MAG-BERT and MAG-XLNet significantly boosts the sentiment analysis performance over previous baselines as well as language-only finetuning of BERT and XLNet. On the CMU-MOSI dataset, MAG-XLNet achieves humanlevel multimodal sentiment analysis performance for the first time in the NLP community.
Background Depression and anxiety disorders among the global population have worsened during the COVID-19 pandemic. Yet, current methods for screening these two issues rely on in-person interviews, which can be expensive, time-consuming, and blocked by social stigma and quarantines. Meanwhile, how individuals engage with online platforms such as Google Search and YouTube has undergone drastic shifts due to COVID-19 and subsequent lockdowns. Such ubiquitous daily behaviors on online platforms have the potential to capture and correlate with clinically alarming deteriorations in depression and anxiety profiles of users in a noninvasive manner. Objective The goal of this study is to examine, among college students in the United States, the relationships of deteriorating depression and anxiety conditions with the changes in user behaviors when engaging with Google Search and YouTube during COVID-19. Methods This study recruited a cohort of undergraduate students (N=49) from a US college campus during January 2020 (prior to the pandemic) and measured the anxiety and depression levels of each participant. The anxiety level was assessed via the General Anxiety Disorder-7 (GAD-7). The depression level was assessed via the Patient Health Questionnaire-9 (PHQ-9). This study followed up with the same cohort during May 2020 (during the pandemic), and the anxiety and depression levels were assessed again. The longitudinal Google Search and YouTube history data of all participants were anonymized and collected. From individual-level Google Search and YouTube histories, we developed 5 features that can quantify shifts in online behaviors during the pandemic. We then assessed the correlations of deteriorating depression and anxiety profiles with each of these features. We finally demonstrated the feasibility of using the proposed features to build predictive machine learning models. Results Of the 49 participants, 49% (n=24) of them reported an increase in the PHQ-9 depression scores; 53% (n=26) of them reported an increase in the GAD-7 anxiety scores. The results showed that a number of online behavior features were significantly correlated with deteriorations in the PHQ-9 scores (r ranging between –0.37 and 0.75, all P values less than or equal to .03) and the GAD-7 scores (r ranging between –0.47 and 0.74, all P values less than or equal to .03). Simple machine learning models were shown to be useful in predicting the change in anxiety and depression scores (mean squared error ranging between 2.37 and 4.22, R2 ranging between 0.68 and 0.84) with the proposed features. Conclusions The results suggested that deteriorating depression and anxiety conditions have strong correlations with behavioral changes in Google Search and YouTube use during the COVID-19 pandemic. Though further studies are required, our results demonstrate the feasibility of using pervasive online data to establish noninvasive surveillance systems for mental health conditions that bypasses many disadvantages of existing screening methods.
We present the design of an online social skills development interface for teenagers with autism spectrum disorder (ASD). The interface is intended to enable private conversation practice anywhere, anytime using a web-browser. Users converse informally with a virtual agent, receiving feedback on nonverbal cues in realtime, and summary feedback. The prototype was developed in consultation with an expert UX designer, two psychologists, and a pediatrician. Using the data from 47 individuals, feedback and dialogue generation were automated using a hidden Markov model and a schema-driven dialogue manager capable of handling multi-topic conversations. We conducted a study with nine high-functioning ASD teenagers. Through a thematic analysis of post-experiment interviews, identified several key design considerations, notably: 1) Users should be fully briefed at the outset about the purpose and limitations of the system, to avoid unrealistic expectations. 2) An interface should incorporate positive acknowledgment of behavior change. 3) Realistic appearance of a virtual agent and responsiveness are important in engaging users. 4) Conversation personalization, for instance in prompting laconic users for more input and reciprocal questions, would help the teenagers engage for longer terms and increase the system's utility. CCS CONCEPTS • Human-centered computing → Empirical studies in HCI .
Communication is a core component of effective healthcare that impacts many patient and doctor outcomes, yet is complex and challenging to both analyse and teach. Human-based coding and audit systems are time-intensive and costly; thus, there is considerable interest in the application of artificial intelligence to this topic, through machine learning using both supervised and unsupervised learning algorithms. In this article we introduce health communication, its importance for patient and health professional outcomes, and the need for rigorous empirical data to support this field. We then discuss historical interaction coding systems and recent developments in applying artificial intelligence (AI) to automate such coding in the health setting. Finally, we discuss available evidence for the reliability and validity of AI coding, application of AI in training and audit of communication, as well as limitations and future directions in this field. In summary, recent advances in machine learning have allowed accurate textual transcription, and analysis of prosody, pauses, energy, intonation, emotion and communication style. Studies have established moderate to good reliability of machine learning algorithms, comparable with human coding (or better), and have identified some expected and unexpected associations between communication variables and patient satisfaction. Finally, application of artificial intelligence to communication skills training has been attempted, to provide audit and feedback, and through the use of avatars. This looks promising to provide confidential and easily accessible training, but may be best used as an adjunct to human-based training.
Remote and objective assessment of the motor symptoms of Parkinson’s disease is an area of great interest particularly since the COVID-19 crisis emerged. In this paper, we focus on a) the challenges of assessing motor severity via videos and b) the use of emerging video-based Artificial Intelligence (AI)/Machine Learning techniques to quantitate human movement and its potential utility in assessing motor severity in patients with Parkinson’s disease. While we conclude that video-based assessment may be an accessible and useful way of monitoring motor severity of Parkinson’s disease, the potential of video-based AI to diagnose and quantify disease severity in the clinical context is dependent on research with large, diverse samples, and further validation using carefully considered performance standards.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.