Before the COVID-19 pandemic, video was already one of the main media used on the internet. During the pandemic, video conferencing services became even more important, coming to be one of the main instruments to enable most social and professional human activities. Given the social distancing policies, people are spending more time using these online services for working, learning, and also for leisure activities. Videoconferencing software became the standard communication for home-office and remote learning. Nevertheless, there are still a lot of issues to be addressed on these platforms, and many different aspects to be reexamined or investigated, such as ethical and user-experience issues, just to name a few. We argue that many of the current state-of-the-art techniques of Artificial Intelligence (AI) may help on enhancing video collabo- ration services, particularly the methods based on Deep Learning such as face and sentiment analyses, and video classification. In this paper, we present a future vision about how AI techniques may contribute to this upcoming videoconferencing-age.
Methods based on Deep Learning became state-of-the-art in several multimedia challenges. However, there is a gap of professionals to perform Deep Learning in the industry. This chapter focuses on presenting the fundamentals and technologies for developing such DL methods for video analyses. In particular, we seek to enable the reader to: (1) understand key DL-based models, more specifically Convolutional Neural Networks (CNN); (2) apply DL models to solve video tasks such as video classification, multi-label video classification, object detection, and pose estimation. The Python programming language is presented in conjunction with the TensorFlow library for implementing DL models. Resumo Os métodos baseados no Deep Learning tornaram-se state-of-the-art em vários desafios de multimídia. No entanto, existe uma lacuna de profissionais para realizar o Deep Learning na indústria. Este capítulo tem como foco apresentar os fundamentos e tecnologias para desenvolver tais métodos de DL para analise de vídeo. Em especial, buscamos capacitar o leitor a: (1) entender os principais modelos baseados em DL, mais especificamente Convolutional Neural Networks (CNN); (2) aplicar os modelos de DL para resolver tarefas de vídeo como: classificação de vídeo, classificação de multi-etiquetas de vídeo, detecção de objetos e estimação de pose. A linguagem de programação Python é apresentada em conjunto com a biblioteca TensorFlow para implementação dos modelos de DL.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.