Visual understanding has become more significant in gathering information in many real‐life applications. For a human, it is a trivial task to understand the content in a visual, however the same is a challenging task for a machine. Generating captions for images and videos for better understanding the situation is gaining more importance as they have wide application in assistive technologies, automatic video captioning, video summarizing, subtitling, blind navigation, and so on. The visual understanding framework will analyse the content present in the video to generate semantically accurate caption for the visual. Apart from the visual understanding of the situation, the gained semantics must be represented in a natural language like English, for which we require a language model. Hence, the semantics and grammar of the sentences being generated in English is yet another challenge. The captured description of the video is supposed to collect information of not just the objects contained in the scene, but it should also express how these objects are related to each other through the activity described in the scene, thus making the entire process a complex task for a machine. This work is an attempt to peep into the various methods for video captioning using deep learning methodologies, datasets that are widely used for these tasks and various evaluation metrics that are used for the performance comparison. The insights that we gained from our premiere work and the extensive literature review made us capable of proposing a practical, efficient video captioning architecture using deep learning which that will utilize the audio clues, external knowledge and attention context to improve the captioning process. Quantum deep learning architectures can bring about extraordinary results in object recognition tasks and feature extraction using convolutions.
In this research w e h a v e studied several human factors problems that are connected to the deployment o f speaker veri cation technology in telecommunication services. We i n v estigate the perception of the safety o f a calling card service when it is protected by speaker veri cation on the 14 digit card number, and compare it to the perceived safety of speaker veri cation and PIN. Moreover, we compare a voice based interface to the service with a DTMF based interface. The results are crucial for guiding the introduction and deployment of speaker veri cation technology in actual applications.
Artificial Intelligence (AI), as a mainstream science today, has the potential to significantly improve human wellbeing and wellness. An automated caretaking system is being developed in this study to enable constant monitoring of people without requiring much human intervention. To do so, we must take into account a wide range of people's movements and varied perspectives in real‐time contexts. The proposed system, coined “Eye‐Tact”, integrates a vision‐based multimodel architecture with wearable sensors to identify poses and detect falls. For people with Parkinson's disease (PD), this patient‐specific, vision‐based keypoint analysis model has been successfully deployed for person identification and aberrant activity recognition. The proposed Multi Model Ensemble Technique (MMET) employs a variety of sensors to acquire data on physiological and other parameters that are necessary for fall prediction and evaluation. The measures used in the proposed system are precision, recall, F1 score and support. The above mentioned parameters are used to evaluate the performance of different models, including XGBoostClassifier, CatBoostClassifier, and RandomForestClassifier. The results reveal that the RantomForestClassifier outperforms other types of classifiers with 97% of accuracy. The proposed work demonstrates its capacity to develop a system that carefully understands and analyses heterogeneous data cautiously using state‐of‐the‐art technologies.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.