Thanh Dinh Ngo scite author profile

In this paper, we present a deep learning based multimodal system for classifying daily life videos. To train the system, we propose a two-phase training strategy. In the first training phase (Phase I), we extract the audio and visual (image) data from the original video. We then train the audio data and the visual data with independent deep learning based models. After the training processes, we obtain audio embeddings and visual embeddings by extracting feature maps from the pretrained deep learning models. In the second training phase (Phase II), we train a fusion layer to combine the audio/visual embeddings and a dense layer to classify the combined embedding into target daily scenes. Our extensive experiments, which were conducted on the benchmark dataset of DCASE (IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events) 2021 Task 1B Development, achieved the best classification accuracy of 80.5%, 91.8%, and 95.3% with only audio data, with only visual data, both audio and visual data, respectively. The highest classification accuracy of 95.3% presents an improvement of 17.9% compared with DCASE baseline and shows very competitive to the state-of-the-art systems.

show abstract

Communication-model based embedded mapping of dataflow actors on heterogeneous MPSoC

Ngo

Sepulveda

Martin

et al. 2014

View full text Add to dashboard Cite

Industrial LoRaWAN Network for Danang City: Solution for Long-Range and Low-Power IoT Applications

Ngo

Ferrero

Doan

et al. 2021

View full text Add to dashboard Cite

Compa backend: A dynamic runtime for the execution of dataflow programs onto multi-core platforms

Martin¹,

Eustache²,

Diguet³

et al. 2015

View full text Add to dashboard Cite

In this demo we will present a design Dow for multi-core based embedded systems. Namely, we implement a kernel capable of moditying the system at run time to increase data throughput. The design Dow starts with the Dynamic DataDow and RVC-CAL (Reconfigurable Video Coding Cal Actor Language) descriptions of an application and goes up to the deployment of the system onto the hardware platform. As a use case, we implement an MPEG-4 decoder algorithm onto a multi-core heterogeneous system deployed onto the Zynq platform from Xilinx.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Thanh Dinh Ngo

Move Based Algorithm for Runtime Mapping of Dataflow Actors on Heterogeneous MPSoCs

Proposed Smart University Model as a Sustainable Living Lab for University Digital Transformation

Communication-model based embedded mapping of dataflow actors on heterogeneous MPSoC

Industrial LoRaWAN Network for Danang City: Solution for Long-Range and Low-Power IoT Applications

Compa backend: A dynamic runtime for the execution of dataflow programs onto multi-core platforms

Contact Info

Product

Resources

About