This research promoted the topic concerning the interaction technique using virtual hand in Augmented Reality environment (AR). The kind of interaction researched uses tracker library that use marker to know the human gesture. This research results in virtual hand design that can be used to interact with virtual object. With the kind of interaction i.e. grab the virtual object and drop that grabbed virtual object to the place desired and the design of virtual hand that similar to the real,. So that it is expected virtual hand could be another alternative on how we interact in world of augmented reality. Kata kunci : Augmented Reality, Virtual Hand, Natural GesturePenelitian ini mengangkat topik mengenai teknik interaksi dengan menggunakan tangan virtual dalam lingkungan Augmented Reality (AR). Jenis interaksi yang diteliti menggunakan tracker library yang berbasis marker untuk mengenali gerak tangan manusia. Penelitian ini menghasilkan tangan virtual yang dapat digunakan untuk berinteraksi dengan obyek virtual. Dengan jenis interaksi seperti mengambil obyek virtual dan menjatuhkan obyek virtual di tempat yang diinginkan dan rancangan tangan virtual yang mirip dengan kenyataan maka diharapkan model tangan virtual dapat menjadi alternatif dalam model interaksi di dunia augmented reality. Kata kunci : Augmented Reality, Tangan Virtual, Gerak Natural PENDAHULUANSaat perkembangan teknologi semakin meningkat, hal ini juga berpengaruh terhadap bidang computer vision. Definisi computer vision secara umum adalah merupakan ilmu dan teknologi bagaimana suatu machine/sistem melihat sesuatu. Masukan untuk suatu sistem berbasis computer vision adalah citra atau image. Data citra dapat berbentuk urutan video, citra dari kamera, dan lain-lain.Beberapa hal yang dikerjakan oleh computer vision adalah recognition, motion, scene reconstruction, dan image restoration. Berikut beberapa contoh penerapan computer vision, yaitu controlling process, detecting events, organizing information, modeling objects or environtments, dan interaction (human-computer interaction).Konsep dari computer vision juga digunakan dalam bidang augmented reality. Augmented reality adalah sebuah konsep tentang penggabungan dunia nyata dan komputer grafis. Tujuan dari augmented reality adalah menambahkan pengertian dan informasi dunia nyata dimana sistem augmented reality mengambil dunia nyata sebagai dasar dan mengabungkan beberapa teknologi dengan menambahkan data kontekstual agar pemahaman seseorang menjadi semakin jelas.Kaufmann menyebutkan ada tiga karakteristik yang melekat pada augemented reality, yaitu: kombinasi dunia nyata dan virtual, interaksi berlansung secara real time, berbentuk 3D. Data kontekstual ini dapat berupa komentar audio, data lokasi, konteks sejarah, atau dalam bentuk-
Spatiotemporal and motion feature representations are the key to video action recognition. Typical previous approaches are to utilize 3D CNNs to cope with both spatial and temporal features, but they suffer from huge computations. Other approaches are to utilize (1+2)D CNNs to learn spatial and temporal features in an efficient way, but they neglect the importance of motion representations. To overcome problems with previous approaches, we propose a novel block which makes it possible to alleviate the aforementioned problems, since our block can capture spatial and temporal features more faithfully and efficiently learn motion features. This proposed block includes Motion Excitation (ME), Multi-view Excitation (MvE), and Densely Connected Temporal Aggregation (DCTA). The purpose of ME is to encode feature-level frame differences; MvE is designed to enrich spatiotemporal features with multiple view representations adaptively; and DCTA is to model long-range temporal dependencies. We inject the proposed building block, which we refer to as the META block (or simply “META”), into 2D ResNet-50. Through extensive experiments, we demonstrate that our proposed method architecture outperforms previous CNN-based methods in terms of “Val Top-1 %” measure with Something-Something v1 and Jester datasets, while the META yielded competitive results with the Moment-in-Time Mini dataset.
Current action recognition studies enjoy the benefits of two neural network branches, spatial and temporal. This work aims to extend the previous work by introducing a fusion of spatial and temporal branches to provide superior action recognition capability toward multi-label multi-class classification problems. In this paper, we propose three fusion models with different fusion strategies. We first build several efficient temporal Gaussian mixture (TGM) layers to form spatial and temporal branches to learn a set of features. In addition to these branches, we introduce a new deep spatio-temporal branch consisting of a series of TGM layers to learn the features that emerged from the existing branches. Each branch produces a temporal-aware feature that assists the model in understanding the underlying action in a video. To verify the performance of our proposed models, we performed extensive experiments using the wellknown MultiTHUMOS benchmarking dataset. The results demonstrate the importance of our proposed deep fusion mechanism, contributing to the overall score while keeping the number of parameters small.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.