Human Movement Direction Prediction using Virtual Reality and Eye Tracking

Pettersson, Jimmy; Falkman, Petter

doi:10.1109/icit46573.2021.9453581

Cited by 5 publications

(7 citation statements)

References 24 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The VRE designed to collect the data consists of four stages: language selection where the test participant selects whether the written instructions in the VRE should be given in Swedish or English, ET calibration, an information form where the participant enters age, gender, and whether they are right handed or not, and the last stage is the test itself. The test stage, Figure 1 , is an alteration of the test in Pettersson and Falkman ( 2021 ), see below.…”

Section: Methodsmentioning

confidence: 99%

“…The test sequence was randomized as suggested in future improvements by Pettersson and Falkman ( 2021 ).…”

Section: Methodsmentioning

confidence: 99%

“…The test is launched when the test participant presses the start button in the environment. Data is then collected, in the same manner as in Pettersson and Falkman ( 2020 ) and Pettersson and Falkman ( 2021 ), i.e., the data between two pressed cubes is saved as one data point, and using the same parameters ( Table 1 ). The data that is collected from each test participant, each test, and at each timestamp, shown in Table 1 , are: the eye gaze direction vector for each eye (EyeDirection), the coordinate in the virtual room where the gaze hits (EyeHitpoint), which object is gazed upon (EyeHitObject) as well as the size and position of the pupils (PupilDiameter, Pupilposition).…”

Section: Methodsmentioning

confidence: 99%

See 2 more Smart Citations

Comparison of LSTM, Transformers, and MLP-mixer neural networks for gaze based human intention prediction

Pettersson

Falkman

2023

Front. Neurorobot.

Self Cite

View full text Add to dashboard Cite

Collaborative robots have gained popularity in industries, providing flexibility and increased productivity for complex tasks. However, their ability to interact with humans and adapt to their behavior is still limited. Prediction of human movement intentions is one way to improve the robots adaptation. This paper investigates the performance of using Transformers and MLP-Mixer based neural networks to predict the intended human arm movement direction, based on gaze data obtained in a virtual reality environment, and compares the results to using an LSTM network. The comparison will evaluate the networks based on accuracy on several metrics, time ahead of movement completion, and execution time. It is shown in the paper that there exists several network configurations and architectures that achieve comparable accuracy scores. The best performing Transformers encoder presented in this paper achieved an accuracy of 82.74%, for predictions with high certainty, on continuous data and correctly classifies 80.06% of the movements at least once. The movements are, in 99% of the cases, correctly predicted the first time, before the hand reaches the target and more than 19% ahead of movement completion in 75% of the cases. The results shows that there are multiple ways to utilize neural networks to perform gaze based arm movement intention prediction and it is a promising step toward enabling efficient human-robot collaboration.

show abstract

Section: Methodsmentioning

confidence: 99%

“…The test sequence was randomized as suggested in future improvements by Pettersson and Falkman ( 2021 ).…”

Section: Methodsmentioning

confidence: 99%

Section: Methodsmentioning

confidence: 99%

See 1 more Smart Citation

Comparison of LSTM, Transformers, and MLP-mixer neural networks for gaze based human intention prediction

Pettersson

Falkman

2023

Front. Neurorobot.

Self Cite

View full text Add to dashboard Cite

show abstract

“…These systems are wearable, cumbersome, user-unfriendly and the electrodes are biase at some positions. The second group is a multi-modal-based sensors, which combined two or more sensors to capture other input features that can assist in detecting or recognizing some events in the gesture as in [13], [14], [15], [16] which generally: required calibration of the sensors first which make the system to be complex and unfriendly. The third group is Video oculography (VOG) which is the most adopted nowadays because, it can capture the images of the subject eyes, estimate eyes positions, and point of gaze (POG) i.e where the user is looking [17], [18].…”

Section: Related Workmentioning

confidence: 99%

Boosted Gaze Gesture Recognition Using Underlying Head Orientation Sequence

et al. 2023

View full text Add to dashboard Cite

People find it challenging to control smart systems with complex gaze gestures due to the vulnerability of eye saccades. Instead, the existing works achieved good recognition accuracy of simple gaze gestures because of sufficient eye gaze points but simple gaze gestures have limited applications compared to complex gaze gestures. Complex gaze gestures need a composition of multiple subunits of eye fixation to contain a sequence of gaze points that are clustered and rotated with an underlying complex head orientation relationship. This paper proposes a new set of eye gaze points and head orientation angles as new sequences to recognize complex gaze gestures. Eye gaze points and head orientation angles have a powerful influence on gaze gesture formation. The new sequence was obtained by aligning clustered gaze points and head orientation angles with a simple moving average (SMA) to denoise and interpolate the gap between successive eye fixations. The aligned new sequence of complex gaze gestures was utilized to train sequential machine learning (ML) algorithms. To evaluate the performance of the proposed method, we recruited and recorded the eye gaze and head orientation features of ten participants using an eye tracker. The results show that Boosted Hidden Markov Models (HMM) using Random Subspace methods achieved the best accuracies of 94.72% and 98.1% for complex, and simple gestures respectively, which outperformed the conventional methods.

show abstract

“…Sequential illustration of the shooting scenario for the data collection. Source: [33]. also been used to automatically extract feature maps from joints connected spatially between each other, as well as temporally through time [15].…”

Section: Action Classificationmentioning

confidence: 99%

Explaining rifle shooting factors through multi-sensor body tracking1

Flyckt

Andersson

Westphal

et al. 2023

IDA

View full text Add to dashboard Cite

There is a lack of data-driven training instructions for sports shooters, as instruction has commonly been based on subjective assessments. Many studies have correlated body posture and balance to shooting performance in rifle shooting tasks, but have mostly focused on single aspects of postural control. This study has focused on finding relevant rifle shooting factors by examining the entire body over sequences of time. A data collection was performed with 13 human participants carrying out live rifle shooting scenarios while being recorded with multiple body tracking sensors. A pre-processing pipeline produced a novel skeleton sequence representation, which was used to train a transformer model. The predictions from this model could be explained on a per sample basis using the attention mechanism, and visualised in an interactive format for humans to interpret. It was possible to separate the different phases of a shooting scenario from body posture with a high classification accuracy (80%). Shooting performance could be detected to an extent by separating participants using their strong and weak shooting hand. The dataset and pre-processing pipeline, as well as the techniques for generating explainable predictions presented in this study have laid the groundwork for future research in the sports shooting domain.

show abstract

Human Movement Direction Prediction using Virtual Reality and Eye Tracking

Cited by 5 publications

References 24 publications

Comparison of LSTM, Transformers, and MLP-mixer neural networks for gaze based human intention prediction

Comparison of LSTM, Transformers, and MLP-mixer neural networks for gaze based human intention prediction

Boosted Gaze Gesture Recognition Using Underlying Head Orientation Sequence

Explaining rifle shooting factors through multi-sensor body tracking1

Contact Info

Product

Resources

About