Where Are They Going? Predicting Human Behaviors in Crowded Scenes

ZhangBo,; Zhang, Rui; BisagnoNiccolo,; Conci, Nicola; NataleFrancesco, G B De; Liu, Hongbo

doi:10.1145/3449359

Cited by 13 publications

(4 citation statements)

References 34 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Alabdulkarim et al [23] defined crowd management as the practice of controlling crowd activities before, during, and after events, including handling all elements such as personnel, venues, facilities, data, and technology. In terms of management strategies, scholars have studied various aspects such as sociology, psychology, and computer science, including crowd evacuation [24], crowd behavior [25][26][27][28], and crowd modeling [29][30][31][32]. Traditional crowd management strategies need to be integrated with technological means to provide accurate crowd-related information for optimal management [33].…”

Section: Safety Of Highly Aggregated Tourist Crowdsmentioning

confidence: 99%

Early warning on safety risk of highly aggregated tourist crowds based on VGGT-Count network model

Liu,

Wu,

Liu

2024

PLoS ONE

View full text Add to dashboard Cite

In the era of mass tourism, more and more people are attracted by internet-famous site. With people’s demand for travel surged, tourists are getting together in one scenic spot with doubling numbers, which easily leads to high concentration of tourists with uncontrollable security risks. It needs to be highly valued by the tourism department. Monitoring and issuing warnings for crowd density in scenic areas with Highly Aggregated Tourist Crowds (HATCs) is an urgent challenge that needs to be addressed. In this paper, Highly Aggregated Tourist Crowds is taken as the research objective, and a VGGT-Count network model is proposed to forecast the density of HATCs. The experimental outcomes demonstrated a substantial improvement in counting accuracy for the ShanghaiTech B and UCF-QNRF datasets. Furthermore, the model allows for real-time monitoring of tourist attractions, enabling advanced prediction of high concentrations in scenic areas. This timely information can alert relevant authorities to implement preventive measures such as crowd control and flow regulation, thereby minimizing safety hazards.

show abstract

Section: Safety Of Highly Aggregated Tourist Crowdsmentioning

confidence: 99%

Early warning on safety risk of highly aggregated tourist crowds based on VGGT-Count network model

Liu,

Wu,

Liu

2024

PLoS ONE

View full text Add to dashboard Cite

show abstract

“…Although these RNN-based approaches performed an interesting exploration, one can still observe unsatisfactory aspects in the predicted motion sequences. In order to ix these limitations, several works use feed-forward networks other than RNNs to model human pose [3,23,26,33,34,40,48]. For example, Butepage et al [3] proposed a deep learning fully-connected network that investigates diferent strategies to encode temporal, and historical information and generalizes well to new, unseen motions.…”

Section: Related Workmentioning

confidence: 99%

Bidirectional Transformer GAN for Long-term Human Motion Prediction

Zhao

Tang

Xie

et al. 2023

ACM Trans. Multimedia Comput. Commun. Appl.

View full text Add to dashboard Cite

The mainstream motion prediction methods usually focus on short-term prediction, and their predicted long-term motions often fall into an average pose, i.e. the freezing forecasting problem [27]. To mitigate this problem, we propose a novel Bidirectional Transformer-based Generative Adversarial Network (BiTGAN) for long-term human motion prediction. The bidirectional setup leads to consistent and smooth generation in both forward and backward directions. Besides, to make full use of the history motions, we split them into two parts. The first part is fed to the Transformer encoder in our BiTGAN while the second part is used as the decoder input. This strategy can alleviate the exposure problem [37]. Additionally, to better maintain both the local ( i.e. , frame-level pose) and global ( i.e. , video-level semantic) similarities between the predicted motion sequence and the real one, the soft dynamic time warping (Soft-DTW) loss is introduced into the generator. Finally, we utilize a dual-discriminator to distinguish the predicted sequence at both frame and sequence levels. Extensive experiments on the public Human3.6M dataset demonstrate that our proposed BiTGAN achieves state-of-the-art performance on long-term (4 s ) human motion prediction, and reduces the average error of all actions by \(4\% \) .

show abstract

“…Since these patches retains no information about the position, therefore, positional information is added in the form of as demonstrated in Figure 3(a). The final sequence of patches with token achieved because of these operations is given in the Equation (4).…”

Section: Active Model Descriptionmentioning

confidence: 99%

“…Data acts as fuel for Deep Learning (DL) models and in today's era, the availability of huge amount of multimedia data enable DL models to be applied in various realms of life [1,2]. Different industrial domains such as surveillance [3,4], medical sciences [5,6], remote sensing [7], disaster management [8], defense [9], transportation [10], and entertainment [11,12] have been tremendously flourished with the advancements in DL [13]. However, the acquisition of properly annotated data for training is deemed as a substantial challenge to the wider adoption of DL models in the industry.…”

Section: Introductionmentioning

confidence: 99%

PMAL: A Proxy Model Active Learning Approach for Vision Based Industrial Applications

Khan

Haq

Hussain

et al. 2022

ACM Trans. Multimedia Comput. Commun. Appl.

View full text Add to dashboard Cite

Deep Learning models’ performance strongly correlate with availability of annotated data; however, massive data labelling is laborious, expensive, and error-prone when performed by human experts. Active Learning (AL) effectively handles this challenge by selecting the uncertain samples from unlabeled data collection, but the existing AL approaches involve repetitive human feedback for labelling uncertain samples, thus rendering these techniques infeasible to be deployed in industry related real-world applications. In the proposed Proxy Model based Active Learning technique (PMAL), this issue is addressed by replacing human oracle with a deep learning model, where human expertise is reduced to label only two small subsets of data for training proxy model and initializing the AL loop. In the PMAL technique, firstly, proxy model is trained with a small subset of labeled data, which subsequently acts as an oracle for annotating uncertain samples. Secondly, active model's training, uncertain samples extraction via uncertainty sampling, and annotation through proxy model is carried out until predefined iterations to achieve higher accuracy and labelled data. Finally, the active model is evaluated using testing data to verify the effectiveness of our technique for practical applications. The correct annotations by the proxy model are ensured by employing the potentials of explainable artificial intelligence. Similarly, emerging vision transformer is used as an active model to achieve maximum accuracy. Experimental results reveal that the proposed method outperforms state-of-the-art in terms of minimum labelled data usage and improves the accuracy with 2.2%, 2.6%, and 1.35% on Caltech-101, Caltech-256, and CIFAR-10 datasets, respectively. Since the proposed technique offers a highly reasonable solution to exploit huge multimedia data, therefore it can be widely used in different evolutionary industrial domains.

show abstract

Where Are They Going? Predicting Human Behaviors in Crowded Scenes

Cited by 13 publications

References 34 publications

Early warning on safety risk of highly aggregated tourist crowds based on VGGT-Count network model

Early warning on safety risk of highly aggregated tourist crowds based on VGGT-Count network model

Bidirectional Transformer GAN for Long-term Human Motion Prediction

PMAL: A Proxy Model Active Learning Approach for Vision Based Industrial Applications

Contact Info

Product

Resources

About