Privacy Protection for Social Video via Background Estimation and CRF-Based Videographer's Intention Modeling

Nakashima, Yuta; Babaguchi, Noboru; Fan, Jianping

doi:10.1587/transinf.2015edp7378

Cited by 5 publications

(2 citation statements)

References 29 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…ACC (%) REC (%) PRE (%) FPR (%) F1 (%) pared our approach to raw support vector machine decisions (SVM) and the previous work in [20] (SVM-CRF), which uses a support vector machine to obtain decision values and applies CRF to them together with features. Since the results in [20] shows that the improvement by the temporal consistency term in their model is not very large, we employed an SVM-CRF model simplified by removing the temporal consistency term. We tuned the hyperparameters of our DNNbased models and SVM-based models (i.e., learning rate, dropout ratio, weight decay ratio, and unit size of hidden layer N for DNN-based models, and γ and C of SVM with the radial basis function) with Bayesian optimization.…”

Section: Resultsmentioning

confidence: 99%

“…Recent progress in large scale datasets [11], [12] and DNN techniques have significantly improved the performance of various vision tasks, such as object classification [13]- [15] and semantic segmentation [16]- [19]. In this work, we also develop a deep model to classify people into important or unimportant ones, which is an extension of [5], [20]. As in these work, we uses a CRF built upon a deep model.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Finding Important People in a Video Using Deep Neural Networks with Conditional Random Fields

Otani

Nishida

Nakashima

et al. 2018

IEICE Trans. Inf. & Syst.

Self Cite

View full text Add to dashboard Cite

Finding important regions is essential for applications, such as content-aware video compression and video retargeting to automatically crop a region in a video for small screens. Since people are one of main subjects when taking a video, some methods for finding important regions use a visual attention model based on face/pedestrian detection to incorporate the knowledge that people are important. However, such methods usually do not distinguish important people from passers-by and bystanders, which results in false positives. In this paper, we propose a deep neural network (DNN)-based method, which classifies a person into important or unimportant, given a video containing multiple people in a single frame and captured with a hand-held camera. Intuitively, important/unimportant labels are highly correlated given that corresponding people's spatial motions are similar. Based on this assumption, we propose to boost the performance of our important/unimportant classification by using conditional random fields (CRFs) built upon the DNN, which can be trained in an end-to-end manner. Our experimental results show that our method successfully classifies important people and the use of a DNN with CRFs improves the accuracy.

show abstract

Section: Resultsmentioning

confidence: 99%