2019
DOI: 10.3390/electronics8121394
|View full text |Cite
|
Sign up to set email alerts
|

Zero-Shot Deep Learning for Media Mining: Person Spotting and Face Clustering in Video Big Data

Abstract: The analysis of frame sequences in talk show videos, which is necessary for media mining and television production, requires significant manual efforts and is a very time-consuming process. Given the vast amount of unlabeled face frames from talk show videos, we address and propose a solution to the problem of recognizing and clustering faces. In this paper, we propose a TV media mining system that is based on a deep convolutional neural network approach, which has been trained with a triplet loss minimization… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
10
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
6
1
1

Relationship

1
7

Authors

Journals

citations
Cited by 9 publications
(10 citation statements)
references
References 33 publications
0
10
0
Order By: Relevance
“…The results showed effectiveness on large datasets, with the model outperforming other state-of-the-art approaches on very low-resolution images and images with some disguises. Abdallah et al [16] proposed a zero-shot learning model consisting of 19 CNN layers for person spotting and face clustering in video stream data. The proposed network extracts face feature vectors similar to FaceNet-extracted embeddings from the pre-whitening processed video frames.…”
Section: Related Workmentioning
confidence: 99%
“…The results showed effectiveness on large datasets, with the model outperforming other state-of-the-art approaches on very low-resolution images and images with some disguises. Abdallah et al [16] proposed a zero-shot learning model consisting of 19 CNN layers for person spotting and face clustering in video stream data. The proposed network extracts face feature vectors similar to FaceNet-extracted embeddings from the pre-whitening processed video frames.…”
Section: Related Workmentioning
confidence: 99%
“…Recently, face image clustering has attracted a lot of interest from researchers [7][8][9][10][11]. L. Zhang et al [7] presented a clustering method based on a spectral clustering and selforganizing feature mapping (SOM) neural network.…”
Section: Introductionmentioning
confidence: 99%
“…M.S. Abdallah et al [9] proposed a TV media mining system based on a DCNN to rapidly identify a specific individual in real-time processing video data. I. Ahn et al [10] proposed a multiple segmentation technique combined with constrained spectral clustering to label facial images containing objects with complicated boundaries.…”
Section: Introductionmentioning
confidence: 99%
“…All wirelessly connected devices collect petabytes of data that allow detecting objects and processes on an unprecedented scale [3]. The most notable examples are automatic face recognition [4], classification of hyperspectral data [5], automatic object detection and classification [6][7][8], remote gesture sensing [9][10][11], wireless detection, and location [12][13][14] -all powered by internet data. Many of these applications are the kind of regression problem for which DNN is the right solution [15].…”
Section: Introductionmentioning
confidence: 99%