Tracking Human Faces in Real-Time,

Yang, Jie; Waibel, Alex

doi:10.21236/ada303256

Cited by 67 publications

(22 citation statements)

References 34 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…9,10 However, for our research we used the Gaussian model in (R, G, B) space because this model is more sensitive to the skin color's brightness, and thus much more suitable for the model tailored for each face sequence.…”

Section: Skin-color Model Extraction and Trackingmentioning

confidence: 99%

Name-It: naming and detecting faces in news videos

Satoh¹,

1999

View full text Add to dashboard Cite

We developed NameIt, a system that associates faces and names in news videos. It processes information from the videos and can infer possible name candidates for a given face or locate a face in news videos by name. To accomplish this task, the system takes a multimodal video analysis approach: face sequence extraction and similarity evaluation from videos, name extraction from transcripts, and video-caption recognition.T he Name-It system 1,2 associates names and faces in news videos. Assume that we're watching a TV news program. When persons we don't know appear in the news video, we can eventually identify most of them by watching only the video. To do this, we detect faces from a news video, locate names in the sound track, and then associate each face to the correct name. For face-name association, we use as many hints as possible based on structure, context, and meaning of the news video. We don't need any additional knowledge such as newspapers containing descriptions of the persons or biographical dictionaries with pictures. Similarly, Name-It can associate faces in news videos with their right names without using an a priori face-name association set. In other words, Name-It extracts face-name correspondences only from news videos.Name-It takes a multimodal approach to accomplish this task. For example, it uses several information sources available from news videosimage sequences, transcripts, and video captions. Name-It detects face sequences from image sequences and extracts name candidates from transcripts. It's possible to obtain transcripts from audio tracks by using the proper speech recognition technique with an allowance for recognition errors. However, most news broadcasts in the US already have closed captions. (In the near future, the worldwide trend will be for broadcasts to feature closed captions.) Thus we use closed-caption texts as transcripts for news videos. In addition, we employ video-caption detection and recognition. We used "CNN Headline News" as our primary source of news for our experiments.Given image sequences, transcripts, and video captions as information sources, Name-It associates extracted faces with extracted name candidates using the correlation of their timing information and face similarity information. Video captions are also taken into account as supplementary information. To associate faces and names, Name-It integrates several advanced image processing and natural-language processing techniques-face sequence extraction and similarity evaluation from videos, name extraction from transcripts, and video-caption recognition. Although these technologies aren't always highly accurate, integrating these results will help the system achieve more accurate output.With respect to face-name association, the Piction system 3 works similarly to Name-It. Piction identifies faces within a given captioned newspaper photograph by extracting faces from the photograph and analyzing the caption to obtain geometric constraints among faces. The system then labels each face with a name. ...

show abstract

Section: Skin-color Model Extraction and Trackingmentioning

confidence: 99%

Name-It: naming and detecting faces in news videos

Satoh¹,

1999

View full text Add to dashboard Cite

show abstract

“…The image sensor network consists of twelve image sensors to be uniformly arranged in the multichannel playback environment and it has 30-degree resolution for detecting the direction of the human face. To estimate the direction of the human face, we used the normalized RGB (red, green, and blue) and the HSV (hue, saturation, and value) calculated from the images obtained by the image sensor network [12][13][14]. It is because the normalized RGB and the HSV are useful for detecting the human skin region in the images.…”

Section: Introductionmentioning

confidence: 99%

Automatic Sound Scene Control Using Image Sensor Network

Cho

Park

Kim

2014

International Journal of Distributed Sensor Networks

View full text Add to dashboard Cite

We proposed the automatic sound scene control system using the image sensor network for preserving the constant sound scene without respect to the users' movement. In the proposed system, the image sensor network detects the human location in the multichannel playback environment and the SSC (sound scene control) module automatically controls the sound scene of the multichannel audio signals according to the estimated human location which is the angle information. To estimate the direction of the human face, we used the normalized RGB (red, green, and blue) and the HSV (hue, saturation, and value) calculated from the images obtained by the image sensor network. The direction of the human face can be easily decided as the image sensor to capture the image with the highest number of pixels to satisfy the thresholds of the normalized RGB and the HSV. The estimated direction of the human face is directly fed to the SSC module, and the controlled sound scene can be simply generated. Experimental results show that the image sensor network successfully detected the human location with the accuracy of about 98% and the controlled sound scene by the SSC according to the detected human location was perceived as the original sound scene with the accuracy of 95%.

show abstract

“…These properties are particularly important for a real-time face/human tracking system. Successful applications of color-based algorithms include some state-of-the-art face/human tracking systems, such as Pfinder [5] and Yang's face tracker [6]. In this work, we first study the properties of various skin-color filters, which are typically used for face detection tasks.…”

Section: Introductionmentioning

confidence: 99%