Families in Wild Multimedia: A Multimodal Database for Recognizing Kinship

Robinson, Joseph P.; Khan, Zaid; Yin, Yu; Shao, Ming; Fu, Yun

doi:10.1109/tmm.2021.3103074

Cited by 14 publications

(7 citation statements)

References 69 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The facial video kinship datasets are the ones that only the facial information is available, including UvA-NEMO Smile [32], KFVW [33], and KIVI [34]. The video and audio kinship datasets include the TALKIN dataset [20] and FIW MM dataset [35]. The TALKIN dataset is the first audio-visual kinship dataset.…”

Section: Related Workmentioning

confidence: 99%

Audio-Visual Kinship Verification: A New Dataset and a Unified Adaptive Adversarial Multimodal Learning Approach

Zhang

López

et al. 2024

IEEE Trans. Cybern.

View full text Add to dashboard Cite

Facial kinship verification refers to automatically determining whether two people have a kin relation from their faces. It has become a popular research topic due to potential practical applications. Over the past decade, many efforts have been devoted to improving the verification performance from human faces only while lacking other biometric information, for example, speaking voice. In this article, to interpret and benefit from multiple modalities, we propose for the first time to combine human faces and voices to verify kinship, which we refer it as the audio-visual kinship verification study. We first establish a comprehensive audio-visual kinship dataset that consists of familial talking facial videos under various scenarios, called TALKIN-Family. Based on the dataset, we present the extensive evaluation of kinship verification from faces and voices. In particular, we propose a deep-learning-based fusion method, called unified adaptive adversarial multimodal learning (UAAML). It consists of the adversarial network and the attention module on the basis of unified multimodal features. Experiments show that audio (voice) information is complementary to facial features and useful for the kinship verification problem. Furthermore, the proposed fusion method outperforms baseline methods. In addition, we also evaluate the human verification ability on a

show abstract

Section: Related Workmentioning

confidence: 99%

Audio-Visual Kinship Verification: A New Dataset and a Unified Adaptive Adversarial Multimodal Learning Approach

Zhang

López

et al. 2024

IEEE Trans. Cybern.

View full text Add to dashboard Cite

show abstract

“…The facial video kinship datasets are the ones that only the facial information is available, including UvA-NEMO Smile [32], KFVW [33] and KIVI [34]. The video and audio kinship datasets include TALKIN dataset [20] and FIW MM dataset [35]. The TALKIN dataset is the first audiovisual kinship dataset.…”

Section: Related Workmentioning

confidence: 99%

Audio-Visual Kinship Verification: a New Dataset and a Unified Adaptive Adversarial Multimodal Learning Approach

Wu¹,

Feng²,

Zhang³

et al. 2022

Preprint

View full text Add to dashboard Cite

<p>Facial kinship verification refers to automatically determining whether two people have a kin relation from their faces. It has become a popular research topic due to potential practical applications, such as finding missing children, family photo organization, or criminal investigations. Over the past decade, many efforts have been devoted to improving the verification performance of human faces only while lacking other biometric information, e.g., speaking voice. In this paper, to interpret and benefit from multiple modalities, we propose for the first time to combine human faces and voices to verify kinship, which we refer it as the audio-visual kinship verification study. Since there is still no standard and public audiovisual kinship dataset, we first establish a comprehensive audio-visual kinship dataset that consists of familial talking facial videos under various scenarios, called TALKIN-Family. Based on the dataset, we present the extensive evaluation of kinship verification from faces and voices. In particular, we propose a deep learning-based fusion method, named Unified Adaptive Adversarial Multimodal Learning (UAAML). It consists of the adversarial network and the attention module on the basis of unified multi-modal features. First, the modality adversarial learning eliminates the cross-modality variations by confusing the discriminator. The attention module quantifies the importance of kinship interested features. The overall multimodal fusion network is trained in Siamese fashion to encourage the compactness of kinship and separation of non-kinship. Experiments show that audio (voice) information is complementary to facial features and useful for the kinship verification problem. Further, the proposed fusion method outperforms baseline methods. In addition, we also evaluate the human kinship verification ability on a sub-set of TALKIN-Family. It indicates that human has higher accuracy when they have access to both faces and voice. The machine learning methods could effectively and efficiently outperform human ability. Finally, we include the future work and research opportunities with the TALKIN-Family dataset.</p>

show abstract

“…The large-scale RFIW dataset [17] was the only one shown to be unbiased, consisting of kin pairs cropped from different photos. The novel FIW in Multimedia (FIW-MM) database, recently introduced by Robinson et al [18], is a large-scale multi-modal kin verification dataset, consisting of of video, audio, and contextual transcripts…”

Section: Introductionmentioning

confidence: 99%

A Unified Approach to Kinship Verification

Dahan¹,

Keller²

2020

Preprint

View full text Add to dashboard Cite

In this work, we propose a deep learning-based approach for kin verification using a unified multi-task learning scheme where all kinship classes are jointly learned. This allows us to better utilize small training sets that are typical of kin verification. We introduce a novel approach for fusing the embeddings of kin images, to avoid overfitting, which is a common issue in training such networks. An adaptive sampling scheme is derived for the training set images to resolve the inherent imbalance in kin verification datasets. A thorough ablation study exemplifies the effectivity of our approach, which is experimentally shown to outperform contemporary state-of-the-art kin verification results when applied to the Families In the Wild, FG2018, and FG2020 datasets.

show abstract

Families in Wild Multimedia: A Multimodal Database for Recognizing Kinship

Cited by 14 publications

References 69 publications

Audio-Visual Kinship Verification: A New Dataset and a Unified Adaptive Adversarial Multimodal Learning Approach

Audio-Visual Kinship Verification: A New Dataset and a Unified Adaptive Adversarial Multimodal Learning Approach

Audio-Visual Kinship Verification: a New Dataset and a Unified Adaptive Adversarial Multimodal Learning Approach

A Unified Approach to Kinship Verification

Contact Info

Product

Resources

About