Multi-Modal Entity Alignment Method Based on Feature Enhancement

Wang, Huansha; Liu, Qinrang; Huang, Ruiyang; Zhang, Jianpeng

doi:10.3390/app13116747

Cited by 1 publication

(1 citation statement)

References 21 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…With the development of cross-disciplinary research between knowledge engineering and multimodal learning, multimodal knowledge graphs (KG) [1] have become increasingly crucial as a means to assist computers in understanding the entity background knowledge in many artificial intelligence applications, such as question answering systems [2], recommendation systems [3], natural language understanding [4], and scene graph generation [5]. In recent years, many researchers have constructed numerous multimodal knowledge graphs targeting different domains and languages.…”

Section: Introductionmentioning

confidence: 99%

MDSEA: Knowledge Graph Entity Alignment Based on Multimodal Data Supervision

Fang,

Yan

2024

Applied Sciences

View full text Add to dashboard Cite

With the development of social media, the internet, and sensing technologies, multimodal data are becoming increasingly common. Integrating these data into knowledge graphs can help models to better understand and utilize these rich sources of information. The basic idea of the existing methods for entity alignment in knowledge graphs is to extract different data features, such as structure, text, attributes, images, etc., and then fuse these different modal features. The entity similarity in different knowledge graphs is calculated based on the fused features. However, the structures, attribute information, image information, text descriptions, etc., of different knowledge graphs often have significant differences. Directly integrating different modal information can easily introduce noise, thus affecting the effectiveness of the entity alignment. To address the above issues, this paper proposes a knowledge graph entity alignment method based on multimodal data supervision. First, Transformer is used to obtain encoded representations of knowledge graph entities. Then, a multimodal supervised method is used for learning the entity representations in the knowledge graph so that the vector representations of the entities contain rich multimodal semantic information, thereby enhancing the generalization ability of the learned entity representations. Finally, the information from different modalities is mapped to a shared low-dimensional subspace, making similar entities closer in the subspace, thus optimizing the entity alignment effect. The experiments on the DBP15K dataset compared with methods such as MTransE, JAPE, EVA, DNCN, etc., all achieve optimal results.

show abstract