Peixi Xiong scite author profile

Peixi Xiong

5Publications

16Citation Statements Received

136Citation Statements Given

How they've been cited

How they cite others

102

136

Affiliations

Northwestern University, Northwestern University

Publications

Order By: Most citations

Visual Query Answering by Entity-Attribute Graph Matching and Reasoning

Xiong¹,

Zhan²,

Wang

et al. 2019

View full text Add to dashboard Cite

Visual Query Answering (VQA) is of great significance in offering people convenience: one can raise a question for details of objects, or high-level understanding about the scene, over an image. This paper proposes a novel method to address the VQA problem. In contrast to prior works, our method that targets single scene VQA, replies on graphbased techniques and involves reasoning. In a nutshell, our approach is centered on three graphs. The first graph, referred to as inference graph G I , is constructed via learning over labeled data. The other two graphs, referred to as query graph Q and entity-attribute graph G EA , are generated from natural language query Q nl and image Img, that are issued from users, respectively. As G EA often does not take sufficient information to answer Q, we develop techniques to infer missing information of G EA with G I . Based on G EA and Q, we provide techniques to find matches of Q in G EA , as the answer of Q nl in Img. Unlike commonly used VQA methods that are based on end-to-end neural networks, our graph-based method shows well-designed reasoning capability, and thus is highly interpretable. We also create a dataset on soccer match (Soccer-VQA) with rich annotations. The experimental results show that our approach outperforms the state-of-the-art method and has high potential for future investigation.

show abstract

Towards Better Driver Safety: Empowering Personal Navigation Technologies with Road Safety Awareness

Xu¹,

Zhang²,

Zhao³

et al. 2020

Preprint

View full text Add to dashboard Cite

SA-VQA: Structured Alignment of Visual and Semantic Representations for Visual Question Answering

Xiong¹,

You²,

Yu³

et al. 2022

Preprint

View full text Add to dashboard Cite

Visual Question Answering (VQA) attracts much attention from both industry and academia. As a multi-modality task, it is challenging since it requires not only visual and textual understanding, but also the ability to align crossmodality representations. Previous approaches extensively employ entity-level alignments, such as the correlations between the visual regions and their semantic labels, or the interactions across question words and object features. These attempts aim to improve the cross-modality representations, while ignoring their internal relations. Instead, we propose to apply structured alignments, which work with graph representation of visual and textual content, aiming to capture the deep connections between the visual and textual modalities. Nevertheless, it is nontrivial to represent and integrate graphs for structured alignments. In this work, we attempt to solve this issue by first converting different modality entities into sequential nodes and the adjacency graph, then incorporating them for structured alignments.As demonstrated in our experimental results, such a structured alignment improves reasoning performance. In addition, our model also exhibits better interpretability for each generated answer. The proposed model, without any pretraining, outperforms the state-of-the-art methods on GQA dataset, and beats the non-pretrained state-of-the-art methods on VQA-v2 dataset.

show abstract

Visual question answering by pattern matching and reasoning

Zhan

Xiong²,

Wang

et al. 2022

Neurocomputing

View full text Add to dashboard Cite

MGA-VQA: Multi-Granularity Alignment for Visual Question Answering

Xiong¹,

Shen²,

Jin³

2022

Preprint

View full text Add to dashboard Cite

Learning to answer visual questions is a challenging task since the multi-modal inputs are within two feature spaces. Moreover, reasoning in visual question answering requires the model to understand both image and question, and align them in the same space, rather than simply memorize statistics about the question-answer pairs. Thus, it is essential to find component connections between different modalities and within each modality to achieve better attention. Previous works learned attention weights directly on the features. However, the improvement is limited since these two modality features are in two domains: image features are highly diverse, lacking structure and grammatical rules as language, and natural language features have a higher probability of missing detailed information. To better learn the attention between visual and text, we focus on how to construct input stratification and embed structural information to improve the alignment between different level components. We propose Multi-GranularityAlignment architecture for Visual Question Answering task (MGA-VQA), which learns intra-and inter-modality correlations by multi-granularity alignment, and outputs the final result by the decision fusion module. In contrast to previous works, our model splits alignment into different levels to achieve learning better correlations without needing additional data and annotations. The experiments on the VQA-v2 and GQA datasets demonstrate that our model significantly outperforms non-pretrained state-of-the-art methods on both datasets without extra pretraining data and annotations. Moreover, it even achieves better results over the pre-trained methods on GQA.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Peixi Xiong

Visual Query Answering by Entity-Attribute Graph Matching and Reasoning

Towards Better Driver Safety: Empowering Personal Navigation Technologies with Road Safety Awareness

SA-VQA: Structured Alignment of Visual and Semantic Representations for Visual Question Answering

Visual question answering by pattern matching and reasoning

MGA-VQA: Multi-Granularity Alignment for Visual Question Answering

Contact Info

Product

Resources

About