Yunsheng Bai scite author profile

Graph similarity search is among the most important graph-based applications, e.g. finding the chemical compounds that are most similar to a query compound. Graph similarity/distance computation, such as Graph Edit Distance (GED) and Maximum Common Subgraph (MCS), is the core operation of graph similarity search and many other applications, but very costly to compute in practice. Inspired by the recent success of neural network approaches to several graph applications, such as node or graph classification, we propose a novel neural network based approach to address this classic yet challenging graph problem, aiming to alleviate the computational burden while preserving a good performance. The proposed approach, called SimGNN, combines two strategies. First, we design a learnable embedding function that maps every graph into an embedding vector, which provides a global summary of a graph. A novel attention mechanism is proposed to emphasize the important nodes with respect to a specific similarity metric. Second, we design a pairwise node comparison method to supplement the graph-level embeddings with fine-grained node-level information. Our model achieves better generalization on unseen graphs, and in the worst case runs in quadratic time with respect to the number of nodes in two graphs. Taking GED computation as an example, experimental results on three real graph datasets demonstrate the effectiveness and efficiency of our approach. Specifically, our model achieves smaller error rate and great time reduction compared against a series of baselines, including several approximation algorithms on GED computation, and many existing graph neural network based models. Our study suggests SimGNN provides a new direction for future research on graph similarity computation and graph similarity search 1 .

show abstract

Learning-Based Efficient Graph Similarity Computation via Multi-Scale Convolutional Set Matching

Bai

Ding²,

et al. 2020

AAAI

View full text Add to dashboard Cite

Graph similarity computation is one of the core operations in many graph-based applications, such as graph similarity search, graph database analysis, graph clustering, etc. Since computing the exact distance/similarity between two graphs is typically NP-hard, a series of approximate methods have been proposed with a trade-off between accuracy and speed. Recently, several data-driven approaches based on neural networks have been proposed, most of which model the graph-graph similarity as the inner product of their graph-level representations, with different techniques proposed for generating one embedding per graph. However, using one fixed-dimensional embedding per graph may fail to fully capture graphs in varying sizes and link structures—a limitation that is especially problematic for the task of graph similarity computation, where the goal is to find the fine-grained difference between two graphs. In this paper, we address the problem of graph similarity computation from another perspective, by directly matching two sets of node embeddings without the need to use fixed-dimensional vectors to represent whole graphs for their similarity computation. The model, Graph-Sim, achieves the state-of-the-art performance on four real-world graph datasets under six out of eight settings (here we count a specific dataset and metric combination as one setting), compared to existing popular methods for approximate Graph Edit Distance (GED) and Maximum Common Subgraph (MCS) computation.

show abstract

Unsupervised Inductive Graph-Level Representation Learning via Graph-Graph Proximity

Bai

Ding

Qiao

et al. 2019

View full text Add to dashboard Cite

We introduce a novel approach to graph-level representation learning, which is to embed an entire graph into a vector space where the embeddings of two graphs preserve their graph-graph proximity. Our approach, UGRAPHEMB, is a general framework that provides a novel means to performing graph-level embedding in a completely unsupervised and inductive manner. The learned neural network can be considered as a function that receives any graph as input, either seen or unseen in the training set, and transforms it into an embedding. A novel graph-level embedding generation mechanism called Multi-Scale Node Attention (MSNA), is proposed. Experiments on five real graph datasets show that UGRAPHEMB achieves competitive accuracy in the tasks of graph classification, similarity ranking, and graph visualization.

show abstract

GHashing: Semantic Graph Hashing for Approximate Similarity Search in Graph Databases

Qin

Bai

Sun

2020

View full text Add to dashboard Cite

A Comprehensive Typing System for Information Extraction from Clinical Narratives

Caufield

Zhou

Bai

et al. 2019

Preprint

View full text Add to dashboard Cite

We have developed ACROBAT (Annotation for Case Reports using Open Biomedical Annotation Terms), a typing system for detailed information extraction from clinical text. This resource supports detailed identification and categorization of entities, events, and relations within clinical text documents, including clincal case reports (CCRs) and the free-text components of electronic health records. Using ACROBAT and the text of 200 CCRs, we annotated a wide variety of real-world clinical disease presentations. The resulting dataset, MACCROBAT2018, is a rich collection of annotated clinical language appropriate for training biomedical natural language processing systems.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Yunsheng Bai

SimGNN

Learning-Based Efficient Graph Similarity Computation via Multi-Scale Convolutional Set Matching

Unsupervised Inductive Graph-Level Representation Learning via Graph-Graph Proximity

GHashing: Semantic Graph Hashing for Approximate Similarity Search in Graph Databases

A Comprehensive Typing System for Information Extraction from Clinical Narratives

Contact Info

Product

Resources

About