Edge-Aware Graph Representation Learning and Reasoning for Face Parsing

Te, Gusi; Liu, Yinglu; Hu, Wei; Shi, Hailin; Mei, Tao

doi:10.1007/978-3-030-58610-2_16

Cited by 49 publications

(33 citation statements)

References 27 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…To explore high-order relations between the lower-level local features from CIM and higher-level cues from CFM. We introduce the non-local [77], [78] operation under graph convolution domain [79] to implement our similarity aggregation module (SAM). As a result, SAM can inject detailed appearance features into high-level semantic features using global attention.…”

Section: E Similarity Aggregation Modulementioning

confidence: 99%

“…K T is the transpose of K and f is the correlation attention map. After obtaining the correlation attention map f , we multiply it with the feature map Q, and the result features are fed to the graph convolutional layer [78] GCN(•), leading to G ∈ R 4×4×16 . Same to [78], we calculate the inner product between f and G as Eqn.…”

Section: E Similarity Aggregation Modulementioning

confidence: 99%

“…After obtaining the correlation attention map f , we multiply it with the feature map Q, and the result features are fed to the graph convolutional layer [78] GCN(•), leading to G ∈ R 4×4×16 . Same to [78], we calculate the inner product between f and G as Eqn. 9, reconstructing the graph domain features into the original structural features:…”

Section: E Similarity Aggregation Modulementioning

confidence: 99%

See 2 more Smart Citations

Polyp-PVT: Polyp Segmentation with Pyramid Vision Transformers

Dong¹,

Wang²,

Fan³

et al. 2021

Preprint

View full text Add to dashboard Cite

Most polyp segmentation methods use CNNs as their backbone, leading to two key issues when exchanging information between the encoder and decoder: 1) taking into account the differences in contribution between different-level features; and 2) designing effective mechanism for fusing these features. Different from existing CNN-based methods, we adopt a transformer encoder, which learns more powerful and robust representations. In addition, considering the image acquisition influence and elusive properties of polyps, we introduce three novel modules, including a cascaded fusion module (CFM), a camouflage identification module (CIM), a and similarity aggregation module (SAM). Among these, the CFM is used to collect the semantic and location information of polyps from high-level features, while the CIM is applied to capture polyp information disguised in low-level features. With the help of the SAM, we extend the pixel features of the polyp area with high-level semantic position information to the entire polyp area, thereby effectively fusing cross-level features. The proposed model, named Polyp-PVT, effectively suppresses noises in the features and significantly improves their expressive capabilities. Extensive experiments on five widely adopted datasets show that the proposed model is more robust to various challenging situations (e.g., appearance changes, small objects) than existing methods, and achieves the new state-of-the-art performance. The proposed model is available at https://github.com/DengPingFan/Polyp-PVT.

show abstract

Section: E Similarity Aggregation Modulementioning

confidence: 99%

Section: E Similarity Aggregation Modulementioning

confidence: 99%

Section: E Similarity Aggregation Modulementioning

confidence: 99%

See 1 more Smart Citation

Polyp-PVT: Polyp Segmentation with Pyramid Vision Transformers

Dong¹,

Wang²,

Fan³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…We perform face landmark detection on the videos before training. We optionally use a state-ofthe-art face parsing algorithm [44] to obtain parsing maps of the videos. During training, we randomly select pairs of source and driving images from each training video.…”

Section: Training Lossesmentioning

confidence: 99%

Sparse to Dense Motion Transfer for Face Image Animation

Zhao¹,

Wu²,

Guo³

2021

Preprint

View full text Add to dashboard Cite

Face image animation from a single image has achieved remarkable progress. However, it remains challenging when only sparse landmarks are available as the driving signal. Given a source face image and a sequence of sparse face landmarks, our goal is to generate a video of the face imitating the motion of landmarks. We develop an efficient and effective method for motion transfer from sparse landmarks to the face image. We then combine global and local motion estimation in a unified model to faithfully transfer the motion. The model can learn to segment the moving foreground from the background and generate not only global motion, such as rotation and translation of the face, but also subtle local motion such as the gaze change. We further improve face landmark detection on videos. With temporally better aligned landmark sequences for training, our method can generate temporally coherent videos with higher visual quality. Experiments suggest we achieve results comparable to the state-of-the-art image driven method on the same identity testing and better results on cross identity testing.

show abstract

“…Recently, graph convolution [11] has been incorporated into computer vision tasks for globally reasoning, which can be generally summarized as two kinds of approaches: feature space graph convolution and coordinate space graph convolution. The feature space graph convolution captures interdependencies along the channel dimensions of the feature map, which projects the feature into a non-coordinate space [12][13][14][15]; whistle coordinate space graph convolution explicitly models the spatial relationships between pixels [16][17][18][19][20], which projects the feature into a new coordinate space, to produce coherent prediction between the disjoint infections.…”

Section: Introductionmentioning

confidence: 99%

Graph-Based Pyramid Global Context Reasoning With a Saliency- Aware Projection for Covid-19 Lung Infections Segmentation

Huang

Cai

Lin

et al. 2021

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

Coronavirus Disease 2019 has rapidly spread in 2020, emerging a mass of studies for lung infection segmentation from CT images. Though many methods have been proposed for this issue, it is a challenging task because of infections of various size appearing in different lobe zones. To tackle these issues, we propose a Graph-based Pyramid Global Context Reasoning (Graph-PGCR) module, which is capable of modeling long-range dependencies among disjoint infections as well as adapt size variation. We first incorporate graph convolution to exploit long-term contextual information from multiple lobe zones. Different from previous average pooling or maximum object probability, we propose a saliency-aware projection mechanism to pick up infection-related pixels as a set of graph nodes. After graph reasoning, the relation-aware features are reversed back to the original coordinate space for the down-stream tasks. We further construct multiple graphs with different sampling rates to handle the size variation problem. To this end, distinct multi-scale long-range contextual patterns can be captured. Our Graph-PGCR module is plug-and-play, which can be integrated into any architecture to improve its performance. Experiments demonstrated that the proposed method consistently boost the performance of state-of-the-art backbone architectures on both of public and our private COVID-19 datasets.

show abstract

Edge-Aware Graph Representation Learning and Reasoning for Face Parsing

Cited by 49 publications

References 27 publications

Polyp-PVT: Polyp Segmentation with Pyramid Vision Transformers

Polyp-PVT: Polyp Segmentation with Pyramid Vision Transformers

Sparse to Dense Motion Transfer for Face Image Animation

Graph-Based Pyramid Global Context Reasoning With a Saliency- Aware Projection for Covid-19 Lung Infections Segmentation

Contact Info

Product

Resources

About