2021
DOI: 10.48550/arxiv.2109.10057
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

LOTR: Face Landmark Localization Using Localization Transformer

Ukrit Watchareeruetai,
Benjaphan Sommana,
Sanjana Jain
et al.

Abstract: This paper presents a novel Transformer-based facial landmark localization network named Localization Transformer (LOTR). The proposed framework is a direct coordinate regression approach leveraging a Transformer network to better utilize the spatial information in the feature map. An LOTR model consists of three main modules: 1) a visual backbone that converts an input image into a feature map, 2) a Transformer module that improves the feature representation from the visual backbone, and 3) a landmark predict… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 41 publications
0
1
0
Order By: Relevance
“…ViT [9] proposes a transformer encoder architecture for image classification, which directly splits the image into patches and introduces a learnable classification token to aid in performing the task. Recently, several methods have proposed variant forms of transformers for landmark detection [45,18,50]. Our method is motivated by these recent works, in that we regress the object keypoints as well as affine matrices for image animation.…”
Section: Related Workmentioning
confidence: 99%
“…ViT [9] proposes a transformer encoder architecture for image classification, which directly splits the image into patches and introduces a learnable classification token to aid in performing the task. Recently, several methods have proposed variant forms of transformers for landmark detection [45,18,50]. Our method is motivated by these recent works, in that we regress the object keypoints as well as affine matrices for image animation.…”
Section: Related Workmentioning
confidence: 99%