Proceedings of the 30th ACM International Conference on Multimedia 2022
DOI: 10.1145/3503161.3548424
|View full text |Cite
|
Sign up to set email alerts
|

A Tree-Based Structure-Aware Transformer Decoder for Image-To-Markup Generation

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
9
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 20 publications
(15 citation statements)
references
References 18 publications
0
9
0
Order By: Relevance
“…To further improve the performance of encoder, DenseWAP [14] improved on the basis of WAP, used a more efficient DenseNet as the encoder, and proposed a multi-scale attention model, which can handle mathematical symbols of different scales well, and retain more details lost due to pooling operations. Later, he proposed DenseWAP-TD [15] model, which replaced string decoder with tree structure decoder based on DenseWAP to improve the parsing ability of complex structures. Li et al [16] proposed a HMER method with scale enhancement and drop attention.…”
Section: Hmer Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…To further improve the performance of encoder, DenseWAP [14] improved on the basis of WAP, used a more efficient DenseNet as the encoder, and proposed a multi-scale attention model, which can handle mathematical symbols of different scales well, and retain more details lost due to pooling operations. Later, he proposed DenseWAP-TD [15] model, which replaced string decoder with tree structure decoder based on DenseWAP to improve the parsing ability of complex structures. Li et al [16] proposed a HMER method with scale enhancement and drop attention.…”
Section: Hmer Methodsmentioning
confidence: 99%
“…At the inference stage, beam search is applied to find the output with the maximum probability, and the beam size is set to 4. In order to verify the effectiveness of the ClipMath we propose, we will compare it with other state-of-the-art methods, including DenseWAP [14], PAL-v2 [30], WS-WAP [31], DenseWAP-MSA [14], DenseWAP-TD [15], BTTR [11], and DATWAP [32]. We take DenseWAP as the baseline, and do not use any data enhancement in the experiment.…”
Section: Experimental Configurationmentioning
confidence: 99%
“…To improve the encoder's effectiveness, we apply sequence self-attention and ASG self-attention to the encoder to highlight dependencies between every node pair and every adjacent ASG node pair. Following existing work [10], [18]- [20], we use two vectors/matrices ("ASG Node Sequence" and "Adjacency Matrix" in Figure 2) to represent each ASG, which are the input formats that a deep learning encoder can take. The ASG Node Sequence, notated by {n i } = (n 1 , n 2 , .…”
Section: Encodermentioning
confidence: 99%
“…Context matching loss is adapted to constrain the intra-class distance and enhance the discriminative power of model. Lately, Zhang et al [46] devised a tree-based decoder to parse mathematical expression. At each step, a parent and child node pair was generated and the relation between parent node and child node reflects the structure type.…”
Section: Related Workmentioning
confidence: 99%
“…As shown in tab. 1 and 2, we compare our prosposed method with DWAP [42], DWAP-TD [46], BTTR [51], ABM [3], SAN [39] and CAN [17] on HME100K dataset. It is clear to notice that SAM-DWAP and SAM-CAN achieves the best performance.…”
Section: Comparison With State-of-the-artmentioning
confidence: 99%