2021
DOI: 10.48550/arxiv.2105.08194
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Visual FUDGE: Form Understanding via Dynamic Graph Editing

Abstract: We address the problem of form understanding: finding text entities and the relationships/links between them in form images. The proposed FUDGE model formulates this problem on a graph of text elements (the vertices) and uses a Graph Convolutional Network to predict changes to the graph. The initial vertices are detected text lines and do not necessarily correspond to the final text entities, which can span multiple lines. Also, initial edges contain many false-positive relationships. FUDGE edits the graph str… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(3 citation statements)
references
References 14 publications
0
3
0
Order By: Relevance
“…capturing K-Nearest Neighbours. GNN-based models can be used for various document-grounded tasks such as text classification [25,27] or key information extraction [2,26]. However, their performance still lags behind that of layout language models.…”
Section: Related Workmentioning
confidence: 99%
“…capturing K-Nearest Neighbours. GNN-based models can be used for various document-grounded tasks such as text classification [25,27] or key information extraction [2,26]. However, their performance still lags behind that of layout language models.…”
Section: Related Workmentioning
confidence: 99%
“…Sub-fields including Named-Entity Recognition (NER) [2], layout understanding [7] and document classification [22] all seek to extract meaningful information from documents. Another sub-field of VrDU, relation extraction (RE) offers the possibility of linking named entities in documents so that a paired relationship can be identified [3,5,6,11,23]. Typically, relations are defined in a question-answer (Q/A) format and the RE task is to define a function which predicts if a pair of entities in a document are related or not [11,23].…”
Section: Introductionmentioning
confidence: 99%
“…These approaches enable learning of joint representations in a single end-to-end training procedure with the aim of maximising the total information in a document. Although transformer-based architectures are prominent in this field [13,23], other methods for optimizing RE tasks, such as graph neural networks [3,6], have been reported.…”
Section: Introductionmentioning
confidence: 99%