2021
DOI: 10.46298/jdmdh.6107
|View full text |Cite
|
Sign up to set email alerts
|

Combining Visual and Textual Features for Semantic Segmentation of Historical Newspapers

Abstract: The massive amounts of digitized historical documents acquired over the last decades naturally lend themselves to automatic processing and exploration. Research work seeking to automatically process facsimiles and extract information thereby are multiplying with, as a first essential step, document layout analysis. If the identification and categorization of segments of interest in document images have seen significant progress over the last years thanks to deep learning techniques, many challenges remain with… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
19
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 33 publications
(19 citation statements)
references
References 33 publications
0
19
0
Order By: Relevance
“…CV-based methods introduce document semantics through text embedding maps, and model layout analysis as object detection or segmentation task. MFCN [39] introduces sentence granularity semantics and inserts the text embedding maps at the decision-level (end of network), while dhSegment T 2 [3] introduces character granularity semantics and inserts text embedding maps at the input-level. Though showing great success, the above methods also bear the following limitations: limited semantics used, simple modality fusion strategy and lack of relation modeling between components.…”
Section: Related Workmentioning
confidence: 99%
See 3 more Smart Citations
“…CV-based methods introduce document semantics through text embedding maps, and model layout analysis as object detection or segmentation task. MFCN [39] introduces sentence granularity semantics and inserts the text embedding maps at the decision-level (end of network), while dhSegment T 2 [3] introduces character granularity semantics and inserts text embedding maps at the input-level. Though showing great success, the above methods also bear the following limitations: limited semantics used, simple modality fusion strategy and lack of relation modeling between components.…”
Section: Related Workmentioning
confidence: 99%
“…CNN is known to be good at learning deep features. However, previous multimodal layout analysis works [39,3] only apply it to extract visual features. Text embedding maps are directly used as semantic features.…”
Section: Two-stream Convnetsmentioning
confidence: 99%
See 2 more Smart Citations
“…Logical layout analysis systems applied to historical documents must then account for the diachronic aspect of their layouts and adapt to the changes. [22] propose a system that goes beyond usual logical labels by labelling physical block as either Serial, Weather Forecast, Death Notice and Stock Exchange Table . To do so, their system combines visual and textual features using the word-embedding representation of each word and its coordinates on the page. Their results show that combining textual and visual features provide better results in most cases than using just one of them.…”
Section: Introductionmentioning
confidence: 99%