2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR) 2017
DOI: 10.1109/icdar.2017.75
|View full text |Cite
|
Sign up to set email alerts
|

Fully Convolutional Neural Networks for Newspaper Article Segmentation

Abstract: Segmenting newspaper pages into articles that semantically belong together is a necessary prerequisite for article-based information retrieval on print media collections like e.g. archives and libraries. It is challenging due to vastly differing layouts of papers, various content types and different languages, but commercially very relevant for e.g. media monitoring. We present a semantic segmentation approach based on the visual appearance of each page. We apply a fully convolutional neural network (FCN) that… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
11
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
6
1
1

Relationship

2
6

Authors

Journals

citations
Cited by 34 publications
(11 citation statements)
references
References 13 publications
0
11
0
Order By: Relevance
“…This indicates manual segmentation, which involves much less effort than OCR postcorrection, is a worthy target when some manual annotation resources are available. Arguably, segmentation can also be improved further by the inclusion of visual features (Meier et al, 2017), which appears a promising direction for future research.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…This indicates manual segmentation, which involves much less effort than OCR postcorrection, is a worthy target when some manual annotation resources are available. Arguably, segmentation can also be improved further by the inclusion of visual features (Meier et al, 2017), which appears a promising direction for future research.…”
Section: Discussionmentioning
confidence: 99%
“…The task tackled in this paper can be split into two sub-tasks: the detection of the different articles and the clustering of parts of the same article. Most previous work performs the segmentation of newspaper pages directly at the image level (Hebert et al, 2014;Meier et al, 2017). This strategy avoids having to deal with spelling errors arising from OCR.…”
Section: Related Workmentioning
confidence: 99%
“…The fully convolutional neural network used in our second approach is built with three logical parts (cp. Meier et al (2017) and Figure 8). Initially, feature extraction is done the same way as with a standard CNN.…”
Section: Cnn-based Pixel Classification Vs One-pass Fcnsmentioning
confidence: 94%
“…Only a few recent work attempt to make use of image and/or localized, two-dimension text information. Meier et al [2017] use a FCN based on image and OCR output information in order to detect articles in newspaper images (no further segment types). In this case text is reduced to a binary feature information (a pixel has text or not) and the lexical and semantic dimensions are not taken into account.…”
Section: Related Workmentioning
confidence: 99%