Proceedings of the 2003 ACM Symposium on Document Engineering 2003
DOI: 10.1145/958220.958249
|View full text |Cite
|
Sign up to set email alerts
|

Structured multimedia document classification

Abstract: We propose a new statistical model for the classification of structured documents and consider its use for multimedia document classification. Its main originality is its ability to simultaneously take into account the structural and the content information present in a structured document, and also to cope with different types of content (text, image, etc). We present experiments on the classification of multilingual pornographic HTML pages using text and image data. The system accurately classifies porn site… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
7
0

Year Published

2005
2005
2020
2020

Publication Types

Select...
6
1
1

Relationship

1
7

Authors

Journals

citations
Cited by 15 publications
(7 citation statements)
references
References 16 publications
0
7
0
Order By: Relevance
“…Some of the methods utilize relevant text information to enhance image features [7]. In [8], the authors introduce an approach that combines the textual and visual statistics as a single feature vector. The approach applies color histograms and a dominant orientation histogram to represent the images into vectors, then combines the vectors with a text vector generated from Latent Semantic Indexing.…”
Section: B Multi-model Documentmentioning
confidence: 99%
“…Some of the methods utilize relevant text information to enhance image features [7]. In [8], the authors introduce an approach that combines the textual and visual statistics as a single feature vector. The approach applies color histograms and a dominant orientation histogram to represent the images into vectors, then combines the vectors with a text vector generated from Latent Semantic Indexing.…”
Section: B Multi-model Documentmentioning
confidence: 99%
“…Denoyer [8,9] propose a statistical graph representation for the classification of structured documents. Their system classifies multimedia documents, where text and images are mixed together in HTML pages.…”
Section: Related Workmentioning
confidence: 99%
“…where l β corresponds to the model of the news category l w from the set (9) of L categories. In this setting, we define a collection…”
Section: Web News Categorizationmentioning
confidence: 99%
“…Moreover, its learning complexity is linear with respect to the size of the documents. This type of model has been used both for the categorization and the clustering of XML documents and the authors have proposed extensions that take into account different information content (text, pictures,...) for the multimedia filtering task (Denoyer et al, 2003). A discriminative algorithm has also been developed for the categorization task.…”
Section: Stochastic Generative Modelmentioning
confidence: 99%