Language Modeling for Information Retrieval 2003
DOI: 10.1007/978-94-017-0171-6_7
|View full text |Cite
|
Sign up to set email alerts
|

Using Compression-Based Language Models for Text Categorization

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
60
0
2

Year Published

2004
2004
2018
2018

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 92 publications
(63 citation statements)
references
References 14 publications
1
60
0
2
Order By: Relevance
“…Hence, it was considered the ideal testing ground for early authorship attribution studies as well as the first fully-automated approaches (Holmes & Forsyth, 1995;Tweedie, et al, 1996). It is also used in some modern studies (Teahan & Harper, 2003;Marton, et al, 2005). Although appealing, this case has a number of important weaknesses.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…Hence, it was considered the ideal testing ground for early authorship attribution studies as well as the first fully-automated approaches (Holmes & Forsyth, 1995;Tweedie, et al, 1996). It is also used in some modern studies (Teahan & Harper, 2003;Marton, et al, 2005). Although appealing, this case has a number of important weaknesses.…”
Section: Discussionmentioning
confidence: 99%
“…It has to be underlined that the prediction by partial matching (PPM) algorithm (Teahan & Harper, 2003) that is used by RAR to compress text files works practically the same as the method of Peng, et al (2004). However, there is a significant difference with the previously described probabilistic method.…”
Section: Compression Modelsmentioning
confidence: 99%
“…Also, the method is very easy to understand and implement, unlike the PPM approach [8], with almost similar detection rates.…”
Section: Introductionmentioning
confidence: 99%
“…Character level language models have been found to be effective in text classification [9] and author attribution [10] tasks. The present paper deals with a relatively small corpus of spoken message transcripts.…”
Section: Introductionmentioning
confidence: 99%