Proceedings IEEE International Forum on Research and Technology Advances in Digital Libraries -ADL'98-
DOI: 10.1109/adl.1998.670374
|View full text |Cite
|
Sign up to set email alerts
|

Applying data mining techniques for descriptive phrase extraction in digital document collections

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
33
0

Publication Types

Select...
3
3
2

Relationship

0
8

Authors

Journals

citations
Cited by 51 publications
(33 citation statements)
references
References 5 publications
0
33
0
Order By: Relevance
“…In phrase based method document is analyzed on phrase basis as phrases are less ambiguous and more discriminative than individual terms [2]. The likely reasons for the daunting performance include: 1) Phrases have inferior statistical properties to terms, 2) They have low frequency of occurrence, and 3) Large numbers of redundant and noisy phrases are present among them.…”
Section: Phrase Based Methodsmentioning
confidence: 99%
“…In phrase based method document is analyzed on phrase basis as phrases are less ambiguous and more discriminative than individual terms [2]. The likely reasons for the daunting performance include: 1) Phrases have inferior statistical properties to terms, 2) They have low frequency of occurrence, and 3) Large numbers of redundant and noisy phrases are present among them.…”
Section: Phrase Based Methodsmentioning
confidence: 99%
“…Our research mainly focuses on the statistical approach, which does not need any grammatical knowledge and has easy adaptability to other languages. Statistical phrase-finding approaches have been used for expanding vector dimensions in clustering multiple documents [21,22], or finding more descriptive or important/meaningful phrases [1,2]. This paper compares previous statistical approaches and attempts to find meaningful phrases in a document.…”
Section: Introductionmentioning
confidence: 99%
“…Ahonen et al [1], Zamir and Etzioni [23], and Chan [2] introduced phrase-finding algorithms. Ahonen's algorithm depends on conditional probability and needs a fixed maximum phrase length.…”
Section: Introductionmentioning
confidence: 99%
“…A sequence S = ¢s 1 , s 2 ,…, s n ² (s i Ӈ T) is an ordered list of terms. A sequence Į = ¢a 1 , a 2 ,…, a n ² is a sub-sequence of another sequence ȕ = ¢b 1 , b 2 …”
Section: Basic Definitionmentioning
confidence: 99%
“…The version of this data collection we chose is Reuters Corpus Volume 1 (RCV1) 2 , which includes 806,791 news stories. The English language stories are produced by Rueters journalists for the period between 20 August 1996 and 19 August 1997.…”
Section: Real World Datasetsmentioning
confidence: 99%