2008 International Conference on Advanced Language Processing and Web Information Technology 2008
DOI: 10.1109/alpit.2008.76
|View full text |Cite
|
Sign up to set email alerts
|

Near-Duplicates Detection for Vietnamese Documents in Large Database

Abstract: Near-duplicate documents exacerbate the problem of information overload. Research in detecting near-duplicates has attracted a lot of attention from both industry and academia. In this paper, we focus on addressing this problem for Vietnamese documents which, to the best of our knowledge, has not been done before. Most of the current algorithms have been designed for English which are not directly applicable to Vietnamese -a monosyllabic language. We propose to combine Charikar's algorithm [2] with a "weightin… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

0
0
0

Publication Types

Select...

Relationship

0
0

Authors

Journals

citations
Cited by 0 publications
references
References 12 publications
0
0
0
Order By: Relevance

No citations

Set email alert for when this publication receives citations?