Proceedings of the 2007 ACM Symposium on Document Engineering 2007
DOI: 10.1145/1284420.1284441
|View full text |Cite
|
Sign up to set email alerts
|

XML version detection

Abstract: The problem of version detection is critical in many important application scenarios, including software clone identification, Web page ranking, plagiarism detection, and peer-to-peer searching. A natural and commonly used approach to version detection relies on analyzing the similarity between files. Most of the techniques proposed so far rely on the use of hard thresholds for similarity measures. However, defining a threshold value is problematic for several reasons: in particular (i) the threshold value is … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2008
2008
2014
2014

Publication Types

Select...
2
2
1

Relationship

0
5

Authors

Journals

citations
Cited by 6 publications
(4 citation statements)
references
References 27 publications
0
4
0
Order By: Relevance
“…A systematic approach to applying machine learning classifiers follows four phases: 1)Training; 2)Testing; 3)Validation; and 4)Classification [9,5]…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…A systematic approach to applying machine learning classifiers follows four phases: 1)Training; 2)Testing; 3)Validation; and 4)Classification [9,5]…”
Section: Methodsmentioning
confidence: 99%
“…Closer to our work but applied on XML documents, [9] attempts to identify versions of documents using naive Bayes classifiers. The authors use a similarity measure dedicated to XML documents as input for the classifiers, and apply the approach on a set of automatically generated documents in a closed domain.…”
Section: Related Workmentioning
confidence: 99%
“…operation : (type, position, v, v , fingerprint(position)) (8) We stored the delta itself within an XML file, using the root node <delta>. Edit operations are mapped to the XML domain as follows:…”
Section: Delta Formatmentioning
confidence: 99%
“…As the context of edit operations is quite small, this technique is not promising for our scenario. Another approach tailored for XML documents was presented in [8]. As our fingerprint technique maps to linear data, thus dissolving the tree structure, this approach is not applicable, either.…”
Section: Related Workmentioning
confidence: 99%