2015
DOI: 10.1109/tpami.2014.2375175
|View full text |Cite
|
Sign up to set email alerts
|

Normalized Compression Distance of Multisets with Applications

Abstract: Pairwise normalized compression distance (NCD) is a parameter-free, feature-free, alignment-free, similarity metric based on compression. We propose an NCD of multisets that is also metric. Previously, attempts to obtain such an NCD failed. For classification purposes it is superior to the pairwise NCD in accuracy and implementation complexity. We cover the entire trajectory from theoretical underpinning to feasible practice. It is applied to biological (stem cell, organelle transport) and OCR classification q… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
53
0
1

Year Published

2015
2015
2021
2021

Publication Types

Select...
4
4

Relationship

1
7

Authors

Journals

citations
Cited by 39 publications
(54 citation statements)
references
References 37 publications
0
53
0
1
Order By: Relevance
“…24 CNNs have been applied in many machine vision tasks, including biological image analysis. In other work, we have found approaches based on CNNs to obtain results comparable to the NCD, 8 and have also found the ability to improve the performance of the NCD by combining it with the subsequent neural network or other supervised machine learning algorithms. One key difference is that because the NCD is a metric distance, it enables direct comparisons of visual similarity among different image training sets.…”
Section: Discussionmentioning
confidence: 90%
See 2 more Smart Citations
“…24 CNNs have been applied in many machine vision tasks, including biological image analysis. In other work, we have found approaches based on CNNs to obtain results comparable to the NCD, 8 and have also found the ability to improve the performance of the NCD by combining it with the subsequent neural network or other supervised machine learning algorithms. One key difference is that because the NCD is a metric distance, it enables direct comparisons of visual similarity among different image training sets.…”
Section: Discussionmentioning
confidence: 90%
“…The NCD uses file compression algorithms as a basis for approximating the relative Kolmogorov complexity. 8,9 Given a set of images X, the NCD of that is defined as…”
Section: Normalized Compression Distancementioning
confidence: 99%
See 1 more Smart Citation
“…Among them two widely known measures are Normalized Information Distance (NID) [15] and the Compression Dissimilarity Measure (CDM) [12]. Whereas CDM works only with two objects, there is a generalisation of NID for multiple data objects [5]. However, here we are not interested in similarity, but in correlation.…”
Section: Related Workmentioning
confidence: 99%
“…The popularity of the Internet and ease of access has motivated millions of users to input billions of words to create trillions of Web pages of, on average, low quality contents. Normalized Google Distance (NGD) is a semantic similarity measure derived from the number of hits returned by the Google search engine for a given set of keywords [6].…”
Section: Introductionmentioning
confidence: 99%