2006
DOI: 10.1109/tpami.2006.77
|View full text |Cite
|
Sign up to set email alerts
|

Metric learning for text documents

Abstract: Many algorithms in machine learning rely on being given a good distance metric over the input space. Rather than using a default metric such as the Euclidean metric, it is desirable to obtain a metric based on the provided data. We consider the problem of learning a Riemannian metric associated with a given differentiable manifold and a set of points. Our approach to the problem involves choosing a metric from a parametric family that is based on maximizing the inverse volume of a given data set of points. Fro… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
66
0
3

Year Published

2008
2008
2021
2021

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 112 publications
(69 citation statements)
references
References 3 publications
0
66
0
3
Order By: Relevance
“…When the training set is unlabeled, manifold learning has to be based on purely geometrical considerations. G. Lebanon [19] has recently suggested in the context of document retrieval an approach that seeks to maximize the inverse volume element associated with a metric around the given training set of points [22]:…”
Section: Objective Functions: Classification Performance and Inverse mentioning
confidence: 99%
See 3 more Smart Citations
“…When the training set is unlabeled, manifold learning has to be based on purely geometrical considerations. G. Lebanon [19] has recently suggested in the context of document retrieval an approach that seeks to maximize the inverse volume element associated with a metric around the given training set of points [22]:…”
Section: Objective Functions: Classification Performance and Inverse mentioning
confidence: 99%
“…By definition, the push-forward F * λ of a vector v ∈ T m M under a an automorphism F λ with parameter λ is given by [19]:…”
Section: Objective Functions: Classification Performance and Inverse mentioning
confidence: 99%
See 2 more Smart Citations
“…A standard method for differentiating document classes is to form a probability distribution over a dictionary and use methods of information geometry to determine a similarity between data sets [2]. To the best of our knowledge, most metrics in document classification which are based on word probabilities [3] do not restrict the probability manifold to be a lower dimensional parametric manifold. As a result, a geodesic may go through many probability models that are not admissible in the context of document classification (e.g., a text with only the words 'the' and 'of').…”
Section: Introductionmentioning
confidence: 99%