2016
DOI: 10.1016/j.patrec.2015.11.004
|View full text |Cite
|
Sign up to set email alerts
|

Toward a generic representation of random variables for machine learning

Abstract: This paper presents a pre-processing and a distance which improve the performance of machine learning algorithms working on independent and identically distributed stochastic processes. We introduce a novel non-parametric approach to represent random variables which splits apart dependency and distribution without losing any information. We also propound an associated metric leveraging this representation and its statistical estimate. Besides experiments on synthetic datasets, the benefits of our contribution … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
11
0

Year Published

2016
2016
2022
2022

Publication Types

Select...
5
2
2

Relationship

2
7

Authors

Journals

citations
Cited by 12 publications
(11 citation statements)
references
References 29 publications
0
11
0
Order By: Relevance
“…In the clustering literature,[8] make an e ort to overcome this risk by designing a distance measure that incorporates both information from the margins and the dependence structure of the assets.…”
mentioning
confidence: 99%
“…In the clustering literature,[8] make an e ort to overcome this risk by designing a distance measure that incorporates both information from the margins and the dependence structure of the assets.…”
mentioning
confidence: 99%
“…The asymmetry of KL divergence has restricted the application of KL divergence in practical applications. Researchers seek for other divergences in different contexts (e.g., [18], [19], [20]). Pardo surveys a wide range of divergences in his book [2].…”
Section: Related Workmentioning
confidence: 99%
“…Authors leverage the empirical copula transform for several purposes: [6] benefit from its invariance to strictly increasing transformation of X i variables ( Fig. 1) for improving feature selection, [7] to obtain a dependence coefficient invariant with respect to marginal distribution transformations, and [5] to study separately dependence and margins for clustering.…”
Section: The Copula Transformmentioning
confidence: 99%
“…For example, in the specific case of N time series whose observed values are drawn from T independent and identically distributed random variables, one should take into account all the available information in these N time series, i.e. dependence between them and the N marginal distributions, in order to design a proper distance for clustering [5]. Many of the time series datasets which can be found in the lit-erature consist in N real-valued variables observed T times, while in this work we will focus on N × d × T time series datasets, i.e.…”
Section: Introductionmentioning
confidence: 99%