Institute of Mathematical Statistics Collections 2008
DOI: 10.1214/193940307000000482
|View full text |Cite
|
Sign up to set email alerts
|

Projection pursuit for discrete data

Abstract: This paper develops projection pursuit for discrete data using the discrete Radon transform. Discrete projection pursuit is presented as an exploratory method for finding informative low dimensional views of data such as binary vectors, rankings, phylogenetic trees or graphs. We show that for most data sets, most projections are close to uniform. Thus, informative summaries are ones deviating from uniformity. Syllabic data from several of Plato's great works is used to illustrate the methods. Along with some b… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
7
0

Year Published

2008
2008
2022
2022

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 7 publications
(7 citation statements)
references
References 52 publications
0
7
0
Order By: Relevance
“…Although much effort in the literature on PP has been put into theoretical aspects and computational approaches for low‐dimensional data, various open problems and challenging issues remain in high dimensions. These include (a) conjectures in Bickel et al , (2018) and cases not yet covered for asymptotic studies, especially, observed data of correlated and/or non‐Gaussian variables X1,,Xp, with applications to financial time series satisfying stylized facts (De Luca & Loperfido, 2004; De Luca et al , 2006); (b) handling practical data of discrete variables in discrete exploratory PP (Klinke, 1995), and analysis of discrete data arising from syllable patterns (Diaconis & Salzman, 2008), symbolic data or data with special structure in numerous disciplines; (c) nonlinear projections (Blanchard et al , 2006; Guo et al , 2020). There is more to be explored along the lines, driven by the need of new adaptations and methodological results, as well as real applications.…”
Section: Discussionmentioning
confidence: 99%
“…Although much effort in the literature on PP has been put into theoretical aspects and computational approaches for low‐dimensional data, various open problems and challenging issues remain in high dimensions. These include (a) conjectures in Bickel et al , (2018) and cases not yet covered for asymptotic studies, especially, observed data of correlated and/or non‐Gaussian variables X1,,Xp, with applications to financial time series satisfying stylized facts (De Luca & Loperfido, 2004; De Luca et al , 2006); (b) handling practical data of discrete variables in discrete exploratory PP (Klinke, 1995), and analysis of discrete data arising from syllable patterns (Diaconis & Salzman, 2008), symbolic data or data with special structure in numerous disciplines; (c) nonlinear projections (Blanchard et al , 2006; Guo et al , 2020). There is more to be explored along the lines, driven by the need of new adaptations and methodological results, as well as real applications.…”
Section: Discussionmentioning
confidence: 99%
“…Several possibilities for choosing this distance function exist, such as the footrule distance, the Spearman distance, and the Kendall distance. 31 In this article, we choose to use the footrule distance, defined as , the equivalent of an measure between rankings. The choice of this distance is motivated by its greater computational efficiency compared to its competitors, such as the Kendall distance, which is more computationally intensive.…”
Section: Lower‐dimensional Bayesian Mallows Modelmentioning
confidence: 99%
“…In statistics the most commonplace use of Correspondence Analysis is in ordination or seriation, that is , the search for a hidden gradient in contingency tables. As an example we take data analyzed by Cox and Brandwood [4] and Diaconis [6], who wanted to seriate Plato's works using the proportion of sentence endings in a given book with a given stress pattern. The seven books studied here are Republic, Laws, Critias, Philebus, Sophist, Timoeus.…”
Section: Correspondence Analysismentioning
confidence: 99%
“…As an example we take data analyzed by Cox and Brandwood [4] and Diaconis [6], who wanted to seriate Plato's works using the proportion of sentence endings in a given book with a given stress pattern. The seven books studied here are Republic, Laws, Critias, Philebus, Sophist, Timoeus.…”
Section: Correspondence Analysismentioning
confidence: 99%