An orthogonal Haar scattering transform is a deep network, computed with a hierarchy of additions, subtractions and absolute values, over pairs of coefficients. It provides a simple mathematical model for unsupervised deep network learning. It implements non-linear contractions, which are optimized for classification, with an unsupervised pair matching algorithm, of polynomial complexity. A structured Haar scattering over graph data computes permutation invariant representations of groups of connected points in the graph. If the graph connectivity is unknown, unsupervised Haar pair learning can provide a consistent estimation of connected dyadic groups of points. Classification results are given on image data bases, defined on regular grids or graphs, with a connectivity which may be known or unknown. deep learning, neural network, scattering transform, Haar wavelet, classification, images, graphs 2000 Math Subject Classification: 68Q32, 68T45, 68Q25, 68T05 arXiv:1509.09187v1 [cs.LG] 30 Sep 2015 on general graphs. In social, sensor or transportation networks, high dimensional data vectors are supported on a graph [28]. In most cases, propagation phenomena require to define translation invariant representations for classification. We show that an appropriate configuration of an orthogonal Haar scattering defines such a translation invariant representation on a graph. It is computed with a product of Haar wavelet transforms on the graph, and is thus closely related to nonorthogonal translation invariant scattering transforms [18].The connectivity of graph data is often unknown. In social or financial networks, we typically have information on individual agents, without knowing the interactions and hence connectivity between agents. Building invariant representations on such graphs requires to estimate the graph connectivity. Such information can be inferred from unlabeled data, by analyzing the joint variability of signals defined on the unknown graph. This paper studies unsupervised learning strategies, which optimize deep network configurations, without class label information.Most deep neural networks are fighting the curse of dimensionality by reducing the variance of the input data with contractive non-linearities [25,2]. The danger of such contractions is to nearly collapse together vectors which belong to different classes. Learning must preserve discriminability despite this variance reduction resulting from contractions. Hierarchical unsupervised architectures have been shown to provide efficient learning strategies [1]. We show that unsupervised learning can optimize an average discriminability by computing sparse features. Sparse unsupervised learning, which is usually NP hard, is reduced to a pair matching problem for Haar scattering. It can thus be computed with a polynomial complexity algorithm. For Haar scattering on graphs, it recovers a hierarchical connectivity of groups of vertices. Under appropriate assumptions, we prove that pairing problems avoid the curse of dimensionality. It can recover an ...