Abstract-We develop a method for aggregating large Markov chains into smaller representative Markov chains, where Markov chains are viewed as weighted directed graphs, and similar nodes (and edges) are aggregated using a deterministic annealing approach. The notions of representativeness of the aggregated graphs and similarity between nodes in graphs are based on a newly proposed metric that quantifies connectivity in the underlying graph. Namely, we develop notions of distance between subchains in Markov chains, and provide easily verifiable conditions that determine if a given Markov chain is nearly decomposable, that is, conditions for which the deterministic annealing approach can be used to identify subchains with high probability. We show that the aggregated Markov chain preserves certain dynamics of the original chain. In particular we provide explicit bounds on the 1 norm of the error between the aggregated stationary distribution of the original Markov chain and the stationary distribution of the aggregated Markov chain, which extends on longstanding foundational results (Simon and Ando, 1961 [5]. In many cases the models comprise large graphs and networks, such as connection structures in social networks [6], metabolic networks [7], and in brain activity maps represented by Markov chains [8]. For tractable analysis and design, succinct models are necessary. Further, for large graph models, the identification of underlying coarse connectivity structure is frequently of primary interest. In this context, clustering and aggregation methods play an important role in terms of both tractability, for example in aggregation of large Markov chain models, and identification of underlying network structures.Recently, clustering based algorithms have been proposed for the purpose of determining reduced dimension graphbased models [9]. As such, an optimal aggregation of nodes in the graph is sought, where the optimality is evaluated based on a distance measure quantifying similarity in connectivity. These algorithms, which have their foundation in the deterministic annealing method proposed by Rose for the vector quantization problem [10], are directly applicable to the Markov chain reduction, or aggregation problem. In this paper, we present analytical results of graph aggregation methods specific to Markov chains; these directly generalize to weighted directed graphs.Aggregation of Markov chains, or more generally, stochastic systems, has been studied for over 50 years. One of the