An (ǫ, φ)-expander decomposition of a graph G = (V, E) is a clustering of the vertices V = V 1 ∪ · · · ∪ V x such that (1) each cluster V i induces subgraph with conductance at least φ, and (2) the number of inter-cluster edges is at most ǫ|E|. In this paper, we give an improved distributed expander decomposition, and obtain a nearly optimal distributed triangle enumeration algorithm in the CONGEST model.Specifically, we construct an (ǫ, φ)-expander decomposition with φ = (ǫ/ log n) 2 O(k) in O(n 2/k · poly(1/φ, log n)) rounds for any ǫ ∈ (0, 1) and positive integer k. For example, a (1/n o(1) , 1/n o(1) )expander decomposition only requires O(n o(1) ) rounds to compute, which is optimal up to subpolynomial factors, and a (0.01, 1/poly log n)-expander decomposition can be computed in O(n γ ) rounds, for any arbitrarily small constant γ > 0. Previously, the algorithm by Chang, Pettie, and Zhang can construct a (1/6, 1/poly log n)-expander decomposition usingÕ(n 1−δ ) rounds for any δ > 0, with a caveat that the algorithm is allowed to throw away a set of edges into an extra part which form a subgraph with arboricity at most n δ . Our algorithm does not have this caveat.By slightly modifying the distributed algorithm for routing on expanders by Ghaffari, Kuhn and Su [PODC'17], we obtain a triangle enumeration algorithm usingÕ(n 1/3 ) rounds. This matches the lower bound by Izumi and Le Gall [PODC'17] and Pandurangan, Robinson and Scquizzato [SPAA'18] ofΩ(n 1/3 ) which holds even in the CONGESTED-CLIQUE model. To the best of our knowledge, this provides the first non-trivial example for a distributed problem that has essentially the same complexity (up to a polylogarithmic factor) in both CONGEST and CONGESTED-CLIQUE.The key technique in our proof is the first distributed approximation algorithm for finding a low conductance cut that is as balanced as possible. Previous distributed sparse cut algorithms do not have this nearly most balanced guarantee. Kuhn and Molla [25] previously claimed that their approximate sparse cut algorithm also has the nearly most balanced guarantee, but this claim turns out to be incorrect [7, Footnote 3].In this paper, we consider the task of finding an expander decomposition of a distributed network in the CONGEST model of distributed computing. Roughly speaking, an expander decomposition of a graph G = (V, E) is a clustering of the vertices V = V 1 ∪ · · · ∪ V x such that (1) each component V i induces a high conductance subgraph, and (2) the number of inter-component edges is small. This natural bicriteria optimization problem of finding a good expander decomposition was introduced by Kannan Vempala and Vetta [22], and was further studied in many other subsequent works [42,32,34,3,44,31,37]. 2 The expander decomposition has a wide range of applications, and it has been applied to solving linear systems [43], unique games [2,44,36], minimum cut [23], and dynamic algorithms [30].Recently, Chang, Pettie, and Zhang [7] applied this technique to the field of distributed computing, and t...