We consider a class of popular distributed non-convex optimization problems, in which agents connected by a network G collectively optimize a sum of smooth (possibly non-convex) local objective functions. We address the following question: if the agents can only access the gradients of local functions, what are the fastest rates that any distributed algorithms can achieve, and how to achieve those rates.First, we show that there exist difficult problem instances, such that it takes a class of distributed first-order methods at least O(1/ ξ(G) ×L/ ) communication rounds to achieve certain -solution [where ξ(G) denotes the spectral gap of the graph Laplacian matrix, andL is some Lipschitz constant]. Second, we propose (near) optimal methods whose rates match the developed lower rate bound (up to a ploylog factor). The key in the algorithm design is to properly embed the classical polynomial filtering techniques into modern first-order algorithms. To the best of our knowledge, this is the first time that lower rate bounds and optimal methods have been developed for distributed non-convex optimization problems.A common way to reformulate problem (1) in the distributed setting is given below. Introduce M local variables x 1 , · · · , x M ∈ R S and a concatenation of M variables x := [x 1 ; · · · ; x M ] ∈ R SM ×1 , and suppose the graph {V, E} is connected, then the following formulation is equivalent to the global consensus problemThe main benefit of the above formulation is that the objective function is now separable, and the linear constraint encodes the network connectivity pattern. 1.2 Distributed non-convex optimization Distributed non-convex optimization has gained considerable attention recently. For example, it finds applications in training neural networks [1], clustering [2], and dictionary learning [3], just to name a few. The problem (1) and (2) have been studied extensively in the literature when f i 's are all convex; see for example [4][5][6]. Primal based methods such as distributed subgradient (DSG) method [4], the EXTRA method [6], as well as primal-dual based methods such as distributed augmented Lagrangian method [7], Alternating Direction Method of Multipliers (ADMM) [8,9] have been proposed.On the contrary, only recently there have been works addressing the more challenging problems without assuming convexity of f i ; see [1,3,[10][11][12][13][14][15][16][17][18][19][20][21][22][23]. The convergence behavior of the distributed consensus problem (1) has been studied in [3,10,11]. Reference [12] develops a non-convex ADMM based methods for solving the distributed consensus problem (1). However the network considered therein is a star network in which the local nodes are all connected to a central controller. References [14,15] propose a primal-dual based method for unconstrained problem over a connected network, and derives a global convergence rate for this setting. In [13,17,18], the authors utilize certain gradient tracking idea to solve a constrained nonsmooth distributed problem over possibl...