Given an undirected graph G = (N , E) of agents N = {1, . . . , N } connected with edges in E, we study how to compute an optimal decision on which there is consensus among agents and that minimizes the sum of agent-specific private convex composite functions {Φ i } i∈N while respecting privacy requirements, where Φ i ξ i + f i belongs to agent-i. Assuming only agents connected by an edge can communicate, we propose a distributed proximal gradient method DPGA for consensus optimization over both unweighted and weighted static (undirected) communication networks. In one iteration, each agent-i computes the prox map of ξ i and gradient of f i , and this is followed by local communication with neighboring agents. We also study its stochastic gradient variant, SDPGA, which can only access to noisy estimates of ∇f i at each agent-i. This computational model abstracts a number of applications in distributed sensing, machine learning and statistical inference. We show ergodic convergence in both sub-optimality error and consensus violation for DPGA and SDPGA with rates O(1/t) and O(1/ √ t),This computational setting, i.e., decentralized consensus optimization, appears as a generic model for various applications in signal processing, e.g., [2]-[6], machine learning, e.g., [7]- [9] and statistical inference, e.g., [10], [11]. Clearly, (3) can also be solved in a "centralized" fashion by communicating all the private functions Φ i to a central node, and solving the overall problem at this node. However, such an approach can be very expensive both from communication January 3, 2017 DRAFT solutionx = [x i ] i∈N such that its consensus violation max{ x i −x j 2 : (i, j) ∈ E} ≤ within O(1) iterations; and its suboptimality is bounded from above as i∈N Φ i (x i ) − F * ≤ within O(1/ 2 ) iterations; however, since the step size is constant, neither suboptimality nor consensus errors are guaranteed to decrease further. Although these algorithms are for more general problems and assume mere convexity on each Φ i , this generality comes at the cost of O(1/ 2 ) complexity bounds, and they also tend to be very slow in practice. On the other extreme, under much stronger conditions: assuming each Φ i is smooth and has bounded gradients, Jakovetic et al. [19] developed a fast distributed gradient method D-NC with O(log(1/ )/ √ ) convergence rate in communication rounds. For the quadratic loss, which is one of the most commonly used loss functions, bounded gradient assumption does not hold. In terms of distributed applicability, D-NC requires all the nodes N to agree on a doubly stochastic weight matrix W ∈ R |N |×|N | ; it also assumes that the second largest eigenvalue of W ∈ R |N |×|N | is known globally among all the nodes -this is not attainable for very large scale fully distributed networks. D-NC is a two-loop algorithm: for each outer loop k, each node computes their gradients once, and it is followed by O(log(k)) communication rounds. In the rest, we briefly discuss those algorithms that balance the trade-off between the iterati...