This paper considers optimization problems over networks where agents have individual objectives to meet, or individual parameter vectors to estimate, subject to subspace constraints that require the objectives across the network to lie in low-dimensional subspaces. This constrained formulation includes consensus optimization as a special case, and allows for more general task relatedness models such as smoothness. While such formulations can be solved via projected gradient descent, the resulting algorithm is not distributed. Starting from the centralized solution, we propose an iterative and distributed implementation of the projection step, which runs in parallel with the stochastic gradient descent update. We establish in this Part I of the work that, for small step-sizes µ, the proposed distributed adaptive strategy leads to small estimation errors on the order of µ. We examine in the accompanying Part II [2] the steady-state performance. The results will reveal explicitly the influence of the gradient noise, data characteristics, and subspace constraints, on the network performance. The results will also show that in the small step-size regime, the iterates generated by the distributed algorithm achieve the centralized steady-state performance.where J k (·) is the cost function at agent k, N is the number of agents in the network, and w ∈ C L is the global parameter vector, which all agents need to agree upon-see Fig. 1 (middle). Each agent seeks to estimate w o through local computations and communications among neighboring agents without the need to know any of the costs besides their own. Among many useful strategies that have been proposed in the literature [3]-[10], diffusion strategies [3]-[5] are particularly attractive since they are scalable, robust, and enable continuous learning and adaptation in response to drifts in the location of the minimizer. However, there exist many network applications that require more complex models and flexible algorithms than consensus implementations since their agents may involve the need to estimate and track multiple distinct, though related, objectives. For instance, in distributed power system state estimation, the local state vectors to be estimated at neighboring control centers may overlap partially since the areas in a power system are interconnected [11], [12]. Likewise, in monitoring applications, agents need to track the movement of multiple correlated targets and to exploit the correlation profile in the data for enhanced accuracy [13], [14]. Problems of this kind, where nodes need to infer multiple, though related, parameter vectors, are referred to as multitask problems. Existing strategies to address multitask problems generally exploit prior knowledge on how the tasks across the network relate to each other [11]-[29]. For example, one way to model relationships among tasks is to formulate convex optimization problems with appropriate co-regularizers between neighboring agents [13], [16]-[19]. Graph spectral regularization can also be used in order t...