Differentiable programming is a fresh programming paradigm which composes parameterized algorithmic components and trains them using automatic differentiation. The concept emerges from deep learning but is not only limited to training neural networks. We present theory and practice of programming tensor network algorithms in a fully differentiable way. By formulating the tensor network algorithm as a computation graph, one can compute higher order derivatives of the program accurately and efficiently using automatic differentiation. We present essential techniques to differentiate through the tensor networks contraction algorithms, including numerical stable differentiation for tensor decompositions and efficient backpropagation through fixed point iterations. As a demonstration, we compute the specific heat of the Ising model directly by taking the second order derivative of the free energy obtained in the tensor renormalization group calculation. Next, we perform gradient based variational optimization of infinite projected entangled pair states for quantum antiferromagnetic Heisenberg model and obtain start-of-the-art variational energy and magnetization with moderate efforts. Differentiable programming removes laborious human efforts in deriving and implementing analytical gradients for tensor network programs, which opens the door to more innovations in tensor network algorithms and applications. * wanglei@iphy.ac.cn † txiang@iphy.ac.cn This fact has limited broad adoption of the gradient based optimization of tensor network states to more complex systems. Alternative approaches, such as computing the gradient using numerical derivative has limited accuracy and efficiency, therefore only applies to cases with few variational parameters [36,37]. While deriving the gradient manually using the chain rule is only manageable for purposely designed simple tensor network structures [38].Differentiable programming provides an elegant and efficient solution to these problems by composing the whole tensor network program in a fully differentiable manner. In this paper, we present essential automatic differentiation techniques which compute (higher order) derivatives of tensor network programs efficiently to numeric precision. This progress opens the door to gradient-based (and even Hessian-based) optimization of tensor network states in a general setting. Moreover, computing (higher order) gradients of the output of a tensor network algorithm offer a straightforward approach to compute physical quantities such as the specific heat and magnetic susceptibilities. The differentiable programming approach is agnostic to the detailed lattice geometry, Hamiltonian, and tensor network contraction schemes. Therefore, the approach is general enough to support a wide class of tensor network applications.We will focus on applications which involve two dimensional infinite tensor networks where the differentiable programming techniques offer significant benefits compared to the conventional approaches. We show that after solving m...