We consider a generalization of the recently proposed gradient coding framework where a large dataset is divided across n workers and each worker transmits to a master node one or more linear combinations of the gradients over the data subsets assigned to it. Unlike the conventional framework which requires the master node to recover the sum of the gradients over all the data subsets in the presence of s straggler workers, we relax the goal of the master node to computing the sum of at least some α fraction of the gradients. The broad goal of our work is to study the optimal computation and communication load per worker for this approximate gradient coding framework. We begin by deriving a lower bound on the computation load of any feasible scheme and also propose a strategy which achieves this lower bound, albeit at the cost of high communication load and a number of data partitions which can be polynomial in the number of workers n. We then restrict attention to schemes which utilize a number of data partitions equal to n and propose schemes based on cyclic assignment which have a lower communication load. When each worker transmits a single linear combination, we also prove lower bounds on the computation load of any scheme using n data partitions.