The convergence speed of stochastic gradient descent (SGD) can be improved by actively selecting mini-batches. We explore sampling schemes where similar data points are less likely to be selected in the same mini-batch. In particular, we prove that such repulsive sampling schemes lower the variance of the gradient estimator. This generalizes recent work on using Determinantal Point Processes (DPPs) for mini-batch diversification to the broader class of repulsive point processes. We first show that the phenomenon of variance reduction by diversified sampling generalizes in particular to non-stationary point processes. We then show that other point processes may be computationally much more efficient than DPPs. In particular, we propose and investigate Poisson Disk samplingfrequently encountered in the computer graphics community-for this task. We show empirically that our approach improves over standard SGD both in terms of convergence speed as well as final model performance.Diversified Mini-Batch Sampling Prior research [35,8,34,32] has shown that sampling diversified mini-batches can reduce the variance of stochastic gradients. It is also the key to overcome the problem of the saturation of the convergence speed in the distributed setting [32]. Diversifying the data is also computationally efficient for large-scale learning problems [35,8,34]. Zhang et. al. [34] recently proposed to use DPPs for diversified mini-batch sampling and drew the connection to stratified sampling [35] and clustering-based preprocessing for SGD [8]. A disadvantage of the DPP-approach is the computational overhead. Besides presenting a more general theory, we provide more efficient point processes in this work.Active Bias Different types of active bias in subsampling the data can improve the convergence and lead to improved final performance in model training [1,9,4,5,31]. As summarized in [4], self-paced learning biases towards easy examples in the early learning phase. Active-learning, on the other hand, puts more emphasis on uncertain cases, and hard example mining focuses on the difficult-to-classify examples.