Support vector machines (SVMs) are a supervised classifier successfully applied in a plethora of real-life applications. However, they suffer from the important shortcomings of their high time and memory training complexities, which depend on the training set size. This issue is especially challenging nowadays, since the amount of data generated every second becomes tremendously large in many domains. This review provides an extensive survey on existing methods for selecting SVM training data from large datasets. We divide the state-of-the-art techniques into several categories. They help understand the underlying ideas behind these algorithms, which may be useful in designing new methods to deal with this important problem. The review is complemented with the discussion on the future research pathways which can make SVMs easier to exploit in practice.