Learning Ising or Potts models from data has become an important topic in statistical physics and computational biology, with applications to predictions of structural contacts in proteins and other areas of biological data analysis. The corresponding inference problems are challenging since the normalization constant (partition function) of the Ising/Potts distributions cannot be computed efficiently on large instances. Different ways to address this issue have hence given size to a substantial methodological literature. In this paper we investigate how these methods could be used on much larger datasets than studied previously. We focus on a central aspect, that in practice these inference problems are almost always severely under-sampled, and the operational result is almost always a small set of leading (largest) predictions. We therefore explore an approach where the data is pre-filtered based on empirical correlations, which can be computed directly even for very large problems. Inference is only used on the much smaller instance in a subsequent step of the analysis. We show that in several relevant model classes such a combined approach gives results of almost the same quality as the computationally much more demanding inference on the whole dataset. We also show that results on whole-genome epistatic couplings that were obtained in a recent computation-intensive study can be retrieved by the new approach. The method of this paper hence opens up the possibility to learn parameters describing pair-wise dependencies in whole genomes in a computationally feasible and expedient manner.ily related to spatial contacts [13,18,33].The actual use of DCA is comprised of two parts: (1) to run the inference procedure of choice on an MSA which consists of N samples, each of dimension L; and (2) to keep only a subset of largest predictions for further assessment and use. We will refer to the columns of the MSA as loci, the variables in each column as alleles, and the rows as samples. The MSA hence consists of N samples with each sample being a list of L alleles. Alternatively we will refer to L as the data dimension and N as the sample size. The total number of parameters in the inferred Ising or Potts models is proportional to L 2 and will here be denoted P. The number of retained predictions will be denoted K.Exact frequentist or Bayesian-point estimate methods, i.e. maximum likelihood (ML) or maximum a posteriori (MAP), are not computationally feasible for the data dimensions of current practical interest, and many approximate inference methods have therefore been developed [26]. Additionally statistical identifiability demands that K cannot be larger than N ; one cannot learn more features from the data than there are examples. This has indeed mostly been the case in the examples above. However, in the intermediate step P parameters are inferred, and in many (if not all) cases of interest P ∼ L 2 has been much larger than N . On top of inference being approximate it must therefore also be regularized.The setting whe...