Compressive Sensing (CS) allows a signal to be sparsely measured first and accurately recovered later in software [1]. In scanning transmission electron microscopy (STEM), it is possible to compress an image spatially by reducing the number of measured pixels, which decreases electron dose and increases sensing speed [2,3,4]. The two requirements for CS to work are: (1) sparsity of basis coefficients and (2) incoherence of the sensing system and the representation system. However, when pixels are missing from the image, it is difficult to have an incoherent sensing matrix. Nevertheless, dictionary learning techniques such as Beta-Process Factor Analysis (BPFA) [5] are able to simultaneously discover a basis and the sparse coefficients in the case of missing pixels.On top of CS, we would like to apply active learning [6,7] to further reduce the proportion of pixels being measured, while maintaining image reconstruction quality. Suppose we initially sample 10% of random pixels. We wish to select the next 1% of pixels that are most useful in recovering the image. Now, we have 11% of pixels, and we want to decide the next 1% of "most informative" pixels. Active learning methods are online and sequential in nature. Our goal is to adaptively discover the best sensing mask during acquisition using feedback about the structures in the image. In the end, we hope to recover a high quality reconstruction with a dose reduction relative to the non-adaptive (random) sensing scheme. In doing this, we try three metrics applied to the partial reconstructions for selecting the new set of pixels: (1) variance, (2) Kullback-Leibler (K-L) divergence using a Radial Basis Function (RBF) kernel, and (3) entropy.Figs. 1 and 2 display the comparison of Peak Signal-to-Noise (PSNR) using these three different active learning methods at different percentages of sampled pixels. At 20% level, all the three active learning methods underperform the original CS without active learning. However, they all beat the original CS as more of the "most informative" pixels are sampled. One can also argue that CS equipped with active learning requires less sampled pixels to achieve the same value of PSNR than CS with pixels randomly sampled, since all the three PSNR curves with active learning grow at a faster pace than that without active learning when the pixel fraction is larger than 20%. For this particular STEM image, by observing the reconstructed images and the sensing masks, we find that while the method based on RBF kernel acquires samples more uniformly, the one on entropy samples more areas of significant change, thus less uniformly. The K-L divergence method performs the best in terms of reconstruction error (PSNR) for this example [8].