Virtualization technology is a key component for data center management which allows for multiple users and applications to share a single, physical machine. Modern virtual machine monitors utilize both software and hardware-assisted paging for memory virtualization, however neither paging mode is always preferable. Previous studies have shown that dynamic selection, which at runtime selects paging modes according to relevant performance metrics, can be effective in tailoring memory virtualization to program workload. However, these approaches require low-level manual analysis, or depend on prior knowledge of workload characteristics and phasing.We map the problem of dynamic paging mode selection to the contextual bandit, a model for sequential decision making in environments with limited feedback. Utilizing random profiling, which executes a workload while regularly selecting paging modes at random, we construct a paging mode selection policy that dynamically optimizes workload performance given page fault and translation lookaside buffer miss counts. Our approach yields an effective policy, DSP-OFFSET, for the dynamic paging mode selection problem. When trained and evaluated on subsets of the SPEC CPU2006 benchmark suite, DSP-OFFSET achieves speedups up to 44% compared to static paging mode selections, which is equivalent to the performance of the state-of-the-art ASP-SVM model. In addition, DSP-OFFSET requires at most a tenth of the profiling time of ASP-SVM (2.5 hours compared to over 24 hours) to achieve equivalent performance.