Recent innovations in ROI camera systems have opened up the avenue for exploring energy optimization techniques like adaptive subsampling. Generally speaking, image frame capture and read-out demand high power consumption. ROI camera systems make it possible to exploit the inverse relation between energy consumption and spatiotemporal pixel readout to optimize the power efficiency of the image sensor. To this end, we develop a reinforcement learning (RL) based adaptive subsampling framework which predicts ROI trajectories and reconfigures the image sensor on-the-fly for improved power efficiency of the image sensing pipeline. In our proposed framework, a pre-trained convolutional neural network (CNN) extracts rich visual features from incoming frames and a long short-term memory (LSTM) network predicts the region of interest (ROI) and subsampling pattern for the consecutive image frame. Based on the application and the difficulty level of object motion trajectory, the user can utilize either the predicted ROI or coarse subsampling pattern to switch off the pixels for sequential frame capture, thus saving energy. We have validated our proposed method by adapting existing trackers for the adaptive subsampling framework and evaluating them as competing baselines. As a proof-of-concept, our method outperforms the baselines and achieves an average AUC score of 0.5090 on three benchmarking datasets. We also characterize the energy-accuracy tradeoff of our method vs. the baselines and show that our approach is best suited for applications that demand both high visual tracking precision and low power consumption. On the TB100 dataset, our method achieves the highest AUC score of 0.5113 out of all the competing algorithms and requires a medium-level power consumption of approximately 4 W as per a generic energy model and an energy consumption of 1.9 mJ as per a mobile system energy model. Although other baselines are shown to have better performance in terms of power consumption, they are ill-suited for applications that require considerable tracking precision, making our method the ideal candidate in terms of power-accuracy tradeoff.