This work studies the question of Representation Learning in RL: how can we learn a compact low-dimensional representation such that on top of the representation we can perform RL procedures such as exploration and exploitation, in a sample efficient manner. We focus on the low-rank Markov Decision Processes (MDPs) where the transition dynamics correspond to a low-rank transition matrix. Unlike prior works that assume the representation is known (e.g., linear MDPs), here we need to learn the representation for the low-rank MDP. We study both the online RL and offline RL settings. For the online setting, operating with the same computational oracles used in FLAMBE(Agarwal et al., 2020b)--the state-of-art algorithm for learning representations in low-rank MDPs, we propose an algorithm REP-UCB-Upper Confidence Bound driven REPresentation learning for RL, which significantly improves the sample complexity from O(A 9 d 7 /( 10 (1with d being the rank of the transition matrix (or dimension of the ground truth representation), A being the number of actions, and γ being the discount factor. Notably, REP-UCB is simpler than FLAMBE, as it directly balances the interplay between representation learning, exploration, and exploitation, while FLAMBE is an explorethen-commit style approach and has to perform reward-free exploration step-by-step forward in time. For the offline RL setting, we develop an algorithm that leverages pessimism to learn under a partial coverage condition: our algorithm is able to compete against any policy as long as it is covered by the offline data distribution.