Sparse matrix-vector multiplication (SpMV) is an essential linear algebra operation that dominates the computing cost in many scientific applications. Due to providing massive parallelism and high memory bandwidth, GPUs are commonly used to accelerate SpMV kernels. Prior studies mainly focused on reducing the latency consumption of SpMV kernels on GPU by tackling the irregular nature of sparse matrices. However, limited attempts have been made to improve the energy efficiency (MFLOPS/Watt) of SpMV kernels, resulting in GPUs being excluded from the range of low-power scientific applications. Furthermore, prior work has primarily focused on optimizing the sparse matrix storage format; the literature ignores evaluating the impact of tweaking compilation parameters (e.g., \texttt{maxrregcount}, thread block size, and such). Lastly, little attention has been paid to preparing a comprehensive training dataset of running SpMV kernels and tweaking the hyperparameters of machine learning-based storage format predictors. To address these limitations, we present a novel learning-based framework, dubbed Auto-SpMV, that enables energy-efficient and low-latency SpMV kernels on GPUs. To achieve the best run time performance, Auto-SpMV proposes two optimization modes: \textit{compile-time} and \textit{run-time}. In the \textit{compile-time} mode, Auto-SpMV tweaks the compilation parameters according to the optimization objective, either lower latency or energy consumption. On the other hand, in the \textit{run-time} mode, Auto-SpMV selects the best sparse format for the input matrix based on an optimized machine learning model. To achieve the best classification results, 1) we collect the largest ever dataset by selecting different sparse matrices running with more than 15K configurations, and 2) we boost classification models by automatically tweaking the learning hyperparameters. Experimental results reveal that Auto-SpMV optimizes latency, energy consumption, average power, and energy efficiency in the \textit{compile-time} mode by up to 51.9%, 52%, 33.2%, and 53%, respectively, compared to the default setting. Auto-SpMV optimizes average power and energy efficiency in the \textit{run-time} mode by up to 34.6% and 99.7%, respectively, compared to the default setting. Finally, our experimental results show that Auto-SpMV generalizes to unseen matrices and hardware devices.