Automatic generation control (AGC) is an essential functionality for ensuring the stability of power systems, and its secure operation is thus of utmost importance to power system operators. In this paper, we investigate the vulnerability of AGC to false data injection attacks that could remain undetected by traditional detection methods based on the area control error (ACE) and the recently proposed unknown input observer (UIO). We formulate the problem of computing undetectable attacks as a multi-objective partially observable Markov decision process. We propose a flexible reward function that allows to explore the trade-off between attack impact and detectability, and use the proximal policy optimization (PPO) algorithm for learning efficient attack policies. Through extensive simulations of a 3-area power system, we show that the proposed attacks can drive the frequency beyond critical limits, while remaining undetectable by state-of-the-art algorithms employed for fault and attack detection in AGC. Our results also show that detectors trained using supervised and unsupervised machine learning can both significantly outperform existing detectors.