Modern embedded systems execute applications, which interacts with the operating system and hardware differently depending on type of workload. These cross-layer interactions result in wide variations of chipwide thermal profile. In this paper, a reinforcement learning-based run-time manager is proposed that guarantees application-specific performance requirements and controls the POSIX thread allocation and voltage/frequency scaling for energy-efficient thermal management. This controls three thermal aspectspeak temperature, average temperature and thermal cycling. Contrary to existing learning-based run-time approaches that optimize energy and temperature individually, the proposed run-time manager is the first approach to combine the two objectives, simultaneously addressing all three thermal aspects. However, determining thread allocation and core frequencies to optimize energy and temperature is an NP-hard problem. This leads to an exponential growth in the learning table (significant memory overhead) and a corresponding increase in the exploration time to learn the most appropriate thread allocation and core frequency for a particular application workload. To confine the learning space and to minimize the learning cost, the proposed run-time manager is implemented in a two-stage hierarchy: a heuristic-based thread allocation at a longer time interval to improve thermal cycling, followed by a learning-based hardware frequency selection at a much finer interval to improve average temperature, peak temperature and energy consumption. This enables finer control on temperature in an energy-efficient manner, while simultaneously addressing scalability, which is a crucial aspect for multi-/many-core embedded systems. The proposed hierarchical run-time manager is implemented for Linux running on nVidia's Tegra SoC, featuring four ARM Cortex-A15 cores. Experiments conducted with a range of embedded and cpu intensive applications demonstrate that the proposed run-time manager not only reduces energy consumption by an average 15% with respect to Linux, but also improves all the thermal aspects -average temperature by 14 • C, peak temperature by 16 • C and thermal cycling by 54%.