The thermal profile of multicore systems vary both within an application's execution (intra) and also when the system switches from one application to another (inter). In this paper, we propose an adaptive thermal management approach to improve the lifetime reliability of multicore systems by considering both inter-and intra-application thermal variations. Fundamental to this approach is a reinforcement learning algorithm, which learns the relationship between the mapping of threads to cores, the frequency of a core and its temperature (sampled from on-board thermal sensors). Action is provided by overriding the operating system's mapping decisions using affinity masks and dynamically changing CPU frequency using in-kernel governors. Lifetime improvement is achieved by controlling not only the peak and average temperatures but also thermal cycling, which is an emerging wear-out concern in modern systems. The proposed approach is validated experimentally using an Intel quad-core platform executing a diverse set of multimedia benchmarks. Results demonstrate that the proposed approach minimizes average temperature, peak temperature and thermal cycling, improving the mean-timeto-failure (MTTF) by an average of 2x for intra-application and 3x for inter-application scenarios when compared to existing thermal management techniques. Furthermore, the dynamic and static energy consumption are also reduced by an average 10% and 11% respectively.