One of the biggest challenges in modern societies is the improvement of healthy aging and the support to older persons in their daily activities. In particular, given its social and economic impact, the automatic detection of falls has attracted considerable attention in the computer vision and pattern recognition communities. Although the approaches based on wearable sensors have provided high detection rates, some of the potential users are reluctant to wear them and thus their use is not yet normalized. As a consequence, alternative approaches such as vision-based methods have emerged. We firmly believe that the irruption of the Smart Environments and the Internet of Things paradigms, together with the increasing number of cameras in our daily environment, forms an optimal context for vision-based systems. Consequently, here we propose a vision-based solution using Convolutional Neural Networks to decide if a sequence of frames contains a person falling. To model the video motion and make the system scenario independent, we use optical flow images as input to the networks followed by a novel three-step training phase. Furthermore, our method is evaluated in three public datasets achieving the state-of-the-art results in all three of them.