Recently, the scientific community has placed great emphasis on the recognition of human activity, especially in the area of health and care for the elderly. There are already practical applications of activity recognition and unusual conditions that use body sensors such as wrist-worn devices or neck pendants. These relatively simple devices may be prone to errors, might be uncomfortable to wear, might be forgotten or not worn, and are unable to detect more subtle conditions such as incorrect postures. Therefore, other proposed methods are based on the use of images and videos to carry out human activity recognition, even in open spaces and with multiple people. However, the resulting increase in the size and complexity involved when using image data requires the use of the most recent advanced machine learning and deep learning techniques. This paper presents an approach based on deep learning with attention to the recognition of activities from multiple frames. Feature extraction is performed by estimating the pose of the human skeleton, and classification is performed using a neural network based on Bidirectional Encoder Representation of Transformers (BERT). This algorithm was trained with the UP-Fall public dataset, generating more balanced artificial data with a Generative Adversarial Neural network (GAN), and evaluated with real data, outperforming the results of other activity recognition methods using the same dataset.