Visual biosignals can be used to analyze human behavioral activities and serve as a primary resource for Facial Expression Recognition (FER). FER computational systems face significant challenges, arising from both spatial and temporal effects. Spatial challenges include deformations or occlusions of facial geometry, while temporal challenges involve discontinuities in motion observation due to high variability in poses and dynamic conditions such as rotation and translation. To enhance the analytical precision and validation reliability of FER systems, several datasets have been proposed. However, most of these datasets focus primarily on spatial characteristics, rely on static images, or consist of short videos captured in highly controlled environments. These constraints significantly reduce the applicability of such systems in real-world scenarios. This paper proposes the Facial Biosignals Time–Series Dataset (FBioT), a novel dataset providing temporal descriptors and features extracted from common videos recorded in uncontrolled environments. To automate dataset construction, we propose Visual–Temporal Facial Expression Recognition (VT-FER), a method that stabilizes temporal effects using normalized measurements based on the principles of the Facial Action Coding System (FACS) and generates signature patterns of expression movements for correlation with real-world temporal events. To demonstrate feasibility, we applied the method to create a pilot version of the FBioT dataset. This pilot resulted in approximately 10,000 s of public videos captured under real-world facial motion conditions, from which we extracted 22 direct and virtual metrics representing facial muscle deformations. During this process, we preliminarily labeled and qualified 3046 temporal events representing two emotion classes. As a proof of concept, these emotion classes were used as input for training neural networks, with results summarized in this paper and available in an open-source online repository.