Datasets are key to developing new machine learning-based applications but are very costly to prepare, which hinders research and development in the field. We propose an edge-to-cloud end-to-end system architecture optimized for sport activity recognition dataset collection and application deployment. Tests in authentic contexts of use in four different sports have revealed the system’s ability to effectively collect machine learning-usable data, with an energy consumption compatible with the timeframe of most of the sport types. The proposed architecture relies on a key feature of the Measurify internet of things framework for the management of measurement data (i.e., .csv dataset management) and supports a workflow designed for efficient data labeling of signal timeseries. The architecture is independent of any specific sport, and a new dataset generation application can be set up in a few days, even by novice developers. With a view to concretely supporting the R&D community, our work is released open-source.