Chronic stress leads to poor well-being, and it has effects on life quality and health. Society may have significant benefits from an automatic daily life stress detection system using unobtrusive wearable devices using physiological signals. However, the performance of these systems is not sufficiently accurate when they are used in unrestricted daily life compared to the systems tested in controlled real-life and laboratory conditions. To test our stress level detection system that preprocesses noisy physiological signals, extracts features, and applies machine learning classification techniques, we used a laboratory experiment and ecological momentary assessment based data collection with smartwatches in daily life. We investigated the effect of different labeling techniques and different training and test environments. In the laboratory environments, we had more controlled situations, and we could validate the perceived stress from self-reports. When machine learning models were trained in the laboratory instead of training them with the data coming from daily life, the accuracy of the system when tested in daily life improved significantly. The subjectivity effect coming from the self-reports in daily life could be eliminated. Our system obtained higher stress level detection accuracy results compared to most of the previous daily life studies.