Olfactometry is globally acknowledged as a technique to determine odor concentrations, which are used to characterize odors for regulatory purposes, e.g., to protect the general public against harmful effects of air pollution. Although the determination procedure for odor concentrations is standardized in some countries, continued research is required to understand uncertainties of odor monitoring and prediction. In this respect, the present paper strives to provide answers of paramount importance in olfactometry. To do so, a wealth of measurement data originating from six large-scale olfactometric stack emission proficiency tests conducted from 2015 to 2017 was retrospectively analyzed. The tests were hosted at a unique emission simulation apparatus—a replica of an industry chimney with 23 m in height—so that for the first time, conventional proficiency testing (no sampling) with real measurements (no reference concentrations) was combined. Surprisingly, highly variable recovery rates of the odorants were observed—no matter, which of the very different odorants was analyzed. Extended measurement uncertainties with roughly 30–300% up to 20–520% around a single olfactometric measurement value were calculated, which are way beyond the 95% confidence interval given by the widely used standard EN 13725 (45–220%) for assessment and control of odor emissions. Also, no evidence has been found that mixtures of odorants could be determined more precisely than single-component odorants. This is an important argument in the intensely discussed topic, whether n-butanol as current reference substance in olfactometry should be replaced by multi-component odorants. However, based on our data, resorting to an alternative reference substance will not solve the inherent problem of high uncertainty levels in dynamic olfactometry. Finally, robust statistics allowed to calculate reliable odor thresholds, which are an important prerequisite to convert mass concentrations to odor concentrations and vice versa.