Ultra-wideband (UWB) localization has emerged in GPS-denied environments as a crucial facilitator for diverse industries, including logistics, healthcare applications, and societal domains. Despite notable progress, UWB algorithms from the scientific literature are often evaluated in isolation in very specific conditions and hence difficult to compare. This paper introduces a novel benchmark platform designed to assess the performance of 11 prominent UWB accuracy-enhancing algorithms, both independently and in combination. A key feature of the platform is its incorporation of multiple diverse evaluation metrics, including mean average error, latency, and spatial error. We showcase the significance of adopting alternative metrics such as spatial error, which often, depending on the use case, offers greater relevance than the prevalently used mean average error. Furthermore, we show that ''more is better'' does not hold true when combining multiple accuracy-improving algorithms for UWB systems: combining multiple accuracyimproving algorithms reveals instances of diminishing returns and can even result in overall performance decline. Additionally, we caution against blind reliance on accuracy outcomes reported in the scientific literature when designing UWB systems for use cases that are different in terms of requirements or environment. Finally, we also provide algorithmic recommendations for distinct surroundings, exemplary applications, and usage contexts, assisting in driving efficient design in future UWB research and adoption.