“…Other approaches to feature extraction from images, such as semantic segmentation, could also have been employed to provide model inputs for pollution estimation, as used in a North American study ( Qi and Hankey, 2021 ). We used objects in our second approach since the data needed to train a model, namely objects, were less resource intensive to generate within our bespoke dataset with bounding boxes ( Nathvani et al, 2022 ) as compared with pixel-level annotation, which may also be explored in future work. The object counts were obtained from training an object detection CNN, described in detail in previous work ( Nathvani et al, 2022 ), for object categories relevant to the local environmental context: persons, market vendor (a person carrying a container over their heads which is a common scene in African markets), car, taxi, pick-up truck, bus, lorry, van, tro-tro (mini buses used for public transportation), motorcycle, bicycle, market stall, loudspeaker, umbrella (commonly used to protect market and roadside vendors from the sun and rain), cookstove, cooking pot/bowl (which frequently contain wares for sale in the marketplace), food, trash, (piece of) debris, and animal.…”