In this study, we investigated the quality of pedestrian volume data, which holds significance for safety and urban planning purposes. We employed statistical methods, machine learning methods, and deep learning methods to first detect anomalies in pedestrian activity data, and then imputed missing values. We accomplished this by analyzing daily time series data of pedestrian activity at traffic signals in the state of Utah from 2018 to 2022. Our approach utilized vector autoregression (VAR) analysis—a multivariate time series analysis—by incorporating epidemiological-environmental (EpiEnv) variables, which consisted of average temperature, precipitation, air quality index, and COVID-19 pandemic data. We additionally scrutinized the influence of built environment variables when mixed with EpiEnv variables on fluctuations in pedestrian volume data. Our findings suggested that the density-based spatial clustering of applications with noise method provided superior performance in anomaly detection, and that the random forest, long short-term memory, and gated recurrent unit techniques excelled at imputing various categories of missing value patterns within temporally based pedestrian volumes. The VAR analysis results also indicated that the EpiEnv variables significantly affected the process of anomaly detection and imputation across all traffic signals. Our findings could assist urban and transportation planners in identifying the most impactful EpiEnv variables on pedestrian activity, which in turn could aid in the development of suitable strategies to promote walking as a mode of transportation.