Pollutant gases, such as CO, NO2, O3, and SO2 affect human health, and low-cost sensors are an important complement to regulatory-grade instruments in pollutant monitoring. Previous studies focused on one or several species, while comprehensive assessments of multiple sensors remain limited. We conducted a 12-month field evaluation of four Alphasense sensors in Beijing and used single linear regression (SLR), multiple linear regression (MLR), random forest regressor (RFR), and neural network (long short-term memory (LSTM)) methods to calibrate and validate the measurements with nearby reference measurements from national monitoring stations. For performances, CO > O3 > NO2 > SO2 for the coefficient of determination (R2) and root mean square error (RMSE). The MLR did not increase the R2 after considering the temperature and relative humidity influences compared with the SLR (with R2 remaining at approximately 0.6 for O3 and 0.4 for NO2). However, the RFR and LSTM models significantly increased the O3, NO2, and SO2 performances, with the R2 increasing from 0.3–0.5 to >0.7 for O3 and NO2, and the RMSE decreasing from 20.4 to 13.2 ppb for NO2. For the SLR, there were relatively larger biases, while the LSTMs maintained a close mean relative bias of approximately zero (e.g., <5% for O3 and NO2), indicating that these sensors combined with the LSTMs are suitable for hot spot detection. We highlight that the performance of LSTM is better than that of random forest and linear methods. This study assessed four electrochemical air quality sensors and different calibration models, and the methodology and results can benefit assessments of other low-cost sensors.