Smart agricultural greenhouses provide well-controlled conditions for crop cultivation but require accurate prediction of environmental factors to ensure ideal crop growth and management efficiency. Due to the limitations of existing predictors in dealing with massive, nonlinear, and dynamic temporal data, this study proposes a bidirectional self-attentive encoder–decoder framework (BEDA) to construct the long-time predictor for multiple environmental factors with high nonlinearity and noise in a smart greenhouse. Firstly, the original data are denoised by wavelet threshold filter and pretreatment operations. Secondly, the bidirectional long short-term-memory is selected as the fundamental unit to extract time-serial features. Then, the multi-head self-attention mechanism is incorporated into the encoder–decoder framework to improve the prediction performance. Experimental investigations are conducted in a practical greenhouse to accurately predict indoor environmental factors (temperature, humidity, and CO2) from noisy IoT-based sensors. The best model for all datasets was the proposed BEDA method, with the root mean square error of three factors’ prediction reduced to 2.726, 3.621, and 49.817, and with an R of 0.749 for temperature, 0.848 for humidity, and 0.8711 for CO2 concentration, respectively. The experimental results show that the favorable prediction accuracy, robustness, and generalization of the proposed method make it suitable to more precisely manage greenhouses.