In recent years, monitoring the health of crops has been greatly aided by deploying highthroughput crop monitoring techniques that integrate remotely captured imagery and deep learning techniques. Most methods rely mainly on the visible spectrum for analyzing the abiotic stress, such as water deficiency in crops. In this study, we carry out experiments on maize crop in a controlled environment of different water treatments. We make use of a multispectral camera mounted on an Unmanned Aerial Vehicle for collecting the data from the tillering stage to the heading stage of the crop. A pre-processing pipeline, followed by the extraction of the Region of Interest from orthomosaic is explained. We propose a model based on a Convolution Neural Network, added with a deformable convolutional layer in order to learn and extract rich spatial and spectral features. These features are further fed to a weighted Attention-based Bi-Directional Long Short-Term Memory network to process the sequential dependency between temporal features. Finally, the water stress category is predicted using the aggregated Spatial-Spectral-Temporal Characteristics. The addition of multispectral, multi-temporal imagery significantly improved accuracy when compared with mono-temporal classification. By incorporating a deformable convolutional layer and Bi-Directional Long Short-Term Memory network with weighted attention, our proposed model achieved best accuracy of 91.30% with a precision of 0.8888 and a recall of 0.8857. The results indicate that multispectral, multi-temporal imagery is a valuable tool for extracting and aggregating discriminative spatial-spectral-temporal characteristics for water stress classification.