Maintaining water quality in aquatic habitats is critical for the health of aquatic species, particularly for fish. This study pioneers an innovative method to water quality classification, leveraging IoT-driven data acquisition and meticulous data labelling with the Aqua-Enviro Index (AEI) by considering the fish habitats. Existing mechanisms fail to capture complex temporal dynamics and depend largely on large amounts of labelled data, exposing fundamental limits. In response, we describe the Deep learning based Convolutional Gated Recurrent Unit Tempo Fusion Network (CGTFN) model, which represents a considerable development in the evaluation of water quality. The model addresses these restrictions by seamlessly merging Convolutional Neural Networks (CNNs) for spatial pattern recognition and Gated Recurrent Units (GRUs) for temporal interactions. The Tempo Fusion mechanism combines spatial, temporal, and contextual data harmoniously, allowing for more sophisticated classifications by recognizing subtle interdependencies among environmental elements. The pioneering CGTFN model outperforms previous models, achieving 99.71% and 99.81% accuracy on both public-env and real-time-env datasets, respectively, exceeding established models at 98.2%. These remarkable findings highlight CGTFN's disruptive potential in water quality evaluation, bridging the gap between technology and environmental management, with ramifications ranging from aquaculture to resource sustainability.