This study presents a novel audio compression technique, tailored for environmental monitoring within multi-modal data processing pipelines. Considering the crucial role that audio data play in environmental evaluations, particularly in contexts with extreme resource limitations, our strategy substantially decreases bit rates to facilitate efficient data transfer and storage. This is accomplished without undermining the accuracy necessary for trustworthy air pollution analysis while simultaneously minimizing processing expenses. More specifically, our approach fuses a Deep-Learning-based model, optimized for edge devices, along with a conventional coding schema for audio compression. Once transmitted to the cloud, the compressed data undergo a decoding process, leveraging vast cloud computing resources for accurate reconstruction and classification. The experimental results indicate that our approach leads to a relatively minor decrease in accuracy, even at notably low bit rates, and demonstrates strong robustness in identifying data from labels not included in our training dataset.