In recent years, the application of deep learning to environmental sound classification (ESC) has received considerable attention owing to its powerful ability to recognize the context of urban sounds. In general, deep learning models with high accuracy require substantial computing and memory resources. Consequently, to apply complex deep learning models to ESC in the real world, model inference has been performed on cloud servers with powerful computing resources. However, heavy network traffic and security issues occur when inferences are performed on a cloud server. In addition, deploying a deep learning model trained on a single public ESC dataset may not be sufficient for classifying various classes of urban noise and emergency-related sounds. To address these problems, we propose an on-device, real-time urban sound monitoring system that can classify various urban sounds at low system construction costs. The proposed system consisted of an edge artificial intelligence (AI) node and a FIWARE-based server. To enable the real-time inference on a resource-constrained edge AI node, we developed a lightweight convolutional neural network (CNN) by adjusting the input and model configurations to achieve high accuracy with a low number of parameters. The model achieved 94.9% classification accuracy using only 331 K parameters on an integrated dataset that included 17 classes of urban noises and emergencies. Furthermore, a prototype of the proposed system was developed and evaluated to verify its feasibility. The prototype system was built at a cost of less than USD 50 and could perform the entire monitoring process every 3 s. We also visualized the sound monitoring data using Grafana on a FIWARE-based server.