New information and communication technologies have contributed to the development of the smart city concept. On a physical level, this paradigm is characterized by deploying a substantial number of different devices that can sense their surroundings and generate a large amount of data. The most typical case is image and video acquisition sensors. Recently, these types of sensors are found in abundance in urban spaces and are responsible for producing a large volume of multimedia data. The advanced computer vision methods for this type of multimedia information means that many aspects can be dynamically monitored, which can help implement value-added applications in the city. However, obtaining more elaborate semantic information from these data poses significant challenges related to a large amount of data generated and the processing capabilities required. This paper aims to address these issues by using a combination of cloud computing technologies and mobile computing techniques to design a three-layer distributed architecture for intensive urban computing. The approach consists of distributing the processing tasks among a city's multimedia acquisition devices, a middle computing layer, known as a cloudlet, and a cloud-computing infrastructure. As a result, each part of the architecture can now focus on a small number of tasks for which they are specially designed, and data transmission communication needs are significantly reduced. To this end, the cloud server can hold and centralize the multimedia analysis of the processed results from the lower layers. Finally, a case study on smart lighting is described to illustrate the benefits of using the proposed model in smart city environments. INDEX TERMS Mobile cloud computing, data processing, distributed architectures, smart city, urban computing.