Recent years have seen the increasing need of location awareness by mobile applications. This paper presents a room-level indoor localization approach based on the measured room's echos in response to a two-millisecond single-tone inaudible chirp emitted by a smartphone's loudspeaker. Different from other acoustics-based room recognition systems that record full-spectrum audio for up to ten seconds, our approach records audio in a narrow inaudible band for 0.1 seconds only to preserve the user's privacy. However, the short-time and narrowband audio signal carries limited information about the room's characteristics, presenting challenges to accurate room recognition. This paper applies deep learning to effectively capture the subtle fingerprints in the rooms' acoustic responses. Our extensive experiments show that a two-layer convolutional neural network fed with the spectrogram of the inaudible echos achieve the best performance, compared with alternative designs using other raw data formats and deep models. Based on this result, we design a RoomRecognize cloud service and its mobile client library that enable the mobile application developers to readily implement the room recognition functionality without resorting to any existing infrastructures and add-on hardware. Extensive evaluation shows that RoomRecognize achieves 99.7%, 97.7%, 99%, and 89% accuracy in differentiating 22 and 50 residential/office rooms, 19 spots in a quiet museum, and 15 spots in a crowded museum, respectively. Compared with the state-of-the-art approaches based on support vector machine, RoomRecognize significantly improves the Pareto frontier of recognition accuracy versus robustness against interfering sounds (e.g., ambient music).