Robust and accurate three-dimensional localization is essential for personal navigation, emergency rescue, and worker monitoring in indoor environments. For localization technology to be employed in various applications, it is necessary to reduce infrastructure dependence and limit the maximum error bound. This study aims to accurately estimate the location of various people using smartphones in a building with a cloud platform-based localization system. The proposed technology is modularized in a hierarchical structure to sequentially estimate the floor and location. This system comprises four localization modules: course level detection, fine level detection (FLD), fine location tracking (FLT), and level change detection (LCD). Each module operates organically according to the current user status. The position estimation range is defined as a total of three phases, and an appropriate location estimation module suitable for the corresponding phase operates to estimate the user’s location gradually and precisely. When the user’s floor is determined by an FLD, the two-dimensional position of the user is estimated by an FLT module that tracks the user’s position by comparing the received signal strength indicator vector sequence and radio map. Also, LCD recognizes the user’s floor change and converts the user’s phase. To verify the proposed technology, various experiments were conducted in a six-story building, and an average accuracy of less than 2 m was obtained.