Distance sensors are important for mobile robots to perceive surrounding environment. Typical sensors like LiDARs and depth cameras have been widely used, yet each has its limitations, such as LiDARs' relatively high cost, depth cameras' limitation to indoor use, and their poor performance in detecting transparent objects directly. On the other hand, ultrasonic phased array that integrates multiple ultrasonic sensors not only enables 3D ranging and imaging, but also provides advantages of strong environmental adaptability, being cost-effective and being able to detect transparent objects. To explore the application of in-air ultrasonic phased arrays for mobile robots, we simulate a 40 kHz 5×5 non-uniform sparse ultrasonic phased array. The simulator emulates the process of phased array transmission and reception, and utilizes algorithms such as beamforming and matched filtering to obtain depth information in three-dimensional space. Then, a multi-view indoor 3D reconstruction method fusing the ultrasonic phased array and a monocular camera is proposed, where two scanning strategies are developed to handle different scenarios. Finally, the method is validated in different Gazebo scenarios and compared with other baseline methods like LiDARs and depth cameras. The experimental results reveal the method's strong performance in terms of accuracy, consistency and completeness.