Fig. 1: Overview: Accurate distance measurement to the closest obstacle and associated gradient cannot be expected from a noisy and sparse voxel grid representation built from the triangulated point cloud of a monocular visual-inertial SLAM (VINS) system. This in turn, poses a critical challenge for trajectory planning. As a potential solution, we propose UrbanFly: a pair of trajectory optimizers for safe navigation amongst high-rises in urban environments. Unlike active SLAM or feature-driven planning methods, our objective is not to tightly couple perception and planning. Instead, we intend to accommodate the capabilities and limitations of SLAM within trajectory optimization. To this end, we use the VINS back-end to construct a geometric representation of obstacles as cuboids and estimate the associated uncertainty in orientation and size. Our trajectory optimizers based on Cross-Entropy Method (CEM) and sequential convex programming achieve real-time performance by leveraging fast estimates of collision probability made possible by the geometrical representation of the obstacles. The figure also demonstrate a case where the baseline [1] that plans to compute trajectory fails whereas our method can successfully plan. Further details related to the figure is described in section III-A and VI-C