There have been increasing demands for developing microaerial vehicles with vision-based autonomy for search and rescue missions in complex environments. In particular, the monocular visual-inertial system (VINS), which consists of only an inertial measurement unit (IMU) and a camera, forms a great lightweight sensor suite due to its low weight and small footprint. In this paper, we address two challenges for rapid deployment of monocular VINS: 1) the initialization problem and 2) the calibration problem. We propose a methodology that is able to initialize velocity, gravity, visual scale, and camera-IMU extrinsic calibration on the fly. Our approach operates in natural environments and does not use any artificial markers. It also does not require any prior knowledge about the mechanical configuration of the system. It is a significant step toward plugand-play and highly customizable visual navigation for mobile robots. We show through online experiments that our method leads to accurate calibration of camera-IMU transformation, with errors less than 0.02 m in translation and 1°in rotation. We compare out method with a state-of-the-art marker-based offline calibration method and show superior results. We also demonstrate the performance of the proposed approach in largescale indoor and outdoor experiments.Note to Practitioners-This paper presents a methodology for online state estimation in natural environments using only a camera and a low-cost micro-electro-mechanical systems (MEMS) IMU. It focuses on addressing the problems of online estimator initialization, sensor extrinsic calibration, and nonlinear optimization with online refinement of calibration parameters. This paper is particularly useful for applications that have superior size, weight, and power constraints. It aims for rapid deployment of robot platforms with robust state estimation capabilities with ). This paper has supplementary downloadable multimedia material available at http://ieeexplore.ieee.org provided by the authors. The Supplementary Material contains the following. Three experiments are presented in the video to demonstrate the performance of our self-calibrating monocular visualinertial state estimation method. The first experiment details the camera-IMU extrinsic calibration process in a small indoor experiment. The second experiment evaluates the performance of the overall system in a large-scale indoor environment with highlights to the online calibration process. The third experiment presents the state estimation results in a large-scale outdoor environment using different camera configurations. This material is 52.6 MB in size. almost no setup, calibration, or initialization overhead. The proposed method can be used in platforms including handheld devices, aerial robots, and other small-scale mobile platforms, with applications in monitoring, inspection, and search and rescue.