This paper studies the task of vision-based MAV-catching-MAV, where a catcher MAV (micro aerial vehicle) can detect, localize, and pursue a target MAV autonomously. Since it is challenging to develop detectors that can effectively detect unseen MAVs in complex environments, the main novelty of this paper is to propose a real-to-sim-to-real approach to address this challenge. In this method, images of real-world environments are first collected. Then, these images are used to construct a high-fidelity simulation environment, based on which a deep-learning detector is trained. The merit of this approach is that it allows efficient automatic collection of large-scale and high-quality labeled datasets. More importantly, since the simulation environment is constructed from real-world images, this approach can effectively bridge the sim-to-real gap, enabling efficient deployment in real environments. Another contribution of this paper lies in the successful implementation of a fully autonomous vision-based MAV-catching-MAV system including proposed estimation and pursuit control algorithms. While the previous works mainly focused on certain aspects of this system, we developed a completely autonomous system that integrates detection, estimation, and control algorithms on real-world robotic platforms.