Objective: Measuring the respiratory signal from a video based on body motion has been proposed and recently matured in products for contactless health monitoring. The core algorithm for this application is the measurement of tiny chest/abdominal motions induced by respiration (i.e. capturing sub-pixel displacement caused by subtle motion between subsequent video frames), and the fundamental challenge is motion sensitivity. Though prior art reported on the validation with real human subjects, there is no thorough or rigorous benchmark to quantify the sensitivities and boundary conditions of motion-based core respiratory algorithms. Approach: A set-up was designed with a fully-controllable physical phantom to investigate the essence of core algorithms, together with a mathematical model incorporating two motion estimation strategies and three spatial representations, leading to six algorithmic combinations for respiratory signal extraction. Their promises and limitations are discussed and clarified through the phantom benchmark. Main results: With the variation of phantom motion intensity between 0.5 mm and 8 mm, the recommended approach obtains an average precision, recall, coverage and MAE of 88.1%, 91.8%, 95.5% and 2.1bpm in the day-light condition, and 81.7%, 90.0%, 93.9% and 4.4 bpm in the night condition. Significance: The insights gained in this paper are intended to improve the understanding and applications of camera-based respiration measurement in health monitoring. The limitations of this study stem from the used physical phantom that does not consider human factors like body shape, sleeping posture, respiratory diseases, etc., and the investigated scenario is focused on sleep monitoring, not including scenarios with a sitting or standing patient like in clinical ward and triage.