“…Consequently, it is not clear -despite excellent performance on standard benchmarks -how methods [59,25,37,26] generalize to in-the-wild images. To add variation, some methods resort to generating synthetic images [46,64,23] but it is complex to approximate fully realistic images with sufficient variance. Similar to model-based methods, learning approaches have benefited from the advent of robust 2D pose methods -by matching 2D detections to a 3D pose database [8,66], by regressing pose from 2D joint distance matrices [35], by exploiting pose and geometric priors for lifting [69,1,51,19,32,70,47]; or simply by training a feed forward network to directly predict 3D pose from 2D joints [30].…”