“…with Ω ⊂ R 2 the mask of the object to reconstruct, I i : Ω → R the i-th input graylevel image, ρ the reflectance (albedo) map, n the normal map (which encodes the 3Dgeometry), and s i ∈ R 3 a vector representing the incident lighting in the i-th image (in intensity and direction). Most of recent works on photometric stereo have focused on relaxing the assumptions of Lambertian reflectance (i.e., handling surfaces which exhibit a specular behavior) [4,21,11,29] and calibrated directional lighting (i.e., handling unknown or non-uniform lighting) [5,10,13,22], see for instance [23] for some discussion and [3] for a state-ofthe-art joint solution to both issues using deep neural networks. However, in all of these recent works the object to reconstruct is assumed to be segmented a priori: the whole pipeline relies on the knowledge of the domain Ω.…”