This paper concerns the problem of recovering an unknown but structured signal x ∈ R n from m quadratic measurements of the form y r = ⟨a r , x⟩ 2 for r = 1, 2, . . . , m. We focus on the under-determined setting where the number of measurements is significantly smaller than the dimension of the signal (m << n). We formulate the recovery problem as a nonconvex optimization problem where prior structural information about the signal is enforced through constrains on the optimization variables. We prove that projected gradient descent, when initialized in a neighborhood of the desired signal, converges to the unknown signal at a linear rate. These results hold for any constraint set (convex or nonconvex) providing convergence guarantees to the global optimum even when the objective function and constraint set is nonconvex. Furthermore, these results hold with a number of measurements that is only a constant factor away from the minimal number of measurements required to uniquely identify the unknown signal. Our results provide the first provably tractable algorithm for this data-poor regime, breaking local sample complexity barriers that have emerged in recent literature. In a companion paper we demonstrate favorable properties for the optimization problem that may enable similar results to continue to hold more globally (over the entire ambient space). Collectively these two papers utilize and develop powerful tools for uniform convergence of empirical processes that may have broader implications for rigorous understanding of constrained nonconvex optimization heuristics. The mathematical results in this paper also pave the way for a new generation of data-driven phase-less imaging systems that can utilize prior information to significantly reduce acquisition time and enhance image reconstruction, enabling nano-scale imaging at unprecedented speeds and resolutions.Definition 3.2 (Gaussian width) The Gaussian width of a set C ∈ R p is defined as:where the expectation is taken over g ∼ N (0, I p ). Throughout we use B n S n−1 to denote the the unit ball/sphere of R n .We now have all the definitions in place to quantify the capability of the function R in capturing the properties of the unknown parameter x. This naturally leads us to the definition of the minimum required number of samples. Definition 3.3 (minimal number of samples) Let C R (x) be a cone of descent of R at x. We define the minimal sample function as M(R, x) = ω 2 (C R (x) ∩ B n ).We shall often use the short hand m 0 = M(R, x) with the dependence on R, x implied.