In the negative perceptron problem we are given n data points (xi, yi), where xi is a d-dimensional vector and yi ∈ {+1, −1} is a binary label. The data are not linearly separable and hence we content ourselves to find a linear classifier with the largest possible negative margin. In other words, we want to find a unit norm vector θ that maximizes min i≤n yi θ, xi . This is a non-convex optimization problem (it is equivalent to finding a maximum norm vector in a polytope), and we study its typical properties under two random models for the data.We consider the proportional asymptotics in which n, d → ∞ with n/d → δ, and prove upper and lower bounds on the maximum margin κs(δ) or -equivalently-on its inverse function δs(κ). In other words, δs(κ) is the overparametrization threshold: for n/d ≤ δs(κ) − ε a classifier achieving vanishing training error exists with high probability, while for n/d ≥ δs(κ) + ε it does not. Our bounds on δs(κ) match to the leading order as κ → −∞. We then analyze a linear programming algorithm to find a solution, and characterize the corresponding threshold δ lin (κ). We observe a gap between the interpolation threshold δs(κ) and the linear programming threshold δ lin (κ), raising the question of the behavior of other algorithms.