The inevitable accumulation of errors in near-future quantum devices represents a key obstacle in delivering practical quantum advantage. This motivated the development of various quantum error-mitigation protocols, each representing a method to extract useful computational output by combining measurement data from multiple samplings of the available imperfect quantum device. What are the ultimate performance limits universally imposed on such protocols? Here, we derive a fundamental bound on the sampling overhead that applies to a general class of error-mitigation protocols, assuming only the laws of quantum mechanics. We use it to show that (1) the sampling overhead to mitigate local depolarizing noise for layered circuits -such as the ones used for variational quantum algorithms -must scale exponentially with circuit depth, and (2) the optimality of probabilistic error cancellation method among all strategies in mitigating a certain class of noise. We discuss how our unified framework and general bounds can be employed to benchmark and compare various present methods of error mitigation and identify situations where present error-mitigation methods have the greatest potential for improvement.