[1] An abundance of methods have been developed over the years to perform the frequency analysis (FA) of extreme environmental variables. Although numerous comparisons between these methods have been implemented, no general comparison framework has been agreed upon so far. The objective of this paper is to build the foundation of a data-based comparison framework, which aims at complementing more standard comparison schemes based on Monte Carlo simulations or statistical testing. This framework is based on the following general principles: (i) emphasis is put on the predictive ability of competing FA implementations, rather than their sole descriptive ability measured by some goodness-of-fit criterion; (ii) predictive ability is quantified by means of reliability indices, describing the consistency between validation data (not used for calibration) and FA predictions; (iii) stability is also quantified, i.e., the ability of a FA implementation to yield similar estimates when calibration data change; and (iv) the necessity to subject uncertainty estimates to the same scrutiny as point estimates is recognized, and a practical approach based on the use of the predictive distribution is proposed for this purpose. This framework is then applied to a case study involving 364 gauging stations in France, where 10 FA implementations are compared. These implementations correspond to the local, regional, and local-regional estimation of Gumbel and generalized extreme value distributions. Results show that reliability and stability indices are able to reveal marked differences between FA implementations. Moreover, the case study also confirms that using the predictive distribution to indirectly scrutinize uncertainty estimates is a viable approach, with distinct FA implementations showing marked differences in the reliability of their uncertainty estimates. The proposed comparison framework therefore constitutes a valuable tool to compare the predictive reliability of competing FA implementations, along with the reliability of their uncertainty estimates.