The selection of paleointensity data is a challenging, but essential step for establishing data reliability. There is, however, no consensus as to how best to quantify paleointensity data and which data selection processes are most effective. To address these issues, we begin to lay the foundations for a more unified and theoretically justified approach to the selection of paleointensity data. We present a new compilation of standard definitions for paleointensity statistics to help remove ambiguities in their calculation. We also compile the largest-to-date data set of raw paleointensity data from historical locations and laboratory control experiments with which to test the effectiveness of commonly used sets of selection criteria. Although most currently used criteria are capable of increasing the proportion of accurate results accepted, criteria that are better at excluding inaccurate results tend to perform poorly at including accurate results and vice versa. In the extreme case, one widely used set of criteria, which is used by default in the ThellierTool software (v4.22), excludes so many accurate results that it is often statistically indistinguishable from randomly selecting data. We demonstrate that, when modified according to recent single domain paleointensity predictions, criteria sets that are no better than a random selector can produce statistically significant increases in the acceptance of accurate results and represent effective selection criteria. The use of such theoretically derived modifications places the selection of paleointensity data on a more justifiable theoretical foundation and we encourage the use of the modified criteria over their original forms.