NIST SP800-22 is one of the most widely used statistical testing tools for pseudorandom number generators (PRNGs). This tool consists of 15 tests (onelevel tests) and two additional tests (two-level tests). Each one-level test provides one or more p-values. The two-level tests measure the uniformity of the obtained p-values for a fixed one-level test. One of the two-level tests categorizes the p-values into ten intervals of equal length, and apply a chi-squared goodness-of-fit test. This two-level test is often more powerful than one-level tests, but sometimes it rejects even good PRNGs when the sample size at the second level is too large, since it detects approximation errors in the computation of p-values.In this paper, we propose a practical upper limit of the sample size in this two-level test, for each of six tests appeared in SP800-22. These upper limits are derived by the chi-squared discrepancy between the distribution of the approximated p-values and the uniform distribution U (0, 1). We also computed a "risky" sample size at the second level for each one-level test. Our experiments show that the twolevel test with the proposed upper limit gives appropriate results, while using the risky size often rejects even good PRNGs.We also propose another improvement: to use the exact probability for the ten categories in the computation of goodness-of-fit at the two-level test. This allows us to increase the sample size at the second level, and would make the test more sensitive than the NIST's recommending usage.