Calibration and reproducibility of quantitative 18 F-FDG PET measures are essential for adopting integral 18 F-FDG PET/CT biomarkers and response measures in multicenter clinical trials. We implemented a multicenter qualification process using National Institute of Standards and Technology-traceable reference sources for scanners and dose calibrators, and similar patient and imaging protocols. We then assessed SUV in patient test-retest studies. Methods: Five 18 F-FDG PET/CT scanners from 4 institutions (2 in a National Cancer Institute-designated Comprehensive Cancer Center, 3 in a community-based network) were qualified for study use. Patients were scanned twice within 15 d, on the same scanner (n 5 10); different but same model scanners within an institution (n 5 2); or different model scanners at different institutions (n 5 11). SUV max was recorded for lesions, and SUV mean for normal liver uptake. Linear mixed models with random intercept were fitted to evaluate test-retest differences in multiple lesions per patient and to estimate the concordance correlation coefficient. Bland-Altman plots and repeatability coefficients were also produced. Results: In total, 162 lesions (82 bone, 80 soft tissue) were assessed in patients with breast cancer (n 5 17) or other cancers (n 5 6). Repeat scans within the same institution, using the same scanner or 2 scanners of the same model, had an average difference in SUV max of 8% (95% confidence interval, 6%-10%). For test-retest on different scanners at different sites, the average difference in lesion SUV max was 18% (95% confidence interval, 13%-24%). Normal liver uptake (SUV mean ) showed an average difference of 5% (95% confidence interval, 3%-10%) for the same scanner model or institution and 6% (95% confidence interval, 3%-11%) for different scanners from different institutions. Protocol adherence was good; the median difference in injection-to-acquisition time was 2 min (range, 0-11 min). Test-retest SUV max variability was not explained by available information on protocol deviations or patient or lesion characteristics. Conclusion: 18 F-FDG PET/CT scanner qualification and calibration can yield highly reproducible test-retest tumor SUV measurements. Our data support use of different qualified scanners of the same model for serial studies. Test-retest differences from different scanner models were greater; more resolution-dependent harmonization of scanner protocols and reconstruction algorithms may be capable of reducing these differences to values closer to same-scanner results. by on August 3, 2020. For personal use only. jnm.snmjournals.org Downloaded from FIGURE 4. Bland-Altman plot of liver SUV mean (n 5 23). Light-green circles 5 same scanner; dark-green circles 5 different scanners from same site; gray circles 5 different scanner models from different sites; dashed lines 5 average difference and 95% limits of agreement.