To assess postoperative pain intensity in adults, the numeric rating scale (NRS) is used. This scale has shown acceptable psychometric features, although its scale properties need further examination. We aimed to evaluate scale properties of the NRS using an item response theory (IRT) approach. Data from an international postoperative pain registry (QUIPS) was analyzed retrospectively. Overall, 346,892 adult patients (age groups: 18-20 years: 1.6%, 21-30 years: 6.7%, 31-40 years: 8.3%, 41-50 years: 13.2%, 51-60 years: 17.1%, 61-70 years: 17.3%, 71-80 years: 16.4%, 81-90 years: 3.9%, >90: 0.2%) were included. Among the patients, 55.7% are female and 38% had preoperative pain. Three pain items (movement pain, worst pain, least pain) were analyzed using 4 different IRT models: partial credit model (PCM), generalized partial credit model (GPCM), rating scale model (RSM), and graded response model (GRM). Fit indices were compared to decide the best fitting model (lower fit indices indicate a better model fit). Subgroup analyses were done for sex and age groups. After collapsing the highest and the second highest response category, the GRM outperformed other models (lowest Bayesian information criterion) in all subgroups. Overlapping categories were found in category boundary curves for worst and minimum pain and particularly for higher pain ratings. Response category widths differed depending on pain intensity. For female, male, and age groups, similar results were obtained. Response categories on the NRS are ordered but have different widths. The interval scale properties of the NRS should be questioned. In dealing with missing linearity in pain intensity ratings using the NRS, IRT methods may be helpful.