This study presents the development and empirical validation of score levels and descriptors specifically designed for reporting purposes to provide test takers with more than just a number on a score scale. In the context of a test primarily intended for 11-to 15-year-old students learning English as a second/foreign language, the study examined the number of band levels that could be meaningfully distinguished, the reliability of the classification of students into these band levels, and the development of overall performance descriptors that would provide meaningful information to score users. The performance data from 2,931 students who took the test were used. The band level solution was determined by balancing considerations for the reliability of classification decisions and the desire for the levels to represent meaningful performance differences. To construct meaningful descriptors for the band levels, multiple sources of information were examined, including the scoring rubrics, the characteristics of test items, typical student performance profiles, and the performance of norm groups on the test. The importance of establishing the psychometric quality of band levels and the empirical basis for performance descriptors, as well as the implications for similar efforts, are discussed.