This study examined the validity and reliability of a newly developed multiple-choice evaluation system that measured students’ higher-order thinking skills (HOTS). The instrument test consisted of 45 multiple-choice items and was developed based on the cognitive domain of Bloom’s Taxonomy. A quantitative method was used. It consisted of three phases: Content Validity by inter-rater agreement, Construct Validity by principal component analysis (PCA), and Reliability shown by Chronbach’s alpha. The content validity by inter-rater agreement found that the instrument was categorized as valid. The construct validity by PCA found that each item in the evaluation instrument measured one-dimensionality, which is good to be used as an evaluation instrument test. The reliability was established to be a high degree with Chronbach’s Alpha being 0.94. From the result of this study, a valid and reliable HOTS multiple-choice item evaluation instrument has been produced and is ready to be tested in a small sample to examine its empirical quality.
Keywords: validity, reliability, multiple-choice, evaluation system