The study applies machine learning (ML) algorithms to investigate the association between the length of a test item written in Chinese (through word count), item difficulty, and students' item perceptions (IPs) in science term examinations. For Research Question 1, items for grade 7 students aged 12-13 in a Taiwanese secondary school from 2014 to 2019 were analyzed. For Research Question 2, the study included 4,916 students from the said population. For RQ3, perceptions were gathered from 48 students of the same school in 2020. The study's results showed that first, the average word count of the 611 items was 88.81, with an average stem word count of 41.16, average options word count of 47.66, and stem-to-options word count ratio (S-O ratio) of 1.27. Second, given that the ML M5P categorization algorithm affirms the items' predictive power, the length of an item is a key factor in determining its difficulty. As a result of this algorithm, 3 categories of the length of science term examination items were classified (<57.5 words, 57.5-91.5 words, and >91.5 words), and 3 linear prediction models of item difficulty (LM1, LM2, and LM3) were generated. From these models, it was found that as the length of an item increases, so does its difficulty. In the prediction analysis of students' IPs, the J48 prediction result was better and convertible into understandable rules. IP was the root node of the decision rule, indicating the importance of this variable. Therefore, students would more likely answer an item correctly when 1) it was perceived to be easy or normal, 2) the students had high or ordinary learning achievements in science, and 3) it contained below 71 words. The study's results can be used as a reference for educators, examiners, and researchers in practical science term examination design. Moreover, it can guide the further research method and direction of applying machine learning in analyzing the difficulty of items in scientific assessments.