Surveys are typical for student evaluation of teaching (SET). Survey research consistently confirms the negative impacts of careless responses on research validity, including low data quality and invalid research inferences. SET literature seldom addresses if careless responses are present and how to improve. To improve evaluation practices and validity, the current study proposed a three-step procedure to screen SET data for quantifying careless responses, delete and justify the removal of careless responses, and assess if removing careless responses improved the internal structure of the SET data. For these purposes, a convenience sample was taken from a Chinese university. A web-based survey was administered using a revised version of the Students’ Evaluation of Education Quality. One hundred ninety-nine students evaluated 11 courses with 295 responses. Longstring and Rasch outlier analyses identified 49% of nonrandom and random careless responses. The careless responses impacted evaluation results substantively and were deleted. The subsequent study demonstrated that removal improved data validity, using reliability, separation, and inter-rater agreement from the multi-facet Rasch model and G- and D-coefficients, signal-noise ratios, and error variance from generalizability theory. Removing careless responses improved the data validity in terms of true score variance and discrimination power of the data. Data screening should be a prerequisite to validating SET data based on the research results. Data removal is necessary to improve the research validity only if there is a noticeable change in the estimated teaching abilities. Suggestions and implications were discussed, including developing sound evaluation practices and formative use of SET.