Wheel damage of high-speed trains has received great attention since it is detrimental to safety and reliability and consumes huge maintenance costs. However, the mixed operation mode of electric multiple units (EMUs) in China induces great challenges in distinguishing the influence of different railway lines on wheel damage. This paper develops a data-driven approach to quantify the influence of different railway lines on wheel damage and further applies it to damage risk prediction. First, under several fundamental assumptions, linear discriminant analysis and the proposed data padding strategy are used to quantify the influence on wheel damage due to per unit mileage of running on each railway section. The result shows that the influence on wheel damage varies significantly for different railway sections. Then, a route-based predictor is defined to represent the cumulative influence of an EMU’s routes on its wheel damage. Further, a wheel damage prediction model is developed based on a support vector machine using the route-based predictor and three other predictors from the operational data. Validation and test results demonstrate that the actual damage rate is higher for wheelsets predicted with higher damage probabilities. The proposed methodology has the potential to be applied to the risk-based operation and maintenance of high-speed trains.