Data on infrastructure project costs are often unstructured and lack consistency. To enable costs to be compared within and between organizations, large amounts of data must be classified to a common standard, typically a manual process. This is time‐consuming, error‐prone, inconsistent, and subjective, as it is based on human judgment. This paper describes a novel approach for automating the process by harnessing natural language processing identifying the relevant keywords in the text descriptions and implementing machine learning classifiers to emulate the expert's knowledge. The task was to identify “extra over” cost items, conversion factors, and to recognize the correct work breakdown structure (WBS) category. The results show that 94% of the “extra over” cases were correctly classified, and 90% of cases that needed conversion, correctly predicting an associated conversion factor with 87% accuracy. Finally, the WBS categories were identified with 72% accuracy. The approach has the potential to provide a step change in the speed and accuracy of structuring and classifying infrastructure cost data for benchmarking.