Effective cost estimation for tendering plays a critical role in the building construction process, enabling efficient investment management and ensuring successful execution of the construction phase. Traditional cost estimation procedure involves manual information processing to extract and match technical data from textual description construction resources. This activity requires practitioner deep experience and manual effort, often resulting in errors and, in the worst scenario, judicial disputes.
In response to the increasing demand for structured information and automated processes, this study addresses the need for Public Administrations to achieve better control over the data contained in public tendering documents provided to practitioners. To fulfill this objective, a framework is proposed to automatically retrieve information from these documents, serving as a support tool to map items within the documents, highlight missing data, and critical semantic ambiguity.
The designed framework aims to develop a tool for automatically identifying similarities between work items and their corresponding elementary resource items in Price List tendering documents. By leveraging the information retrieval NLP technique of cosine similarity through TF-IDF, a methodology was developed to support and facilitate practitioners' activities. Finally, the framework was tested on four case studies extracted from Lombardy Regional Italian price list documents showing that the resulting support tool is able to automate the analysis process and efficiently reveal inconsistency. The model successfully extracted and correctly matched the elementary resource to the corresponding work query in 75% of the cases where the elementary resource was present in the list. Additionally, the model proved to be a valuable tool in helping practitioners identify missing resources