In European Seveso Legislation for the control of the hazard of major accidents (Directive 2015/12/UE), the Safety Management System SMS is an essential obligation for managers and the authorities are required to periodically verify its adequateness through periodical inspections at Seveso sites. One of the pillars of the SMS is the collection and analysis of documents on accidents, near misses, and possible anomalies, in order to identify weaknesses and implement continuous improvement. In Italy, for a few years, the documents, gathered from all Italian Seveso sites by the inspectors, have been archived and used for research purposes. The archive currently contains some 4000 reports, collected in 5 years by some 100 inspectors throughout Italy. This paper discusses in detail the challenges faced to extract the knowledge hidden in the documents and make it usable through the design of a robust model. For this aim, machine learning techniques have been used for preprocessing of the reports for extracting the concepts and their relations, organized into an entity-relation model. The effectiveness of this methodology and its potentiality are pointed out by investigating a few hot topics, exploiting the information contained in the repository.