<p>The huge impact caused by liquefaction during past earthquakes stimulates the interest of researchers in investigating the factors ruling the susceptibility of subsoil and the triggering conditions. The concern of stakeholders raises the need for risk assessment methods applicable at the large scale. A crucial aspect for liquefaction risk assessment consists in the subsoil characterization, with the&#160; stratigraphic classification into homogeneous soil layers and the identification of the susceptible volumes, with the aim of constructing 2D and 3D geo-mechanical models. In the current practice, the CPT-based soil behavior type (SBT) and the soil behavior type index (Ic), are widely used to identify soil boundaries discontinuities (Robertson, 2016). Sometimes, the interpretation of subsoil profile is not immediate and unique, due to the lack of evident boundary changes. In these cases, the need is felt for sound, widely applicable tools that provide univocal identification of subsoil strata. Statistical procedure, developed over the years, provides a less subjective interpretation of the subsoil and, in conjunction with artificial intelligence, can lead to improve the current methodology obtaining an objective and extensive site characterization. This work exposes a data-driven analysis for the subsoil stratigraphic recognition combining geostatistical tools and AI genetic algorithms. The presented procedure is calibrated and validated on the case study of Terre del Reno (Italy), severely struck by liquefaction during the 2012 Mw 6.1 earthquake and characterized by complex geo-stratigraphic conditions. The selected area, homogeneously covered by about 1700 geognostic surveys, is investigated within the "PERL" research project, carried out by the Emilia Romagna Region (RER), CNR-IGAG and UniCas-DiCeM, aiming to provide a reliable procedure for liquefaction risk assessment and a seismic microzonation. From the RER geodatabase, 102 pairs of complementary CPT and boreholes were extracted to calibrate the method, defined as the couples of surveys located at a relative distance less than 30m, considered for this purpose as spatially correlated. Starting from the information available from the boreholes, a geologic-sedimentologic study has been carried out to define the main stratigraphic units. In parallel, CPT profiles are processed with a statistical method based on the spatial variability analysis of the measured parameters, identifying statistically homogeneous layers and associating to each of them the correspondent stratigraphic unit reported in the complementary borehole. At this stage, an artificial intelligence algorithm has been calibrated merging the outcomes derived from couples of CPTs and boreholes. Subsequently, the procedure has been applied to the remaining CPTs, combining the geological and geotechnical knowledge of the subsoil in an efficient and automatic way to enable a large-scale reconstruction of the subsoil stratigraphy.</p>