In this work, we present and validate a methodology for generating reaction networks from spectroscopic data using data-driven methods by applying it to the hydrothermal liquefaction (HTL) of Monterrey pine biomass and its constituents, viz., cellulose and lignin. This work is presented as a step toward automated inference of chemistry of the hydrothermal liquefaction process, thus limiting the need for human expertise. Bayesian hierarchical clustering of spectra and selfmodeling multivariate spectral curve resolution are used to generate groups of chemically similar species, the reaction networks among which have been developed using Bayesian networks. Fourier transform infrared spectroscopy and proton nuclear magnetic resonance spectroscopy-based measurements are used as input data. The data-driven reaction network includes pathways representing decomposition of the biomass components, large molecule hydrolysis, and reformation of produced molecules and is consistent with the literature. Furthermore, the comparison of the networks generated for biomass and its components (levoglucosan, representing cellulose, and 2-phenoxy-ethyl benzene, representing lignin) reveals the relationship between the biomass HTL reaction network and the reaction networks of the components. The data-driven approach provides a diagnostic tool to identify the most probable reaction chemistry for complex biomass feedstocks and can be used for process understanding, design, and control.