We present a data-driven approach
to identifying the reaction network
of the dominant chemistry in complex mixtures using model compounds
representative of cellulose and lignin chemistry that are processed
using hydrous pyrolysis. We present two methods for the identification
of pseudocomponents: self-modeling multivariate curve resolution,
which is a non-negative matrix factorization method, and Bayesian
hierarchical clustering. The pseudocomponents are identified from
spectroscopic data from two sources: Fourier transform infrared spectroscopy
and 1H NMR spectroscopy. The data from the two sources
is combined using a simple data combination method. Once pseudocomponents
have been identified, Bayesian networks are used to identify directed
pathways between the components, resulting in a proposed hypothesis
for the reaction network or mechanism. We validate the methods by
showing consistency of the derived reaction networks with the known
chemistry of cellulose, lignin, and their derivatives and demonstrate
the importance of data fusion in developing believable reaction networks.