It has been demonstrated that a pair of spectra exhibiting
a coefficient
of determination (R
2) as low as 0.976
can originate from the same chemical species in one example, while
a different pair of spectra exhibiting an R
2 up to 0.9997 can originate from different chemical species. The R
2 between spectra overlays depends on the signal-to-noise
ratio, while the residual between any two spectra should look like
noise only when the two spectra originate from the same chemical species.
Numerical characteristics of the residual between two high-resolution
spectra are invaluable toward the definitive elimination of many plausible
matches of reference spectra to the sample spectra of analytes eluted
from two-dimensional gas chromatography. Additionally, numerical characteristics
beyond R
2 facilitate a logical ranking
of all plausible matches, making positive identification of a single-component
analyte possible provided a reference spectrum exists for all plausible
matches. Specifically, the experimental background noise is shown
to follow a Gaussian distribution at all wavelengths, and a method
is described to normalize the data such that the numerically adjusted
noise distributions are independent of wavelength. The differences
between matching spectra are further shown to exhibit numerical characteristics
consistent with the background noise’s Gaussian distribution,
common to all wavelengths. Seven criteria are described for judging
the similarity between spectra: R
2 between
the two spectra, R
2 of a Q–Q plot with one axis being ideal Gaussian
quantiles and the other axis being the distribution of the numerically
adjusted residual quantiles, the maximum count of consecutive (by
wavelength) signs in the residual, and the first four moments of the
residuals. One exemplar application of the methodology is a definitive
match of n-undecane, n-dodecane,
and n-tridecane sample spectra to their corresponding
reference spectrum, which is among the most challenging set of species
within the volatility range of jet fuel to differentiate by spectral
methods. While this example is a significant stress test of the approach,
the utility of the methodology generally is in the subtle math and
transparent criteria that unambiguously identify mismatches because
the distributions of residuals between mismatching spectra are very
clearly not Gaussian and have a high consecutive sign count, even
in cases where the R
2 between the compared
spectra is ambiguous.