The present work aims at studying five Indian coals and their solvent extracted clean coal products using Py-GCMS analysis and correlating the characterization data using theoretical principal component analysis. The pyrolysis products of the original coals and the super clean coals were classified as mono-, di- and tri-aromatics, while other prominent products that were obtained included cycloalkanes, n-alkanes, and alkenes ranging from C10–C29. The principal component analysis is a dimensionality reduction technique that reduced the number of input variables in the characterization dataset and gave inferences on the relative composition of constituent compounds and functional groups and structural insights based on scores and loading plots which were consistent with the experimental observations. ATR-FTIR studies confirmed the reduced concentration of ash in the super clean coals and the presence of aromatics. The Py-GCMS data and the ATR-FTIR spectra led to the conclusion that the super clean coals behaved similarly for both coking and non-coking coals with high aromatic concentrations as compared to the raw coal. Neyveli lignite super clean coal was found to show some structural similarity with the original coals, whereas the other super clean coals showed structural similarity within themselves but not with their original coal samples confirming the selective action of the e,N solvent in solubilizing the polycondensed aromatic structures in the coal samples.