We extensively mapped energy landscapes and conformations
of 22
(including three His protonation states) proteinogenic α-amino
acids in trans configuration and the corresponding
484 (222) dipeptides. To mimic the environment in a protein
chain, the N- and C-termini of the
studied systems were capped with acetyl and N-methylamide
groups, respectively. We systematically varied the main chain dihedral
angles (ϕ, ψ) by 40° steps and all side chain angles
by 90° or 120° steps. We optimized the molecular geometries
with the GFN2-xTB semiempirical (SQM) method and performed single
point density functional theory calculations at the BP86-D3/DGauss-DZVP//COSMO-RS
level in water, 1-octanol, N,N-dimethylformamide,
and n-hexane. For each restrained (nonequilibrium)
structure, we also calculated energy gradients (in water) and natural
atomic charges. The exhaustive and unprecedented QM-based sampling
enabled us to construct Ramachandran plots of quantum mechanical (QM(BP86-D3)//COSMO-RS)
energies calculated on SQM structures, for all 506 (484 dipeptides
and 22 amino acids) studied systems. We showed how the character of
an amino acid side chain influences the conformational space of single
amino acids and dipeptides. With clustering techniques, we were able
to identify unique minima of amino acids and dipeptides (i.e., minima
on the GFN2-xTB potential energy surfaces) and analyze the distribution
of their BP86-D3//COSMO-RS conformational energies in all four solvents.
We also derived an empirical formula for the number of unique minima
based on the overall number of rotatable bonds within each peptide.
The final peptide conformer data set (PeptideCs) comprises over 400
million structures, all of them annotated with QM(BP86-D3)//COSMO-RS
energies. Thanks to its completeness and unbiased nature, the PeptideCs
can serve, inter alia, as a data set for the validation
of new methods for predicting the energy landscapes of protein structures.
This data set may also prove to be useful in the development and reparameterization
of biomolecular force fields. The data set is deposited at Figshare
() and can be accessed using a simple web interface at .