G protein-coupled receptors (GPCRs) conserve common structural folds and activation mechanisms, yet their ligand spectra and functions are highly diversified. This work investigated how the functional variations in olfactory GPCRs (ORs)−the largest GPCR family−are encoded in the primary sequence. With the aid of site-directed mutagenesis and molecular simulations, we built machine learning models to predict OR-ligand pairs as well as basal activity of ORs. In vitro functional assay confirmed 20 new OR-odorant pairs, including 9 orphan ORs. Residues around the odorant-binding pocket dictate the odorant selectivity/specificity of the ORs. Residues that encode the varied basal activities of the ORs were found to mostly surround the conserved motifs as well as the binding pocket. The machine learning approach, which is readily applicable to mammalian OR families, will accelerate OR-odorant mapping and the decoding of combinatorial OR codes for odors.
IntroductionFunctions of proteins are encoded within either diversified or conserved subparts of their sequence. G protein-coupled receptors (GPCRs) are the most remarkable examples of this phenomenon. GPCRs are the largest membrane protein family and the targets of about 40% of marketed drugs 1 . The human genome contains over 800 genes coding for GPCRs, which exert differentiated and specific functions in the complex cellular signaling network. Half of these genes are olfactory receptors (ORs), which endow us with fascinating capacities of odor discrimination 2,3 . Mammalian GPCRs conserve a typical structural architecture of seven transmembrane helices (7TM) that house an orthosteric ligand-binding pocket 4 . Their intrinsic signaling mechanism, large-scale conformational changes to accommodate their cognate G proteins, is encoded in conserved motifs throughout the 7TM, which form a network of inter-TM contacts converging to their cytoplasmic part 5 . Namely, the "D(E)RY", "CWLP" and "NPxxY" motifs in TM3, TM6 and TM7, respectively, are the most conserved hubs of the allosteric communication between the orthosteric pocket and the cytoplasmic side of class A GPCRs 4 . The rest of the GPCR sequences, especially the ligand-binding pocket, have diversified extensively, which resulted in huge variations in the receptors' function. Amongst the hundreds of variable residues in the GPCR sequences, it is very likely that some are specialized for the receptors' ligand specificity/selectivity while, others encode more information on the receptors' intrinsic activity.This study focused on the divergence of ligand-dependent and ligand-independent (or basal) activities of olfactory class A GPCRs. We seek to identify amino acid positions that are critical for the receptors' functional heterogeneity. ORs discriminate a vast spectrum of volatile molecules (odorants) and code for an innumerous number of odors perceived in the brain. The manyto-many relationships between ORs and odorants are key to understanding odor perception 6 . ORs are also expressed ectopically, some of which have ...