Cryo-electron microscopy (cryo-EM) has now become a widely used technique for structure determination of macromolecular complexes. For modeling molecular structures from density maps of different resolutions, many algorithms have been developed. These algorithms can be categorized into rigid fitting, flexible fitting, and de novo modeling methods. It is also observed that machine learning (ML) techniques have been increasingly applied following the rapid progress of the ML field. Here, we review these different categories of macromolecule structure modeling methods and discuss their advances over time.Molecules 2020, 25, 82 2 of 13 methods, in this order. Then, we discuss methods that use machine learning approaches, which are emerging in recent years in the cryo-EM structure modeling field. Figure 1. The number of rigid fitting, flexible fitting, and de novo modeling software published per year. The statistics are based on publication. The plot shows 28 rigid fitting methods [9-36], 33 flexible fitting methods [37-69], and 8 de novo modeling methods [70-77].
Rigid Fitting MethodsIn rigid body fitting, high resolution atomic models which are derived from X-ray crystallography, Nuclear Magnetic Resonance (NMR) spectroscopy, or protein prediction are fitted into a cryo-EM map. One of the earliest rigid fitting methods is EMfit developed by Rossmann et al. in 2001 [12]. In EMfit, a high-resolution structure is placed manually into a specific position in the EM map. Then, a 3-D rotational search is applied to find the best orientation. After that, EMfit optimizes the initial fitting by performing local rotational and translational steps. In general, rigid-body fitting methods search for the best placement of an atomic model in a density map. Search algorithms that have been used for rigid fitting include Fast Fourier transform-based (FFT) [14,32,35], grid-threading Monte Carlo (GTMC) [16], spherical harmonic-based search [20], and geometric hashing [27]. FFT is a fast search scheme that accelerates the 3-D translational search [14]. HermiteFit speeds up the rotation step in the FFT by representing densities as three-dimensional orthogonal Hermite functions and performing rotation in the Hermite space [32]. Fast polar Fourier search is a variation of FFT, which is based on non-uniform SO(3) Fourier Transforms [35]. Its principal advantage is the ability to search efficiently and uniformly over a set of samples of the conformational space. GTMC combines grid search and Monte Carlo sampling [16]. GTMC divides the search space into grid points and uses Monte Carlo to find local maxima near the grid points to identify the global maximum. ADP_EM is a spherical harmonic-based (SH) method which applies exhaustive translational scanning and accelerates the rotational search by representing densities as SH functions [20]. Geometric hashing identifies a set of possible transformations which are stored in a fast-searchable hash map [27]. Later, the set of transformations is searched to find the best fit.While the methods above e...