The coarse-graining of data from molecular simulations yields conformational space networks that may be used for predicting the system's long time scale behavior, to discover structural pathways connecting free energy basins in the system, or simply to represent accessible phase space regions of interest and their connectivities in a two-dimensional plot. In this contribution, we present a tree-based algorithm to partition conformations of biomolecules into sets of similar microstates, i.e., to coarse-grain trajectory data into mesostates. On account of utilizing an architecture similar to that of established tree-based algorithms, the proposed scheme operates in near-linear time with data set size. We derive expressions needed for the fast evaluation of mesostate properties and distances when employing typical choices for measures of similarity between microstates. Using both a pedagogically useful and a realword application, the algorithm is shown to be robust with respect to tree height, which in addition to mesostate threshold size is the main adjustable parameter. It is demonstrated that the derived mesostate networks can preserve information regarding the free energy basins and barriers by which the system is characterized.
ABSTRACTThe coarse-graining of data from molecular simulations yields conformational space networks that may be used for predicting the system's long timescale behavior, to discover structural pathways connecting free energy basins in the system, or simply to represent accessible phase space regions of interest and their connectivities in a two-dimensional plot. In this contribution, we present a tree-based algorithm to partition conformations of biomolecules into sets of similar microstates, i.e., to coarse-grain trajectory data into mesostates. On account of utilizing an architecture similar to that of established tree-based algorithms, the proposed scheme operates in near-linear time with dataset size. We derive expressions needed for the fast evaluation of mesostate properties and distances when employing typical choices for measures of similarity between microstates. Using both a pedagogically useful and a real-word application, the algorithm is shown to be robust with respect to tree height, which in addition to mesostate threshold size is the main adjustable parameter. It is demonstrated that the derived mesostate networks can preserve information regarding the free energy basins and barriers the system is characterized by.2