Analyzing the physical and chemical properties of single DNA based molecular machines such as polymerases and helicases requires to track stepping motion on the length scale of base pairs. Although high resolution instruments have been developed that are capable of reaching that limit, individual steps are oftentimes hidden by experimental noise which complicates data processing. Here, we present an effective two-step algorithm which detects steps in a high bandwidth signal by minimizing an energy based model (Energy based step-finder, EBS). First, an efficient convex denoising scheme is applied which allows compression to tuples of amplitudes and plateau lengths. Second, a combinatorial clustering algorithm formulated on a graph is used to assign steps to the tuple data while accounting for prior information.Performance of the algorithm was tested on Poissonian stepping data simulated based on published kinetics data of RNA Polymerase II (Pol II). Comparison to existing step-finding methods shows that EBS is superior in speed while providing competitive step detection results especially in challenging situations.Moreover, the capability to detect backtracked intervals in experimental data of Pol II as well as to detect stepping behavior of the Phi29 DNA packaging motor is demonstrated.‡ The authors contributed equally to this article.
IntroductionSingle molecule measurements of molecular motors allow to study the motion of individual enzymes. The studies range from enzymes making comparably large steps e.g. motor proteins like Myosin V [1] and Kinesin [2] to DNA based molecular machines which make steps on the scale of single nucleotides [3][4][5][6]. Experimental techniques to study these systems range from single molecule fluorescence localization [7] to optical and magnetic tweezers [8]. Most of these measurements represent the underlying dynamics as one-dimensional time series of positional changes. The enzymatic reactions which fuel this motion appear as stochastic events resulting in step-like movements [9] obliterated by noise. Nowadays state of the art optical tweezers experiments allow to study the movement of enzymes with a resolution down to single base pairs [3,10]. For example, studies on the ϕ29 bacteriophage ring ATPase [11][12][13] used the information from step detection data to propose a complete model of the mechanochemical cycle. However, oftentimes analysis schemes rely on low pass smoothed data.Indeed, the problem of finding steps is not only limited to studies of movement of enzymes but appears in a wide range of biomolecular experiments from fluorescence resonance energy transfer trajectories [14], to steps in membrane tether formation [15], or the opening of ion channels [16], just to name a few.Consequently, there is a rich amount of signal processing techniques available to recover piecewise constant signals from noisy data. Due to the stochastic nature of enzymatic stepping the number of steps is often not known a priori. Therefore, different step finding algorithms have been develop...