The high resolution of synchrotron cryo-nano tomography can be easily undermined by setup instabilities and sample stage deficiencies such as runout or backlash. At the cost of limiting the sample visibility, especially in the case of bio-specimens, high contrast nano-beads are often added to the solution to provide a set of landmarks for a manual alignment. However, the spatial distribution of these reference points within the sample is difficult to control, resulting in many datasets without a sufficient amount of such critical features for tracking. Fast automatic methods based on tomography consistency are thus desirable, especially for biological samples, where regular, high contrast features can be scarce. Current off-the-shelf implementations of such classes of algorithms are slow if used on a real-world high-resolution dataset. In this paper, we present a fast implementation of a consistency-based alignment algorithm especially tailored to a multi-GPU system. Our implementation is released as open-source.