For the past few decades, PPA (performance, power, and area) demand of computation infrastructure has been driving exponential increase of chip density. In recent years, the challenges of printability and process window for advanced manufacturing node continuously motivated innovations in reticle enhancement techniques, notably the adoption of inverse lithography technology (ILT) and curvilinear mask. We have observed a few challenges: 1) ILT provides unmatched quality of results but does incur additional computation time to manage; 2) for curvilinear mask, though the benefits are evident, the associated data volume is very large; and 3) mask consistency remains a critical component for design manufacturability. To utilize the advanced RET techniques to their full potential, it is crucial to identify the repeating structures in design layout and reuse the correction result, getting three benefits at the same time: reducing mask preparation runtime, reducing mask data volume, and improving mask consistency. Conventional layout repetition analysis is based on native design hierarchy. However, in many cases, the input layout for mask synthesis flows is either completely stripped of hierarchy or contains sub-optimal hierarchy. Some layout hierarchy can be detected and reconstructed using manual methods such as using user generated pattern library of highly repeating structures in conjunction with pattern matching technology. However, the preparation of such libraries is a formidable effort, and a significant number of repetitions in designs will be overlooked by this approach. In this paper, we investigate the automatic detection of repeating geometry structures and formed a hierarchy that is optimized for mask synthesis. The detection supports any process layer and both Manhattan and all-angle designs. The engine detects repeating regions of arbitrary shape. The detected repeating structures can also be applied within the chip or across chips to accelerate correction to further improve mask consistency. By scaling well to hundreds of processors, the distributed hierarchy extraction is very efficient for a full chip layout. For highly repetitive layouts, mask synthesis runtime reduction of more than an order of magnitude has been observed by performing this hierarchy extraction.