Pre-silicon performance evaluation is a crucial component of computer systems research and development. While simulation has long been the de facto standard in this context, it can be prohibitively time-consuming for long-running, realistic workloads. To expedite this process, researchers have traditionally turned to sampling techniques. However, these techniques typically rely on fixed-length intervals for analysis, which can often be out of sync with the periodicity of program execution. Additionally, since an application's phase behavior is strongly correlated to the code it executes, it can exhibit a hierarchy of phase behaviors that can be observed at various interval lengths, rendering conventional sampling techniques inadequate. To address these limitations, we propose Viper -a novel sampled simulation methodology that applies to single-threaded and multi-threaded workloads by leveraging the hierarchical structure of program execution. Viper takes into account both application periodicity and inter-thread synchronization in order to achieve better sampling accuracy and smaller regions, which enables faster register-transfer level (RTL) simulations. We evaluate Viper with the multi-threaded SPEC CPU2017 benchmarks and demonstrate a significant simulation speedup (up to 2,710×, 358× on average for the train input set) while maintaining an average sampling error of just 1.32%.