Monolithic 3D IC (M3D) is a promising solution to improve the performance and energy-efficiency of modern processors. But, designers are faced with challenges in design tools and methodologies, especially for power and thermal verifications. We develop a new physical design flow that optimally places and routes cache modules in one tier and logic gates in the other. Our tool also builds high-quality clock and power delivery networks targeting logic-on-memory M3D designs. Lastly, we develop a sign-off analysis tool flow to evaluate power, performance, area (PPA), thermal, and voltage-drop quality for given M3D designs. Using our complete RTL-to-GDS tool flow, we design commercial quality 2D and M3D implementation of Arm Cortex-A7 and Cortex-A53 processors in a commercial 28nm technology. Experimental results show that our 3D processors offer 20% (A7) and 21% (A53) performance gain, compared with their 2D commercial counterparts. The voltage-drop degradation of our 3D Cortex-A7 and Cortex-A53 processors is less than 3% of the supply voltage, while temperature increase is 10.71°C and 13.04°C, respectively.