We introduce circuit implementations of one-and two-stage carrier phase recovery (CPR) for 256QAM coherent optical receivers. We describe in detail the optimizations of algorithms, such as modified Viterbi-Viterbi (mVV), blind phase search (BPS), and principal component-based phase estimation (PCPE), that are required to develop energy-efficient CPR circuits and show how design parameter settings and limited fixedpoint resolution affect the SNR penalty. 30-GBaud CPR circuit netlists synthesized in a 22-nm CMOS process technology allow us to study trade-offs between energy per bit and SNR penalty. We show that it is possible to reach an energy dissipation of around 1 pJ/bit at an SNR penalty of 0.6 dB for two-stage PCPE+BPS and mVV+BPS implementations, and that PCPE+BPS is the preferred choice thanks to its smaller area.