This work presents a scalable Column-Row-Parallel ASIC architecture for 3D wearable / portable medical ultrasound. It leverages programmable electronic addressing to achieve linear scaling for both hardware interconnection and software data acquisition. A 16x16 transceiver ASIC is fabricated and flip-chip bonded to a 16x16 capacitive micromachined ultrasonic transducer (CMUT) to demonstrate the compact, low-power front-end assembly. A 3D plane-wave coherent compounding algorithm is designed for fast volume rate (62.5 volume/s), high quality 3D ultrasonic imaging. An interleaved checker board pattern with I&Q excitations is also proposed for ultrasonic harmonic imaging, reducing transmitted second harmonic distortion by over 20dB, applicable to nonlinear transducers and circuits with arbitrary pulse shapes.Each transceiver circuit is element-matched to its CMUT element. The high voltage transmitter employs a 3-level pulse-shaping technique with charge recycling to enhance the power efficiency, requiring minimum off-chip components. Compared to traditional 2-level pulsers, 50% more acoustic power delivery is obtained with the same total power dissipation. The receiver is implemented with a transimpedance amplifier topology and achieves a lowest noise efficiency factor in the literature (2.1 compared to a previously reported lowest of 3.6, in unit of mP a · mW/Hz). A source follower stage is specially designed to combine the analog outputs of receivers in parallel, improving output SNR as parallelization increases and offering flexibility for imaging algorithm design. Lastly, fault-tolerance is incorporated into the transceiver to deal with faulty elements within the 2D MEMS transducer array, increasing yield for the system assembly.