While micro-CT systems are instrumental in preclinical research, clinical micro-CT imaging has long been desired with cochlear implantation as a primary application. The structural details of the cochlear implant and the temporal bone require a significantly higher image resolution than that (about 0.2 mm) provided by current medical CT scanners. In this paper, we propose a clinical micro-CT (CMCT) system design integrating conventional spiral cone-beam CT, contemporary interior tomography, deep learning techniques, and the technologies of a micro-focus X-ray source, a photon-counting detector (PCD), and robotic arms for ultrahigh-resolution localized tomography of a freely-selected volume of interest (VOI) at a minimized radiation dose level. The whole system consists of a standard CT scanner for a clinical CT exam and VOI specification, and a robotic micro-CT scanner for a local scan of high spatial and spectral resolution at minimized radiation dose. The prior information from the global scan is also fully utilized for background compensation of the local scan data for accurate and stable VOI reconstruction. Our results and analysis show that the proposed hybrid reconstruction algorithm delivers accurate high-resolution local reconstruction, and is insensitive to the misalignment of the isocenter position, initial view angle and scale mismatch in the data/image registration. These findings demonstrate the feasibility of our system design. We envision that deep learning techniques can be leveraged for optimized imaging performance. With highresolution imaging, high dose efficiency and low system cost synergistically, our proposed CMCT system has great promise in temporal bone imaging as well as various other clinical applications.INDEX TERMS Clinical micro-CT, Deep learning, High-resolution imaging, Interior tomography, Photoncounting detector, Robotic arms, Temporal bone imaging, X-ray computed tomography This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.