Cone beam computed tomography (CBCT) is a standard solution for in-room image guidance for radiation therapy. It is used to evaluate and compensate for anatomopathological changes between the dose delivery plan and the fraction delivery day. CBCT is a fast and versatile solution, but it suffers from drawbacks like low contrast and requires proper calibration to derive density values. Although these limitations are even more prominent with in-room customized CBCT systems, strategies based on deep learning have shown potential in improving image quality. As such, this article presents a method based on a convolutional neural network and a novel two-step supervised training based on the transfer learning paradigm for shading correction in CBCT volumes with narrow field of view (FOV) acquired with an ad hoc in-room system. Methods: We designed a U-Net convolutional neural network, trained on axial slices of corresponding CT/CBCT couples. To improve the generalization capability of the network, we exploited two-stage learning using two distinct data sets. At first, the network weights were trained using synthetic CBCT scans generated from a public data set, and then only the deepest layers of the network were trained again with real-world clinical data to fine-tune the weights. Synthetic data were generated according to real data acquisition parameters. The network takes a single grayscale volume as input and outputs the same volume with corrected shading and improved HU values. Results: Evaluation was carried out with a leave-one-out cross-validation, computed on 18 unique CT/CBCT pairs from six different patients from a real-world dataset. Comparing original CBCT to CT and improved CBCT to CT,we obtained an average improvement of 6 dB on peak signal-to-noise ratio (PSNR), +2% on structural similarity index measure (SSIM).The median interquartile range (IQR) Hounsfield unit (HU) difference between CBCT and CT improved from 161.37 (162.54) HU to 49.41 (66.70) HU. Region of interest (ROI)-based HU difference was narrowed by 75% in the spongy bone (femoral head), 89% in the bladder, 85% for fat, and 83% for muscle. The improvement in contrast-to-noise ratio for these ROIs was about 67%.GLOSSARY: CBCT, cone beam computed tomography; CNR, contrast-to-noise ratio; CT, computed tomography; CTV, clinical target volume; DIR, deformable image registration; D r , data set containing only real images; D s , data set containing synthetic CBCT images and real CT; FOV, field of view; FT x , model trained with transfer learning on x blocks, where x can be 1, 2 or 3; HU, Hounsfield unit; IQR, interquartile range; LOO-CV, leave-one-out cross-validation; MAE, mean absolute error; noFT, U-Net model trained without transfer learning; pCT, planning CT; PSNR, peak signal-to-noise ratio; ROI, region of interest; SSIM, structural similarity index measureThis is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is prope...