Background: The extent of lung involvement in Coronavirus Disease 2019 (COVID-19) pneumonia, quantified on computed tomography (CT), is an established biomarker for prognosis and guides clinical decision making. The clinical standard is semi-quantitative scoring of lung involvement by an experienced reader. We aim to compare the performance of automated deep-learning-and threshold-based methods to the manual semi-quantitative lung scoring. Further, we aim to investigate an optimal threshold for quantification of involved lung in COVID pneumonia chest CT, using a multi-center dataset.
Methods: In total 250 patients were included, 50 consecutive patients with RT-PCR confirmed COVID-19 from our local institutional database, and another 200 patients from four international datasets (n=50 each). Lung involvement was scored semi-quantitatively by three experienced radiologists according to the established chest CT score (CCS) ranging from 0-25. Inter-rater reliability was reported by the intraclass correlation coefficient (ICC). Deep-learning-based segmentation of ground-glass and consolidation was obtained by CT Pulmo Auto Results prototype plugin on IntelliSpace Discovery (Philips Healthcare, The Netherlands). Threshold-based segmentation of involved lung was implemented using an open-source tool for whole-lung segmentation under the presence of severe pathologies (R231CovidWeb, Hofmanninger et al., 2020) and consecutive quantitative assessment of lung attenuation. The best threshold was investigated by training and testing a linear regression of deep-learning and threshold-based results in a five-fold cross validation strategy. Results: Median CCS among 250 evaluated patients was 10 [6-15]. Inter-rater reliability of the CCS was excellent [ICC 0.97 (0.97-0.98)]. Best attenuation threshold for identification of involved lung was −522 HU.While the relationship of deep-learning-and threshold-based quantification was linear and strong (r 2 deep-learning vs. threshold =0.84), both automated quantification methods translated to the semi-quantitative CCS in a nonlinear fashion, with an increasing slope towards higher CCS (r 2 deep-learning vs. CCS = 0.80, r 2 threshold vs. CCS =0.63).