Introduction: To achieve the highest diagnostic accuracy of AI services in radiology, it is necessary to test and validate them on data sets that consider the balance of classes of various abnormalities. Information about the frequency of abnormal findings in practical healthcare is essential for creation of such datasets.
Objective: To establish the frequency of chest X-ray abnormalities using big data from the healthcare system of a Russian metropolis.
Materials and methods: We conducted an observational multicenter retrospective sample study by retrieving 562,077 chest X-ray reports dated February 18, 2021 to June 11, 2021 from the Unified Radiological Information Service of the Unified Medical Information Analysis System of the city of Moscow, which were then analyzed and automatically labeled using the Medlabel tool. The results were processed in Microsoft Excel and using the Python 3.9 programming language. Group differences were determined using the chi-square test.
Results: Among all analyzed reports, cardiomegaly was the most frequent abnormal finding (12.23 %), while the proportion of other abnormalities did not exceed 3.0 %. Among all abnormal chest X-rays, 79.60 % showed only one abnormality with cardiomegaly found in 80.78 % of cases. Among the reports with two or more abnormal findings, cardiomegaly was detected in only 43.36 % of cases, whereas opacities (64.98 %) and infiltration/consolidation (64.50 %) prevailed.
Conclusions: The proportion of abnormal chest X-rays was 16.7 %. In terms of the frequency of occurrence, cardiomegaly ranked first followed by focal pulmonary opacity and infiltration/consolidation. It is worth noting that the frequency of certain types of abnormalities varied significantly between the tests with one and several (two or more) abnormal findings, which should be taken into account when training and testing AI services.