Background
Recent studies have reported promising outcomes of non-operative treatment for uncomplicated appendicitis; however, the preoperative prediction of complicated appendicitis is challenging. We developed models by incorporating fat stranding (FS), which is commonly observed in perforated appendicitis.
Material and methods
We reviewed the data of 402 consecutive patients with confirmed acute appendicitis from our prospective registry. Multivariate logistic regression was performed to select clinical and radiographic factors predicting complicated acute appendicitis in our model 1 (involving backward elimination) and model 2 (involving stepwise selection). We compared c statistics among scoring systems developed by Bröker et al. (in J Surg Res 176(1):79–83. https://doi.org/10.1016/j.jss.2011.09.049, 2012), Imaoka et al. (in World J Emerg Surg 11(1):1–5, 2016), Khan et al. (in Cureus. https://doi.org/1010.7759/cureus.4765, 2019), Kim et al. (in Ann Coloproctol 31(5):192, 2015), Kang et al. (in Medicine 98(23): e15768, 2019), Atema et al. (in Br J Surg 102(8):979–990. https://doi.org/10.1002/bjs.9835, 2015), Avanesov et al. (in Eur Radiol 28(9):3601–3610, 2018), and Kim et al. (in Abdom Radiol 46:1–12, 2020). Finally, we examined our models by performing the integrated discrimination improvement (IDI) test.
Results
Among enrolled patients, 64 (15.9%) had complicated acute appendicitis. We developed new 10-point scoring models by including the following variables: C-reactive protein, neutrophil to lymphocyte ratio, and computed tomography features of FS, ascites, and appendicolith. A cutoff score of ≥ 6 exhibited a high sensitivity of 82.8% and a specificity of 82.8% for model 1 and 81.3% and 82.3% for model 2, respectively, with c statistics of 0.878 (model 1) and 0.879 (model 2). Compared with the model developed by Bröker et al. which included C-reactive protein and the abdominal pain duration (c statistic: 0.778), the models developed by Atema et al. (c statistic: 0.826, IDI: 5.92%, P = 0.0248), H.Y Kim et al. (c statistics: 0.838, IDI: 13.82%, P = 0.0248), and our two models (IDI: 18.29%, P < 0.0001) demonstrated a significantly higher diagnostic accuracy.
Conclusion
Our models and the scoring systems developed by Atema et al. and Kim et al. were validated to have a high diagnostic accuracy; moreover, our models included the lowest number of variables.