Background & Aims
Liver biopsy is the reference standard for staging and grading nonalcoholic fatty liver disease (NAFLD), but histologic scoring systems are semiquantitative with marked interobserver and intraobserver variation. We used machine learning to develop fully automated software for quantification of steatosis, inflammation, ballooning, and fibrosis in biopsy specimens from patients with NAFLD and validated the technology in a separate group of patients.
Methods
We collected data from 246 consecutive patients with biopsy-proven NAFLD and followed up in London from January 2010 through December 2016. Biopsy specimens from the first 100 patients were used to derive the algorithm and biopsy specimens from the following 146 were used to validate it. Biopsy specimens were scored independently by pathologists using the Nonalcoholic Steatohepatitis Clinical Research Network criteria and digitalized. Areas of steatosis, inflammation, ballooning, and fibrosis were annotated on biopsy specimens by 2 hepatobiliary histopathologists to facilitate machine learning. Images of biopsies from the derivation and validation sets then were analyzed by the algorithm to compute percentages of fat, inflammation, ballooning, and fibrosis, as well as the collagen proportionate area, and compared with findings from pathologists’ manual annotations and conventional scoring systems.
Results
In the derivation group, results from manual annotation and the software had an interclass correlation coefficient (ICC) of 0.97 for steatosis (95% CI, 0.95–0.99;
P
< .001); ICC of 0.96 for inflammation (95% CI, 0.9–0.98;
P
< .001); ICC of 0.94 for ballooning (95% CI, 0.87–0.98;
P
< .001); and ICC of 0.92 for fibrosis (95% CI, 0.88–0.96;
P
= .001). Percentages of fat, inflammation, ballooning, and the collagen proportionate area from the derivation group were confirmed in the validation cohort. The software identified histologic features of NAFLD with levels of interobserver and intraobserver agreement ranging from 0.95 to 0.99; this value was higher than that of semiquantitative scoring systems, which ranged from 0.58 to 0.88. In a subgroup of paired liver biopsy specimens, quantitative analysis was more sensitive in detecting differences compared with the nonalcoholic steatohepatitis Clinical Research Network scoring system.
Conclusions
We used machine learning to develop software to rapidly and objectively analyze liver biopsy specimens for histologic features of NAFLD. The results from the software correlate with those from histopathologists, with high levels of interobserver and intraobserver agreement. Findings were validated in a separate group of patients. This tool might be used for objective assessment of response to therapy for NAFLD in practice and clinical trials.