A murine model of myelofibrosis in tibia was used in a co-clinical trial to evaluate segmentation methods for application of image-based biomarkers to assess disease status. The dataset (32 mice with 157 3D MRI scans including 49 test–retest pairs scanned on consecutive days) was split into approximately 70% training, 10% validation, and 20% test subsets. Two expert annotators (EA1 and EA2) performed manual segmentations of the mouse tibia (EA1: all data; EA2: test and validation). Attention U-net (A-U-net) model performance was assessed for accuracy with respect to EA1 reference using the average Jaccard index (AJI), volume intersection ratio (AVI), volume error (AVE), and Hausdorff distance (AHD) for four training scenarios: full training, two half-splits, and a single-mouse subsets. The repeatability of computer versus expert segmentations for tibia volume of test–retest pairs was assessed by within-subject coefficient of variance (%wCV). A-U-net models trained on full and half-split training sets achieved similar average accuracy (with respect to EA1 annotations) for test set: AJI = 83–84%, AVI = 89–90%, AVE = 2–3%, and AHD = 0.5 mm–0.7 mm, exceeding EA2 accuracy: AJ = 81%, AVI = 83%, AVE = 14%, and AHD = 0.3 mm. The A-U-net model repeatability wCV [95% CI]: 3 [2, 5]% was notably better than that of expert annotators EA1: 5 [4, 9]% and EA2: 8 [6, 13]%. The developed deep learning model effectively automates murine bone marrow segmentation with accuracy comparable to human annotators and substantially improved repeatability.
We are developing deep learning models for the segmentation of mouse tibia in MRI scans by utilizing three U-Net architectures: Attention, Inception, and basic U-Net, on a data set of 32 mice with 158 MRI scans. The data set was split into training (23 mice, 108 scans), validation (3 mice, 17 scans), and test (6 mice, 33 scans) sets. Two expert annotators (EA1 and EA2) provided manual 3D segmentations of the tibia on the MRI scans. EA1 provided outlines on all MRI scans, which were used as the reference for the training, validation, and testing of U-net models. EA2 provided outlines on the validation and test set, which were used for the assessment of inter-observer reference variability. The model performance was evaluated based on the average Jaccard index (%AJI), average volume intersection ratio (%AVI), average volume error (%AVE), and average Hausdorff distance (AHD, mm). For the test set, the %AJI with reference to EA1 was 83.45 ± 5.11 for the Attention U-Net, 83.05 ± 6.21 for the Inception U-Net, and 83.38 ± 5.36 for the basic U-Net. The %AJI was 80.70 ± 2.91 for EA1 versus EA2 and 79.70 ± 6.28 for Attention U-Net versus EA2. The variability between the U-Net models and EA1 and EA2 references was similar to the variability between EA1 and EA2. All 3 U-Net architectures achieved similar performances with the Attention U-Net performing marginally better.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.