HaN‐Seg: The head and neck organ‐at‐risk CT and MR segmentation dataset

Podobnik, Gašper; Strojan, Primož; Peterlin, Primož; Ibragimov, Bulat; Vrtovec, Tomaž

doi:10.1002/mp.16197

Cited by 22 publications

(17 citation statements)

References 45 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…15,16 The baseline auto-segmentation experiments and results, performed and obtained for the images used in this study, 53 indicate that there is still room for improvements that can be leveraged by applying custom solutions, for example, tailored CT and MR modality feature fusion module techniques. 54 Our study is not without limitations. First, although observers were asked to mimic clinical practice, contouring was performed retrospectively and the observers were aware that their results would not be used for RT planning.Second,there were only two contour sets available for each CT and MR image, and normally more contours would be required, preferably from multiple institutions,for a more reliable variability analysis.Finally, the variability analysis was performed by comparing the obtained contours, but preferably a consensus in the form of ground truth contours would represent a better comparison reference.…”

Section: Implications For Auto-segmentationmentioning

confidence: 84%

“…[13][14][15][16] On the other hand, automated contouring (i.e., automated segmentation, auto-segmentation) performed by computerassisted algorithms 17 has witnessed a revival with the introduction and integration of artificial intelligence approaches, such as deep learning, [18][19][20][21][22][23][24][25][26] which has outperformed the previously established atlas-based auto-segmentation. 27 As a result, computational challenges were organized to evaluate the quality of auto-segmentation results, 28 and several datasets were made publicly available for benchmarking different auto-segmentation methodologies 20,[28][29][30][31] and evaluating their clinical acceptability. 32 However, even with sophisticated auto-segmentation approaches, manual contouring is still the method of choice for evaluating and benchmarking the performance of the developed algorithms.…”

Section: Introductionmentioning

confidence: 99%

“…The MR modality has been long recognized as valuable for contouring OARs in the HaN region, [45][46][47][48] however, to the best of our knowledge, the accuracy and consistency of the resulting OAR contours have not been yet objectively evaluated. In this study, we therefore analyze the interobserver and intermodality variability of manual contouring of up to 31 OARs in the HaN region, performed by observers with different level of experience from CT and MR images of the same patients.Besides providing valuable insights to the levels of both interobserver and intermodality variability from the perspective of manual OAR contouring, the obtained results can be also viewed as a baseline for an objective evaluation of methods for auto-segmentation of OARs in the HaN region, 31 which have been rapidly evolving during the past decade due to the integration of artificial intelligence, and received a considerable boost in performance due to the advances in deep learning. [18][19][20][21][22][23][24][25][26] F I G U R E 1 Example of a CT (top row) and MR (bottom row) image of the same patient from the devised cohort, displayed in mid-axial (left), mid-coronal (middle), and mid-sagittal (right) cross-sections.…”

Section: Introductionmentioning

confidence: 99%

“…In this study, we therefore analyze the interobserver and intermodality variability of manual contouring of up to 31 OARs in the HaN region, performed by observers with different level of experience from CT and MR images of the same patients. Besides providing valuable insights to the levels of both interobserver and intermodality variability from the perspective of manual OAR contouring, the obtained results can be also viewed as a baseline for an objective evaluation of methods for auto‐segmentation of OARs in the HaN region, 31 which have been rapidly evolving during the past decade due to the integration of artificial intelligence, and received a considerable boost in performance due to the advances in deep learning 18–26 …”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

vOARiability: Interobserver and intermodality variability analysis in OAR contouring from head and neck CT and MR images

Podobnik,

Ibragimov,

Peterlin

et al. 2024

Medical Physics

Self Cite

View full text Add to dashboard Cite

BackgroundAccurate and consistent contouring of organs‐at‐risk (OARs) from medical images is a key step of radiotherapy (RT) cancer treatment planning. Most contouring approaches rely on computed tomography (CT) images, but the integration of complementary magnetic resonance (MR) modality is highly recommended, especially from the perspective of OAR contouring, synthetic CT and MR image generation for MR‐only RT, and MR‐guided RT. Although MR has been recognized as valuable for contouring OARs in the head and neck (HaN) region, the accuracy and consistency of the resulting contours have not been yet objectively evaluated.PurposeTo analyze the interobserver and intermodality variability in contouring OARs in the HaN region, performed by observers with different level of experience from CT and MR images of the same patients.MethodsIn the final cohort of 27 CT and MR images of the same patients, contours of up to 31 OARs were obtained by a radiation oncology resident (junior observer, JO) and a board‐certified radiation oncologist (senior observer, SO). The resulting contours were then evaluated in terms of interobserver variability, characterized as the agreement among different observers (JO and SO) when contouring OARs in a selected modality (CT or MR), and intermodality variability, characterized as the agreement among different modalities (CT and MR) when OARs were contoured by a selected observer (JO or SO), both by the Dice coefficient (DC) and 95‐percentile Hausdorff distance (HD95).ResultsThe mean (±standard deviation) interobserver variability was 69.0 ± 20.2% and 5.1 ± 4.1 mm, while the mean intermodality variability was 61.6 ± 19.0% and 6.1 ± 4.3 mm in terms of DC and HD95, respectively, across all OARs. Statistically significant differences were only found for specific OARs. The performed MR to CT image registration resulted in a mean target registration error of 1.7 ± 0.5 mm, which was considered as valid for the analysis of intermodality variability.ConclusionsThe contouring variability was, in general, similar for both image modalities, and experience did not considerably affect the contouring performance. However, the results indicate that an OAR is difficult to contour regardless of whether it is contoured in the CT or MR image, and that observer experience may be an important factor for OARs that are deemed difficult to contour. Several of the differences in the resulting variability can be also attributed to adherence to guidelines, especially for OARs with poor visibility or without distinctive boundaries in either CT or MR images. Although considerable contouring differences were observed for specific OARs, it can be concluded that almost all OARs can be contoured with a similar degree of variability in either the CT or MR modality, which works in favor of MR images from the perspective of MR‐only and MR‐guided RT.

show abstract

Section: Implications For Auto-segmentationmentioning

confidence: 84%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

vOARiability: Interobserver and intermodality variability analysis in OAR contouring from head and neck CT and MR images

Podobnik,

Ibragimov,

Peterlin

et al. 2024

Medical Physics

Self Cite

View full text Add to dashboard Cite

show abstract

“…Furthermore, we compared the prediction capability of our best experimental scenario to benchmark segmentation tools. Finally, our model was evaluated using cohorts from the HaN-Seg challenge 2023 (Podobnik et al 2023) to quantify how our model generalizes to patients acquired using different protocols and acquisition parameters.…”

Section: Introductionmentioning

confidence: 99%

Essential parameters needed for a U-Net-based segmentation of individual bones on planning CT images in the head and neck region using limited datasets for radiotherapy application

Yawson,

Walter,

Wolf

et al. 2024

Phys. Med. Biol.

View full text Add to dashboard Cite

Objective: The field of radiotherapy is highly marked by the lack of datasets even with the availability of public datasets. Our study uses a very limited dataset to provide insights on essential parameters needed to automatically and accurately segment individual bones on planning CT images of head & neck cancer patients. Approach: The study was conducted using 30 planning CT images of real patients acquired from 5 different cohorts. 15 cases from 4 cohorts were randomly selected as training and validation datasets while the remaining were used as test datasets. Four experimental sets were formulated to explore parameters such as background patch reduction, class-dependent augmentation and incorporation of a weight map on the loss function. Main Results: Our best experimental scenario resulted in a mean Dice score of 0.93 ± 0.06 for other bones (skull, mandible, scapulae, clavicles, humeri and hyoid), 0.93 ± 0.02 for ribs and 0.88 ± 0.03 for vertebrae on 7 test cases from the same cohorts as the training datasets. We compared our proposed solution approach to a retrained nnU-Net and obtained comparable results for vertebral bones while outperforming in the correct identification of the left and right instances of ribs, scapulae, humeri and clavicles. Furthermore, we evaluated the generalization capability of our proposed model on a new cohort and the mean Dice score yielded 0.96 ± 0.10 for other bones, 0.95 ± 0.07 for ribs and 0.81 ± 0.19 for vertebrae on 8 test cases. Significance: With these insights, we are challenging the utilization of an automatic and accurate bone segmentation tool into the clinical routine of radiotherapy despite the limited training datasets.

show abstract

Transfer learning for auto‐segmentation of 17 organs‐at‐risk in the head and neck: Bridging the gap between institutional and public datasets

Clark,

Hardcastle,

Johnston

et al. 2024

Medical Physics

View full text Add to dashboard Cite

BackgroundAuto‐segmentation of organs‐at‐risk (OARs) in the head and neck (HN) on computed tomography (CT) images is a time‐consuming component of the radiation therapy pipeline that suffers from inter‐observer variability. Deep learning (DL) has shown state‐of‐the‐art results in CT auto‐segmentation, with larger and more diverse datasets showing better segmentation performance. Institutional CT auto‐segmentation datasets have been small historically (n < 50) due to the time required for manual curation of images and anatomical labels. Recently, large public CT auto‐segmentation datasets (n > 1000 aggregated) have become available through online repositories such as The Cancer Imaging Archive. Transfer learning is a technique applied when training samples are scarce, but a large dataset from a closely related domain is available.PurposeThe purpose of this study was to investigate whether a large public dataset could be used in place of an institutional dataset (n > 500), or to augment performance via transfer learning, when building HN OAR auto‐segmentation models for institutional use.MethodsAuto‐segmentation models were trained on a large public dataset (public models) and a smaller institutional dataset (institutional models). The public models were fine‐tuned on the institutional dataset using transfer learning (transfer models). We assessed both public model generalizability and transfer model performance by comparison with institutional models. Additionally, the effect of institutional dataset size on both transfer and institutional models was investigated. All DL models used a high‐resolution, two‐stage architecture based on the popular 3D U‐Net. Model performance was evaluated using five geometric measures: the dice similarity coefficient (DSC), surface DSC, 95th percentile Hausdorff distance, mean surface distance (MSD), and added path length.ResultsFor a small subset of OARs (left/right optic nerve, spinal cord, left submandibular), the public models performed significantly better (p < 0.05) than, or showed no significant difference to, the institutional models under most of the metrics examined. For the remaining OARs, the public models were inferior to the institutional models, although performance differences were small (DSC ≤ 0.03, MSD < 0.5 mm) for seven OARs (brainstem, left/right lens, left/right parotid, mandible, right submandibular). The transfer models performed significantly better than the institutional models for seven OARs (brainstem, right lens, left/right optic nerve, left/right parotid, spinal cord) with a small margin of improvement (DSC ≤ 0.02, MSD < 0.4 mm). When numbers of institutional training samples were limited, public and transfer models outperformed the institutional models for most OARs (brainstem, left/right lens, left/right optic nerve, left/right parotid, spinal cord, and left/right submandibular).ConclusionTraining auto‐segmentation models with public data alone was suitable for a small number of OARs. Using only public data incurred a small performance deficit for most other OARs, when compared with institutional data alone, but may be preferable over time‐consuming curation of a large institutional dataset. When a large institutional dataset was available, transfer learning with models pretrained on a large public dataset provided a modest performance improvement for several OARs. When numbers of institutional samples were limited, using the public dataset alone, or as a pretrained model, was beneficial for most OARs.

show abstract

HaN‐Seg: The head and neck organ‐at‐risk CT and MR segmentation dataset

Cited by 22 publications

References 45 publications

vOARiability: Interobserver and intermodality variability analysis in OAR contouring from head and neck CT and MR images

vOARiability: Interobserver and intermodality variability analysis in OAR contouring from head and neck CT and MR images

Essential parameters needed for a U-Net-based segmentation of individual bones on planning CT images in the head and neck region using limited datasets for radiotherapy application

Transfer learning for auto‐segmentation of 17 organs‐at‐risk in the head and neck: Bridging the gap between institutional and public datasets

Contact Info

Product

Resources

About