Understanding Test-Time Augmentation

Kimura, Masanari

doi:10.1007/978-3-030-92185-9_46

Cited by 21 publications

(10 citation statements)

References 33 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Furthermore, a systematic approach to analyzing the outliers for correctable errors may improve measurement accuracy. We only explored simple averaging and outlier exclusion prior to averaging; other approaches to averaging could including test‐time augmentation, Bayesian inference and other more complex approaches 18,19 . Finally, there may be ways of improving data acquisition.…”

Section: Discussionmentioning

confidence: 99%

“…We only explored simple averaging and outlier exclusion prior to averaging; other approaches to averaging could including test-time augmentation, Bayesian inference and other more complex approaches. 18,19 Finally, there may be ways of improving data acquisition. Another limitation is the absence of a true ground truth for the volume measurements, which is difficult to obtain since ADPKD kidneys are rarely removed even at the time of transplantation.…”

Section: Limitationsmentioning

confidence: 99%

See 1 more Smart Citation

Effect of Averaging Measurements From Multiple MRI Pulse Sequences on Kidney Volume Reproducibility in Autosomal Dominant Polycystic Kidney Disease

Dev

Zhu

Sharbatdaran

et al. 2023

Magnetic Resonance Imaging

View full text Add to dashboard Cite

BackgroundTotal kidney volume (TKV) is an important biomarker for assessing kidney function, especially for autosomal dominant polycystic kidney disease (ADPKD). However, TKV measurements from a single MRI pulse sequence have limited reproducibility, ± ~5%, similar to ADPKD annual kidney growth rates.PurposeTo improve TKV measurement reproducibility on MRI by extending artificial intelligence algorithms to automatically segment kidneys on T1‐weighted, T2‐weighted, and steady state free precession (SSFP) sequences in axial and coronal planes and averaging measurements.Study TypeRetrospective training, prospective testing.SubjectsThree hundred ninety‐seven patients (356 with ADPKD, 41 without), 75% for training and 25% for validation, 40 ADPKD patients for testing and 17 ADPKD patients for assessing reproducibility.Field Strength/SequenceT2‐weighted single‐shot fast spin echo (T2), SSFP, and T1‐weighted 3D spoiled gradient echo (T1) at 1.5 and 3T.Assessment2D U‐net segmentation algorithm was trained on images from all sequences. Five observers independently measured each kidney volume manually on axial T2 and using model‐assisted segmentations on all sequences and image plane orientations for two MRI exams in two sessions separated by 1–3 weeks to assess reproducibility. Manual and model‐assisted segmentation times were recorded.Statistical TestsBland–Altman, Schapiro–Wilk (normality assessment), Pearson's chi‐squared (categorical variables); Dice similarity coefficient, interclass correlation coefficient, and concordance correlation coefficient for analyzing TKV reproducibility. P‐value < 0.05 was considered statistically significant.ResultsIn 17 ADPKD subjects, model‐assisted segmentations of axial T2 images were significantly faster than manual segmentations (2:49 minute vs. 11:34 minute), with no significant absolute percent difference in TKV (5.9% vs. 5.3%, P = 0.88) between scans 1 and 2. Absolute percent differences between the two scans for model‐assisted segmentations on other sequences were 5.5% (axial T1), 4.5% (axial SSFP), 4.1% (coronal SSFP), and 3.2% (coronal T2). Averaging measurements from all five model‐assisted segmentations significantly reduced absolute percent difference to 2.5%, further improving to 2.1% after excluding an outlier.Data ConclusionMeasuring TKV on multiple MRI pulse sequences in coronal and axial planes is practical with deep learning model‐assisted segmentations and can improve TKV measurement reproducibility more than 2‐fold in ADPKD.Evidence Level2Technical EfficacyStage 1

show abstract

Section: Discussionmentioning

confidence: 99%

Section: Limitationsmentioning

confidence: 99%

Effect of Averaging Measurements From Multiple MRI Pulse Sequences on Kidney Volume Reproducibility in Autosomal Dominant Polycystic Kidney Disease

Dev

Zhu

Sharbatdaran

et al. 2023

Magnetic Resonance Imaging

View full text Add to dashboard Cite

show abstract

“…Patient-based accuracy, a metric aggregating predictions from multiple images of the same patient to make the final diagnosis, peaked at 60.1%, consistently outperforming image-based accuracy which peaked at 51.0%. This aggregation in patient-based accuracy can be seen as a process similar to test time augmentation [22,23], where predictions from augmented images are combined for the final decision, thereby improving overall accuracy.…”

Section: Discussionmentioning

confidence: 99%

Ensemble of Self-supervised Learning Methods for Robust Skin Disease Image Diagnosis Leveraging Unlabeled Data

Kojima,

Tadokoro,

Kinoshita

et al. 2023

Preprint

View full text Add to dashboard Cite

Deep learning technologies have led to remarkable improvements in medical image analysis, but the collection and annotation of medical data remain challenging. This study leverages self-supervised learning and unlabeled images from the National Skin Disease Database of Japan (NSDD) to enhance skin disease classification. By generating pre-trained models using three self-supervised learning methods, and comparing them to a baseline pre-trained model on ImageNet, we found that pre-training with unlabeled images increased transfer learning performance by 1-2% on labeled images in the NSDD. We also demonstrated the effectiveness of the pre-trained models across four public dermatological image datasets. We further implemented a hybrid pre-training approach that combines self-supervised and supervised learning, and observed accuracy improvements in certain datasets. Since the optimal strategy is task and dataset-specific, we proposed an ensemble of transfer learned models from diverse pre-training methods for robust classification. The ensemble approach consistently enhanced accuracy across various datasets, with an improvement by up to 8.3% from the baseline, confirming the effectiveness of self-supervised pre-training with unlabeled dermatological images even when the images are mainly from one ethnic group.

show abstract

“…The imbalance between malignant and benign classes poses a significant challenge, potentially diminishing the effectiveness of classification models. We resort to the test-time augmentation (TTA) [37] technique to mitigate the imbalance between classes and achieve a more effective balance. This approach is applied to training, validation, and test sets, focusing on the class of malignant lesions.…”

Section: Experiments and Evaluation Metricsmentioning

confidence: 99%

Aspects of Lighting and Color in Classifying Malignant Skin Cancer with Deep Learning

Santos,

Aires,

Veras

2024

Applied Sciences

View full text Add to dashboard Cite

Malignant skin cancers are common in emerging countries, with excessive sun exposure and genetic predispositions being the main causes. Variations in lighting and color, resulting from the diversity of devices and lighting conditions during image capture, pose a challenge for automated diagnosis through digital images. Deep learning techniques emerge as promising solutions to improve the accuracy of identifying malignant skin lesions. This work aims to investigate the impact of lighting and color correction methods on automated skin cancer diagnosis using deep learning architectures, focusing on the relevance of these characteristics for accuracy in identifying malignant skin cancer. The developed methodology includes steps for hair removal, lighting, and color correction, defining the region of interest, and classification using deep neural network architectures. We employed deep learning techniques such as LCDPNet, LLNeRF, and DSN for lighting and color correction, which still need to be tested in this context. The results emphasize the importance of image preprocessing, especially in lighting and color adjustments, where the best results show an accuracy increase of between 3% and 4%. We observed that different deep neural network architectures react variably to lighting and color corrections. Some architectures are more sensitive to variations in these characteristics, while others are more robust. Advanced lighting and color correction can thus significantly improve the accuracy of malignant skin cancer diagnosis.

show abstract

Understanding Test-Time Augmentation

Cited by 21 publications

References 33 publications

Effect of Averaging Measurements From Multiple MRI Pulse Sequences on Kidney Volume Reproducibility in Autosomal Dominant Polycystic Kidney Disease

Effect of Averaging Measurements From Multiple MRI Pulse Sequences on Kidney Volume Reproducibility in Autosomal Dominant Polycystic Kidney Disease

Ensemble of Self-supervised Learning Methods for Robust Skin Disease Image Diagnosis Leveraging Unlabeled Data

Aspects of Lighting and Color in Classifying Malignant Skin Cancer with Deep Learning

Contact Info

Product

Resources

About