2022
DOI: 10.21105/joss.04101
|View full text |Cite
|
Sign up to set email alerts
|

TorchMetrics - Measuring Reproducibility in PyTorch

Abstract: A main problem with reproducing machine learning publications is the variance of metric implementations across papers. A lack of standardization leads to different behavior in mechanisms such as checkpointing, learning rate schedulers or early stopping, that will influence the reported results. For example, a complex metric such as Fréchet inception distance (FID) for synthetic image quality evaluation (Heusel et al., 2017) will differ based on the specific interpolation method used.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
29
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
3
1
1

Relationship

0
10

Authors

Journals

citations
Cited by 77 publications
(29 citation statements)
references
References 1 publication
0
29
0
Order By: Relevance
“…The SSIM and MS-SSIM are in the range [0, 1] and a higher value indicates a better synthetization. Both metrics are implemented using the TorchMetrics 0.10.3 library [6].…”
Section: Experiments and Resultsmentioning
confidence: 99%
“…The SSIM and MS-SSIM are in the range [0, 1] and a higher value indicates a better synthetization. Both metrics are implemented using the TorchMetrics 0.10.3 library [6].…”
Section: Experiments and Resultsmentioning
confidence: 99%
“…For quantitative evaluation, we compared our metric library with other widely used frameworks for machine learning and image analysis. As it can be seen in Table 1, MISeval provides currently 28 metrics, which is the highest number of segmentation metrics compared to other analyzed frameworks: scikit-learn [6] with 18, EvaluateSegmentation from VISCERAL [7] with 13, PyMIA [8] with 12, Tensorflow [9] with 16 and TorchMetrics [10] with 12.…”
Section: Resultsmentioning
confidence: 99%
“…DICE is a similarity coefficient that measures the overlap over the union of two binary masks, and Hausdorff distance measures the average distance error between ground truth and predicted masks. To compute the metrics we used Evaluate Segmentation tool [33] and TorchMetrics package [34] Figure 2 shows a 2D sagittal slice of the T1w and T2w images and their corresponding low and high-resolution brainmasks. We can see that the low-resolution mask has very coarse boundaries and mistakes, such as missed part of the cerebellum indicated by the red arrow.…”
Section: Deep-learning Model Accuracymentioning
confidence: 99%