“…• It could reveal unknown failure modes of the artificial intelligence system, such as a tendency to produce higher error rates in certain populations, diseases, or settings, or in the presence of specific input data characteristics. 9,11,18 • Before deployment, it can be used to derive a measurable adverse event rate, which can inform how closely safety monitoring and post-deployment auditing should be performed. It can also provide a baseline measurement against which ongoing performance can be benchmarked.…”