Increased adoption of artificial intelligence (AI) systems into scientific workflows will result in an increasing technical debt as the distance between the data scientists and engineers who develop AI system components and scientists, researchers and other users grows. This could quickly become problematic, particularly where guidance or regulations change and once-acceptable best practice becomes outdated, or where data sources are later discredited as biased or inaccurate. This paper presents a novel method for deriving a quantifiable metric capable of ranking the overall transparency of the process pipelines used to generate AI systems, such that users, auditors and other stakeholders can gain confidence that they will be able to validate and trust the data sources and contributors in the AI systems that they rely on. The methodology for calculating the metric, and the type of criteria that could be used to make judgements on the visibility of contributions to systems are evaluated through models published at ModelHub and PyTorch Hub, popular archives for sharing science resources, and is found to be helpful in driving consideration of the contributions made to generating AI systems and approaches toward effective documentation and improving transparency in machine learning assets shared within scientific communities.
K E Y W O R D Saccountability, data ecosystems, data provenance, ML model evaluation, model zoo, transparency
INTRODUCTIONScientists and researchers across all fields are making use of artificial intelligence (AI) systems and machine learning (ML) models in their experimentation. Increasingly, these new research assets are created by domain experts curating and aggregating data from multiple and diverse sources; their outputs can be new data assets or heavily data-influenced products, including ML models, which go on to be used in research within the originating organization, or in the wider community when they are published or distributed and shared via gateways with collaborating or even unknown third party research groups and organizations. This is already a real situation-in the UK financial services sector, for example, 24% of ML use cases are developed and implemented by third-party providers, with many of those developing internally also reporting adaptation or further development of off-the-shelf ML models or libraries. 1Providing support to offer transparency and traceability of assets through the production pipeline is an important contributor to delivering accountability, which is necessary to achieve and retain confidence and trust, such that scientists and practitioners using AI systems, whether developed in-house or sourced externally from their community, are able to demonstrate the provenance and authenticity of the data and knowledge they use to make decisions. 2,3 In considering the development workflows for ML models, the contributing assets can be identified and itemized, and will typically include data sources and labeled datasets used for model training and validation, alon...