Increased adoption of artificial intelligence (AI) systems into scientific workflows will result in an increasing technical debt as the distance between the data scientists and engineers who develop AI system components and scientists, researchers and other users grows. This could quickly become problematic, particularly where guidance or regulations change and once-acceptable best practice becomes outdated, or where data sources are later discredited as biased or inaccurate. This paper presents a novel method for deriving a quantifiable metric capable of ranking the overall transparency of the process pipelines used to generate AI systems, such that users, auditors and other stakeholders can gain confidence that they will be able to validate and trust the data sources and contributors in the AI systems that they rely on. The methodology for calculating the metric, and the type of criteria that could be used to make judgements on the visibility of contributions to systems are evaluated through models published at ModelHub and PyTorch Hub, popular archives for sharing science resources, and is found to be helpful in driving consideration of the contributions made to generating AI systems and approaches toward effective documentation and improving transparency in machine learning assets shared within scientific communities. K E Y W O R D Saccountability, data ecosystems, data provenance, ML model evaluation, model zoo, transparency INTRODUCTIONScientists and researchers across all fields are making use of artificial intelligence (AI) systems and machine learning (ML) models in their experimentation. Increasingly, these new research assets are created by domain experts curating and aggregating data from multiple and diverse sources; their outputs can be new data assets or heavily data-influenced products, including ML models, which go on to be used in research within the originating organization, or in the wider community when they are published or distributed and shared via gateways with collaborating or even unknown third party research groups and organizations. This is already a real situation-in the UK financial services sector, for example, 24% of ML use cases are developed and implemented by third-party providers, with many of those developing internally also reporting adaptation or further development of off-the-shelf ML models or libraries. 1Providing support to offer transparency and traceability of assets through the production pipeline is an important contributor to delivering accountability, which is necessary to achieve and retain confidence and trust, such that scientists and practitioners using AI systems, whether developed in-house or sourced externally from their community, are able to demonstrate the provenance and authenticity of the data and knowledge they use to make decisions. 2,3 In considering the development workflows for ML models, the contributing assets can be identified and itemized, and will typically include data sources and labeled datasets used for model training and validation, alon...
Birth registration is a critical element of newborn care. Increasing the coverage of birth registration is an essential part of the strategy to improve newborn survival globally, and is central to achieving greater health, social, and economic equity as defined under the United Nations Sustainable Development Goals. Parts of Eastern and Southern Africa have some of the lowest birth registration rates in the world. Mobile technologies have been used successfully with mothers and health workers in Africa to increase coverage of essential newborn care, including birth registration. However, mounting concerns about data ownership and data protection in the digital age are driving the search for scalable, user-centered, privacy protecting identity solutions. There is increasing interest in understanding if a self-sovereign identity (SSI) approach can help lower the barriers to birth registration by empowering families with a smartphone based process while providing high levels of data privacy and security in populations where birth registration rates are low. The process of birth registration and the barriers experienced by stakeholders are highly contextual. There is currently a gap in the literature with regard to modeling birth registration using SSI technology. This paper describes the development of a smartphone-based prototype system that allows interaction between families and health workers to carry out the initial steps of birth registration and linkage of mothers-baby pairs in an urban Kenyan setting using verifiable credentials, decentralized identifiers, and the emerging standards for their implementation in identity systems. The goal of the project was to develop a high fidelity prototype that could be used to obtain end-user feedback related to the feasibility and acceptability of an SSI approach in a particular Kenyan healthcare context. This paper will focus on how this technology was adapted for the specific context and implications for future research.
Artificial Intelligence (AI) systems are being deployed around the globe in critical fields such as healthcare and education. In some cases, expert practitioners in these domains are being tasked with introducing or using such systems, but have little or no insight into what data these complex systems are based on, or how they are put together. In this paper, we consider an AI system from the domain practitioner's perspective and identify key roles that are involved in system deployment. We consider the differing requirements and responsibilities of each role, and identify tensions between transparency and confidentiality that need to be addressed so that domain practitioners are able to intelligently assess whether a particular AI system is appropriate for use in their domain.
Adopting shared data resources requires scientists to place trust in the originators of the data. When shared data is later used in the development of artificial intelligence (AI) systems or machine learning (ML) models, the trust lineage extends to the users of the system, typically practitioners in fields such as healthcare and finance. Practitioners rely on AI developers to have used relevant, trustworthy data, but may have limited insight and recourse. This article introduces a software architecture and implementation of a system based on design patterns from the field of self-sovereign identity.Scientists can issue signed credentials attesting to qualities of their data resources.Data contributions to ML models are recorded in a bill of materials (BOM), which is stored with the model as a verifiable credential. The BOM provides a traceable record of the supply chain for an AI system, which facilitates on-going scrutiny of the qualities of the contributing components. The verified BOM, and its linkage to certified data qualities, is used in the AI scrutineer, a web-based tool designed to offer practitioners insight into ML model constituents and highlight any problems with adopted datasets, should they be found to have biased data or be otherwise discredited.
Machine Learning systems rely on data for training, input and ongoing feedback and validation. Data in the field can come from varied sources, often anonymous or unknown to the ultimate users of the data. Whenever data is sourced and used, its consumers need assurance that the data accuracy is as described, that the data has been obtained legitimately, and they need to understand the terms under which the data is made available so that they can honour them. Similarly, suppliers of data require assurances that their data is being used legitimately by authorised parties, in accordance with their terms, and that usage is appropriately recompensed. Furthermore, both parties may want to agree on a specific set of quality of service (QoS) metrics, which can be used to negotiate service quality based on cost, and then receive affirmation that data is being supplied within those agreed QoS levels. Here we present a conceptual architecture which enables data sharing agreements to be encoded and computationally enforced, remuneration to be made when required, and a trusted audit trail to be produced for later analysis or reproduction of the environment. Our architecture uses blockchainbased distributed ledger technology, which can facilitate transactions in situations where parties do not have an established trust relationship or centralised command and control structures. We explore techniques to promote faith in the accuracy of the supplied data, and to let data users determine trade-offs between data quality and cost. Our system is exemplified through consideration of a case study using multiple data sources from different parties to monitor traffic levels in urban locations.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.