Significant effort is being put into developing industrial applications for artificial intelligence (AI), especially those using machine learning (ML) techniques. Despite the intensive support for building ML applications, there are still challenges when it comes to evaluating, assuring, and improving the quality or dependability. The difficulty stems from the unique nature of ML, namely, system behavior is derived from training data not from logical design by human engineers. This leads to black-box and intrinsically imperfect implementations that invalidate many principles and techniques in traditional software engineering. In light of this situation, the Japanese industry has jointly worked on a set of guidelines for the quality assurance of AI systems (in the Consortium of Quality Assurance for AI-based Products and Services) from the viewpoint of traditional quality-assurance engineers and test engineers. We report on the second version of these guidelines, which cover a list of quality evaluation aspects, catalogue of current state-of-the-art techniques, and domain-specific discussions in five representative domains. The guidelines provide significant insights for engineers in terms of methodologies and designs for tests driven by application-specific requirements.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.