Many techniques have been developed for sensor and information fusion, machine and deep learning, as well as data and machine analytics. Currently, many groups are exploring methods for human-machine teaming using saliency and heat maps, explainable and interpretable artificial intelligence, as well as user-defined interfaces. However, there is still a need for standard metrics for test and evaluation of systems utilizing artificial intelligence (AI), such as deep learning (DL), to support the AI principles. In this paper, we explore the elements associated with the opportunities and challenges emerging from designing, testing, and evaluating such future systems. The paper highlights the MAST (multi-attribute scorecard table), and more specifically the MAST criteria ―analysis of alternatives‖ by measuring the risk associated with an evidential DL-based decision. The concept of risk includes the probability of a decision as well as the severity of the choice, from which there is also a need for an uncertainty bound on the decision choice which the paper postulates a risk bound. Notional analysis for a cyber networked system is presented to guide to interactive process for test and evaluation to support the certification of AI systems as to the decision risk for a human-machine system that includes analysis from both the DL method and a user.