The possibility of creating thinking machines raises a host of ethical issues. These questions relate both to ensuring that such machines do not harm humans and other morally relevant beings, and to the moral status of the machines themselves. The first section discusses issues that may arise in the near future of AI. The second section outlines challenges for ensuring that AI operates safely as it approaches humans in its intelligence. The third section outlines how we might assess whether, and in what circumstances, AIs themselves have moral status. In the fourth section, we consider how AIs might differ from humans in certain basic respects relevant to our ethical assessment of them. The final section addresses the issues of creating AIs more intelligent than human, and ensuring that they use their advanced intelligence for good rather than ill.
By far the greatest danger of Artificial Intelligence (AI) is that people conclude too early that they understand it. Of course, this problem is not limited to the field of AI. Jacques Monod wrote: ‘A curious aspect of the theory of evolution is that everybody thinks he understands it’ (Monod, 1974). The problem seems to be unusually acute in Artificial Intelligence. The field of AI has a reputation for making huge promises and then failing to deliver on them. Most observers conclude that AI is hard, as indeed it is. But the embarrassment does not stem from the difficulty. It is difficult to build a star from hydrogen, but the field of stellar astronomy does not have a terrible reputation for promising to build stars and then failing. The critical inference is not that AI is hard, but that, for some reason, it is very easy for people to think they know far more about AI than they actually do. It may be tempting to ignore Artificial Intelligence because, of all the global risks discussed in this book, AI is probably hardest to discuss. We cannot consult actuarial statistics to assign small annual probabilities of catastrophe, as with asteroid strikes. We cannot use calculations from a precise, precisely confirmed model to rule out events or place infinitesimal upper bounds on their probability, as with proposed physics disasters. But this makes AI catastrophes more worrisome, not less. The effect of many cognitive biases has been found to increase with time pressure, cognitive busyness, or sparse information. Which is to say that the more difficult the analytic challenge, the more important it is to avoid or reduce bias. Therefore I strongly recommend reading my other chapter (Chapter 5) in this book before continuing with this chapter. When something is universal enough in our everyday lives, we take it for granted to the point of forgetting it exists. Imagine a complex biological adaptation with ten necessary parts. If each of the ten genes is independently at 50% frequency in the gene pool – each gene possessed by only half the organisms in that species – then, on average, only 1 in 1024 organisms will possess the full, functioning adaptation.
This chapter surveys eight research areas organized around one question: As learning systems become increasingly intelligent and autonomous, what design principles can best ensure that their behavior is aligned with the interests of the operators? The chapter focuses on two major technical obstacles to AI alignment: the challenge of specifying the right kind of objective functions and the challenge of designing AI systems that avoid unintended consequences and undesirable behavior even in cases where the objective function does not line up perfectly with the intentions of the designers. The questions surveyed include the following: How can we train reinforcement learners to take actions that are more amenable to meaningful assessment by intelligent overseers? What kinds of objective functions incentivize a system to “not have an overly large impact” or “not have many side effects”? The chapter discusses these questions, related work, and potential directions for future research, with the goal of highlighting relevant research topics in machine learning that appear tractable today.
All else being equal, not many people would prefer to destroy the world. Even faceless corporations, meddling governments, reckless scientists, and other agents of doom, require a world in which to achieve their goals of profit, order, tenure, or other villainies. If our extinction proceeds slowly enough to allow a moment of horrified realization, the doers of the deed will likely be quite taken aback on realizing that they have actually destroyed the world. Therefore I suggest that if the Earth is destroyed, it will probably be by mistake. The systematic experimental study of reproducible errors of human reasoning, and what these errors reveal about underlying mental processes, is known as the heuristics and biases programme in cognitive psychology. This programme has made discoveries highly relevant to assessors of global catastrophic risks. Suppose you are worried about the risk of Substance P, an explosive of planet-wrecking potency which will detonate if exposed to a strong radio signal. Luckily there is a famous expert who discovered Substance P, spent the last thirty years working with it, and knows it better than anyone else in the world. You call up the expert and ask how strong the radio signal has to be. The expert replies that the critical threshold is probably around 4000 terawatts. ‘Probably?’ you query. ‘Can you give me a 98% confidence interval?’ ‘Sure’, replies the expert. ‘I’m 99%confident that the critical threshold is above 500 terawatts, and 99%confident that the threshold is below 80,000 terawatts.’ ‘What about 10 terawatts?’ you ask. ‘Impossible’, replies the expert. The above methodology for expert elicitation looks perfectly reasonable, the sort of thing any competent practitioner might do when faced with such a problem. Indeed, this methodology was used in the Reactor Safety Study (Rasmussen, 1975), now widely regarded as the first major attempt at probabilistic risk assessment. But the student of heuristics and biases will recognize at least two major mistakes in the method – not logical flaws, but conditions extremely susceptible to human error. I shall return to this example in the discussion of anchoring and adjustments biases (Section 5.7).
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.