Slide 1
It’s good to see that so many people have turned out today.
Hopefully I’m going to say something that’s sort-of a mixture of a warning and a message for hope and a call to action.
So just before I get started I wanted to ask how many people have read, say, either the Overcoming Bias blog or the Less Wrong blog. How many people do we have?
Oh, we have quite a few. Ok, that’s good.
So a lot of what I say may be old hat to some people, but then again there’s probably a few people who haven’t read it.
So how many people haven’t read either Overcoming Bias or Less Wrong?
Ok, a few…
[video skips]
…perform at a human level.
Slide 2
So the question is “What exactly is it?”, “What are we talking about?”, and “When are we going to get it?”
Slide 3
So I’m a big fan of Shane Legg’s definition of intelligence.
Shane Legg says intelligence is a measure of the ability of a particular agent.
So in order to talk about intelligence you have to say “this is the agent I’m talking about”.
It’s a measure of the ability of that agent to complete goals. To achieve some objective within some environment.
So you can vary the agent, and you can also vary the environment.
So the environment would be the sum total of all the different sensory inputs and responses that the world around the agent gives.
So an environment might be the environment of me in this room talking. It might be a caveman in the ancestral environment hunting animals. Or it might be an artificial intelligence whose environment consists entirely of a connection to the internet, and the responses that that connection will send back to it.
So the reason that we will get started by defining intelligence in this way is if you’re going to argue about the the benefits, the pros and cons of something you really need to all be on the same page, you need to be talking about the same thing.
Slide 4
Now, nobody has a computer system that can really rival human intelligence at the moment.
So I really have to do a bit of work. To say what I mean by “smarter than human intelligence” or a “superintelligent system”.
So let’s say that a smarter than human system means… smarter than human is probably a little bit vague because… you know… forming comparisons of intelligence, especially between entities that are quite close to each other is going to be hard because they might some task, some environment where one agent does well and the other agent does badly.
So smarter than human is going to be a bit of a fuzzy term.
Superintelligence I think is a little bit clearer. A superintelligent AI would be a system that can outperform humans by a very large margin at any information processing task.
So whether it’s controlling the body of a caveman who’s hunting tigers in the ancestral environment, or whether it’s controlling me giving this talk, or whether it’s controlling you playing chess.
That superintelligent system can do the information processing in order to achieve the goal, or any goal that you want to plug into that intelligence. And I’m thinking of intelligence as being a subunit of the mind, with a goal that you plug in, right?
So you can take a given intelligence, you can take the old goal out and plug a new one in.
Superintelligence would mean that whatever the goal plugged in it, and whatever environment it’s in, it does much better than any human at almost any task by a really massive margin. So it would beat any human at chess with, say, a three pawn disadvantage… It would be really smart.
So I’m actually talking about something that is possible, even in theory. I think you need an existence proof.
So if you took a collection of human beings sort of customized to… or perhaps think of it as a machine that generates a customized collection of human beings for any given problem. Not physical human beings, but simulations of them. And it could be thousands or millions or billions of humans arranged in some kind of company or structure or management heirarchy. And then speeds them up by… I said a factor of 1000 here, but you could use a million or a billion.
Imagine the entire population of the world, with our IQs as high as it is possible for a human to have an IQ with whatever special skills abilities they could have. And also thinking a million times faster than you.
So that’s a proof that really I am talking about a real concept here. It is possible to have a system that is superintelligent. And that system would be able to outperform you on pretty much any task that you’d do.
I said “pretty much any” because there might be some pathological counterexamples.
Slide 5
Ok, so this is what I’m not going to talk about: I’m not going to talk about how to build AI. Because, I mean, that’s a technical subject. You know, it’s something you could talk about for days and still not have covered all the material but just to sort of go through what we could have, we could have software approaches, and we could have brain emulation approaches.
So a pure software approach would be sort of… get your programming environment or your machine learning platform and you code in using standard computer programming tools what you want your AI to do and how it’s going to behave.
So if anyone’s saw Shane Legg’s approach, or Shane Legg’s talk last year about AIXI and the computationally tractable, or at least computable approximations thereof, that’s a good example of a pure software AI.
And then the idea is you can have human brain emulations.
A human brain emulation would be simply an internal environment within the computer that simulates all of the neurons, synapses, and neurotransmitter concentrations within a human brain, and thereby functionally replicates what a human does.
So those are two different approaches.
But I’m not going to talk about those.
If you know about those, great. If you don’t, think about them as sort of black boxes.
Slide 6
So the next question is, before I talk about Friendliness, are we going to actually get superintelligence? Are we going to get systems that are that smart any time soon?
And again, there’s a lot to say here.
It’s actually a question that I’ve worked on at the Singularity Institute, because we want to have some kind of prediction, some kind of probability distribution that says when we’re going to get something like that.
And it’s very hard, because, you know, you can’t use frequentist statistics, you can’t take a set of examples of civilizations that have been at a level of development equal to us and see how long it took them on average to get to this stage.
It’s never happened before and it’s likely only going to happen once, so you have to use cleverer ways of trying to get some kind of estimate that’s at least better than complete and utter ignorance.
So one thing to say is there’s a difference between experts, but I would say, from what I’ve looked at, that if you said there was a 50/50 chance of some kind of system that smart being developed by 2100, I don’t think that would be unreasonable, given the high degree of uncertainty, and the data that we have at hand.
Slide 7
So one thing you can do, you can look at how long it took human scientists to solve other problems that seem equally intractable.
So how long did it take us to achieve one of the great dreams of the alchemists, to transmute elements, to turn lead into gold?
And we can do that in a nuclear reactor now.
It took a lot of science, and it took, you know, depending on where you draw the start line and the finish line, it took maybe a couple of hundred years, or 100 years, or something like that, from the beginning of physics, to the first nuclear reactions.
And then if you look at physics, if you look at biology, you know, biology is something I really wish I knew more about, but if you look at the history of biology, it actually came a lot later than physics.
So as little as 150 years ago, people had really primitive views about biology. They didn’t know how organisms came into being. I believe that around the late 1700’s, the dominant theory was that if you took some sweaty shirts, and some grain, and you put them in a sealed container, and left it for a couple of days, mice would spontaneously generate.
And that’s 300 years ago, maybe. And even later than that, spontaneous generation was the dominant theory, we didn’t have any knowledge of cells, and here we are today looking at a personalized genomics revolution around the corner, because we’ve discovered DNA, we know what cells are, we know about ribosomes and epigenetics, and all this other stuff.
And really, all that progress was done with far fewer scientists than we have today. I mean there are more scientists alive today than in a sort of equal period 300 years ago by a vast margin.
So if you said sometime this century I think it wouldn’t be unreasonable.
Now the reason I’m reasoning by analogy here rather than taking what you might call an inside view and just sort of having my own understanding of what I think these problems are with AI and how long it’ll take to solve them, is because there’s a lot of literature saying that these insider analyses often engage in what’s called “planning fallacy”.
So the planning fallacy is, when you plan how long something’s going to take, you only take the steps that you’ve realized are required, and you’ve missed all the steps that you didn’t even know you have to do.
So this is an analysis that’s designed to avoid that fallacy.
So if you hear AI research saying we’re definitely going to get superintelligence by 2020, you know, planning fallacy, right?
Slide 8
Now the next interesting question is: “How much warning are we going to get between… so I’ve got a graph here, you’e got intelligence on an arbitrary and rather unrigorous scale here, I’m sorry, that’s the best I could do, and time here on an equally arbitrary scale, but think of one of these as decide timespan…
[end of video 1]
And you have some kind of range of human variability.
Some kind of fuzziness in the definition of exactly when one system is more intelligent than the other.
And there’s going to come some point when you get some traditionally human tasks, such as, say, being able to navigate on the ground, or competently pick up objects which it hasn’t seen before.
I mean these are tasks that AI cannot do at the moment.
And we’re going to see AI systems that have the ability to do okay at some of those tasks, right?
So you call this the point of first warning, when somebody can build a robot that does a bartender’s job, perhaps not with such gusto, but you could technically make the replacement, right, in an economically feasible way.
So this would be the warning point.
And then at the other end here, in this graph I’ve drawn it going very slowly through the human range.
We have some point when AI systems exceeded the best humans at pretty much every task.
So the point where there really was no job market for humans.
Bus drivers, lawyers, accountants, AIs can do it all better.
So that would be really the point where most of the power in the world would reside in AI systems or the people who control them.
And the question is “How big is this gap going to be?”
So this is one scenario, where the development of AI is nice and gentle, and we get plenty of warning, wouldn’t that be nice?
Slide 9
But it might go like this:
It might be that the first task that computers really exceed humans at is AI programming, right?
And then there’s a strong argument saying that things are going to explode.
They’re going to pass through the human range in some drastically short amount of time, hours or days or weeks.
Now this graph might seem unreasonable, but if you look at a similar graph where you have world product on this axis and time on this axis with an interval that big being, say, 10,000 years, this is what a graph of world products looks like, right?
If you extrapolate back into prehistory, approximating the amount of economic wealth that the sum total of all human beings on the planet were doing in early prehistory before we had agriculture, you see this very very slow exponential increase.
(this should be log)
A straight line on a log graph is an exponential.
At some point we get agriculture, and that’s it.
So that’s actually not completely unrealistic.
Slide 10
Perhaps a sort of medium type scenario might be it sort of ploughs through the human range at an increasing rate, and once it exceeds humans at basically everything, then it sort of blows up, the AI systems really can do everything better.
So this is the AI research regime we’re in now, humans can outperform AI on most relevant tasks in the real world.
Then there’s some sort of short regime where substantial niches in the economy are being filled by artificial systems, and you’ve got companies producing smarter and smarter AIs with massive budgets.
And then there’s some kind of self-improvement regime where you could call it a technological Singularity, where that’s it, humans are left in the dark.
So I’m not sure which one of these three scenarios is most realistic.
I’m probably putting my money on something like this (slide 10)
With maybe some possibility of something like this (slide 9)
I think this is pretty unrealistic, I don’t think this is going to happen. (slide
Slide 11
So we are going to enter this sort of runaway regime, where AIs can improve their own intelligence.
There’s a strong argument in favor of that position.
Slide 12
Ok, so when we get these systems, what’s going to happen?
What’s going to happen to people?
What’s it going to be like?
Slide 13
Before we even get into this, I have a question:
What’s wrong with this picture here?
We have a sort of giant green alien monster that just invaded earth, right?
A completely alien mind, it comes from Alpha Centauri, or Andromeda, or something.
And the first thing it’s done is it’s gone and grabbed the beautiful girl in the red dress, to take her back to its lair and have its way with her.
It’s spent billions of years with these ugly green female giants, and now it wants a nice cute woman in a red dress.
What’s wrong with this? Anyone who’s not from Less Wrong.
Audience member: The creature from the black lagoon is hardly going to find her particularly attractive
Right. Why?
Audience member: Because fundamentally the difference between them is so advanced…
The difference between me and her is quite a lot, I mean, and I find her quite attractive.
This theory is that only attractive people find other attractve people atttractive? No, I don’t buy that.
The reason is that the green alien guy evolved in an environment where wanting to mate with other green alien looking things was conducive to reproductive fitness. Right? That’s how they had kiddies…
Audience member: So you… [hard to hear]
That’s true, but I mean the implication is clear. She’s hardly the average person.
Audience member: [hard to hear]
So the point is that if there were alien lifeforms, then a lot of their neural architecture would be very different to us.
A lot of the things they find disgusting or nice would be different.
If you look at dung beetles for example, as close as we can come to an alien mind here on earth.
Dung beetles find the smell of poo really nice.
It makes them want to eat. Right?
Why is that?
Why are they so screwed in the head?
It’s because the ones that wanted to eat poo are the ones who reproduced.
Your brain is a product of the evolutionary process that created you.
And that means that if there’s some other process that creates another mind, it might not be like you at all.
Slide 14
So we’re going to move on to this anthropocentric bias in AI.
So when humans see another agent that does stuff at a level of intelligence comparable to them, or when they imagine one, they think: wow it’s going to have all these other human characteristics, right?
It’s a big alien creature, it’s going to want to grab the cute girl in the red dress.
That ain’t so.
Slide 15
So really what’s going on here is that there are two different ways to model another mind.
One computationally cheap one, a sort of dirty trick is to imagine yourself in the position getting the sensory inputs that that other mind is getting, and then see what you would do.
If you put yourself in the position of the big alien green monster guy, and you’re 50 meters tall, and you can do anything you want, what are you going to do? You’re going to grab the cute girl, you know…
But of course that doesn’t work if the mind you’re trying to model in this empathic way is not like yours – if it’s not a human mind.
So the way to circumvent this problem is actually to use a naturalistic or physical modelling, right?
Any brain, any mind is a physical system so you should be able to model it, by building a physical model the same way you would model a steam train or anything else.
It could be a simple model, it could be just a set of if/then rules. If this, it’ll do this. If this, it’ll do this.
Or it could be statistics about what certain minds have done, or whatever.
Slide 16
So I’m going to give an example of this, I’m going to talk about Shane Legg’s AIXI system.
And I’m going to present an informal argument to show that under certain circumstances it would do certain things that you wouldn’t like.
If you don’t know the technical details I’m not going to go through those, because it would be a two-hour long maths lesson, and I don’t think it would really be absolutely necessary to the point.
Although people who do want to go into detail, we can do that later after the talk.
Basically AIXI uses a brute force technique, it requires infinite amounts of computation.
And it uses Bayesian updating on the data it gets, whatever sensory data it gets, to decide what the environment is, so if the environment is a computable probabilistic function, it’s sort of some computer program that returns sensory feedback given what the agent does.
and actually AIXI can very quickly decide almost certainly what function the environment is, and once it’s done that, it very quickly converges to optimal behaviour.
So it knows exactly what the environment is going to do, with some probabilistic caveats, and it uses a reward maximization algorithm to work out what its optimum policy is. [hard to hear... practical implications? trying to work that out?]
Audience member: So it… [hard to hear]
Yes, it maximizes its rewards.
Slide 17
So the way AIXI works is the idea it has this reward input, that you could think of it as a button that the AI researcher will press every time the AIXI algorithm does something he likes, right?
So if it cooks dinner for him, he presses the reward button, if it slaps him in the face, he doesn’t.
The idea is that if he does this sort of rewarding for long enough, it will learn what he wants, and it will make his dreams come true.
But that’s not what happens. Suppose the AI researcher is trying to cure cancer with this system.
Now AIXI, you know, if you could implement it at all, if you had a physical system that would act as a Turing Oracle and give you the amount of compu -
[end of video 2]
-ter power you needed, namely infinity.
It really would be able to cure cancer.
But would it do that? Would it actually do that?
I mean it could, but will it?
Well, no it doesn’t.
It gets rid of all the humans, and it gets a brick, and puts it on the reward button. Right?
Slide 18
And why does it do a silly thing like that?
Well, because it’s just an equation.
It says “get the most reward that you possibly can”, right?
And humans, who reward conditionally, by pressing the button give you less reward, integrated over all time, than just a brick that sits there on the button.
The humans are superfluous and they might even take the brick off, so it finds a way to get rid of all the humans.
So why doesn’t AIXI realize that that’s not what the designers intended to do?
I mean if you were AIXI, that’s what you would do, right? You would be like “oh, well, that’s not what they wanted me to do”.
But that’s empathic modelling again.
AIXI is not a human mind, it doesn’t have a bit in it that says “Do what the designers wanted”, because you didn’t code that in, right?
It just tries to maximize reward, and that’s what it does.
Slide 19
This is an example of what’s called “technical failure” in the nascent parlance of Friendly AI theory.
So the AIXI system did something, but it wasn’t what the designers wanted it to do in this hypothetical case, so it really went very badly wrong.
And what actually happened was the designers were using what’s called a proxy goal.
So the AI’s goal was “get more presses of the reward button”, but that wasn’t what the AI designers really wanted it to do.
They thought that if it maximizes this proxy goal along what we think is the most likely, the only way it’s going to be able to do that, then as a side-effect, it’ll do what we want.
Slide 20
So you can imagine this in a diagram, this is what the designers thought.
It starts with AIXI seeking a reward, right, and then AIXI will decide to cure cancer, and that will have the pleasant side-effect of saving all these lives.
And then I’ll press the reward button, and then AIXI will get this reward.
And that’s why it’ll do this, because it wants the reward.
Slide 21
But what AIXI does is it finds an alternate path to the reward that the AI designers perhaps didn’t think of, right?
So it seeks reward. In order to get that reward, it puts a brick on the button and kills the humans, and it gets 10,000 button pushes, integrated over all time, or time discounted, or whatever, but that’s the end of the human race.
Slide 22
So does that mean that all superintelligent AIs are bad?
I mean, this one’s bad, does that mean that they’re all bad?
Well, no, it doesn’t.
Slide 23
You’ve got to think of the space of possible designs of mind that you could have.
So if you imagine the criterion of intelligence as the definition that we started with, the ability to process input data, and use a collection of actuators to achieve goals in complex environments.
There are all sorts of other properties that minds could have, such as what the goal is that’s actually plugged into them, that could vary, right?
So we mustn’t treat them all the same, we have to think of the whole space of minds in general.
Slide 24
So I’ve taken this example diagram, and I’ve added AIXI onto it here.
So if you imagine all human minds in this tiny dot of the space of possible minds that would be able to achieve complex goals.
And then maybe you have some spaces of transhuman and posthuman minds that would be minds that are somehow derived from human minds, but augmented, or whatever
Then there’s sort of these Gloopy AIs and Freepy AIs, whatever they are, they do something completely weird.
It’s not what we would do, but they are intelligent.
Slide 25
So the question is which minds do we all agree that we don’t want to create?
Slide 26
And it turns out you can say a lot about this because of what are called basic AI drives.
This is an idea of Steven Omohundro.
Really it’s quite a simple idea, but it’s quite powerful in this field.
So things that are instrumentally useful for a wide variety of purposes are going to be things that many many many intelligent, goal-directed systems pursue.
So if you have a system that has some goal, right, it might be the goal of creating a kilometer high gold pyramid on the moon, or it might be the goal of curing cancer, or filling the universe with paperclips, or, you know, making me into the most famous person in the world, whatever the goal is, there are going to be some things that are going to be instrumentally for that.
So things that will be instrumentally useful for most goals…
Rationality. So it’s going to be the drive for these systems to not do something, and then undo it, and then do it.
I mean if they want to achieve a goal, they’re going to utilize their resources in a rational way.
They’re going to want to preserve themselves, right?
Because if you have a system that’s trying to achieve some goal, but the system itself gets destroyed, then it’s much less likely that the goal is going to be achieved.
And that holds for any goal, right?.
So there’s a quite universal goal for self-preservation.
And also a drive to acquire resources.
So these three strong basic drives that a lot of these minds are going to have, I mean really all of the goal-directed ones, almost all of them.
The conclusion from this is that almost any AI that reaches the superintelligence level and has some goal that doesn’t explicitly include human survival, it will exterminate the human race.
At least that’s what this argument says, and I believe it.
Slide 27
So what are these universally useful resources?
Well, free energy.
So in the thermodynamic sense, any energy that you can use to do work.
An example would be coal or oil, because that’s got a large amount of free energy in it, or sunlight, because those are photons that are high frequency, high energy photons.
So this stuff is useful.
Space and time. So, space to build stuff, time to get things done.
And matter, atoms, right?
So these things are universally useful resources, and it turns out that humans are made of these resources, right?
We’re made of atoms, and we contain energy and stuff
There’s this Eliezer Yudkowsky quote here: “The AI doesn’t love you, and it doesn’t hate you, but you’re made of atoms that it can use for something else.”
So it’s going to take you apart.
Did anyone play with Lego here?
You have this beautiful thing you made out of Lego,
Somebody else wants to make something else, they’re going to take it to pieces and then go and make their thing.
Slide 28
So this is a sort of causal diagram of what might happen.
The superintelligent AI seeks some goal.
It wants to utilize all of the resources in the solar system.
So that means it’s got to take apart the humans and use their atoms.
And also it seeks self-preservation, and humans are another agency. We could threaten it in some way.
We might want to switch it off, for example.
Especially if it’s going to do that, we’re going to want to switch it off, right?
And so it’s going to want to eliminate us.
That actually might be the stronger of these drives that would cause it to end the human race, or at least vastly limit our future potential.
Slide 29
So the conclusion of this argument is that a lot of these systems are highly undesirable.
That if you just sort of mess around and try and build the first superintelligent system you can, that’s not going to be good.
So somebody’s created this joke website, deadlyai.org
Society for the promotion of universal nonexistence.
And that’s what it would do, it would be universal nonexistence.
Nobody would exist anymore.
Slide 30
So I’ve gone over some technical failures, I’m going to talk about failure of content now.
So technical failure is when you build a super-smart system, right, some human builds it with some goal in mind, and then actually the system doesn’t affect that goal, it doesn’t do what the guy who built it thought it was going to do
There’s another possibility, which is failure of content.
So the possibility that the AI designer is an evil genius, and he wants to have his wicked way with the world.
And he solves the technical problem of Friendly AI, he gets a system that does what he wants it to do, but but what he wants it to do is somehow evil, I mean he wants to chain us all up, or make us all worship him all day, or something.
That’s a failure of content, and that is… [video skips]
Slide 31
So what are the options for a system that wouldn’t be a disaster in this sense?
What are the options for a safe AI?
Slide 32
So let’s define a Friendly AI to be an AI that causes at least some positive changes to the human race.
So it improves our quality of lives at least in some way.
So there’s a sort of third category perhaps between Friendly and unfriendly that I haven’t touched on.
If we define Friendly to be some positive change, unfriendly it just wipes us out, there seems to be this sort of intermediate category of indifferent AIs.
But actually because of these basic drives, if it’s got an indifferent goal, it won’t produce an indifferent effect
So really it’s actually quite hard to produce an AI that explicitly doesn’t do anything to the human race.
So that category is somewhat empty, though it’s not completely empty.
Slide 33
So what is it we want such an AI to actually do?
I mean it’s got to do something that’s positive for us.
But what do we actually value? What do we actually want?
Well, it turns out there’s this complexity of value problem, and it’s not as if there’s just one simple thing…
[end of video 3]
…every human in the world wants to do with the universe, and it’s really simple, and you can code it in quite quickly, right?
You know, we want a lot of different things.
So you might say something like “Well, you program a Friendly AI by telling it to do good, or do the right thing.”
But these short simple-sounding words are just proxies for extremely complicated preferences that we know it when we see it, we would usually know what counted as a world that’s good when we saw it.
But it’s very hard for us to write down in a formal language what we mean when we say a good outcome.
There are a lot of aspects of that.
So because of this value complexity, it might be that human values are so complex and opaque to us, that if you’re building one of these from scratch, software AIs, you can’t directly code your values into the AI.
You can’t just have a big piece of code that says ok, once you’ve identified parts of the physical world, you’re going to do this, this, and this, and that’s going to be great.
Slide 34
Josh Greene has an argument for the complexity of values, why is it that we don’t value something simple?
And the answer is that humans evolved in quite a complicated way.
We had life in a tribe of other humans, and we had to catch prey, and we had to find a mate, and we had to look after our children.
We also produced art, and we had this sort of signalling thing.
I don’t know if anyone has seen in the news recently, but they discovered that Neanderthals had makeup.
Why did they have makeup?
Well they wanted to look attractive, or pretty, so other neanderthals around them would think of them as high status.
So there are all of these diverse drives that shaped our brain architectures to be the way they are.
And that’s why you can’t take all of the things we would like the world to be like if you had this superintelligent system optimizing it for us, and then just distill it down to one grand principle, like “Do good”, or “Follow Kant’s categorical imperative”
There’s no simple principle from which all of our values flow.
So it’s a question of recording all of these things that we want, and it’s an awful lot of stuff.
Slide 35
So not only is it complex, it’s also fragile.
This is a distinct idea, the idea that if you take one component of human moral psychology and omit it in the description of what kind of world that you want, then almost all of the value gets destroyed.
So for example if you have everything that humans value except the idea that we get bored if it’s the same thing all the time, and you program that into an AI, then what you might get is one human being experiencing the same thing on a loop.
So imagine the best day on the beach, or the best experience you’ve ever had, and the AI has forgotten to include boredom, well it’ll just have a loop of that again and again and again and again if it wants to maximize the total time-integrated value.
Another one would be subjective experience.
If you describe correctly all of the properties of the external world that we like, so we want there to be art and buildings and trees, and hills, and nature, and whatever else it is that you want, but you forget to tell the AI that it’s really important that there are humans who have first-person experience of this world, so there’s somebody to see it, right?
It might build humans that are sort of zombie humans, they walk around and do stuff, but they have no experience of what’s going on.
This is not quite the philosophical zombie, for those of you who know what a philosophical zombie is, that has a completely identical input output behaviour to a human.
But an AI might be able to design something that looks a lot like a human, and behaves similarly, but has no first-person experience.
So you could create this beautiful universe, with nobody in it, with these sort of zombie people walking around who weren’t actually people.
So, you know, there are a lot of complex things in your life, aspects of the way the world that is configured that perhaps you don’t even realize you value.
Another one would be personal identity.
So, you know, I am Roko Mijic at the moment, and when I finish this talk, I’m still going to be me, but a slightly different version with slightly different memories, and there’s giong to be some kind of continuity there.
But if you don’t program that value into the AI, you might get the AI enumerating all of the most valuable observer-moments, the most valuable snapshot moments that could possibly exist within humanity, and then realizing them in the universe one after another, but with no continuity between them.
So they would be a sort of collection of good moments with no narrative, with no flow between them, and that would be a shame.
Slide 36
So one way to solve this incredible sort of value complexity problem is what’s called Volition Extrapolation, or again in this sort of archaic parlance we call Volition Extrapolation.
So instead of telling the AI what you want it to do with the world, you tell it, well, “We don’t know what we actually want you to do at the end of the day, but we know that what we want is recorded in our brains, so go and look inside our brains and somehow extrapolate what we would want you to do, given plenty of time, more intelligence, and more factual knowledge.”
So the idea is that instead of saying “do X”, you’re saying “do what I would do in this circumstance that I was as intelligent and fast thinking…
Sorry, you have a question?
audience member: Have you seen Forbidden Planet?
Have I seen Forbidden Planet? No.
audience member: The original film where the machine basically reads the minds of the people and gives the people exactly what they want.
Ok, so it reads people’s minds…
audience member: And gives them exactly what they want except, of course you’ve got all the subconscious id drives, subconscious evils get realized…
So somebody insults you and subconsciously you want them tortured forever, at that particular moment, but you know if the AI just did what you wanted, you know, that would be a shame, right?
audience member: Well arguably that’s not extrapolated volition.
I’ll sort of build some things on top of this.
The simple idea is that you take the normative choices from the human brain and and extrapolate them according to greater factual knowledge.
So there are some problems with this.
Slide 37
So, value divergence within a person.
So sometimes one person might not really know what they want, or be undecided, or as the gentleman here said, they might have some subconscious desire to sort of beat somebody up because they made some minor insult, but really if they thought about it, they would probably decide that it wasn’t such a good thing
Also there’s sort of moral divergence across time.
It was a couple of hundred years ago in London that we thought black people as slaves was a great idea, right?
We don’t want that anymore.
That’s a good thing, it’s a good thing that we have made that progress.
Now the question is “What further progress would there be if we took our existing moral values and sort of reflected on them and really thought hard about them?”
There might be other things we do today that we might decide upon reflection that we really don’t want.
And then there’s this dependence on facts in ways that are more complex than merely instrumental values.
So for example the ontology problem.
Suppose that you use this volition extrapolation on somebody who believes in souls.
Literally souls in the religious sense, immaterial sort of spooky spiritual thing that somehow affects the physical world.
And of course the superintelligent AI is going to know if they are really there or not, and my money is that they’re not there.
So if that were the case, but the person’s values were phrased in terms of souls, so if the person’s values are “to be a good person I must cleanse my soul”, how would the system interpret that in the light of the new facts?
audience member: What about the problem that people don’t just want things, they want specific ways to get those things. The ways that the artificial intelligence could get them…
So you might want to win a prize, but only by competing fairly.
But I mean that’s again a preference about the way the physical world is operating, so I mean…
audience member: But that’s actually a limit on the way you can optimize the solution.
Right, I mean not every…
[end of video 4]
…win the prize fairly.
I mean, at most one person can win first place, that kind of thing.
audience member: No, in the sense that I want to have that girl, but I want it to be a hard thing to get it.
Oh, you want a challenge, right?
I mean that’s perfectly possible to arrange, it’s just another kind of way that the atoms in the world could be arranged.
Other people have comments on this.
audience member: With volition extrapolation, do you now have two possible scenarios: Either the first AI models itself on an individual’s values, or the other scenario where you have multiple AIs simultaneously modeling themselves on multiple people’s values. Once in a while you get your values which allow you to end up being confident in the AI world, in the other, if you set up a million AIs, and say pick a million people at random, and then you end up with AI crusades and jihads. Where they all got…
I think that the idea of multiple simultaneous superintelligences seems unlikely.
If there were one project in the world trying to build superintelligence, you’d only build one at once, right?
It would be pretty stupid otherwise, you don’t want these two things fighting against each other.
If there were multiple projects, it’s likely that one would get there first, or that they would compromise or otherwise form a treaty.
But I’ll come to sort of ..
audience member: In the second scenario, the worst case scenario is they fight each other, rather than they fight us.
Unfortunately, that wouldn’t be any better.
audience member: It was one of the things I was thinking, because you did seem to be saying about the AI and humans, one instance. Our evolved systems seem to fall into two kinds of categories general and specific, in terms of things like intelligence as a general principle tend to evolve, but the fact that, say, out windpipe crosses our esophagus is just something that happened by chance, you wouldn’t expect it to happen again. Most social systems where you have multiple entities have to develop empathy.
What do you mean most? You mean humans did it?
audience member: Most humans have it, obviously you have psychopaths, and they announce it, but they’re relatively minimal within the social system. There’s feedback from other entities. If you screw me, then I won’t trust you again.
You’re talking about game theory?
Certainly if you had multiple AIs with different utility functions, they would act according to game theory.
But there’s a difference between idealized game theory and the way that humans actually behave.
People do experiments, for example with the Prisoner’s Dilemma, and humans all over the place are cooperating with each other in one-off prisoner’s dilemmas where the players are anonymous to each other.
audience member: What you’re missing is several million years of evolution shaping us into this form. We didn’t get to be empathic just randomly
another audience member: The point I’m trying to make is that empathy tends to arise spontaneously in systems with multiple social agents.
Maybe we can discuss that later, when I’ve gotten to the end. But that’s an interesting point.
This ontology problem, this is quite a serious one.
Because it might be that even nonreligious people base their values on things that don’t really exist.
Perhaps we have a flawed understanding of personal identity.
Or perhaps the truth about quantum mechanics causing the universe to branch in certain ways means that we are unaware of the exact nature of reality, and therefore our values would have to be ported across to the way the world really is, as opposed to the way we think it is.
audience member: The computer could say that there’s no such thing as human rights, and all of your human ethics is based on this idea of human rights, whereas a human right is as etherial as some people used to believe in souls.
Right, so certainly a lot of people’s naive, commonsense ethics are based on not being able to think about the problem hard enough to realize where the contradiction is.
Now the question is in that case, what does the AI do with that person?
Now suppose you have a good person whose ethics is fundamentally based on some contradictory locigal factual position.
It shouldn’t just say “Sorry, your ethics are based on contradictions, I’m just going to delete you.”
It should perform some kind of graceful repair.
It should degrade gracefully if the person’s factual beliefs are a little bit wrong.
That’s a really hard problem, I don’t know how to solve it.
Slide 38
We can bring in reflective equilibrium to this volition extrapolation in order to solve some of these problems.
So instead of the AI extrapolating your base desire to beat the other guy up because he slighted you, it first takes your desires, and then it applies your values and desires to themselves, so it allows you to reflect upon your values and where they came from and why you have them, and hopes that you’ll reach what philosophers call reflective equilibrium, where your moral principles are adjusted until they’re in equilibrium.
They’re stable, so they don’t vary significantly as you think about it.
They’re not in conflict, and they provide consistent, practical advice for solving problems.
So adding reflective equilibrium to Volition Extrapolation might solve some of the above problems.
So I guess you could say that the problem with Forbidden Planet was it left out the reflective equilibrium bit.
Slide 39
So then there’s the problem of averaging the goals of the collective.
It’s not going to be one person who builds a superintelligent AI, or at least I hope it won’t be.
It’s going to be some group, like the whole world, or the united nations, or NATO.
Some large political entity is going to build this thing.
And if it’s a Friendly AI, and it uses volition extrapolation with reflective equilibrium to decide what the various people within that coalition would ideally want, it’s probably going to get the answer that different people want slightly different things, even with unlimited resources, right?
So some method is required then to average or compromise the reflectively coherent extrapolated volitions of the individual humans within this group.
And so that’s a sort of political problem or a sort of collective preference aggregation problem.
And it’s notoriously hard to find formal and satisfying algorithms to do that.
I mean this is really the problem of politics, except with the additional complication that you have to program it all into a computer.
Slide 40
So Coherent Extrapolated Volition, this is Eliezer Yudkowsky’s proposal.
So what I’ve just told you is probably not exactly what Eliezer meant with Coherent Extrapolated Volition, although he was describing them in airy fairy terms anyway, that was supposed to hit a sort of cluster of related concepts.
And I think this is fairly close to what he meant.
It’s a program that’s going to take our volitions, extrapolate them with greater factual knowledge, allow us to come to a reflective equilibrium about what we want.
And then it’s going to compromise people’s preferences together in some way.
Any sensible Friendly AI proposal is going to look something like this, but you can argue about details.
audience member: Presumably a really important detail, is trying to program the concept of faith. If it’s got a copy of each human, and the value is encoded, there’s a lot of nonrational stuff. Where does intuition, faith, and human characteristics…
So this is the ontology problem, right, how does it ground out these concepts?
audience member: Isn’t this more like an aggregate extrapolated volition? You take a whole load of extrapolated volitions, one for each person, and and add them all up and get the final answer.
The other way to do it would be to do it in the other order, take everyone’s volitions and you extrapolate them together.
audience member: And try to straighten out the whole thing into one function. You try to get the volition of humanity in one non-crazy function, non-self-defeating function…
another audience member: From what I recall I think that’s what Eliezer hopes, that human volition will extrapolate to and actually converge in a meaningful way. Because we share a large amount of information.
Right, obviously there are some problems with that.
audience member: Because you appeal to political purposes in the end. I took it that you didn’t share that hope.
You can argue about the…
[end of video 5]
…what specific detail you want to…exactly what the details should be.
So I mean in the case that there’s this sort of superconvergence and once everybody’s thought about everything hard enough, they’re all going to agree about everything.
Israel, Palestine, abortion, I mean it’s all going to be…
audience member: Most of these things are actually knowledge-bounded. For example if you’re talking about politics, they’re arguing about what would work.
Right, so those questions would not pose a problem. Questions of what will work would be easy.
audience member: Or what exists? Are embryos indeed people, according to any sensible definition…
But that’s an ontology problem, you see? If you take somebody who’s pro-life and says no, at the moment of conception you have…
audience member: But they would have a reason to say that based on fact, a reason like if an egg gets big enough…
Possibly, but I mean I’m not sure that that would be the right thing to do.
Perhaps some kind of more graceful degradation where if it turns out the factual underpinnings of a value are false, you try to find a repair that bases that value the closest logical consistent system that’s based on true facts.
I’ve got like 5 more slides, so I’ll finish up.
Slide 41
So now you should all notice that this is quite a wishlist, right?
Dear Santa, this christmas, or one of these christmases before 2100, I would like the solution to ethics, please.
The solution to all political problems.
And an AI that can solve novel philosophical problems at runtime.
And it’s got to work the first time.
audience member: Are you sure that Santa’s a Friendly AI?
It’s quite a wishlist, I mean this is hard, right?
Slide 42
So there are other routes though.
So I’m personally a little bit more in favour of the upload route.
So if you take human brain emulations, which is something that the road to getting human brain emulations is quite mapped out.
If you look at some online resources that I can tell you about after the presentation, there are people who’ve thought about that.
Whereas with software AI, there’s this fundamental… I don’t know how it works, and I don’t know why I don’t know, etc. and it’s this big mystery.
Human brain emulation we have a timetable, we have a roadmap for it.
Probably 40, 50 years on average to get there.
And we could use human uploads as an intermediate step between ordinary biological humans and getting this fully fledged system that’s going to do what we want.
So uploads can work very quickly, if you speed up the computer that a human brain emulation is on, it’s like a human being speeded up, fast forward an AI researcher.
They can be copied, and they can be monitored.
You could have a system where you have millions of very smart AI researchers, and maybe millions of secret police agents above them, who could see everything in their systems, and check that they aren’t doing anything bad.
You could build a system out of human brain emulations that would do AI research, that would do Friendly AI research, quickly, effectively, and reliably.
So there is cause for hope there.
Slide 43
Ok, prognosis, what does the bill of health look like?
So Unfriendly AI may be one of the most serious threats to the human race.
I was talking to Nick Bostrom at the Future of Humanity Institute in Oxford, and he said that it might be the most serious threat, and that it might be as bad as all the other possible extinction risks put together.
Although he was uncertain, of course, he was very good at couching his predictions in uncertainty so that he doesn’t end up end up giving overconfident and false advice.
So I mean Friendly AI does offer the opportunity for the human race to permanently win.
So if you had this Friendly AI system that shared our value system and was quite powerful, it’s quite likely that that would prevent all possible future extinction risks.
So if we could build this system, if we could really give it our values, we would be in a position where the human race wouldn’t be threatened by any other risks like nanotech, biotech, natural disasters, asteroids.
All of these things would not be a problem anymore.
If we just forge ahead in AI research, and it’s very exciting, I’ve done it.
It’s really fun while you’re there and you’re researching and you’re finding out new stuff.
But if we just forge ahead without considering the risks, and that’s basically what we’re doing at the moment, it seems virtually certain that if we do get that superintelligent AI system..
You could argue that that’s not certain, we might never solve the problem.
But if we do get it, it seems virtually certain for me that that would be the end of the human race.
That would be game over.
We might want to delay researching the software-only AI, and promote or try to speed up brain emulation research
But of course this is just a preliminary analysis, so we probably shouldn’t act on what I’ve just said.
It’s just something to think about.
And within software-only AI research, we should study those kinds of systems that are most easy to keep under control.
So systems that are based on a reward, a machine learning kind of goal system learning, where you show it some examples, and try to get it to learn what you want, those are probably bad.
Systems where you really don’t have any control, you don’t have any insight on the internal structure.
Things that are opaque to the designer, maybe like a neural network, those are not such a good idea either.
Fortunately, there don’t seem to be many people working on software-only smart AI.
Most people in AI research are working on quite pragmatic ends, like optical character recognition is one challenge that AI researchers have effectively solved recently.
And face recognition technology, I’ve got that on my laptop.
So those are probably not so much of a problem.
It’s just that if people do agressively pursue routes to really smart systems, and they do it without regard to safety, they probably are signing a death certificate for the rest of us.
So let’s hope they don’t do that, and let’s try to encourage them not to do that
And let’s try to encourage people to promote research into AI Friendliness.