Stuart Russell: Long-Term Future of Artificial Intelligence #9

Transcript

00:00:00 The following is a conversation with Stuart Russell. He’s a professor of computer science at

00:00:04 UC Berkeley and a coauthor of a book that introduced me and millions of other people

00:00:10 to the amazing world of AI called Artificial Intelligence, A Modern Approach. So it was an

00:00:16 honor for me to have this conversation as part of MIT course in artificial general intelligence

00:00:23 and the artificial intelligence podcast. If you enjoy it, please subscribe on YouTube,

00:00:28 iTunes or your podcast provider of choice, or simply connect with me on Twitter at Lex Friedman

00:00:34 spelled F R I D. And now here’s my conversation with Stuart Russell.

00:00:41 So you’ve mentioned in 1975 in high school, you’ve created one of your first AI programs

00:00:47 that play chess. Were you ever able to build a program that beat you at chess or another board

00:00:57 game? So my program never beat me at chess. I actually wrote the program at Imperial College.

00:01:06 So I used to take the bus every Wednesday with a box of cards this big and shove them into the

00:01:14 card reader. And they gave us eight seconds of CPU time. It took about five seconds to read the cards

00:01:21 in and compile the code. So we had three seconds of CPU time, which was enough to make one move,

00:01:28 you know, with a not very deep search. And then we would print that move out and then

00:01:32 we’d have to go to the back of the queue and wait to feed the cards in again.

00:01:35 How deep was the search? Are we talking about one move, two moves, three moves?

00:01:39 No, I think we got an eight move, a depth eight with alpha beta. And we had some tricks of our

00:01:48 own about move ordering and some pruning of the tree. But you were still able to beat that program?

00:01:55 Yeah, yeah. I was a reasonable chess player in my youth. I did an Othello program and a

00:02:01 backgammon program. So when I got to Berkeley, I worked a lot on what we call meta reasoning,

00:02:08 which really means reasoning about reasoning. And in the case of a game playing program,

00:02:14 you need to reason about what parts of the search tree you’re actually going to explore because the

00:02:19 search tree is enormous, bigger than the number of atoms in the universe. And the way programs

00:02:27 succeed and the way humans succeed is by only looking at a small fraction of the search tree.

00:02:33 And if you look at the right fraction, you play really well. If you look at the wrong fraction,

00:02:37 if you waste your time thinking about things that are never going to happen,

00:02:41 moves that no one’s ever going to make, then you’re going to lose because you won’t be able

00:02:46 to figure out the right decision. So that question of how machines can manage their own computation,

00:02:53 how they decide what to think about, is the meta reasoning question. And we developed some methods

00:03:00 for doing that. And very simply, the machine should think about whatever thoughts are going

00:03:07 to improve its decision quality. We were able to show that both for Othello, which is a standard

00:03:13 two player game, and for Backgammon, which includes dice rolls, so it’s a two player game

00:03:19 with uncertainty. For both of those cases, we could come up with algorithms that were actually

00:03:25 much more efficient than the standard alpha beta search, which chess programs at the time were

00:03:31 using. And that those programs could beat me. And I think you can see the same basic ideas in Alpha

00:03:42 Go and Alpha Zero today. The way they explore the tree is using a form of meta reasoning to select

00:03:51 what to think about based on how useful it is to think about it. Is there any insights you can

00:03:57 describe with our Greek symbols of how do we select which paths to go down? There’s really

00:04:04 two kinds of learning going on. So as you say, Alpha Go learns to evaluate board positions. So

00:04:11 it can look at a go board. And it actually has probably a superhuman ability to instantly tell

00:04:19 how promising that situation is. To me, the amazing thing about Alpha Go is not that it can

00:04:28 be the world champion with its hands tied behind his back, but the fact that if you stop it from

00:04:36 searching altogether, so you say, okay, you’re not allowed to do any thinking ahead. You can just

00:04:42 consider each of your legal moves and then look at the resulting situation and evaluate it. So

00:04:48 what we call a depth one search. So just the immediate outcome of your moves and decide if

00:04:53 that’s good or bad. That version of Alpha Go can still play at a professional level.

00:05:02 And human professionals are sitting there for five, 10 minutes deciding what to do and Alpha Go

00:05:06 in less than a second can instantly intuit what is the right move to make based on its ability to

00:05:14 evaluate positions. And that is remarkable because we don’t have that level of intuition about Go.

00:05:23 We actually have to think about the situation. So anyway, that capability that Alpha Go has is one

00:05:31 big part of why it beats humans. The other big part is that it’s able to look ahead 40, 50, 60 moves

00:05:41 into the future. And if it was considering all possibilities, 40 or 50 or 60 moves into the

00:05:49 future, that would be 10 to the 200 possibilities. So way more than atoms in the universe and so on.

00:06:01 So it’s very, very selective about what it looks at. So let me try to give you an intuition about

00:06:08 how you decide what to think about. It’s a combination of two things. One is how promising

00:06:14 it is. So if you’re already convinced that a move is terrible, there’s no point spending a lot more

00:06:22 time convincing yourself that it’s terrible because it’s probably not going to change your mind. So

00:06:28 the real reason you think is because there’s some possibility of changing your mind about what to do.

00:06:34 And it’s that changing your mind that would result then in a better final action in the real world.

00:06:40 So that’s the purpose of thinking is to improve the final action in the real world. So if you

00:06:47 think about a move that is guaranteed to be terrible, you can convince yourself it’s terrible,

00:06:53 you’re still not going to change your mind. But on the other hand, suppose you had a choice between

00:06:59 two moves. One of them you’ve already figured out is guaranteed to be a draw, let’s say. And then

00:07:05 the other one looks a little bit worse. It looks fairly likely that if you make that move, you’re

00:07:10 going to lose. But there’s still some uncertainty about the value of that move. There’s still some

00:07:16 possibility that it will turn out to be a win. Then it’s worth thinking about that. So even though

00:07:22 it’s less promising on average than the other move, which is a good move, it’s worth thinking

00:07:27 about on average than the other move, which is guaranteed to be a draw. There’s still some

00:07:32 purpose in thinking about it because there’s a chance that you will change your mind and discover

00:07:36 that in fact it’s a better move. So it’s a combination of how good the move appears to be

00:07:42 and how much uncertainty there is about its value. The more uncertainty, the more it’s worth thinking

00:07:48 about because there’s a higher upside if you want to think of it that way.

00:07:52 And of course in the beginning, especially in the AlphaGo Zero formulation, everything is shrouded

00:07:59 in uncertainty. So you’re really swimming in a sea of uncertainty. So it benefits you to,

00:08:07 I mean, actually following the same process as you described, but because you’re so uncertain

00:08:11 about everything, you basically have to try a lot of different directions.

00:08:15 Yeah. So the early parts of the search tree are fairly bushy that it will look at a lot

00:08:22 of different possibilities, but fairly quickly, the degree of certainty about some of the moves,

00:08:27 I mean, if a move is really terrible, you’ll pretty quickly find out, right? You lose half

00:08:32 your pieces or half your territory and then you’ll say, okay, this is not worth thinking

00:08:37 about anymore. And then so further down the tree becomes very long and narrow and you’re following

00:08:45 various lines of play, 10, 20, 30, 40, 50 moves into the future. And that again is something that

00:08:55 human beings have a very hard time doing mainly because they just lack the short term memory.

00:09:02 You just can’t remember a sequence of moves that’s 50 moves long. And you can’t imagine

00:09:08 the board correctly for that many moves into the future.

00:09:13 Of course, the top players, I’m much more familiar with chess, but the top players probably have,

00:09:19 they have echoes of the same kind of intuition instinct that in a moment’s time AlphaGo applies

00:09:26 when they see a board. I mean, they’ve seen those patterns, human beings have seen those patterns

00:09:31 before at the top, at the grandmaster level. It seems that there is some similarities or maybe

00:09:41 it’s our imagination creates a vision of those similarities, but it feels like this kind of

00:09:47 pattern recognition that the AlphaGo approaches are using is similar to what human beings at the

00:09:53 top level are using.

00:09:55 I think there’s, there’s some truth to that, but not entirely. Yeah. I mean, I think the,

00:10:03 the extent to which a human grandmaster can reliably instantly recognize the right move

00:10:10 and instantly recognize the value of the position. I think that’s a little bit overrated.

00:10:15 But if you sacrifice a queen, for example, I mean, there’s these, there’s these beautiful games of

00:10:20 chess with Bobby Fischer, somebody where it’s seeming to make a bad move. And I’m not sure

00:10:28 there’s a perfect degree of calculation involved where they’ve calculated all the possible things

00:10:34 that happen, but there’s an instinct there, right? That somehow adds up to

00:10:40 Yeah. So I think what happens is you, you, you get a sense that there’s some possibility in the

00:10:46 position, even if you make a weird looking move, that it opens up some, some lines of,

00:10:56 of calculation that otherwise would be definitely bad. And, and it’s that intuition that there’s

00:11:05 something here in this position that might, might yield a win.

00:11:10 And then you follow that, right? And, and in some sense, when a, when a chess player is

00:11:16 following a line and in his or her mind, they’re, they’re mentally simulating what the other person

00:11:23 is going to do, what the opponent is going to do. And they can do that as long as the moves are kind

00:11:29 of forced, right? As long as there’s, you know, there’s a, a fort we call a forcing variation

00:11:34 where the opponent doesn’t really have much choice how to respond. And then you follow that,

00:11:39 how to respond. And then you see if you can force them into a situation where you win.

00:11:43 You know, we see plenty of mistakes even, even in grandmaster games where they just miss some

00:11:51 simple three, four, five move combination that, you know, wasn’t particularly apparent in,

00:11:58 in the position, but was still there. That’s the thing that makes us human.

00:12:02 Yeah. So when you mentioned that in Othello, those games were after some matter reasoning

00:12:09 improvements and research was able to beat you. How did that make you feel?

00:12:14 Part of the meta reasoning capability that it had was based on learning and, and you could

00:12:23 sit down the next day and you could just feel that it had got a lot smarter, you know, and all of a

00:12:30 sudden you really felt like you’re sort of pressed against the wall because it was, it was much more

00:12:37 aggressive and, and was totally unforgiving of any minor mistake that you might make. And, and

00:12:43 actually it seemed understood the game better than I did. And Gary Kasparov has this quote where

00:12:52 during his match against Deep Blue, he said, he suddenly felt that there was a new kind of

00:12:56 intelligence across the board. Do you think that’s a scary or an exciting

00:13:03 possibility for, for Kasparov and for yourself in, in the context of chess, purely sort of

00:13:10 in this, like that feeling, whatever that is? I think it’s definitely an exciting feeling.

00:13:17 You know, this is what made me work on AI in the first place was as soon as I really understood

00:13:23 what a computer was, I wanted to make it smart. You know, I started out with the first program

00:13:29 I wrote was for the Sinclair programmable calculator. And I think you could write a

00:13:35 21 step algorithm. That was the biggest program you could write, something like that. And do

00:13:42 little arithmetic calculations. So I think I implemented Newton’s method for a square

00:13:48 roots and a few other things like that. But then, you know, I thought, okay, if I just had more

00:13:54 space, I could make this thing intelligent. And so I started thinking about AI and,

00:14:04 and I think the, the, the thing that’s scary is not, is not the chess program

00:14:11 because, you know, chess programs, they’re not in the taking over the world business.

00:14:19 But if you extrapolate, you know, there are things about chess that don’t resemble

00:14:29 the real world, right? We know, we know the rules of chess.

00:14:35 The chess board is completely visible to the program where of course the real world is not

00:14:40 most, most of the real world is, is not visible from wherever you’re sitting, so to speak.

00:14:47 And to overcome those kinds of problems, you need qualitatively different algorithms. Another thing

00:14:56 about the real world is that, you know, we, we regularly plan ahead on the timescales involving

00:15:05 billions or trillions of steps. Now we don’t plan those in detail, but you know, when you

00:15:12 choose to do a PhD at Berkeley, that’s a five year commitment and that amounts to about a trillion

00:15:19 motor control steps that you will eventually be committed to. Including going up the stairs,

00:15:26 opening doors, drinking water. Yeah. I mean, every, every finger movement while you’re typing,

00:15:32 every character of every paper and the thesis and everything. So you’re not committing in

00:15:36 advance to the specific motor control steps, but you’re still reasoning on a timescale that

00:15:41 will eventually reduce to trillions of motor control actions. And so for all of these reasons,

00:15:52 you know, AlphaGo and Deep Blue and so on don’t represent any kind of threat to humanity,

00:15:58 but they are a step towards it, right? And progress in AI occurs by essentially removing

00:16:07 one by one these assumptions that make problems easy. Like the assumption of complete observability

00:16:14 of the situation, right? We remove that assumption, you need a much more complicated

00:16:20 kind of computing design. It needs, it needs something that actually keeps track of all the

00:16:25 things you can’t see and tries to estimate what’s going on. And there’s inevitable uncertainty

00:16:31 in that. So it becomes a much more complicated problem. But, you know, we are removing those

00:16:36 assumptions. We are starting to have algorithms that can cope with much longer timescales,

00:16:42 that can cope with uncertainty, that can cope with partial observability.

00:16:47 And so each of those steps sort of magnifies by a thousand the range of things that we can

00:16:54 do with AI systems. So the way I started in AI, I wanted to be a psychiatrist for a long time. I

00:16:58 wanted to understand the mind in high school and of course program and so on. And I showed up

00:17:04 University of Illinois to an AI lab and they said, okay, I don’t have time for you,

00:17:10 but here’s a book, AI and Modern Approach. I think it was the first edition at the time.

00:17:16 Here, go, go, go learn this. And I remember the lay of the land was, well, it’s incredible that

00:17:22 we solved chess, but we’ll never solve go. I mean, it was pretty certain that go

00:17:27 in the way we thought about systems that reason wasn’t possible to solve. And now we’ve solved

00:17:33 this. So it’s a very… Well, I think I would have said that it’s unlikely we could take

00:17:39 the kind of algorithm that was used for chess and just get it to scale up and work well for go.

00:17:46 And at the time what we thought was that in order to solve go, we would have to do something similar

00:17:56 to the way humans manage the complexity of go, which is to break it down into kind of sub games.

00:18:02 So when a human thinks about a go board, they think about different parts of the board as sort

00:18:08 of weakly connected to each other. And they think about, okay, within this part of the board, here’s

00:18:13 how things could go in that part of board, here’s how things could go. And then you try to sort of

00:18:18 couple those two analyses together and deal with the interactions and maybe revise your views of

00:18:24 how things are going to go in each part. And then you’ve got maybe five, six, seven, ten parts of

00:18:28 the board. And that actually resembles the real world much more than chess does because in the

00:18:38 real world, we have work, we have home life, we have sport, different kinds of activities,

00:18:46 shopping, these all are connected to each other, but they’re weakly connected. So when I’m typing

00:18:54 a paper, I don’t simultaneously have to decide which order I’m going to get the milk and the

00:19:01 butter, that doesn’t affect the typing. But I do need to realize, okay, I better finish this

00:19:08 before the shops close because I don’t have anything, I don’t have any food at home. So

00:19:12 there’s some weak connection, but not in the way that chess works where everything is tied into a

00:19:19 single stream of thought. So the thought was that to solve go, we’d have to make progress on stuff

00:19:26 that would be useful for the real world. And in a way, AlphaGo is a little bit disappointing,

00:19:29 right? Because the program designed for AlphaGo is actually not that different from Deep Blue

00:19:39 or even from Arthur Samuel’s checker playing program from the 1950s. And in fact, the two

00:19:48 things that make AlphaGo work is one is this amazing ability to evaluate the positions,

00:19:53 and the other is the meta reasoning capability, which allows it to

00:19:57 explore some paths in the tree very deeply and to abandon other paths very quickly.

00:20:04 So this word meta reasoning, while technically correct, inspires perhaps the wrong degree of

00:20:14 power that AlphaGo has, for example, the word reasoning is a powerful word. So let me ask you,

00:20:19 sort of, you were part of the symbolic AI world for a while, like AI was, there’s a lot of

00:20:27 excellent, interesting ideas there that unfortunately met a winter. And so do you think it reemerges?

00:20:38 So I would say, yeah, it’s not quite as simple as that. So the AI winter

00:20:44 for the first winter that was actually named as such was the one in the late 80s.

00:20:51 And that came about because in the mid 80s, there was a really a concerted attempt to push AI

00:21:01 out into the real world using what was called expert system technology. And for the most part,

00:21:09 that technology was just not ready for primetime. They were trying, in many cases, to do a form of

00:21:17 uncertain reasoning, judgment, combinations of evidence, diagnosis, those kinds of things,

00:21:24 which was simply invalid. And when you try to apply invalid reasoning methods to real problems,

00:21:31 you can fudge it for small versions of the problem. But when it starts to get larger,

00:21:36 the thing just falls apart. So many companies found that the stuff just didn’t work, and they

00:21:44 were spending tons of money on consultants to try to make it work. And there were other

00:21:50 practical reasons, like they were asking the companies to buy incredibly expensive

00:21:56 Lisp machine workstations, which were literally between $50,000 and $100,000 in 1980s money,

00:22:06 which would be like between $150,000 and $300,000 per workstation in current prices.

00:22:13 And then the bottom line, they weren’t seeing a profit from it.

00:22:17 Yeah, in many cases. I think there were some successes, there’s no doubt about that. But

00:22:23 people, I would say, overinvested. Every major company was starting an AI department, just like

00:22:30 now. And I worry a bit that we might see similar disappointments, not because the current technology

00:22:40 is invalid, but it’s limited in its scope. And it’s almost the duel of the scope problems that

00:22:51 expert systems had. So what have you learned from that hype cycle? And what can we do to

00:22:56 prevent another winter, for example? Yeah, so when I’m giving talks these days,

00:23:02 that’s one of the warnings that I give. So this is a two part warning slide. One is that rather

00:23:11 than data being the new oil, data is the new snake oil. That’s a good line. And then the other

00:23:18 is that we might see a kind of very visible failure in some of the major application areas. And I think

00:23:30 self driving cars would be the flagship. And I think when you look at the history,

00:23:40 so the first self driving car was on the freeway, driving itself, changing lanes, overtaking in 1987.

00:23:52 And so it’s more than 30 years. And that kind of looks like where we are today, right? You know,

00:23:59 prototypes on the freeway, changing lanes and overtaking. Now, I think that’s one of the things

00:24:05 that’s been made, particularly on the perception side. So we worked a lot on autonomous vehicles

00:24:12 in the early mid 90s at Berkeley. And we had our own big demonstrations. We put congressmen into

00:24:21 self driving cars and had them zooming along the freeway. And the problem was clearly perception.

00:24:30 At the time, the problem was perception. Yeah. So in simulation, with perfect perception,

00:24:36 you could actually show that you can drive safely for a long time, even if the other cars are

00:24:40 misbehaving and so on. But simultaneously, we worked on machine vision for detecting cars and

00:24:48 tracking pedestrians and so on. And we couldn’t get the cars to do that. And so we had to do

00:24:56 that for pedestrians and so on. And we couldn’t get the reliability of detection and tracking

00:25:03 up to a high enough level, particularly in bad weather conditions, nighttime,

00:25:10 rainfall. Good enough for demos, but perhaps not good enough to cover the general operation.

00:25:15 Yeah. So the thing about driving is, you know, suppose you’re a taxi driver, you know,

00:25:19 and you drive every day, eight hours a day for 10 years, right? That’s 100 million seconds of

00:25:25 driving, you know, and any one of those seconds, you can make a fatal mistake. So you’re talking

00:25:30 about eight nines of reliability, right? Now, if your vision system only detects 98.3% of the

00:25:40 vehicles, right, then that’s sort of, you know, one in a bit nines of reliability. So you have

00:25:47 another seven orders of magnitude to go. And this is what people don’t understand. They think,

00:25:54 oh, because I had a successful demo, I’m pretty much done. But you’re not even within seven orders

00:26:01 of magnitude of being done. And that’s the difficulty. And it’s not the, can I follow a

00:26:09 white line? That’s not the problem, right? We follow a white line all the way across the country.

00:26:16 But it’s the weird stuff that happens. It’s all the edge cases, yeah.

00:26:22 The edge case, other drivers doing weird things. You know, so if you talk to Google, right, so

00:26:29 they had actually a very classical architecture where, you know, you had machine vision which

00:26:35 would detect all the other cars and pedestrians and the white lines and the road signs. And then

00:26:41 basically that was fed into a logical database. And then you had a classical 1970s rule based

00:26:48 expert system telling you, okay, if you’re in the middle lane and there’s a bicyclist in the right

00:26:55 lane who is signaling this, then you do that, right? And what they found was that every day

00:27:02 they’d go out and there’d be another situation that the rules didn’t cover. You know, so they’d

00:27:06 come to a traffic circle and there’s a little girl riding her bicycle the wrong way around

00:27:10 the traffic circle. Okay, what do you do? We don’t have a rule. Oh my God. Okay, stop.

00:27:14 And then, you know, they come back and add more rules and they just found that this was not really

00:27:20 converging. And if you think about it, right, how do you deal with an unexpected situation,

00:27:28 meaning one that you’ve never previously encountered and the sort of reasoning required

00:27:35 to figure out the solution for that situation has never been done. It doesn’t match any previous

00:27:41 situation in terms of the kind of reasoning you have to do. Well, you know, in chess programs,

00:27:46 this happens all the time, right? You’re constantly coming up with situations you haven’t

00:27:51 seen before and you have to reason about them and you have to think about, okay, here are the

00:27:56 possible things I could do. Here are the outcomes. Here’s how desirable the outcomes are and then

00:28:01 pick the right one. You know, in the 90s, we were saying, okay, this is how you’re going to have to

00:28:05 do automated vehicles. They’re going to have to have a look ahead capability, but the look ahead

00:28:10 for driving is more difficult than it is for chess because there’s humans and they’re less

00:28:18 predictable than chess pieces. Well, then you have an opponent in chess who’s also somewhat

00:28:23 unpredictable. But for example, in chess, you always know the opponent’s intention. They’re

00:28:29 trying to beat you, right? Whereas in driving, you don’t know is this guy trying to turn left

00:28:36 or has he just forgotten to turn off his turn signal or is he drunk or is he changing the

00:28:42 channel on his radio or whatever it might be. You’ve got to try and figure out the mental state,

00:28:47 the intent of the other drivers to forecast the possible evolutions of their trajectories.

00:28:54 And then you’ve got to figure out, okay, which is the trajectory for me that’s going to be safest.

00:29:00 And those all interact with each other because the other drivers are going to react to your

00:29:04 trajectory and so on. So, you know, they’ve got the classic merging onto the freeway problem where

00:29:10 you’re kind of racing a vehicle that’s already on the freeway and you’re going to pull ahead of

00:29:15 them or you’re going to let them go first and pull in behind and you get this sort of uncertainty

00:29:19 about who’s going first. So all those kinds of things mean that you need a decision making

00:29:29 architecture that’s very different from either a rule based system or it seems to me kind of an

00:29:37 end to end neural network system. So just as AlphaGo is pretty good when it doesn’t do any

00:29:43 look ahead, but it’s way, way, way, way better when it does, I think the same is going to be

00:29:49 true for driving. You can have a driving system that’s pretty good when it doesn’t do any look

00:29:55 ahead, but that’s not good enough. And we’ve already seen multiple deaths caused by poorly

00:30:03 designed machine learning algorithms that don’t really understand what they’re doing.

00:30:09 Yeah. On several levels, I think on the perception side, there’s mistakes being made by those

00:30:16 algorithms where the perception is very shallow. On the planning side, the look ahead, like you

00:30:21 said, and the thing that we come up against that’s really interesting when you try to deploy systems

00:30:31 in the real world is you can’t think of an artificial intelligence system as a thing that

00:30:36 responds to the world always. You have to realize that it’s an agent that others will respond to as

00:30:41 well. So in order to drive successfully, you can’t just try to do obstacle avoidance.

00:30:47 Right. You can’t pretend that you’re invisible, right? You’re the invisible car.

00:30:51 Right. It doesn’t work that way.

00:30:53 I mean, but you have to assert yet others have to be scared of you. Just we’re all,

00:30:58 there’s this tension, there’s this game. So if we study a lot of work with pedestrians,

00:31:04 if you approach pedestrians as purely an obstacle avoidance, so you’re doing look ahead as in

00:31:10 modeling the intent that they’re not going to, they’re going to take advantage of you. They’re

00:31:15 not going to respect you at all. There has to be a tension, a fear, some amount of uncertainty.

00:31:21 That’s how we have created.

00:31:24 Or at least just a kind of a resoluteness. You have to display a certain amount of

00:31:29 resoluteness. You can’t be too tentative. And yeah, so the solutions then become

00:31:39 pretty complicated, right? You get into game theoretic analyses. And so at Berkeley now,

00:31:46 we’re working a lot on this kind of interaction between machines and humans.

00:31:51 And that’s exciting.

00:31:53 And so my colleague, Ankur Dragan, actually, if you formulate the problem game theoretically,

00:32:04 you just let the system figure out the solution. It does interesting unexpected things. Like

00:32:10 sometimes at a stop sign, if no one is going first, the car will actually back up a little,

00:32:18 right? And just to indicate to the other cars that they should go. And that’s something it

00:32:23 invented entirely by itself. We didn’t say this is the language of communication at stop signs.

00:32:29 It figured it out.

00:32:30 That’s really interesting. So let me one just step back for a second. Just this beautiful

00:32:38 philosophical notion. So Pamela McCordick in 1979 wrote, AI began with the ancient wish to

00:32:47 forge the gods. So when you think about the history of our civilization, do you think

00:32:53 that there is an inherent desire to create, let’s not say gods, but to create superintelligence?

00:33:01 Is it inherent to us? Is it in our genes? That the natural arc of human civilization is to create

00:33:11 things that are of greater and greater power and perhaps echoes of ourselves. So to create the gods

00:33:19 as Pamela said. Maybe. I mean, we’re all individuals, but certainly we see over and over

00:33:32 again in history, individuals who thought about this possibility. Hopefully when I’m not being too

00:33:40 philosophical here, but if you look at the arc of this, where this is going and we’ll talk about AI

00:33:47 safety, we’ll talk about greater and greater intelligence. Do you see that there in, when you

00:33:54 created the Othello program and you felt this excitement, what was that excitement? Was it

00:33:59 excitement of a tinkerer who created something cool like a clock? Or was there a magic or was

00:34:07 it more like a child being born? Yeah. So I mean, I certainly understand that viewpoint. And if you

00:34:14 look at the Lighthill report, which was, so in the 70s, there was a lot of controversy in the UK

00:34:23 about AI and whether it was for real and how much money the government should invest. And

00:34:32 there was a long story, but the government commissioned a report by Lighthill, who was a

00:34:39 physicist, and he wrote a very damning report about AI, which I think was the point. And he

00:34:48 said that these are frustrated men who are unable to have children would like to create and create

00:34:59 a life as a kind of replacement, which I think is really pretty unfair. But there is a kind of magic,

00:35:17 I would say, when you build something and what you’re building in is really just, you’re building

00:35:28 in some understanding of the principles of learning and decision making. And to see those

00:35:35 principles actually then turn into intelligent behavior in specific situations, it’s an

00:35:45 incredible thing. And that is naturally going to make you think, okay, where does this end?

00:36:00 And so there’s magical optimistic views of where it ends, whatever your view of optimism is,

00:36:08 whatever your view of utopia is, it’s probably different for everybody. But you’ve often talked

00:36:13 about concerns you have of how things may go wrong. So I’ve talked to Max Tegmark. There’s a

00:36:26 lot of interesting ways to think about AI safety. You’re one of the seminal people thinking about

00:36:33 this problem amongst sort of being in the weeds of actually solving specific AI problems. You’re

00:36:39 also thinking about the big picture of where are we going? So can you talk about several elements

00:36:44 of it? Let’s just talk about maybe the control problem. So this idea of losing ability to control

00:36:52 the behavior in our AI system. So how do you see that? How do you see that coming about?

00:37:00 What do you think we can do to manage it?

00:37:04 Well, so it doesn’t take a genius to realize that if you make something that’s smarter than you,

00:37:09 you might have a problem. Alan Turing wrote about this and gave lectures about this in 1951.

00:37:22 He did a lecture on the radio and he basically says, once the machine thinking method starts,

00:37:31 very quickly they’ll outstrip humanity. And if we’re lucky, we might be able to turn off the power

00:37:42 at strategic moments, but even so, our species would be humbled. Actually, he was wrong about

00:37:49 that. If it’s sufficiently intelligent machine, it’s not going to let you switch it off. It’s

00:37:55 actually in competition with you. So what do you think is most likely going to happen?

00:37:59 What do you think is meant just for a quick tangent, if we shut off this super intelligent

00:38:06 machine that our species will be humbled? I think he means that we would realize that

00:38:16 we are inferior, right? That we only survive by the skin of our teeth because we happen to get

00:38:22 to the off switch just in time. And if we hadn’t, then we would have lost control over the earth.

00:38:32 Are you more worried when you think about this stuff about super intelligent AI,

00:38:36 or are you more worried about super powerful AI that’s not aligned with our values? So the

00:38:43 paperclip scenarios kind of… So the main problem I’m working on is the control problem, the problem

00:38:54 of machines pursuing objectives that are, as you say, not aligned with human objectives. And

00:39:02 this has been the way we’ve thought about AI since the beginning.

00:39:07 You build a machine for optimizing, and then you put in some objective, and it optimizes, right?

00:39:14 And we can think of this as the King Midas problem, right? Because if the King Midas put

00:39:23 in this objective, everything I touch should turn to gold. And the gods, that’s like the machine,

00:39:30 they said, okay, done. You now have this power. And of course, his father,

00:39:35 his drink, and his family all turned to gold. And then he dies of misery and starvation. And

00:39:47 it’s a warning, it’s a failure mode that pretty much every culture in history has had some story

00:39:54 along the same lines. There’s the genie that gives you three wishes, and the third wish is always,

00:39:59 you know, please undo the first two wishes because I messed up. And when Arthur Samuel wrote his

00:40:09 checker playing program, which learned to play checkers considerably better than

00:40:13 Arthur Samuel could play, and actually reached a pretty decent standard.

00:40:20 Norbert Wiener, who was one of the major mathematicians of the 20th century,

00:40:24 he’s sort of the father of modern automation control systems. He saw this and he basically

00:40:31 extrapolated, as Turing did, and said, okay, this is how we could lose control.

00:40:39 And specifically, that we have to be certain that the purpose we put into the machine is the

00:40:49 purpose which we really desire. And the problem is, we can’t do that.

00:40:57 You mean we’re not, it’s a very difficult to encode,

00:41:00 to put our values on paper is really difficult, or you’re just saying it’s impossible?

00:41:10 So theoretically, it’s possible, but in practice, it’s extremely unlikely that we could

00:41:17 specify correctly in advance, the full range of concerns of humanity.

00:41:24 You talked about cultural transmission of values,

00:41:27 I think is how humans to human transmission of values happens, right?

00:41:31 Well, we learn, yeah, I mean, as we grow up, we learn about the values that matter,

00:41:37 how things should go, what is reasonable to pursue and what isn’t reasonable to pursue.

00:41:43 You think machines can learn in the same kind of way?

00:41:46 Yeah, so I think that what we need to do is to get away from this idea that

00:41:52 you build an optimising machine, and then you put the objective into it.

00:41:56 Because if it’s possible that you might put in a wrong objective, and we already know this is

00:42:03 possible because it’s happened lots of times, right? That means that the machine should never

00:42:09 take an objective that’s given as gospel truth. Because once it takes the objective as gospel

00:42:18 truth, then it believes that whatever actions it’s taking in pursuit of that objective are

00:42:26 the correct things to do. So you could be jumping up and down and saying, no, no, no,

00:42:30 no, you’re going to destroy the world, but the machine knows what the true objective is and is

00:42:35 pursuing it, and tough luck to you. And this is not restricted to AI, right? This is, I think,

00:42:42 many of the 20th century technologies, right? So in statistics, you minimise a loss function,

00:42:48 the loss function is exogenously specified. In control theory, you minimise a cost function.

00:42:53 In operations research, you maximise a reward function, and so on. So in all these disciplines,

00:42:59 this is how we conceive of the problem. And it’s the wrong problem because we cannot specify

00:43:07 with certainty the correct objective, right? We need uncertainty, we need the machine to be

00:43:13 uncertain about what it is that it’s supposed to be maximising.

00:43:18 Favourite idea of yours, I’ve heard you say somewhere, well, I shouldn’t pick favourites,

00:43:23 but it just sounds beautiful, we need to teach machines humility. It’s a beautiful way to put it,

00:43:31 I love it.

00:43:32 That they’re humble, they know that they don’t know what it is they’re supposed to be doing,

00:43:39 and that those objectives, I mean, they exist, they’re within us, but we may not be able to

00:43:47 we may not be able to explicate them, we may not even know how we want our future to go.

00:43:56 Exactly.

00:43:58 And the machine, a machine that’s uncertain is going to be deferential to us. So if we say,

00:44:06 don’t do that, well, now the machines learn something a bit more about our true objectives,

00:44:11 because something that it thought was reasonable in pursuit of our objective,

00:44:16 turns out not to be, so now it’s learned something. So it’s going to defer because

00:44:20 it wants to be doing what we really want. And that point, I think, is absolutely central

00:44:30 to solving the control problem. And it’s a different kind of AI when you take away this

00:44:37 idea that the objective is known, then in fact, a lot of the theoretical frameworks that we’re so

00:44:44 familiar with, you know, Markov decision processes, goal based planning, you know,

00:44:53 standard games research, all of these techniques actually become inapplicable.

00:44:59 And you get a more complicated problem because now the interaction with the human becomes part

00:45:11 of the problem. Because the human by making choices is giving you more information about

00:45:21 the true objective and that information helps you achieve the objective better.

00:45:26 And so that really means that you’re mostly dealing with game theoretic problems where

00:45:31 you’ve got the machine and the human and they’re coupled together,

00:45:35 rather than a machine going off by itself with a fixed objective.

00:45:39 LW. Which is fascinating on the machine and the human level that we, when you don’t have an

00:45:46 objective, means you’re together coming up with an objective. I mean, there’s a lot of philosophy

00:45:53 that, you know, you could argue that life doesn’t really have meaning. We together agree on what

00:45:58 gives it meaning and we kind of culturally create things that give why the heck we are on this earth

00:46:05 anyway. We together as a society create that meaning and you have to learn that objective.

00:46:11 And one of the biggest, I thought that’s where you were going to go for a second,

00:46:15 one of the biggest troubles we run into outside of statistics and machine learning and AI

00:46:21 and just human civilization is when you look at, I came from, I was born in the Soviet Union

00:46:28 and the history of the 20th century, we ran into the most trouble, us humans, when there was a

00:46:36 certainty about the objective and you do whatever it takes to achieve that objective, whether you’re

00:46:41 talking about Germany or communist Russia. You get into trouble with humans.

00:46:47 I would say with, you know, corporations, in fact, some people argue that, you know,

00:46:52 we don’t have to look forward to a time when AI systems take over the world. They already have

00:46:57 and they call corporations, right? That corporations happen to be using people as

00:47:03 components right now, but they are effectively algorithmic machines and they’re optimizing

00:47:10 an objective, which is quarterly profit that isn’t aligned with overall wellbeing of the human race.

00:47:17 And they are destroying the world. They are primarily responsible for our inability to tackle

00:47:23 climate change. So I think that’s one way of thinking about what’s going on with corporations,

00:47:30 but I think the point you’re making is valid that there are many systems in the real world where

00:47:39 we’ve sort of prematurely fixed on the objective and then decoupled the machine from those that’s

00:47:48 supposed to be serving. And I think you see this with government, right? Government is supposed to

00:47:54 be a machine that serves people, but instead it tends to be taken over by people who have their

00:48:02 own objective and use government to optimize that objective regardless of what people want.

00:48:09 Do you find appealing the idea of almost arguing machines where you have multiple AI systems with

00:48:16 a clear fixed objective. We have in government, the red team and the blue team, they’re very fixed on

00:48:22 their objectives and they argue and they kind of may disagree, but it kind of seems to make it

00:48:29 work somewhat that the duality of it. Okay. Let’s go a hundred years back when there was still was

00:48:39 going on or at the founding of this country, there was disagreements and that disagreement is where,

00:48:46 so it was a balance between certainty and forced humility because the power was distributed.

00:48:53 Yeah. I think that the nature of debate and disagreement argument takes as a premise,

00:49:04 the idea that you could be wrong, which means that you’re not necessarily absolutely convinced

00:49:12 that your objective is the correct one. If you were absolutely convinced, there’d be no point

00:49:19 in having any discussion or argument because you would never change your mind and there wouldn’t

00:49:24 be any sort of synthesis or anything like that. I think you can think of argumentation as an

00:49:32 implementation of a form of uncertain reasoning. I’ve been reading recently about utilitarianism

00:49:44 and the history of efforts to define in a sort of clear mathematical way,

00:49:53 if you like a formula for moral or political decision making. It’s really interesting that

00:50:00 the parallels between the philosophical discussions going back 200 years and what you see now in

00:50:07 discussions about existential risk because it’s almost exactly the same. Someone would say,

00:50:14 okay, well here’s a formula for how we should make decisions. Utilitarianism is roughly each

00:50:20 person has a utility function and then we make decisions to maximize the sum of everybody’s

00:50:27 utility. Then people point out, well, in that case, the best policy is one that leads to

00:50:36 the enormously vast population, all of whom are living a life that’s barely worth living.

00:50:44 This is called the repugnant conclusion. Another version is that we should maximize

00:50:50 pleasure and that’s what we mean by utility. Then you’ll get people effectively saying, well,

00:50:57 in that case, we might as well just have everyone hooked up to a heroin drip. They didn’t use those

00:51:03 words, but that debate was happening in the 19th century as it is now about AI that if we get the

00:51:11 formula wrong, we’re going to have AI systems working towards an outcome that in retrospect

00:51:20 would be exactly wrong. Do you think there’s, as beautifully put, so the echoes are there,

00:51:26 but do you think, I mean, if you look at Sam Harris, our imagination worries about the AI

00:51:32 version of that because of the speed at which the things going wrong in the utilitarian context

00:51:44 could happen. Is that a worry for you? Yeah. I think that in most cases, not in all, but if we

00:51:53 have a wrong political idea, we see it starting to go wrong and we’re not completely stupid and so

00:52:00 we say, okay, maybe that was a mistake. Let’s try something different. Also, we’re very slow and

00:52:09 inefficient about implementing these things and so on. So you have to worry when you have

00:52:14 corporations or political systems that are extremely efficient. But when we look at AI systems

00:52:22 or even just computers in general, they have this different characteristic from ordinary

00:52:29 human activity in the past. So let’s say you were a surgeon, you had some idea about how to do some

00:52:36 operation. Well, and let’s say you were wrong, that way of doing the operation would mostly

00:52:42 kill the patient. Well, you’d find out pretty quickly, like after three, maybe three or four

00:52:49 tries. But that isn’t true for pharmaceutical companies because they don’t do three or four

00:53:00 operations. They manufacture three or four billion pills and they sell them and then they find out

00:53:05 maybe six months or a year later that, oh, people are dying of heart attacks or getting cancer from

00:53:11 this drug. And so that’s why we have the FDA, right? Because of the scalability of pharmaceutical

00:53:18 production. And there have been some unbelievably bad episodes in the history of pharmaceuticals

00:53:29 and adulteration of products and so on that have killed tens of thousands or paralyzed hundreds

00:53:36 of thousands of people. Now with computers, we have that same scalability problem that you can

00:53:43 sit there and type for I equals one to five billion do, right? And all of a sudden you’re

00:53:49 having an impact on a global scale. And yet we have no FDA, right? There’s absolutely no controls

00:53:56 at all over what a bunch of undergraduates with too much caffeine can do to the world.

00:54:03 And we look at what happened with Facebook, well, social media in general and click through

00:54:10 optimization. So you have a simple feedback algorithm that’s trying to just optimize click

00:54:18 through, right? That sounds reasonable, right? Because you don’t want to be feeding people ads

00:54:24 that they don’t care about or not interested in. And you might even think of that process as

00:54:33 simply adjusting the feeding of ads or news articles or whatever it might be

00:54:41 to match people’s preferences, right? Which sounds like a good idea.

00:54:47 But in fact, that isn’t how the algorithm works, right? You make more money,

00:54:54 the algorithm makes more money if it can better predict what people are going to click on,

00:55:01 because then it can feed them exactly that, right? So the way to maximize click through

00:55:07 is actually to modify the people to make them more predictable. And one way to do that is to

00:55:16 feed them information, which will change their behavior and preferences towards extremes that

00:55:23 make them predictable. Whatever is the nearest extreme or the nearest predictable point,

00:55:29 that’s where you’re going to end up. And the machines will force you there.

00:55:35 And I think there’s a reasonable argument to say that this, among other things,

00:55:40 is contributing to the destruction of democracy in the world.

00:55:47 And where was the oversight of this process? Where were the people saying, okay,

00:55:52 you would like to apply this algorithm to 5 billion people on the face of the earth.

00:55:58 Can you show me that it’s safe? Can you show me that it won’t have various kinds of negative

00:56:03 effects? No, there was no one asking that question. There was no one placed between

00:56:11 the undergrads with too much caffeine and the human race. They just did it.

00:56:16 But some way outside the scope of my knowledge, so economists would argue that the, what is it,

00:56:22 the invisible hand, so the capitalist system, it was the oversight. So if you’re going to corrupt

00:56:29 society with whatever decision you make as a company, then that’s going to be reflected in

00:56:33 people not using your product. That’s one model of oversight.

00:56:38 We shall see, but in the meantime, but you might even have broken the political system

00:56:48 that enables capitalism to function. Well, you’ve changed it.

00:56:53 We shall see.

00:56:54 Change is often painful. So my question is absolutely, it’s fascinating. You’re absolutely

00:57:01 right that there was zero oversight on algorithms that can have a profound civilization changing

00:57:09 effect. So do you think it’s possible? I mean, I haven’t, have you seen government? So do you

00:57:15 think it’s possible to create regulatory bodies oversight over AI algorithms, which are inherently

00:57:24 such cutting edge set of ideas and technologies?

00:57:28 Yeah, but I think it takes time to figure out what kind of oversight, what kinds of controls.

00:57:35 I mean, it took time to design the FDA regime, you know, and some people still don’t like it and

00:57:40 they want to fix it. And I think there are clear ways that it could be improved.

00:57:46 But the whole notion that you have stage one, stage two, stage three, and here are the criteria

00:57:51 for what you have to do to pass a stage one trial, right? We haven’t even thought about what those

00:57:58 would be for algorithms. So, I mean, I think there are things we could do right now with regard to

00:58:07 bias, for example, we have a pretty good technical handle on how to detect algorithms that are

00:58:15 propagating bias that exists in data sets, how to de bias those algorithms, and even what it’s going

00:58:22 to cost you to do that. So I think we could start having some standards on that. I think there are

00:58:30 things to do with impersonation and falsification that we could work on.

00:58:37 Fakes, yeah.

00:58:38 A very simple point. So impersonation is a machine acting as if it was a person.

00:58:46 I can’t see a real justification for why we shouldn’t insist that machines self identify

00:58:53 as machines. Where is the social benefit in fooling people into thinking that this is really

00:59:02 a person when it isn’t? I don’t mind if it uses a human like voice, that’s easy to understand,

00:59:09 that’s fine, but it should just say, I’m a machine in some form.

00:59:14 And how many people are speaking to that? I would think relatively obvious facts.

00:59:20 Yeah, I mean, there is actually a law in California that bans impersonation, but only in certain

00:59:27 restricted circumstances. So for the purpose of engaging in a fraudulent transaction and for the

00:59:36 purpose of modifying someone’s voting behavior. So those are the circumstances where machines have

00:59:44 to self identify. But I think arguably, it should be in all circumstances. And

00:59:51 then when you talk about deep fakes, we’re just at the beginning, but already it’s possible to

00:59:58 make a movie of anybody saying anything in ways that are pretty hard to detect.

01:00:05 Including yourself because you’re on camera now and your voice is coming through with high

01:00:09 resolution.

01:00:09 Yeah, so you could take what I’m saying and replace it with pretty much anything else you

01:00:13 wanted me to be saying. And it’s a very simple thing.

01:00:17 Take what I’m saying and replace it with pretty much anything else you wanted me to be saying. And

01:00:21 even it would change my lips and facial expressions to fit. And there’s actually not much

01:00:30 in the way of real legal protection against that. I think in the commercial area, you could say,

01:00:38 yeah, you’re using my brand and so on. There are rules about that. But in the political sphere,

01:00:45 I think at the moment, anything goes. That could be really, really damaging.

01:00:53 And let me just try to make not an argument, but try to look back at history and say something dark

01:01:04 in essence is while regulation seems to be, oversight seems to be exactly the right thing to

01:01:10 do here. It seems that human beings, what they naturally do is they wait for something to go

01:01:15 wrong. If you’re talking about nuclear weapons, you can’t talk about nuclear weapons being dangerous

01:01:21 until somebody actually like the United States drops the bomb or Chernobyl melting. Do you think

01:01:28 we will have to wait for things going wrong in a way that’s obviously damaging to society,

01:01:36 not an existential risk, but obviously damaging? Or do you have faith that…

01:01:43 I hope not, but I think we do have to look at history.

01:01:49 And so the two examples you gave, nuclear weapons and nuclear power are very, very interesting

01:01:57 because nuclear weapons, we knew in the early years of the 20th century that atoms contained

01:02:07 a huge amount of energy. We had E equals MC squared. We knew the mass differences between

01:02:12 the different atoms and their components. And we knew that

01:02:17 you might be able to make an incredibly powerful explosive. So HG Wells wrote science fiction book,

01:02:23 I think in 1912. Frederick Soddy, who was the guy who discovered isotopes, the Nobel prize winner,

01:02:31 he gave a speech in 1915 saying that one pound of this new explosive would be the equivalent

01:02:40 of 150 tons of dynamite, which turns out to be about right. And this was in World War I,

01:02:48 so he was imagining how much worse the world war would be if we were using that kind of explosive.

01:02:56 But the physics establishment simply refused to believe that these things could be made.

01:03:04 Including the people who are making it.

01:03:05 Well, so they were doing the nuclear physics. I mean, eventually were the ones who made it.

01:03:11 You talk about Fermi or whoever.

01:03:13 Well, so up to the development was mostly theoretical. So it was people using sort of

01:03:22 primitive kinds of particle acceleration and doing experiments at the level of single particles

01:03:29 or collections of particles. They weren’t yet thinking about how to actually make a bomb or

01:03:37 anything like that. But they knew the energy was there and they figured if they understood it

01:03:40 better, it might be possible. But the physics establishment, their view, and I think because

01:03:47 they did not want it to be true, their view was that it could not be true. That this could not

01:03:54 not provide a way to make a super weapon. And there was this famous speech given by Rutherford,

01:04:03 who was the sort of leader of nuclear physics. And it was on September 11th, 1933. And he said,

01:04:11 anyone who talks about the possibility of obtaining energy from transformation of atoms

01:04:17 is talking complete moonshine. And the next morning, Leo Szilard read about that speech

01:04:26 and then invented the nuclear chain reaction. And so as soon as he invented, as soon as he had that

01:04:32 idea that you could make a chain reaction with neutrons, because neutrons were not repelled by

01:04:38 the nucleus, so they could enter the nucleus and then continue the reaction. As soon as he has that

01:04:44 idea, he instantly realized that the world was in deep doo doo. Because this is 1933, right? Hitler

01:04:54 had recently come to power in Germany. Szilard was in London and eventually became a refugee

01:05:04 and came to the US. And in the process of having the idea about the chain reaction,

01:05:11 he figured out basically how to make a bomb and also how to make a reactor. And he patented the

01:05:18 reactor in 1934. But because of the situation, the great power conflict situation that he could see

01:05:27 happening, he kept that a secret. And so between then and the beginning of World War II, people

01:05:39 were working, including the Germans, on how to actually create neutron sources, what specific

01:05:50 fission reactions would produce neutrons of the right energy to continue the reaction.

01:05:57 And that was demonstrated in Germany, I think in 1938, if I remember correctly.

01:06:01 The first nuclear weapon patent was 1939 by the French. So this was actually going on well before

01:06:16 World War II really got going. And then the British probably had the most advanced capability

01:06:22 in this area. But for safety reasons, among others, and just resources, they moved the program

01:06:30 from Britain to the US and then that became Manhattan Project. So the reason why we couldn’t

01:06:40 have any kind of oversight of nuclear weapons and nuclear technology

01:06:46 was because we were basically already in an arms race and a war.

01:06:50 LR But you mentioned then in the 20s and 30s. So what are the echoes? The way you’ve described

01:07:00 this story, I mean, there’s clearly echoes. Why do you think most AI researchers,

01:07:06 folks who are really close to the metal, they really are not concerned about AI. They don’t

01:07:11 think about it, whether it’s they don’t want to think about it. But why do you think that is,

01:07:18 is what are the echoes of the nuclear situation to the current AI situation? And what can we do

01:07:27 about it? BF I think there is a kind of motivated cognition, which is a term in psychology means

01:07:35 that you believe what you would like to be true, rather than what is true. And it’s unsettling

01:07:46 to think that what you’re working on might be the end of the human race, obviously. So you would

01:07:52 rather instantly deny it and come up with some reason why it couldn’t be true. And I have,

01:08:00 I collected a long list of reasons that extremely intelligent, competent AI scientists have come up

01:08:08 with for why we shouldn’t worry about this. For example, calculators are superhuman at arithmetic

01:08:16 and they haven’t taken over the world. So there’s nothing to worry about. Well, okay, my five year

01:08:22 old, you know, could have figured out why that was an unreasonable and really quite weak argument.

01:08:29 Another one was, while it’s theoretically possible that you could have superhuman AI destroy the

01:08:40 world, it’s also theoretically possible that a black hole could materialize right next to the

01:08:45 earth and destroy humanity. I mean, yes, it’s theoretically possible, quantum theoretically,

01:08:50 extremely unlikely that it would just materialize right there. But that’s a completely bogus analogy,

01:08:58 because, you know, if the whole physics community on earth was working to materialize a black hole

01:09:04 in near earth orbit, right? Wouldn’t you ask them, is that a good idea? Is that going to be safe?

01:09:10 You know, what if you succeed? Right. And that’s the thing, right? The AI community is sort of

01:09:16 refused to ask itself, what if you succeed? And initially I think that was because it was too hard,

01:09:24 but, you know, Alan Turing asked himself that, and he said, we’d be toast, right? If we were lucky,

01:09:32 we might be able to switch off the power, but probably we’d be toast. But there’s also an aspect

01:09:37 that because we’re not exactly sure what the future holds, it’s not clear exactly,

01:09:45 so technically what to worry about, sort of how things go wrong. And so there is something,

01:09:53 it feels like, maybe you can correct me if I’m wrong, but there’s something paralyzing about

01:09:58 worrying about something that logically is inevitable, but you have to think about it,

01:10:05 logically is inevitable, but you don’t really know what that will look like.

01:10:10 Yeah, I think that’s, it’s a reasonable point and, you know, it’s certainly in terms of

01:10:18 existential risks, it’s different from, you know, asteroid collides with the earth, right? Which,

01:10:24 again, is quite possible, you know, it’s happened in the past, it’ll probably happen again,

01:10:29 we don’t know right now, but if we did detect an asteroid that was going to hit the earth

01:10:34 in 75 years time, we’d certainly be doing something about it.

01:10:39 Well, it’s clear there’s got big rock and there’s,

01:10:42 we’ll probably have a meeting and see what do we do about the big rock with AI.

01:10:46 Right, with AI, I mean, there are very few people who think it’s not going to happen within the

01:10:50 next 75 years. I know Rod Brooks doesn’t think it’s going to happen, maybe Andrew Ng doesn’t

01:10:56 think it’s happened, but, you know, a lot of the people who work day to day, you know, as you say,

01:11:02 at the rock face, they think it’s going to happen. I think the median estimate from AI researchers is

01:11:10 somewhere in 40 to 50 years from now, or maybe, you know, I think in Asia, they think it’s going

01:11:16 to be even faster than that. I’m a little bit more conservative, I think it’d probably take

01:11:24 longer than that, but I think, you know, as happened with nuclear weapons, it can happen

01:11:30 overnight that you have these breakthroughs and we need more than one breakthrough, but,

01:11:34 you know, it’s on the order of half a dozen, I mean, this is a very rough scale, but sort of

01:11:40 half a dozen breakthroughs of that nature would have to happen for us to reach the superhuman AI.

01:11:49 But the, you know, the AI research community is vast now, the massive investments from governments,

01:11:57 from corporations, tons of really, really smart people, you know, you just have to look at the

01:12:03 rate of progress in different areas of AI to see that things are moving pretty fast. So to say,

01:12:09 oh, it’s just going to be thousands of years, I don’t see any basis for that. You know, I see,

01:12:15 you know, for example, the Stanford 100 year AI project, right, which is supposed to be sort of,

01:12:26 you know, the serious establishment view, their most recent report actually said it’s probably

01:12:32 not even possible. Oh, wow.

01:12:35 Right. Which if you want a perfect example of people in denial, that’s it. Because, you know,

01:12:42 for the whole history of AI, we’ve been saying to philosophers who said it wasn’t possible,

01:12:49 well, you have no idea what you’re talking about. Of course it’s possible, right? Give me an argument

01:12:53 for why it couldn’t happen. And there isn’t one, right? And now, because people are worried that

01:13:00 maybe AI might get a bad name, or I just don’t want to think about this, they’re saying, okay,

01:13:06 well, of course, it’s not really possible. You know, imagine if, you know, the leaders of the

01:13:12 cancer biology community got up and said, well, you know, of course, curing cancer,

01:13:17 it’s not really possible. There’d be complete outrage and dismay. And, you know, I find this

01:13:28 really a strange phenomenon. So, okay, so if you accept that it’s possible,

01:13:35 and if you accept that it’s probably going to happen, the point that you’re making that,

01:13:42 you know, how does it go wrong? A valid question. Without that, without an answer to that question,

01:13:50 then you’re stuck with what I call the gorilla problem, which is, you know, the problem that

01:13:54 the gorillas face, right? They made something more intelligent than them, namely us, a few million

01:14:00 years ago, and now they’re in deep doo doo. So there’s really nothing they can do. They’ve lost

01:14:07 the control. They failed to solve the control problem of controlling humans, and so they’ve

01:14:13 lost. So we don’t want to be in that situation. And if the gorilla problem is the only formulation

01:14:20 you have, there’s not a lot you can do, right? Other than to say, okay, we should try to stop,

01:14:26 you know, we should just not make the humans, or in this case, not make the AI. And I think

01:14:31 that’s really hard to do. I’m not actually proposing that that’s a feasible course of

01:14:40 action. I also think that, you know, if properly controlled AI could be incredibly beneficial.

01:14:48 But it seems to me that there’s a consensus that one of the major failure modes is this

01:14:56 loss of control, that we create AI systems that are pursuing incorrect objectives. And because

01:15:05 the AI system believes it knows what the objective is, it has no incentive to listen to us anymore,

01:15:12 so to speak, right? It’s just carrying out the strategy that it has computed as being the optimal

01:15:21 solution. And, you know, it may be that in the process, it needs to acquire more resources to

01:15:30 increase the possibility of success or prevent various failure modes by defending itself against

01:15:36 interference. And so that collection of problems, I think, is something we can address. The other

01:15:45 problems are, roughly speaking, you know, misuse, right? So even if we solve the control problem,

01:15:55 we make perfectly safe controllable AI systems. Well, why? You know, why does Dr. Evil going to

01:16:01 use those, right? He wants to just take over the world and he’ll make unsafe AI systems that then

01:16:06 get out of control. So that’s one problem, which is sort of a, you know, partly a policing problem,

01:16:12 partly a sort of a cultural problem for the profession of how we teach people what kinds

01:16:21 of AI systems are safe. You talk about autonomous weapon system and how pretty much everybody

01:16:26 agrees that there’s too many ways that that can go horribly wrong. This great slaughterbots movie

01:16:32 that kind of illustrates that beautifully. I want to talk about that. That’s another,

01:16:36 there’s another topic I’m having to talk about. I just want to mention that what I see is the

01:16:41 third major failure mode, which is overuse, not so much misuse, but overuse of AI that we become

01:16:49 overly dependent. So I call this the WALL E problem. So if you’ve seen WALL E, the movie,

01:16:54 all right, all the humans are on the spaceship and the machines look after everything for them,

01:17:00 and they just watch TV and drink big gulps. And they’re all sort of obese and stupid and they

01:17:07 sort of totally lost any notion of human autonomy. And, you know, so in effect, right. This would

01:17:17 happen like the slow boiling frog, right? We would gradually turn over more and more of the

01:17:24 management of our civilization to machines as we are already doing. And this, you know, if this

01:17:29 if this process continues, you know, we sort of gradually switch from sort of being the masters

01:17:37 of technology to just being the guests. Right. So we become guests on a cruise ship, you know,

01:17:44 which is fine for a week, but not not for the rest of eternity. You know, and it’s almost

01:17:51 irreversible. Right. Once you once you lose the incentive to, for example, you know, learn to be

01:17:58 an engineer or a doctor or a sanitation operative or any other of the infinitely many ways that we

01:18:08 maintain and propagate our civilization. You know, if you if you don’t have the incentive to do any

01:18:14 of that, you won’t. And then it’s really hard to recover. And of course, as just one of the

01:18:20 technologies that could that third failure mode result in that there’s probably other

01:18:24 technology in general detaches us from it does a bit. But the difference is that in terms of

01:18:31 the knowledge to to run our civilization, you know, up to now, we’ve had no alternative but

01:18:38 to put it into people’s heads. Right. And if you software with Google, I mean, so software in

01:18:43 general, so computers in general, but but the, you know, the knowledge of how, you know, how a

01:18:51 sanitation system works, you know, that’s an AI has to understand that it’s no good putting it

01:18:56 into Google. So, I mean, we we’ve always put knowledge in on paper, but paper doesn’t run our

01:19:02 civilization and only runs when it goes from the paper into people’s heads again. Right. So we’ve

01:19:07 always propagated civilization through human minds. And we’ve spent about a trillion person

01:19:13 years doing that. I literally write you, you can work it out. It’s about right. There’s about just

01:19:19 over 100 billion people who’ve ever lived. And each of them has spent about 10 years learning

01:19:25 stuff to keep their civilization going. And so that’s a trillion person years we put into this

01:19:30 effort. Beautiful way to describe all civilization. And now we’re, you know, we’re in danger of

01:19:36 throwing that away. So this is a problem that AI can’t solve. It’s not a technical problem. It’s

01:19:40 you know, if we do our job right, the AI systems will say, you know, the human race doesn’t in the

01:19:48 long run want to be passengers in a cruise ship. The human race wants autonomy. This is part of

01:19:54 human preferences. So we, the AI systems are not going to do this stuff for you. You’ve got to do

01:20:01 it for yourself. Right. I’m not going to carry you to the top of Everest in an autonomous

01:20:06 helicopter. You have to climb it if you want to get the benefit and so on. So, but I’m afraid that

01:20:14 because we are short sighted and lazy, we’re going to override the AI systems. And, and there’s an

01:20:22 amazing short story that I recommend to everyone that I talked to about this called The Machine

01:20:28 Stops, written in 1909 by E.M. Forster, who, you know, wrote novels about the British Empire and

01:20:37 sort of things that became costume dramas on the BBC. But he wrote this one science fiction story,

01:20:42 which is an amazing vision of the future. It has basically iPads, it has video conferencing,

01:20:51 it has MOOCs, it has computer induced obesity. I mean, literally it’s what people spend their

01:21:00 time doing is giving online courses or listening to online courses and talking about ideas,

01:21:05 but they never get out there in the real world. They don’t really have a lot of face to face

01:21:11 contact. Everything is done online, you know, so all the things we’re worrying about now

01:21:17 were described in the story. And, and then the human race becomes more and more dependent on

01:21:22 the machine, loses knowledge of how things really run and then becomes vulnerable to collapse. And

01:21:31 so it’s a, it’s a pretty unbelievably amazing story for someone writing in 1909 to imagine all

01:21:38 this. So there’s very few people that represent artificial intelligence more than you Stuart

01:21:45 Russell. If you say it’s okay, that’s very kind. So it’s all my fault. Right. You’re often brought

01:21:57 up as the person, well, Stuart Russell, like the AI person is worried about this. That’s why you

01:22:03 should be worried about it. Do you feel the burden of that? I don’t know if you feel that at all,

01:22:10 but when I talk to people like from, you talk about people outside of computer science,

01:22:15 when they think about this, Stuart Russell is worried about AI safety. You should be worried

01:22:21 too. Do you feel the burden of that? I mean, in a practical sense, yeah, because I get, you know,

01:22:29 a dozen, sometimes 25 invitations a day to talk about it, to give interviews, to write press

01:22:38 articles and so on. So in that very practical sense, I’m seeing that people are concerned and

01:22:46 really interested about this. Are you worried that you could be wrong as all good scientists are?

01:22:52 Of course. I worry about that all the time. I mean, that’s, that’s always been the way that I,

01:22:57 I’ve worked, you know, is like I have an argument in my head with myself, right? So I have,

01:23:03 I have some idea and then I think, okay, how could that be wrong? Or did someone else already have

01:23:10 that idea? So I’ll go and, you know, search in as much literature as I can to see whether someone

01:23:16 else already thought of that or, or even refuted it. So, you know, I, right now I’m, I’m reading a

01:23:23 lot of philosophy because, you know, in, in the form of the debates over, over utilitarianism and,

01:23:32 and other kinds of moral, moral formulas, shall we say, people have already thought through

01:23:42 some of these issues. But, you know, what, one of the things I’m, I’m not seeing in a lot of

01:23:47 these debates is this specific idea about the importance of uncertainty in the objective

01:23:56 that this is the way we should think about machines that are beneficial to humans. So this

01:24:01 idea of provably beneficial machines based on explicit uncertainty in the objective,

01:24:10 you know, it seems to be, you know, my gut feeling is this is the core of it. It’s going to have to

01:24:17 be elaborated in a lot of different directions and there are a lot of beneficial. Yeah. But there,

01:24:23 there are, I mean, it has to be right. We can’t afford, you know, hand wavy beneficial because

01:24:30 there are, you know, whenever we do hand wavy stuff, there are loopholes. And the thing about

01:24:34 super intelligent machines is they find the loopholes, you know, just like, you know, tax

01:24:40 evaders. If you don’t write your tax law properly, people will find the loopholes and end up paying

01:24:46 no tax. And, and so you should think of it this way and, and getting those definitions right,

01:24:56 you know, it is really a long process, you know, so you can, you can define mathematical frameworks

01:25:04 and within that framework, you can prove mathematical theorems that yes, this will,

01:25:08 you know, this, this theoretical entity will be provably beneficial to that theoretical entity,

01:25:13 but that framework may not match the real world in some crucial way. So it’s a long process,

01:25:20 thinking through it, iterating and so on. Last question. Yep. You have 10 seconds to answer it.

01:25:27 What is your favorite sci fi movie about AI? I would say interstellar has my favorite robots.

01:25:34 Oh, beats space. Yeah. Yeah. Yeah. So, so Tars, the robots, one of the robots in interstellar is

01:25:42 the way robot should behave. And, uh, I would say ex machina is in some ways, the one,

01:25:51 the one that makes you think, uh, in a nervous kind of way about, about where we’re going.

01:25:58 Well Stuart, thank you so much for talking today. Pleasure.