Michael Littman: Reinforcement Learning and the Future of AI #144

Transcript

00:00:00 The following is a conversation with Michael Littman, a computer science professor at Brown

00:00:04 University doing research on and teaching machine learning, reinforcement learning,

00:00:10 and artificial intelligence. He enjoys being silly and lighthearted in conversation,

00:00:16 so this was definitely a fun one. Quick mention of each sponsor,

00:00:20 followed by some thoughts related to the episode. Thank you to SimplySafe, a home security company

00:00:26 I use to monitor and protect my apartment, ExpressVPN, the VPN I’ve used for many years

00:00:32 to protect my privacy on the internet, MasterClass, online courses that I enjoy from

00:00:38 some of the most amazing humans in history, and BetterHelp, online therapy with a licensed

00:00:43 professional. Please check out these sponsors in the description to get a discount and to support

00:00:49 this podcast. As a side note, let me say that I may experiment with doing some solo episodes

00:00:55 in the coming month or two. The three ideas I have floating in my head currently is to use one,

00:01:02 a particular moment in history, two, a particular movie, or three, a book to drive a conversation

00:01:10 about a set of related concepts. For example, I could use 2001, A Space Odyssey, or Ex Machina

00:01:17 to talk about AGI for one, two, three hours. Or I could do an episode on the, yes, rise and fall of

00:01:26 Hitler and Stalin, each in a separate episode, using relevant books and historical moments

00:01:32 for reference. I find the format of a solo episode very uncomfortable and challenging,

00:01:38 but that just tells me that it’s something I definitely need to do and learn from the experience.

00:01:44 Of course, I hope you come along for the ride. Also, since we have all this momentum built up

00:01:49 on announcements, I’m giving a few lectures on machine learning at MIT this January.

00:01:54 In general, if you have ideas for the episodes, for the lectures, or for just short videos on

00:02:01 YouTube, let me know in the comments that I still definitely read, despite my better judgment,

00:02:10 and the wise sage advice of the great Joe Rogan. If you enjoy this thing, subscribe on YouTube,

00:02:17 review it with Five Stars and Apple Podcast, follow on Spotify, support on Patreon, or connect

00:02:22 with me on Twitter at Lex Friedman. And now, here’s my conversation with Michael Littman.

00:02:29 I saw a video of you talking to Charles Isbell about Westworld, the TV series. You guys were

00:02:35 doing the kind of thing where you’re watching new things together, but let’s rewind back.

00:02:41 Is there a sci fi movie or book or shows that was profound, that had an impact on you philosophically,

00:02:50 or just specifically something you enjoyed nerding out about?

00:02:55 Yeah, interesting. I think a lot of us have been inspired by robots in movies. One that I really

00:03:00 like is, there’s a movie called Robot and Frank, which I think is really interesting because it’s

00:03:05 very near term future, where robots are being deployed as helpers in people’s homes. And we

00:03:15 don’t know how to make robots like that at this point, but it seemed very plausible. It seemed

00:03:19 very realistic or imaginable. And I thought that was really cool because they’re awkward,

00:03:25 they do funny things that raise some interesting issues, but it seemed like something that would

00:03:29 ultimately be helpful and good if we could do it right.

00:03:31 Yeah, he was an older cranky gentleman, right?

00:03:33 He was an older cranky jewel thief, yeah.

00:03:36 It’s kind of funny little thing, which is, you know, he’s a jewel thief and so he pulls the

00:03:42 robot into his life, which is like, which is something you could imagine taking a home robotics

00:03:49 thing and pulling into whatever quirky thing that’s involved in your existence.

00:03:54 It’s meaningful to you. Exactly so. Yeah. And I think from that perspective, I mean,

00:04:00 not all of us are jewel thieves. And so when we bring our robots into our lives, it explains a

00:04:05 lot about this apartment, actually. But no, the idea that people should have the ability to make

00:04:12 this technology their own, that it becomes part of their lives. And I think it’s hard for us

00:04:18 as technologists to make that kind of technology. It’s easier to mold people into what we need them

00:04:22 to be. And just that opposite vision, I think, is really inspiring. And then there’s a

00:04:28 anthropomorphization where we project certain things on them, because I think the robot was

00:04:32 kind of dumb. But I have a bunch of Roombas I play with and you immediately project stuff onto

00:04:38 them. Much greater level of intelligence. We’ll probably do that with each other too. Much greater

00:04:43 degree of compassion. That’s right. One of the things we’re learning from AI is where we are

00:04:47 smart and where we are not smart. Yeah. You also enjoy, as people can see, and I enjoyed

00:04:55 myself watching you sing and even dance a little bit, a little bit, a little bit of dancing.

00:05:02 A little bit of dancing. That’s not quite my thing. As a method of education or just in life,

00:05:08 you know, in general. So easy question. What’s the definitive, objectively speaking,

00:05:15 top three songs of all time? Maybe something that, you know, to walk that back a little bit,

00:05:22 maybe something that others might be surprised by the three songs that you kind of enjoy.

00:05:28 That is a great question that I cannot answer. But instead, let me tell you a story.

00:05:32 So pick a question you do want to answer. That’s right. I’ve been watching the

00:05:36 presidential debates and vice presidential debates. And it turns out, yeah, it’s really,

00:05:39 you can just answer any question you want. So it’s a related question. Well said.

00:05:47 I really like pop music. I’ve enjoyed pop music ever since I was very young. So 60s music,

00:05:51 70s music, 80s music. This is all awesome. And then I had kids and I think I stopped listening

00:05:56 to music and I was starting to realize that my musical taste had sort of frozen out.

00:06:01 And so I decided in 2011, I think, to start listening to the top 10 billboard songs each week.

00:06:08 So I’d be on the on the treadmill and I would listen to that week’s top 10 songs

00:06:11 so I could find out what was popular now. And what I discovered is that I have no musical

00:06:17 taste whatsoever. I like what I’m familiar with. And so the first time I’d hear a song

00:06:22 is the first week that was on the charts, I’d be like, and then the second week,

00:06:26 I was into it a little bit. And the third week, I was loving it. And by the fourth week is like,

00:06:30 just part of me. And so I’m afraid that I can’t tell you the most my favorite song of all time,

00:06:36 because it’s whatever I heard most recently. Yeah, that’s interesting. People have told me that

00:06:44 there’s an art to listening to music as well. And you can start to, if you listen to a song,

00:06:48 just carefully, like explicitly, just force yourself to really listen. You start to,

00:06:54 I did this when I was part of jazz band and fusion band in college. You start to hear the layers

00:07:01 of the instruments. You start to hear the individual instruments and you start to,

00:07:04 you can listen to classical music or to orchestra this way. You can listen to jazz this way.

00:07:08 I mean, it’s funny to imagine you now to walking that forward to listening to pop hits now as like

00:07:16 a scholar, listening to like Cardi B or something like that, or Justin Timberlake. Is he? No,

00:07:22 not Timberlake, Bieber. They’ve both been in the top 10 since I’ve been listening.

00:07:26 They’re still up there. Oh my God, I’m so cool.

00:07:29 If you haven’t heard Justin Timberlake’s top 10 in the last few years, there was one

00:07:33 song that he did where the music video was set at essentially NeurIPS.

00:07:38 Oh, wow. Oh, the one with the robotics. Yeah, yeah, yeah, yeah, yeah.

00:07:42 Yeah, yeah. It’s like at an academic conference and he’s doing a demo.

00:07:45 He was presenting, right?

00:07:46 It was sort of a cross between the Apple, like Steve Jobs kind of talk and NeurIPS.

00:07:51 Yeah.

00:07:53 So, you know, it’s always fun when AI shows up in pop culture.

00:07:56 I wonder if he consulted somebody for that. That’s really interesting. So maybe on that topic,

00:08:01 I’ve seen your celebrity multiple dimensions, but one of them is you’ve done cameos in different

00:08:08 places. I’ve seen you in a TurboTax commercial as like, I guess, the brilliant Einstein character.

00:08:16 And the point is that TurboTax doesn’t need somebody like you. It doesn’t need a brilliant

00:08:23 person.

00:08:24 Very few things need someone like me. But yes, they were specifically emphasizing the

00:08:28 idea that you don’t need to be like a computer expert to be able to use their software.

00:08:32 How did you end up in that world?

00:08:33 I think it’s an interesting story. So I was teaching my class. It was an intro computer

00:08:38 science class for non concentrators, non majors. And sometimes when people would visit campus,

00:08:45 they would check in to say, hey, we want to see what a class is like. Can we sit on your class?

00:08:48 So a person came to my class who was the daughter of the brother of the husband of the best friend

00:09:02 of my wife. Anyway, basically a family friend came to campus to check out Brown and asked to

00:09:11 come to my class and came with her dad. Her dad is, who I’ve known from various

00:09:16 kinds of family events and so forth, but he also does advertising. And he said that he was

00:09:21 recruiting scientists for this ad, this TurboTax set of ads. And he said, we wrote the ad with the

00:09:31 idea that we get like the most brilliant researchers, but they all said no. So can you

00:09:36 help us find like B level scientists? And I’m like, sure, that’s who I hang out with.

00:09:44 So that should be fine. So I put together a list and I did what some people call the Dick Cheney.

00:09:49 So I included myself on the list of possible candidates, with a little blurb about each one

00:09:55 and why I thought that would make sense for them to do it. And they reached out to a handful of

00:09:59 them, but then they ultimately, they YouTube stalked me a little bit and they thought,

00:10:03 oh, I think he could do this. And they said, okay, we’re going to offer you the commercial.

00:10:07 I’m like, what? So it was such an interesting experience because they have another world, the

00:10:14 people who do like nationwide kind of ad campaigns and television shows and movies and so forth.

00:10:21 It’s quite a remarkable system that they have going because they have a set. Yeah. So I went to,

00:10:28 it was just somebody’s house that they rented in New Jersey. But in the commercial, it’s just me

00:10:35 and this other woman. In reality, there were 50 people in that room and another, I don’t know,

00:10:41 half a dozen kind of spread out around the house in various ways. There were people whose job it

00:10:46 was to control the sun. They were in the backyard on ladders, putting filters up to try to make sure

00:10:53 that the sun didn’t glare off the window in a way that would wreck the shot. So there was like

00:10:57 six people out there doing that. There was three people out there giving snacks, the craft table.

00:11:02 There was another three people giving healthy snacks because that was a separate craft table.

00:11:05 There was one person whose job it was to keep me from getting lost. And I think the reason for all

00:11:12 this is because so many people are in one place at one time. They have to be time efficient. They

00:11:16 have to get it done. The morning they were going to do my commercial. In the afternoon, they were

00:11:20 going to do a commercial of a mathematics professor from Princeton. They had to get it done. No wasted

00:11:27 time or energy. And so there’s just a fleet of people all working as an organism. And it was

00:11:32 fascinating. I was just the whole time just looking around like, this is so neat. Like one person

00:11:36 whose job it was to take the camera off of the cameraman so that someone else whose job it was

00:11:43 to remove the film canister. Because every couple’s takes, they had to replace the film because film

00:11:48 gets used up. It was just, I don’t know. I was geeking out the whole time. It was so fun.

00:11:53 How many takes did it take? It looked the opposite. There was more than two people there. It was very

00:11:57 relaxed. Right. Yeah. The person who I was in the scene with is a professional. She’s an improv

00:12:06 comedian from New York City. And when I got there, they had given me a script as such as it was. And

00:12:11 then I got there and they said, we’re going to do this as improv. I’m like, I don’t know how to

00:12:15 improv. I don’t know what you’re telling me to do here. Don’t worry. She knows. I’m like, okay.

00:12:21 I’ll go see how this goes. I guess I got pulled into the story because like, where the heck did

00:12:26 you come from? I guess in the scene. Like, how did you show up in this random person’s house?

00:12:32 Yeah. Well, I mean, the reality of it is I stood outside in the blazing sun. There was someone

00:12:36 whose job it was to keep an umbrella over me because I started to sweat. And so I would wreck

00:12:41 the shot because my face was all shiny with sweat. So there was one person who would dab me off,

00:12:45 had an umbrella. But yeah, like the reality of it, like, why is this strange stalkery person hanging

00:12:51 around outside somebody’s house? We’re not sure when you have to look in,

00:12:54 what the ways for the book, but are you, so you make, you make, like you said, YouTube,

00:13:00 you make videos yourself, you make awesome parody, sort of parody songs that kind of focus on a

00:13:07 particular aspect of computer science. How much those seem really interesting to you?

00:13:13 How much those seem really natural? How much production value goes into that?

00:13:18 Do you also have a team of 50 people? The videos, almost all the videos,

00:13:22 except for the ones that people would have actually seen, are just me. I write the lyrics,

00:13:26 I sing the song. I generally find a, like a backing track online because I’m like you,

00:13:34 can’t really play an instrument. And then I do, in some cases I’ll do visuals using just like

00:13:39 PowerPoint. Lots and lots of PowerPoint to make it sort of like an animation.

00:13:44 The most produced one is the one that people might have seen, which is the overfitting video

00:13:49 that I did with Charles Isbell. And that was produced by the Georgia Tech and Udacity people

00:13:55 because we were doing a class together. It was kind of, I usually do parody songs kind of to

00:13:59 cap off a class at the end of a class. So that one you’re wearing, so it was just a

00:14:04 thriller. You’re wearing the Michael Jackson, the red leather jacket. The interesting thing

00:14:09 with podcasting that you’re also into is that I really enjoy is that there’s not a team of people.

00:14:21 It’s kind of more, because you know, there’s something that happens when there’s more people

00:14:29 involved than just one person that just the way you start acting, I don’t know. There’s a censorship.

00:14:36 You’re not given, especially for like slow thinkers like me, you’re not. And I think most of us are,

00:14:42 if we’re trying to actually think we’re a little bit slow and careful, it kind of large teams get

00:14:50 in the way of that. And I don’t know what to do with that. Like that’s the, to me, like if,

00:14:56 yeah, it’s very popular to criticize quote unquote mainstream media.

00:15:01 But there is legitimacy to criticizing them the same. I love listening to NPR, for example,

00:15:06 but every, it’s clear that there’s a team behind it. There’s a commercial,

00:15:11 there’s constant commercial breaks. There’s this kind of like rush of like,

00:15:16 okay, I have to interrupt you now because we have to go to commercial. Just this whole,

00:15:20 it creates, it destroys the possibility of nuanced conversation. Yeah, exactly. Evian,

00:15:29 which Charles Isbell, who I talked to yesterday told me that Evian is naive backwards, which

00:15:36 the fact that his mind thinks this way is quite brilliant. Anyway, there’s a freedom to this

00:15:42 podcast. He’s Dr. Awkward, which by the way, is a palindrome. That’s a palindrome that I happen to

00:15:46 know from other parts of my life. And I just, well, you know, use it against Charles. Dr. Awkward.

00:15:54 So what was the most challenging parody song to make? Was it the Thriller one?

00:16:00 No, that one was really fun. I wrote the lyrics really quickly and then I gave it over to the

00:16:06 production team. They recruited a acapella group to sing. That went really smoothly. It’s great

00:16:11 having a team because then you can just focus on the part that you really love, which in my case

00:16:15 is writing the lyrics. For me, the most challenging one, not challenging in a bad way, but challenging

00:16:21 in a really fun way, was I did one of the parody songs I did is about the halting problem in

00:16:27 computer science. The fact that you can’t create a program that can tell for any other arbitrary

00:16:34 program whether it actually going to get stuck in infinite loop or whether it’s going to eventually

00:16:38 stop. And so I did it to an 80’s song because I hadn’t started my new thing of learning current

00:16:46 songs. And it was Billy Joel’s The Piano Man. Nice. Which is a great song. Sing me a song.

00:16:56 You’re the piano man. Yeah. So the lyrics are great because first of all, it rhymes. Not all

00:17:04 songs rhyme. I’ve done Rolling Stones songs which turn out to have no rhyme scheme whatsoever. They’re

00:17:09 just sort of yelling and having a good time, which makes it not fun from a parody perspective because

00:17:14 like you can say anything. But the lines rhymed and there was a lot of internal rhymes as well.

00:17:18 And so figuring out how to sing with internal rhymes, a proof of the halting problem was really

00:17:24 challenging. And I really enjoyed that process. What about, last question on this topic, what

00:17:30 about the dancing in the Thriller video? How many takes that take? So I wasn’t planning to dance.

00:17:36 They had me in the studio and they gave me the jacket and it’s like, well, you can’t,

00:17:40 if you have the jacket and the glove, like there’s not much you can do. Yeah. So I think I just

00:17:46 danced around and then they said, why don’t you dance a little bit? There was a scene with me

00:17:49 and Charles dancing together. They did not use it in the video, but we recorded it. Yeah. Yeah. No,

00:17:55 it was pretty funny. And Charles, who has this beautiful, wonderful voice doesn’t really sing.

00:18:02 He’s not really a singer. And so that was why I designed the song with him doing a spoken section

00:18:07 and me doing the singing. It’s very like Barry White. Yeah. Smooth baritone. Yeah. Yeah. It’s

00:18:12 great. That was awesome. So one of the other things Charles said is that, you know, everyone

00:18:19 knows you as like a super nice guy, super passionate about teaching and so on. What he said,

00:18:27 don’t know if it’s true, that despite the fact that you’re, you are. Okay. I will admit this

00:18:34 finally for the first time. That was, that was me. It’s the Johnny Cash song. Kill the Manorino just

00:18:39 to watch him die. That you actually do have some strong opinions on some topics. So if this in fact

00:18:46 is true, what strong opinions would you say you have? Is there ideas you think maybe in artificial

00:18:55 intelligence and machine learning, maybe in life that you believe is true that others might,

00:19:02 you know, some number of people might disagree with you on? So I try very hard to see things

00:19:08 from multiple perspectives. There’s this great Calvin and Hobbes cartoon where, do you know?

00:19:15 Yeah. Okay. So Calvin’s dad is always kind of a bit of a foil and he talked Calvin into,

00:19:21 Calvin had done something wrong. The dad talks him into like seeing it from another perspective

00:19:25 and Calvin, like this breaks Calvin because he’s like, oh my gosh, now I can see the opposite sides

00:19:30 of things. And so the, it’s, it becomes like a Cubist cartoon where there is no front and back.

00:19:35 Everything’s just exposed and it really freaks him out. And finally he settles back down. It’s

00:19:39 like, oh good. No, I can make that go away. But like, I’m that, I’m that I live in that world where

00:19:44 I’m trying to see everything from every perspective all the time. So there are some things that I’ve

00:19:48 formed opinions about that I would be harder, I think, to disavow me of. One is the super

00:19:56 intelligence argument and the existential threat of AI is one where I feel pretty confident in my

00:20:02 feeling about that one. Like I’m willing to hear other arguments, but like, I am not particularly

00:20:07 moved by the idea that if we’re not careful, we will accidentally create a super intelligence

00:20:13 that will destroy human life. Let’s talk about that. Let’s get you in trouble and record your

00:20:17 video. It’s like Bill Gates, I think he said like some quote about the internet that that’s just

00:20:24 going to be a small thing. It’s not going to really go anywhere. And then I think Steve

00:20:29 Ballmer said, I don’t know why I’m sticking on Microsoft. That’s something that like smartphones

00:20:36 are useless. There’s no reason why Microsoft should get into smartphones, that kind of.

00:20:40 So let’s get, let’s talk about AGI. As AGI is destroying the world, we’ll look back at this

00:20:45 video and see. No, I think it’s really interesting to actually talk about because nobody really

00:20:49 knows the future. So you have to use your best intuition. It’s very difficult to predict it,

00:20:54 but you have spoken about AGI and the existential risks around it and sort of basing your intuition

00:21:01 that we’re quite far away from that being a serious concern relative to the other concerns

00:21:08 we have. Can you maybe unpack that a little bit? Yeah, sure, sure, sure. So as I understand it,

00:21:15 that for example, I read Bostrom’s book and a bunch of other reading material about this sort

00:21:22 of general way of thinking about the world. And I think the story goes something like this, that we

00:21:27 will at some point create computers that are smart enough that they can help design the next version

00:21:35 of themselves, which itself will be smarter than the previous version of themselves and eventually

00:21:42 bootstrapped up to being smarter than us. At which point we are essentially at the mercy of this sort

00:21:49 of more powerful intellect, which in principle we don’t have any control over what its goals are.

00:21:56 And so if its goals are at all out of sync with our goals, for example, the continued existence

00:22:04 of humanity, we won’t be able to stop it. It’ll be way more powerful than us and we will be toast.

00:22:12 So there’s some, I don’t know, very smart people who have signed on to that story. And it’s a

00:22:18 compelling story. Now I can really get myself in trouble. I once wrote an op ed about this,

00:22:25 specifically responding to some quotes from Elon Musk, who has been on this very podcast

00:22:30 more than once. AI summoning the demon. But then he came to Providence, Rhode Island,

00:22:38 which is where I live, and said to the governors of all the states, you know, you’re worried about

00:22:45 entirely the wrong thing. You need to be worried about AI. You need to be very, very worried about

00:22:49 AI. And journalists kind of reacted to that and they wanted to get people’s take. And I was like,

00:22:56 OK, my my my belief is that one of the things that makes Elon Musk so successful and so remarkable

00:23:03 as an individual is that he believes in the power of ideas. He believes that you can have you can

00:23:08 if you know, if you have a really good idea for getting into space, you can get into space.

00:23:12 If you have a really good idea for a company or for how to change the way that people drive,

00:23:18 you just have to do it and it can happen. It’s really natural to apply that same idea to AI.

00:23:23 You see these systems that are doing some pretty remarkable computational tricks, demonstrations,

00:23:30 and then to take that idea and just push it all the way to the limit and think, OK, where does

00:23:35 this go? Where is this going to take us next? And if you’re a deep believer in the power of ideas,

00:23:40 then it’s really natural to believe that those ideas could be taken to the extreme and kill us.

00:23:47 So I think, you know, his strength is also his undoing, because that doesn’t mean it’s true.

00:23:52 Like, it doesn’t mean that that has to happen, but it’s natural for him to think that.

00:23:56 So another way to phrase the way he thinks, and I find it very difficult to argue with that line

00:24:04 of thinking. So Sam Harris is another person from neuroscience perspective that thinks like that

00:24:09 is saying, well, is there something fundamental in the physics of the universe that prevents this

00:24:18 from eventually happening? And Nick Bostrom thinks in the same way, that kind of zooming out, yeah,

00:24:24 OK, we humans now are existing in this like time scale of minutes and days. And so our intuition

00:24:32 is in this time scale of minutes, hours and days. But if you look at the span of human history,

00:24:39 is there any reason you can’t see this in 100 years? And like, is there something fundamental

00:24:47 about the laws of physics that prevent this? And if it doesn’t, then it eventually will happen

00:24:52 or will we will destroy ourselves in some other way. And it’s very difficult, I find,

00:24:57 to actually argue against that. Yeah, me too.

00:25:03 And not sound like. Not sound like you’re just like rolling your eyes like I have like science

00:25:11 fiction, we don’t have to think about it, but even even worse than that, which is like, I don’t have

00:25:16 kids, but like I got to pick up my kids now like this. OK, I see there’s more pressing short. Yeah,

00:25:20 there’s more pressing short term things that like stop over the next national crisis. We have much,

00:25:25 much shorter things like now, especially this year, there’s covid. So like any kind of discussion

00:25:30 like that is like there’s this, you know, this pressing things today is. And then so the Sam

00:25:37 Harris argument, well, like any day the exponential singularity can can occur is very difficult to

00:25:45 argue against. I mean, I don’t know. But part of his story is also he’s not going to put a date on

00:25:50 it. It could be in a thousand years, it could be in a hundred years, it could be in two years. It’s

00:25:53 just that as long as we keep making this kind of progress, it’s ultimately has to become a concern.

00:25:59 I kind of am on board with that. But the thing that the piece that I feel like is missing from

00:26:03 that that way of extrapolating from the moment that we’re in, is that I believe that in the

00:26:09 process of actually developing technology that can really get around in the world and really process

00:26:14 and do things in the world in a sophisticated way, we’re going to learn a lot about what that means,

00:26:20 which that we don’t know now because we don’t know how to do this right now.

00:26:24 If you believe that you can just turn on a deep learning network and eventually give it enough

00:26:28 compute and eventually get there. Well, sure, that seems really scary because we won’t we won’t be

00:26:32 in the loop at all. We won’t we won’t be helping to design or target these kinds of systems.

00:26:38 But I don’t I don’t see that. That feels like it is against the laws of physics,

00:26:43 because these systems need help. Right. They need they need to surpass the the the difficulty,

00:26:49 the wall of complexity that happens in arranging something in the form that that will happen.

00:26:55 Yeah, like I believe in evolution, like I believe that that that there’s an argument. Right. So

00:27:00 there’s another argument, just to look at it from a different perspective, that people say,

00:27:04 why don’t believe in evolution? How could evolution? It’s it’s sort of like a random set of

00:27:10 parts assemble themselves into a 747. And that could just never happen. So it’s like,

00:27:15 OK, that’s maybe hard to argue against. But clearly, 747 do get assembled. They get assembled

00:27:20 by us. Basically, the idea being that there’s a process by which we will get to the point of

00:27:26 making technology that has that kind of awareness. And in that process, we’re going to learn a lot

00:27:31 about that process and we’ll have more ability to control it or to shape it or to build it in our

00:27:37 own image. It’s not something that is going to spring into existence like that 747. And we’re

00:27:43 just going to have to contend with it completely unprepared. That’s very possible that in the

00:27:49 context of the long arc of human history, it will, in fact, spring into existence.

00:27:55 But that springing might take like if you look at nuclear weapons, like even 20 years is a springing

00:28:02 in in the context of human history. And it’s very possible, just like with nuclear weapons,

00:28:07 that we could have I don’t know what percentage you want to put at it, but the possibility could

00:28:13 have knocked ourselves out. Yeah. The possibility of human beings destroying themselves in the 20th

00:28:17 century with nuclear weapons. I don’t know. You can if you really think through it, you could

00:28:23 really put it close to, like, I don’t know, 30, 40 percent, given like the certain moments of

00:28:28 crisis that happen. So, like, I think one, like, fear in the shadows that’s not being acknowledged

00:28:38 is it’s not so much the A.I. will run away is is that as it’s running away,

00:28:44 we won’t have enough time to think through how to stop it. Right. Fast takeoff or FOOM. Yeah.

00:28:52 I mean, my much bigger concern, I wonder what you think about it, which is

00:28:55 we won’t know it’s happening. So I kind of think that there’s an A.G.I. situation already happening

00:29:05 with social media that our minds, our collective intelligence of human civilization is already

00:29:11 being controlled by an algorithm. And like we’re we’re already super like the level of a collective

00:29:19 intelligence, thanks to Wikipedia, people should donate to Wikipedia to feed the A.G.I.

00:29:23 . Man, if we had a super intelligence that that was in line with Wikipedia’s values,

00:29:31 that it’s a lot better than a lot of other things I could imagine. I trust Wikipedia more than I

00:29:36 trust Facebook or YouTube as far as trying to do the right thing from a rational perspective.

00:29:41 Yeah. Now, that’s not where you were going. I understand that. But it does strike me that

00:29:45 there’s sort of smarter and less smart ways of exposing ourselves to each other on the Internet.

00:29:51 Yeah. The interesting thing is that Wikipedia and social media have very different forces.

00:29:55 You’re right. I mean, Wikipedia, if A.G.I. was Wikipedia, it’d be just like this cranky, overly

00:30:02 competent editor of articles. You know, there’s something to that. But the social

00:30:08 media aspect is not. So the vision of A.G.I. is as a separate system that’s super intelligent.

00:30:17 That’s super intelligent. That’s one key little thing. I mean, there’s the paperclip argument

00:30:20 that’s super dumb, but super powerful systems. But with social media, you have a relatively like

00:30:27 algorithms we may talk about today, very simple algorithms that when something Charles talks a

00:30:35 lot about, which is interactive A.I., when they start like having at scale, like tiny little

00:30:40 interactions with human beings, they can start controlling these human beings. So a single

00:30:45 algorithm can control the minds of human beings slowly to what we might not realize. It could

00:30:51 start wars. It could start. It could change the way we think about things. It feels like

00:30:57 in the long arc of history, if I were to sort of zoom out from all the outrage and all the tension

00:31:03 on social media, that it’s progressing us towards better and better things. It feels like chaos and

00:31:11 toxic and all that kind of stuff. It’s chaos and toxic. Yeah. But it feels like actually

00:31:17 the chaos and toxic is similar to the kind of debates we had from the founding of this country.

00:31:22 You know, there was a civil war that happened over that period. And ultimately it was all about

00:31:28 this tension of like something doesn’t feel right about our implementation of the core values we

00:31:33 hold as human beings. And they’re constantly struggling with this. And that results in people

00:31:38 calling each other, just being shady to each other on Twitter. But ultimately the algorithm is

00:31:47 managing all that. And it feels like there’s a possible future in which that algorithm

00:31:53 controls us into the direction of self destruction and whatever that looks like.

00:31:59 Yeah. So, all right. I do believe in the power of social media to screw us up royally. I do believe

00:32:05 in the power of social media to benefit us too. I do think that we’re in a, yeah, it’s sort of

00:32:12 almost got dropped on top of us. And now we’re trying to, as a culture, figure out how to cope

00:32:16 with it. There’s a sense in which, I don’t know, there’s some arguments that say that, for example,

00:32:23 I guess college age students now, late college age students now, people who were in middle school

00:32:27 when social media started to really take off, may be really damaged. Like this may have really hurt

00:32:34 their development in a way that we don’t have all the implications of quite yet. That’s the generation

00:32:40 who, and I hate to make it somebody else’s responsibility, but like they’re the ones who

00:32:46 can fix it. They’re the ones who can figure out how do we keep the good of this kind of technology

00:32:53 without letting it eat us alive. And if they’re successful, we move on to the next phase, the next

00:33:01 level of the game. If they’re not successful, then yeah, then we’re going to wreck each other. We’re

00:33:06 going to destroy society. So you’re going to, in your old age, sit on a porch and watch the world

00:33:11 burn because of the TikTok generation that… I believe, well, so this is my kid’s age,

00:33:17 right? And that’s certainly my daughter’s age. And she’s very tapped in to social stuff, but she’s

00:33:21 also, she’s trying to find that balance, right? Of participating in it and in getting the positives

00:33:26 of it, but without letting it eat her alive. And I think sometimes she ventures, I hope she doesn’t

00:33:33 watch this. Sometimes I think she ventures a little too far and is consumed by it. And other

00:33:39 times she gets a little distance. And if there’s enough people like her out there, they’re going to

00:33:46 navigate this choppy waters. That’s an interesting skill actually to develop. I talked to my dad

00:33:52 about it. I’ve now, somehow this podcast in particular, but other reasons has received a

00:34:01 little bit of attention. And with that, apparently in this world, even though I don’t shut up about

00:34:07 love and I’m just all about kindness, I have now a little mini army of trolls. It’s kind of hilarious

00:34:15 actually, but it also doesn’t feel good, but it’s a skill to learn to not look at that, like to

00:34:23 moderate actually how much you look at that. The discussion I have with my dad, it’s similar to,

00:34:28 it doesn’t have to be about trolls. It could be about checking email, which is like, if you’re

00:34:33 anticipating, you know, there’s a, my dad runs a large Institute at Drexel University and there

00:34:39 could be stressful like emails you’re waiting, like there’s drama of some kinds. And so like,

00:34:45 there’s a temptation to check the email. If you send an email and you kind of,

00:34:49 and that pulls you in into, it doesn’t feel good. And it’s a skill that he actually complains that

00:34:56 he hasn’t learned. I mean, he grew up without it. So he hasn’t learned the skill of how to

00:35:01 shut off the internet and walk away. And I think young people, while they’re also being

00:35:05 quote unquote damaged by like, you know, being bullied online, all of those stories, which are

00:35:12 very like horrific, you basically can’t escape your bullies these days when you’re growing up.

00:35:17 But at the same time, they’re also learning that skill of how to be able to shut off the,

00:35:23 like disconnect with it, be able to laugh at it, not take it too seriously. It’s fascinating. Like

00:35:29 we’re all trying to figure this out. Just like you said, it’s been dropped on us and we’re trying to

00:35:32 figure it out. Yeah. I think that’s really interesting. And I guess I’ve become a believer

00:35:37 in the human design, which I feel like I don’t completely understand. Like how do you make

00:35:42 something as robust as us? Like we’re so flawed in so many ways. And yet, and yet, you know,

00:35:48 we dominate the planet and we do seem to manage to get ourselves out of scrapes eventually,

00:35:57 not necessarily the most elegant possible way, but somehow we get, we get to the next step.

00:36:02 And I don’t know how I’d make a machine do that. Generally speaking, like if I train one of my

00:36:09 reinforcement learning agents to play a video game and it works really hard on that first stage

00:36:13 over and over and over again, and it makes it through, it succeeds on that first level.

00:36:17 And then the new level comes and it’s just like, okay, I’m back to the drawing board. And somehow

00:36:21 humanity, we keep leveling up and then somehow managing to put together the skills necessary to

00:36:26 achieve success, some semblance of success in that next level too. And, you know,

00:36:33 I hope we can keep doing that.

00:36:36 You mentioned reinforcement learning. So you’ve had a couple of years in the field. No, quite,

00:36:42 you know, quite a few, quite a long career in artificial intelligence broadly, but reinforcement

00:36:50 learning specifically, can you maybe give a hint about your sense of the history of the field?

00:36:58 And in some ways it’s changed with the advent of deep learning, but as a long roots, like how is it

00:37:05 weaved in and out of your own life? How have you seen the community change or maybe the ideas that

00:37:09 it’s playing with change? I’ve had the privilege, the pleasure of being, of having almost a front

00:37:16 row seat to a lot of this stuff. And it’s been really, really fun and interesting. So when I was

00:37:21 in college in the eighties, early eighties, the neural net thing was starting to happen.

00:37:29 And I was taking a lot of psychology classes and a lot of computer science classes as a college

00:37:34 student. And I thought, you know, something that can play tic tac toe and just like learn to get

00:37:38 better at it. That ought to be a really easy thing. So I spent almost, almost all of my, what would

00:37:43 have been vacations during college, like hacking on my home computer, trying to teach it how to

00:37:48 play tic tac toe and programming language. Basic. Oh yeah. That’s, that’s, I was, I that’s my first

00:37:53 language. That’s my native language. Is that when you first fell in love with computer science,

00:37:57 just like programming basic on that? Uh, what was, what was the computer? Do you remember? I had,

00:38:02 I had a TRS 80 model one before they were called model ones. Cause there was nothing else. Uh,

00:38:08 I got my computer in 1979, uh, instead. So I was, I was, I would have been bar mitzvahed,

00:38:18 but instead of having a big party that my parents threw on my behalf, they just got me a computer.

00:38:23 Cause that’s what I really, really, really wanted. I saw them in the, in the, in the mall and

00:38:26 radio shack. And I thought, what, how are they doing that? I would try to stump them. I would

00:38:32 give them math problems like one plus and then in parentheses, two plus one. And I would always get

00:38:37 it right. I’m like, how do you know so much? Like I’ve had to go to algebra class for the last few

00:38:42 years to learn this stuff and you just seem to know. So I was, I was, I was smitten and, uh,

00:38:48 got a computer and I think ages 13 to 15. I have no memory of those years. I think I just was in

00:38:55 my room with the computer, listening to Billy Joel, communing, possibly listening to the radio,

00:38:59 listening to Billy Joel. That was the one album I had, uh, on vinyl at that time. And, um, and then

00:39:06 I got it on cassette tape and that was really helpful because then I could play it. I didn’t

00:39:09 have to go down to my parents, wifi or hi fi sorry. Uh, and at age 15, I remember kind of

00:39:16 walking out and like, okay, I’m ready to talk to people again. Like I’ve learned what I need to

00:39:20 learn here. And, um, so yeah, so, so that was, that was my home computer. And so I went to college

00:39:26 and I was like, oh, I’m totally going to study computer science. And I opted the college I chose

00:39:30 specifically had a computer science major. The one that I really wanted the college I really wanted

00:39:34 to go to didn’t so bye bye to them. So I went to Yale, uh, Princeton would have been way more

00:39:41 convenient and it was just beautiful campus and it was close enough to home. And I was really

00:39:45 excited about Princeton. And I visited, I said, so computer science majors like, well, we have

00:39:50 computer engineering. I’m like, Oh, I don’t like that word engineering. I like computer science.

00:39:55 I really, I want to do like, you’re saying hardware and software. They’re like, yeah.

00:39:59 I’m like, I just want to do software. I couldn’t care less about hardware. And you grew up in

00:40:02 Philadelphia. I grew up outside Philly. Yeah. Yeah. Uh, so the, you know, local schools were

00:40:07 like Penn and Drexel and, uh, temple. Like everyone in my family went to temple at least at

00:40:12 one point in their lives, except for me. So yeah, Philly, Philly family, Yale had a computer science

00:40:18 department. And that’s when you, it’s kind of interesting. You said eighties and neural

00:40:22 networks. That’s when the neural networks was a hot new thing or a hot thing period. Uh, so what

00:40:27 is that in college when you first learned about neural networks or when she learned, like how did

00:40:31 it was in a psychology class, not in a CS. Yeah. Was it psychology or cognitive science or like,

00:40:36 do you remember like what context it was? Yeah. Yeah. Yeah. So, so I was a, I’ve always been a

00:40:42 bit of a cognitive psychology groupie. So like I’m, I studied computer science, but I like,

00:40:47 I like to hang around where the cognitive scientists are. Cause I don’t know brains, man.

00:40:52 They’re like, they’re wacky. Cool. And they have a bigger picture view of things. They’re a little

00:40:57 less engineering. I would say they’re more, they’re more interested in the nature of cognition and

00:41:03 intelligence and perception and how like the vision system work. Like they’re asking always

00:41:07 bigger questions. Now with the deep learning community there, I think more, there’s a lot of

00:41:12 intersections, but I do find that the neuroscience folks actually in cognitive psychology, cognitive

00:41:21 science folks are starting to learn how to program, how to use neural, artificial neural networks.

00:41:27 And they are actually approaching problems in like totally new, interesting ways. It’s fun to

00:41:31 watch that grad students from those departments, like approach a problem of machine learning.

00:41:37 Right. They come in with a different perspective. Yeah. They don’t care about like your

00:41:40 image net data set or whatever they want, like to understand the, the, the, like the basic

00:41:47 mechanisms at the, at the neuronal level and the functional level of intelligence. It’s kind of,

00:41:53 it’s kind of cool to see them work, but yeah. Okay. So you always love, you’re always a groupie

00:41:58 of cognitive psychology. Yeah. Yeah. And so, so it was in a class by Richard Garrig. He was kind of

00:42:04 like my favorite psych professor in college. And I took like three different classes with him

00:42:11 and yeah. So they were talking specifically the class, I think was kind of a,

00:42:17 there was a big paper that was written by Steven Pinker and Prince. I don’t, I’m blanking on

00:42:22 Prince’s first name, but Prince and Pinker and Prince, they wrote kind of a, they were at that

00:42:28 time kind of like, ah, I’m blanking on the names of the current people. The cognitive scientists

00:42:36 who are complaining a lot about deep networks. Oh, Gary, Gary Marcus, Marcus and who else? I mean,

00:42:44 there’s a few, but Gary, Gary’s the most feisty. Sure. Gary’s very feisty. And with this, with his

00:42:49 coauthor, they, they, you know, they’re kind of doing these kinds of take downs where they say,

00:42:52 okay, well, yeah, it does all these amazing, amazing things, but here’s a shortcoming. Here’s

00:42:56 a shortcoming. Here’s a shortcoming. And so the Pinker Prince paper is kind of like the,

00:43:01 that generation’s version of Marcus and Davis, right? Where they’re, they’re trained as cognitive

00:43:07 scientists, but they’re looking skeptically at the results in the, in the artificial intelligence,

00:43:12 neural net kind of world and saying, yeah, it can do this and this and this, but low,

00:43:16 it can’t do that. And it can’t do that. And it can’t do that maybe in principle or maybe just

00:43:20 in practice at this point. But, but the fact of the matter is you’re, you’ve narrowed your focus

00:43:26 too far to be impressed. You know, you’re impressed with the things within that circle,

00:43:30 but you need to broaden that circle a little bit. You need to look at a wider set of problems.

00:43:34 And so, so we had, so I was in this seminar in college that was basically a close reading of

00:43:40 the Pinker Prince paper, which was like really thick. There was a lot going on in there. And,

00:43:47 and it, you know, and it talked about the reinforcement learning idea a little bit.

00:43:51 I’m like, oh, that sounds really cool because behavior is what is really interesting to me

00:43:55 about psychology anyway. So making programs that, I mean, programs are things that behave.

00:44:00 People are things that behave. Like I want to make learning that learns to behave.

00:44:05 And which way was reinforcement learning presented? Is this talking about human and

00:44:09 animal behavior or are we talking about actual mathematical construct?

00:44:12 Ah, that’s right. So that’s a good question. Right. So this is, I think it wasn’t actually

00:44:17 talked about as behavior in the paper that I was reading. I think that it just talked about

00:44:22 learning. And to me, learning is about learning to behave, but really neural nets at that point

00:44:27 were about learning like supervised learning. So learning to produce outputs from inputs.

00:44:31 So I kind of tried to invent reinforcement learning. When I graduated, I joined a research

00:44:36 group at Bellcore, which had spun out of Bell Labs recently at that time because of the divestiture

00:44:42 of the long distance and local phone service in the 1980s, 1984. And I was in a group with

00:44:50 Dave Ackley, who was the first author of the Boltzmann machine paper. So the very first neural

00:44:56 net paper that could handle XOR, right? So XOR sort of killed neural nets. The very first,

00:45:02 the zero with the first winter. Yeah. Um, the, the perceptrons paper and Hinton along with his

00:45:10 student, Dave Ackley, and I think there was other authors as well showed that no, no, no,

00:45:14 with Boltzmann machines, we can actually learn nonlinear concepts. And so everything’s back on

00:45:19 the table again. And that kind of started that second wave of neural networks. So Dave Ackley

00:45:24 was, he became my mentor at, at Bellcore and we talked a lot about learning and life and

00:45:30 computation and how all these things fit together. Now Dave and I have a podcast together. So,

00:45:35 so I get to kind of enjoy that sort of his, his perspective once again, even, even all these years

00:45:42 later. And so I said, so I said, I was really interested in learning, but in the concept of

00:45:48 behavior and he’s like, oh, well that’s reinforcement learning here. And he gave me

00:45:52 Rich Sutton’s 1984 TD paper. So I read that paper. I honestly didn’t get all of it,

00:45:58 but I got the idea. I got that they were using, that he was using ideas that I was familiar with

00:46:04 in the context of neural nets and, and like sort of back prop. But with this idea of making

00:46:09 predictions over time, I’m like, this is so interesting, but I don’t really get all the

00:46:13 details I said to Dave. And Dave said, oh, well, why don’t we have him come and give a talk?

00:46:18 And I was like, wait, what, you can do that? Like, these are real people. I thought they

00:46:23 were just words. I thought it was just like ideas that somehow magically seeped into paper. He’s

00:46:28 like, no, I, I, I know Rich like, we’ll just have him come down and he’ll give a talk. And so I was,

00:46:35 you know, my mind was blown. And so Rich came and he gave a talk at Bellcore and he talked about

00:46:41 what he was super excited, which was they had just figured out at the time Q learning. So Watkins

00:46:48 had visited the Rich Sutton’s lab at, at UMass or Andy Bartow’s lab that Rich was a part of.

00:46:55 And, um, he was really excited about this because it resolved a whole bunch of problems that he

00:47:00 didn’t know how to resolve in the, in the earlier paper. And so, um,

00:47:05 For people who don’t know TD, temporal difference, these are all just algorithms

00:47:09 for reinforcement learning.

00:47:10 Right. And TD, temporal difference in particular is about making predictions over time. And you can

00:47:15 try to use it for making decisions, right? Cause if you can predict how good a future action or an

00:47:19 action outcomes will be in the future, you can choose one that has better and, or, but the thing

00:47:24 that’s really cool about Q learning is it was off policy, which meant that you could actually be

00:47:29 learning about the environment and what the value of different actions would be while actually

00:47:33 figuring out how to behave optimally. So that was a revelation.

00:47:38 Yeah. And the proof of that is kind of interesting. I mean, that’s really surprising

00:47:41 to me when I first read that paper. I mean, it’s, it’s, it’s, it’s, it’s, it’s, it’s, it’s,

00:47:46 it’s, it’s, it’s, it’s, it’s, it’s, it’s, it’s, it’s, it’s, it’s, it’s, it’s, it’s, it’s, it’s, it’s,

00:47:51 it’s interesting. I mean, that’s really surprising to me when I first read that and then in Richard,

00:47:55 Rich Sutton’s book on the matter, it’s, it’s kind of a beautiful that a single equation can

00:48:01 capture all one line of code and like, you can learn anything. Yeah. Like enough time.

00:48:06 So equation and code, you’re right. Like you can the code that you can arguably, at least

00:48:13 if you like squint your eyes can say,

00:48:17 this is all of intelligence is that you can implement

00:48:21 that in a single one.

00:48:22 I think I started with Lisp, which is a shout out to Lisp

00:48:26 with like a single line of code, key piece of code,

00:48:29 maybe a couple that you could do that.

00:48:32 It’s kind of magical.

00:48:33 It’s feels too good to be true.

00:48:37 Well, and it sort of is.

00:48:38 Yeah, kind of.

00:48:40 It seems to require an awful lot

00:48:41 of extra stuff supporting it.

00:48:43 But nonetheless, the idea is really good.

00:48:46 And as far as we know, it is a very reasonable way

00:48:50 of trying to create adaptive behavior,

00:48:52 behavior that gets better at something over time.

00:48:56 Did you find the idea of optimal at all compelling

00:49:00 that you could prove that it’s optimal?

00:49:02 So like one part of computer science

00:49:04 that it makes people feel warm and fuzzy inside

00:49:08 is when you can prove something like

00:49:10 that a sorting algorithm worst case runs

00:49:13 and N log N, and it makes everybody feel so good.

00:49:16 Even though in reality, it doesn’t really matter

00:49:18 what the worst case is, what matters is like,

00:49:20 does this thing actually work in practice

00:49:22 on this particular actual set of data that I enjoy?

00:49:26 Did you?

00:49:26 So here’s a place where I have maybe a strong opinion,

00:49:29 which is like, you’re right, of course, but no, no.

00:49:34 Like, so what makes worst case so great, right?

00:49:37 If you have a worst case analysis so great

00:49:39 is that you get modularity.

00:49:41 You can take that thing and plug it into another thing

00:49:44 and still have some understanding of what’s gonna happen

00:49:47 when you click them together, right?

00:49:49 If it just works well in practice, in other words,

00:49:51 with respect to some distribution that you care about,

00:49:54 when you go plug it into another thing,

00:49:56 that distribution can shift, it can change,

00:49:58 and your thing may not work well anymore.

00:50:00 And you want it to, and you wish it does,

00:50:02 and you hope that it will, but it might not,

00:50:04 and then, ah.

00:50:06 So you’re saying you don’t like machine learning.

00:50:13 But we have some positive theoretical results

00:50:15 for these things.

00:50:17 You can come back at me with,

00:50:20 yeah, but they’re really weak,

00:50:21 and yeah, they’re really weak.

00:50:22 And you can even say that sorting algorithms,

00:50:25 like if you do the optimal sorting algorithm,

00:50:27 it’s not really the one that you want,

00:50:30 and that might be true as well.

00:50:31 But it is, the modularity is a really powerful statement.

00:50:34 I really like that.

00:50:35 If you’re an engineer, you can then assemble

00:50:36 different things, you can count on them to be,

00:50:39 I mean, it’s interesting.

00:50:42 It’s a balance, like with everything else in life,

00:50:45 you don’t want to get too obsessed.

00:50:47 I mean, this is what computer scientists do,

00:50:48 which they tend to get obsessed,

00:50:51 and they overoptimize things,

00:50:53 or they start by optimizing, and then they overoptimize.

00:50:56 So it’s easy to get really granular about this thing,

00:51:00 but like the step from an n squared to an n log n

00:51:06 sorting algorithm is a big leap for most real world systems.

00:51:10 No matter what the actual behavior of the system is,

00:51:13 that’s a big leap.

00:51:14 And the same can probably be said

00:51:17 for other kind of first leaps

00:51:20 that you would take on a particular problem.

00:51:22 Like it’s picking the low hanging fruit,

00:51:25 or whatever the equivalent of doing the,

00:51:29 not the dumbest thing, but the next to the dumbest thing.

00:51:32 Picking the most delicious reachable fruit.

00:51:34 Yeah, most delicious reachable fruit.

00:51:36 I don’t know why that’s not a saying.

00:51:38 Yeah.

00:51:39 Okay, so then this is the 80s,

00:51:44 and this kind of idea starts to percolate of learning.

00:51:47 At that point, I got to meet Rich Sutton,

00:51:50 so everything was sort of downhill from there,

00:51:52 and that was really the pinnacle of everything.

00:51:55 But then I felt like I was kind of on the inside.

00:51:58 So then as interesting results were happening,

00:52:00 I could like check in with Rich or with Jerry Tesaro,

00:52:03 who had a huge impact on kind of early thinking

00:52:06 in temporal difference learning and reinforcement learning

00:52:10 and showed that you could do,

00:52:11 you could solve problems

00:52:12 that we didn’t know how to solve any other way.

00:52:16 And so that was really cool.

00:52:17 So as good things were happening,

00:52:18 I would hear about it from either the people

00:52:20 who were doing it,

00:52:21 or the people who were talking to the people

00:52:23 who were doing it.

00:52:23 And so I was able to track things pretty well

00:52:25 through the 90s.

00:52:28 So what wasn’t most of the excitement

00:52:32 on reinforcement learning in the 90s era

00:52:34 with, what is it, TD Gamma?

00:52:37 Like what’s the role of these kind of little

00:52:40 like fun game playing things and breakthroughs

00:52:43 about exciting the community?

00:52:46 Was that, like what were your,

00:52:48 because you’ve also built across,

00:52:50 or part of building across a puzzle solver,

00:52:56 solving program called proverb.

00:53:00 So you were interested in this as a problem,

00:53:05 like in forming, using games to understand

00:53:09 how to build intelligence systems.

00:53:12 So like, what did you think about TD Gamma?

00:53:14 Like what did you think about that whole thing in the 90s?

00:53:16 Yeah, I mean, I found the TD Gamma result

00:53:19 really just remarkable.

00:53:20 So I had known about some of Jerry’s stuff

00:53:22 before he did TD Gamma and he did a system,

00:53:24 just more vanilla, well, not entirely vanilla,

00:53:27 but a more classical back proppy kind of network

00:53:31 for playing backgammon,

00:53:32 where he was training it on expert moves.

00:53:35 So it was kind of supervised,

00:53:37 but the way that it worked was not to mimic the actions,

00:53:41 but to learn internally an evaluation function.

00:53:44 So to learn, well, if the expert chose this over this,

00:53:47 that must mean that the expert values this more than this.

00:53:50 And so let me adjust my weights to make it

00:53:52 so that the network evaluates this

00:53:54 as being better than this.

00:53:56 So it could learn from human preferences,

00:53:59 it could learn its own preferences.

00:54:02 And then when he took the step from that

00:54:04 to actually doing it

00:54:06 as a full on reinforcement learning problem,

00:54:08 where you didn’t need a trainer,

00:54:10 you could just let it play, that was remarkable, right?

00:54:13 And so I think as humans often do,

00:54:17 as we’ve done in the recent past as well,

00:54:20 people extrapolate.

00:54:22 It’s like, oh, well, if you can do that,

00:54:23 which is obviously very hard,

00:54:24 then obviously you could do all these other problems

00:54:27 that we wanna solve that we know are also really hard.

00:54:31 And it turned out very few of them ended up being practical,

00:54:35 partly because I think neural nets,

00:54:38 certainly at the time,

00:54:39 were struggling to be consistent and reliable.

00:54:42 And so training them in a reinforcement learning setting

00:54:45 was a bit of a mess.

00:54:46 I had, I don’t know, generation after generation

00:54:50 of like master students

00:54:51 who wanted to do value function approximation,

00:54:55 basically reinforcement learning with neural nets.

00:54:59 And over and over and over again, we were failing.

00:55:03 We couldn’t get the good results that Jerry Tesaro got.

00:55:06 I now believe that Jerry is a neural net whisperer.

00:55:09 He has a particular ability to get neural networks

00:55:14 to do things that other people would find impossible.

00:55:18 And it’s not the technology,

00:55:19 it’s the technology and Jerry together.

00:55:22 Which I think speaks to the role of the human expert

00:55:27 in the process of machine learning.

00:55:28 Right, it’s so easy.

00:55:30 We’re so drawn to the idea that it’s the technology

00:55:32 that is where the power is coming from

00:55:36 that I think we lose sight of the fact

00:55:38 that sometimes you need a really good,

00:55:39 just like, I mean, no one would think,

00:55:40 hey, here’s this great piece of software.

00:55:42 Here’s like, I don’t know, GNU Emacs or whatever.

00:55:44 And doesn’t that prove that computers are super powerful

00:55:48 and basically gonna take over the world?

00:55:49 It’s like, no, Stalman is a hell of a hacker, right?

00:55:52 So he was able to make the code do these amazing things.

00:55:55 He couldn’t have done it without the computer,

00:55:57 but the computer couldn’t have done it without him.

00:55:59 And so I think people discount the role of people

00:56:02 like Jerry who have just a particular set of skills.

00:56:07 On that topic, by the way, as a small side note,

00:56:10 I tweeted Emacs is greater than Vim yesterday

00:56:14 and deleted the tweet 10 minutes later

00:56:18 when I realized it started a war.

00:56:21 I was like, oh, I was just kidding.

00:56:24 I was just being, and I’m gonna walk back and forth.

00:56:29 So people still feel passionately

00:56:30 about that particular piece of good stuff.

00:56:32 Yeah, I don’t get that

00:56:33 because Emacs is clearly so much better, I don’t understand.

00:56:37 But why do I say that?

00:56:38 Because I spent a block of time in the 80s

00:56:43 making my fingers know the Emacs keys

00:56:46 and now that’s part of the thought process for me.

00:56:49 Like I need to express, and if you take that,

00:56:51 if you take my Emacs key bindings away, I become…

00:56:57 I can’t express myself.

00:56:58 I’m the same way with the,

00:56:59 I don’t know if you know what it is,

00:57:01 but it’s a Kinesis keyboard, which is this butt shaped keyboard.

00:57:05 Yes, I’ve seen them.

00:57:06 They’re very, I don’t know, sexy, elegant?

00:57:10 They’re just beautiful.

00:57:11 Yeah, they’re gorgeous, way too expensive.

00:57:14 But the problem with them, similar with Emacs,

00:57:19 is once you learn to use it.

00:57:23 It’s harder to use other things.

00:57:24 It’s hard to use other things.

00:57:26 There’s this absurd thing where I have like small, elegant,

00:57:29 lightweight, beautiful little laptops

00:57:31 and I’m sitting there in a coffee shop

00:57:33 with a giant Kinesis keyboard and a sexy little laptop.

00:57:36 It’s absurd, but I used to feel bad about it,

00:57:40 but at the same time, you just kind of have to,

00:57:42 sometimes it’s back to the Billy Joel thing.

00:57:44 You just have to throw that Billy Joel record

00:57:47 and throw Taylor Swift and Justin Bieber to the wind.

00:57:51 So…

00:57:52 See, but I like them now because again,

00:57:54 I have no musical taste.

00:57:55 Like now that I’ve heard Justin Bieber enough,

00:57:57 I’m like, I really like his songs.

00:57:59 And Taylor Swift, not only do I like her songs,

00:58:02 but my daughter’s convinced that she’s a genius.

00:58:04 And so now I basically have signed onto that.

00:58:07 So…

00:58:08 So yeah, that speaks to the,

00:58:10 back to the robustness of the human brain.

00:58:11 That speaks to the neuroplasticity

00:58:13 that you can just like a mouse teach yourself to,

00:58:17 or probably a dog teach yourself to enjoy Taylor Swift.

00:58:21 I’ll try it out.

00:58:22 I don’t know.

00:58:23 I try, you know what?

00:58:25 It has to do with just like acclimation, right?

00:58:28 Just like you said, a couple of weeks.

00:58:29 Yeah.

00:58:30 That’s an interesting experiment.

00:58:31 I’ll actually try that.

00:58:32 Like I’ll listen to it.

00:58:33 That wasn’t the intent of the experiment?

00:58:33 Just like social media,

00:58:34 it wasn’t intended as an experiment

00:58:36 to see what we can take as a society,

00:58:38 but it turned out that way.

00:58:39 I don’t think I’ll be the same person

00:58:40 on the other side of the week listening to Taylor Swift,

00:58:43 but let’s try.

00:58:44 No, it’s more compartmentalized.

00:58:45 Don’t be so worried.

00:58:46 Like it’s, like I get that you can be worried,

00:58:48 but don’t be so worried

00:58:49 because we compartmentalize really well.

00:58:51 And so it won’t bleed into other parts of your life.

00:58:53 You won’t start, I don’t know,

00:58:56 wearing red lipstick or whatever.

00:58:57 Like it’s fine.

00:58:58 It’s fine.

00:58:59 It changed fashion and everything.

00:58:59 It’s fine.

00:59:00 But you know what?

00:59:01 The thing you have to watch out for

00:59:02 is you’ll walk into a coffee shop

00:59:03 once we can do that again.

00:59:05 And recognize the song?

00:59:06 And you’ll be, no,

00:59:07 you won’t know that you’re singing along

00:59:09 until everybody in the coffee shop is looking at you.

00:59:11 And then you’re like, that wasn’t me.

00:59:16 Yeah, that’s the, you know,

00:59:17 people are afraid of AGI.

00:59:18 I’m afraid of the Taylor Swift.

00:59:21 The Taylor Swift takeover.

00:59:22 Yeah, and I mean, people should know that TD Gammon was,

00:59:26 I get, would you call it,

00:59:28 do you like the terminology of self play by any chance?

00:59:31 So like systems that learn by playing themselves.

00:59:35 Just, I don’t know if it’s the best word, but.

00:59:38 So what’s the problem with that term?

00:59:41 I don’t know.

00:59:42 So it’s like the big bang,

00:59:43 like it’s like talking to a serious physicist.

00:59:46 Do you like the term big bang?

00:59:47 And when it was early,

00:59:49 I feel like it’s the early days of self play.

00:59:51 I don’t know, maybe it was used previously,

00:59:53 but I think it’s been used by only a small group of people.

00:59:57 And so like, I think we’re still deciding

00:59:59 is this ridiculously silly name a good name

01:00:02 for potentially one of the most important concepts

01:00:05 in artificial intelligence?

01:00:07 Okay, it depends how broadly you apply the term.

01:00:09 So I used the term in my 1996 PhD dissertation.

01:00:12 Wow, the actual terms of self play.

01:00:14 Yeah, because Tesoro’s paper was something like

01:00:18 training up an expert backgammon player through self play.

01:00:21 So I think it was in the title of his paper.

01:00:24 If not in the title, it was definitely a term that he used.

01:00:27 There’s another term that we got from that work is rollout.

01:00:29 So I don’t know if you, do you ever hear the term rollout?

01:00:32 That’s a backgammon term that has now applied

01:00:35 generally in computers, well, at least in AI

01:00:38 because of TD gammon.

01:00:40 That’s fascinating.

01:00:41 So how is self play being used now?

01:00:43 And like, why is it,

01:00:44 does it feel like a more general powerful concept

01:00:46 is sort of the idea of,

01:00:47 well, the machine’s just gonna teach itself to be smart.

01:00:50 Yeah, so that’s where maybe you can correct me,

01:00:53 but that’s where the continuation of the spirit

01:00:56 and actually like literally the exact algorithms

01:01:00 of TD gammon are applied by DeepMind and OpenAI

01:01:03 to learn games that are a little bit more complex

01:01:07 that when I was learning artificial intelligence,

01:01:09 Go was presented to me

01:01:10 with artificial intelligence, the modern approach.

01:01:13 I don’t know if they explicitly pointed to Go

01:01:16 in those books as like unsolvable kind of thing,

01:01:20 like implying that these approaches hit their limit

01:01:24 in this, with these particular kind of games.

01:01:26 So something, I don’t remember if the book said it or not,

01:01:29 but something in my head,

01:01:31 or if it was the professors instilled in me the idea

01:01:34 like this is the limits of artificial intelligence

01:01:37 of the field.

01:01:38 Like it instilled in me the idea

01:01:40 that if we can create a system that can solve the game of Go

01:01:44 we’ve achieved AGI.

01:01:46 That was kind of, I didn’t explicitly like say this,

01:01:49 but that was the feeling.

01:01:51 And so from, I was one of the people that it seemed magical

01:01:54 when a learning system was able to beat

01:01:59 a human world champion at the game of Go

01:02:02 and even more so from that, that was AlphaGo,

01:02:06 even more so with AlphaGo Zero

01:02:08 than kind of renamed and advanced into AlphaZero

01:02:11 beating a world champion or world class player

01:02:16 without any supervised learning on expert games.

01:02:21 We’re doing only through by playing itself.

01:02:24 So that is, I don’t know what to make of it.

01:02:29 I think it would be interesting to hear

01:02:31 what your opinions are on just how exciting,

01:02:35 surprising, profound, interesting, or boring

01:02:40 the breakthrough performance of AlphaZero was.

01:02:45 Okay, so AlphaGo knocked my socks off.

01:02:48 That was so remarkable.

01:02:50 Which aspect of it?

01:02:52 That they got it to work,

01:02:55 that they actually were able to leverage

01:02:57 a whole bunch of different ideas,

01:02:58 integrate them into one giant system.

01:03:01 Just the software engineering aspect of it is mind blowing.

01:03:04 I don’t, I’ve never been a part of a program

01:03:06 as complicated as the program that they built for that.

01:03:09 And just the, like Jerry Tesaro is a neural net whisperer,

01:03:14 like David Silver is a kind of neural net whisperer too.

01:03:17 He was able to coax these networks

01:03:19 and these new way out there architectures

01:03:22 to do these, solve these problems that,

01:03:25 as you said, when we were learning from AI,

01:03:31 no one had an idea how to make it work.

01:03:32 It was remarkable that these techniques

01:03:35 that were so good at playing chess

01:03:40 and that could beat the world champion in chess

01:03:42 couldn’t beat your typical Go playing teenager in Go.

01:03:46 So the fact that in a very short number of years,

01:03:49 we kind of ramped up to trouncing people in Go

01:03:54 just blew me away.

01:03:55 So you’re kind of focusing on the engineering aspect,

01:03:58 which is also very surprising.

01:04:00 I mean, there’s something different

01:04:02 about large, well funded companies.

01:04:05 I mean, there’s a compute aspect to it too.

01:04:07 Like that, of course, I mean, that’s similar

01:04:11 to Deep Blue, right, with IBM.

01:04:14 Like there’s something important to be learned

01:04:16 and remembered about a large company

01:04:19 taking the ideas that are already out there

01:04:22 and investing a few million dollars into it or more.

01:04:26 And so you’re kind of saying the engineering

01:04:29 is kind of fascinating, both on the,

01:04:32 with AlphaGo is probably just gathering all the data,

01:04:35 right, of the expert games, like organizing everything,

01:04:38 actually doing distributed supervised learning.

01:04:42 And to me, see the engineering I kind of took for granted,

01:04:49 to me philosophically being able to persist

01:04:55 in the face of like long odds,

01:04:57 because it feels like for me,

01:05:00 I would be one of the skeptical people in the room

01:05:02 thinking that you can learn your way to beat Go.

01:05:05 Like it sounded like, especially with David Silver,

01:05:08 it sounded like David was not confident at all.

01:05:11 So like it was, like not,

01:05:15 it’s funny how confidence works.

01:05:18 It’s like, you’re not like cocky about it, like, but.

01:05:24 Right, because if you’re cocky about it,

01:05:26 you kind of stop and stall and don’t get anywhere.

01:05:28 But there’s like a hope that’s unbreakable.

01:05:31 Maybe that’s better than confidence.

01:05:33 It’s a kind of wishful hope and a little dream.

01:05:36 And you almost don’t want to do anything else.

01:05:38 You kind of keep doing it.

01:05:40 That’s, that seems to be the story and.

01:05:43 But with enough skepticism that you’re looking

01:05:45 for where the problems are and fighting through them.

01:05:48 Cause you know, there’s gotta be a way out of this thing.

01:05:51 And for him, it was probably,

01:05:52 there’s a bunch of little factors that come into play.

01:05:55 It’s funny how these stories just all come together.

01:05:57 Like everything he did in his life came into play,

01:06:00 which is like a love for video games

01:06:02 and also a connection to,

01:06:05 so the nineties had to happen with TD Gammon and so on.

01:06:09 In some ways it’s surprising,

01:06:10 maybe you can provide some intuition to it

01:06:13 that not much more than TD Gammon was done

01:06:16 for quite a long time on the reinforcement learning front.

01:06:19 Is that weird to you?

01:06:21 I mean, like I said, the students who I worked with,

01:06:24 we tried to get, basically apply that architecture

01:06:27 to other problems and we consistently failed.

01:06:30 There were a couple of really nice demonstrations

01:06:33 that ended up being in the literature.

01:06:35 There was a paper about controlling elevators, right?

01:06:38 Where it’s like, okay, can we modify the heuristic

01:06:42 that elevators use for deciding,

01:06:43 like a bank of elevators for deciding which floors

01:06:46 we should be stopping on to maximize throughput essentially.

01:06:50 And you can set that up as a reinforcement learning problem

01:06:52 and you can have a neural net represent the value function

01:06:55 so that it’s taking where all the elevators,

01:06:57 where the button pushes, you know, this high dimensional,

01:07:00 well, at the time high dimensional input,

01:07:03 you know, a couple of dozen dimensions

01:07:05 and turn that into a prediction as to,

01:07:07 oh, is it gonna be better if I stop at this floor or not?

01:07:10 And ultimately it appeared as though

01:07:13 for the standard simulation distribution

01:07:16 for people trying to leave the building

01:07:18 at the end of the day,

01:07:19 that the neural net learned a better strategy

01:07:21 than the standard one that’s implemented

01:07:22 in elevator controllers.

01:07:24 So that was nice.

01:07:26 There was some work that Satyendra Singh et al

01:07:28 did on handoffs with cell phones,

01:07:34 you know, deciding when should you hand off

01:07:36 from this cell tower to this cell tower.

01:07:38 Oh, okay, communication networks, yeah.

01:07:39 Yeah, and so a couple of things

01:07:42 seemed like they were really promising.

01:07:44 None of them made it into production that I’m aware of.

01:07:46 And neural nets as a whole started

01:07:48 to kind of implode around then.

01:07:50 And so there just wasn’t a lot of air in the room

01:07:53 for people to try to figure out,

01:07:55 okay, how do we get this to work in the RL setting?

01:07:58 And then they found their way back in 10 plus years.

01:08:03 So you said AlphaGo was impressive,

01:08:05 like it’s a big spectacle.

01:08:06 Is there, is that?

01:08:07 Right, so then AlphaZero.

01:08:09 So I think I may have a slightly different opinion

01:08:11 on this than some people.

01:08:12 So I talked to Satyendra Singh in particular about this.

01:08:15 So Satyendra was like Rich Sutton,

01:08:18 a student of Andy Bartow.

01:08:19 So they came out of the same lab,

01:08:21 very influential machine learning,

01:08:23 reinforcement learning researcher.

01:08:26 Now at DeepMind, as is Rich.

01:08:29 Though different sites, the two of them.

01:08:31 He’s in Alberta.

01:08:33 Rich is in Alberta and Satyendra would be in England,

01:08:36 but I think he’s in England from Michigan at the moment.

01:08:39 But the, but he was, yes,

01:08:41 he was much more impressed with AlphaGo Zero,

01:08:46 which is didn’t get a kind of a bootstrap

01:08:50 in the beginning with human trained games.

01:08:51 It just was purely self play.

01:08:53 Though the first one AlphaGo

01:08:55 was also a tremendous amount of self play, right?

01:08:58 They started off, they kickstarted the action network

01:09:01 that was making decisions,

01:09:02 but then they trained it for a really long time

01:09:04 using more traditional temporal difference methods.

01:09:08 So as a result, I didn’t,

01:09:09 it didn’t seem that different to me.

01:09:11 Like, it seems like, yeah, why wouldn’t that work?

01:09:15 Like once you, once it works, it works.

01:09:17 So what, but he found that removal

01:09:21 of that extra information to be breathtaking.

01:09:23 Like that’s a game changer.

01:09:25 To me, the first thing was more of a game changer.

01:09:27 But the open question, I mean,

01:09:29 I guess that’s the assumption is the expert games

01:09:32 might contain within them a humongous amount of information.

01:09:39 But we know that it went beyond that, right?

01:09:41 We know that it somehow got away from that information

01:09:43 because it was learning strategies.

01:09:45 I don’t think AlphaGo is just better

01:09:48 at implementing human strategies.

01:09:50 I think it actually developed its own strategies

01:09:52 that were more effective.

01:09:54 And so from that perspective, okay, well,

01:09:56 so it made at least one quantum leap

01:10:00 in terms of strategic knowledge.

01:10:02 Okay, so now maybe it makes three, like, okay.

01:10:05 But that first one is the doozy, right?

01:10:07 Getting it to work reliably and for the networks

01:10:11 to hold onto the value well enough.

01:10:13 Like that was a big step.

01:10:16 Well, maybe you could speak to this

01:10:17 on the reinforcement learning front.

01:10:19 So starting from scratch and learning to do something,

01:10:25 like the first like random behavior

01:10:29 to like crappy behavior to like somewhat okay behavior.

01:10:34 It’s not obvious to me that that’s not like impossible

01:10:39 to take those steps.

01:10:41 Like if you just think about the intuition,

01:10:43 like how the heck does random behavior

01:10:46 become somewhat basic intelligent behavior?

01:10:51 Not human level, not superhuman level, but just basic.

01:10:55 But you’re saying to you kind of the intuition is like,

01:10:58 if you can go from human to superhuman level intelligence

01:11:01 on this particular task of game playing,

01:11:04 then so you’re good at taking leaps.

01:11:07 So you can take many of them.

01:11:08 That the system, I believe that the system

01:11:10 can take that kind of leap.

01:11:12 Yeah, and also I think that beginner knowledge in go,

01:11:17 like you can start to get a feel really quickly

01:11:19 for the idea that being in certain parts of the board

01:11:25 seems to be more associated with winning, right?

01:11:28 Cause it’s not stumbling upon the concept of winning.

01:11:32 It’s told that it wins or that it loses.

01:11:34 Well, it’s self play.

01:11:35 So it both wins and loses.

01:11:36 It’s told which side won.

01:11:39 And the information is kind of there

01:11:41 to start percolating around to make a difference as to,

01:11:46 well, these things have a better chance of helping you win.

01:11:48 And these things have a worse chance of helping you win.

01:11:50 And so it can get to basic play, I think pretty quickly.

01:11:54 Then once it has basic play,

01:11:55 well now it’s kind of forced to do some search

01:11:58 to actually experiment with, okay,

01:12:00 well what gets me that next increment of improvement?

01:12:04 How far do you think, okay, this is where you kind of

01:12:07 bring up the Elon Musk and the Sam Harris, right?

01:12:10 How far is your intuition about these kinds

01:12:13 of self play mechanisms being able to take us?

01:12:16 Cause it feels, one of the ominous but stated calmly things

01:12:23 that when I talked to David Silver, he said,

01:12:25 is that they have not yet discovered a ceiling

01:12:29 for Alpha Zero, for example, in the game of Go or chess.

01:12:32 Like it keeps, no matter how much they compute,

01:12:35 they throw at it, it keeps improving.

01:12:37 So it’s possible, it’s very possible that if you throw,

01:12:43 you know, some like 10 X compute that it will improve

01:12:46 by five X or something like that.

01:12:48 And when stated calmly, it’s so like, oh yeah, I guess so.

01:12:54 But like, and then you think like,

01:12:56 well, can we potentially have like continuations

01:13:00 of Moore’s law in totally different way,

01:13:02 like broadly defined Moore’s law,

01:13:04 not the exponential improvement, like,

01:13:08 are we going to have an Alpha Zero that swallows the world?

01:13:13 But notice it’s not getting better at other things.

01:13:15 It’s getting better at Go.

01:13:16 And I think that’s a big leap to say,

01:13:19 okay, well, therefore it’s better at other things.

01:13:22 Well, I mean, the question is how much of the game of life

01:13:26 can be turned into.

01:13:27 Right, so that I think is a really good question.

01:13:30 And I think that we don’t, I don’t think we as a,

01:13:32 I don’t know, community really know the answer to this,

01:13:34 but so, okay, so I went to a talk

01:13:39 by some experts on computer chess.

01:13:43 So in particular, computer chess is really interesting

01:13:45 because for, of course, for a thousand years,

01:13:49 humans were the best chess playing things on the planet.

01:13:52 And then computers like edged ahead of the best person.

01:13:56 And they’ve been ahead ever since.

01:13:57 It’s not like people have overtaken computers.

01:14:01 But computers and people together

01:14:05 have overtaken computers.

01:14:07 So at least last time I checked,

01:14:09 I don’t know what the very latest is,

01:14:10 but last time I checked that there were teams of people

01:14:14 who could work with computer programs

01:14:16 to defeat the best computer programs.

01:14:17 In the game of Go?

01:14:18 In the game of chess.

01:14:19 In the game of chess.

01:14:20 Right, and so using the information about how,

01:14:25 these things called ELO scores,

01:14:27 this sort of notion of how strong a player are you.

01:14:30 There’s kind of a range of possible scores.

01:14:32 And you increment in score,

01:14:35 basically if you can beat another player

01:14:37 of that lower score 62% of the time or something like that.

01:14:41 Like there’s some threshold

01:14:42 of if you can somewhat consistently beat someone,

01:14:46 then you are of a higher score than that person.

01:14:48 And there’s a question as to how many times

01:14:50 can you do that in chess, right?

01:14:52 And so we know that there’s a range of human ability levels

01:14:55 that cap out with the best playing humans.

01:14:57 And the computers went a step beyond that.

01:15:00 And computers and people together have not gone,

01:15:03 I think a full step beyond that.

01:15:05 It feels, the estimates that they have

01:15:07 is that it’s starting to asymptote.

01:15:09 That we’ve reached kind of the maximum,

01:15:11 the best possible chess playing.

01:15:13 And so that means that there’s kind of

01:15:15 a finite strategic depth, right?

01:15:18 At some point you just can’t get any better at this game.

01:15:21 Yeah, I mean, I don’t, so I’ll actually check that.

01:15:25 I think it’s interesting because if you have somebody

01:15:29 like Magnus Carlsen, who’s using these chess programs

01:15:34 to train his mind, like to learn about chess.

01:15:37 To become a better chess player, yeah.

01:15:38 And so like, that’s a very interesting thing

01:15:41 because we’re not static creatures.

01:15:43 We’re learning together.

01:15:45 I mean, just like we’re talking about social networks,

01:15:47 those algorithms are teaching us

01:15:49 just like we’re teaching those algorithms.

01:15:51 So that’s a fascinating thing.

01:15:52 But I think the best chess playing programs

01:15:57 are now better than the pairs.

01:15:58 Like they have competition between pairs,

01:16:00 but it’s still, even if they weren’t,

01:16:03 it’s an interesting question, where’s the ceiling?

01:16:06 So the David, the ominous David Silver kind of statement

01:16:09 is like, we have not found the ceiling.

01:16:12 Right, so the question is, okay,

01:16:14 so I don’t know his analysis on that.

01:16:16 My, from talking to Go experts,

01:16:20 the depth, the strategic depth of Go

01:16:22 seems to be substantially greater than that of chess.

01:16:25 That there’s more kind of steps of improvement

01:16:27 that you can make, getting better and better

01:16:29 and better and better.

01:16:30 But there’s no reason to think that it’s infinite.

01:16:32 Infinite, yeah.

01:16:33 And so it could be that what David is seeing

01:16:37 is a kind of asymptoting that you can keep getting better,

01:16:39 but with diminishing returns.

01:16:41 And at some point you hit optimal play.

01:16:43 Like in theory, all these finite games, they’re finite.

01:16:47 They have an optimal strategy.

01:16:49 There’s a strategy that is the minimax optimal strategy.

01:16:51 And so at that point, you can’t get any better.

01:16:54 You can’t beat that strategy.

01:16:56 Now that strategy may be,

01:16:58 from an information processing perspective, intractable.

01:17:02 Right, you need, all the situations

01:17:06 are sufficiently different that you can’t compress it at all.

01:17:08 It’s this giant mess of hardcoded rules.

01:17:12 And we can never achieve that.

01:17:14 But that still puts a cap on how many levels of improvement

01:17:17 that we can actually make.

01:17:19 But the thing about self play is if you put it,

01:17:23 although I don’t like doing that,

01:17:24 in the broader category of self supervised learning,

01:17:28 is that it doesn’t require too much or any human input.

01:17:31 Human labeling, yeah.

01:17:32 Yeah, human label or just human effort.

01:17:34 The human involvement passed a certain point.

01:17:37 And the same thing you could argue is true

01:17:41 for the recent breakthroughs in natural language processing

01:17:44 with language models.

01:17:45 Oh, this is how you get to GPT3.

01:17:47 Yeah, see how that did the.

01:17:49 That was a good transition.

01:17:51 Yeah, I practiced that for days leading up to this now.

01:17:56 But like that’s one of the questions is,

01:17:59 can we find ways to formulate problems in this world

01:18:03 that are important to us humans,

01:18:05 like more important than the game of chess,

01:18:08 that to which self supervised kinds of approaches

01:18:12 could be applied?

01:18:13 Whether it’s self play, for example,

01:18:15 for like maybe you could think of like autonomous vehicles

01:18:19 in simulation, that kind of stuff,

01:18:22 or just robotics applications and simulation,

01:18:25 or in the self supervised learning,

01:18:29 where unannotated data,

01:18:33 or data that’s generated by humans naturally

01:18:37 without extra costs, like Wikipedia,

01:18:41 or like all of the internet can be used

01:18:44 to learn something about,

01:18:46 to create intelligent systems that do something

01:18:49 really powerful, that pass the Turing test,

01:18:52 or that do some kind of superhuman level performance.

01:18:56 So what’s your intuition,

01:18:58 like trying to stitch all of it together

01:19:01 about our discussion of AGI,

01:19:05 the limits of self play,

01:19:07 and your thoughts about maybe the limits of neural networks

01:19:10 in the context of language models.

01:19:13 Is there some intuition in there

01:19:14 that might be useful to think about?

01:19:17 Yeah, yeah, yeah.

01:19:17 So first of all, the whole Transformer network

01:19:22 family of things is really cool.

01:19:26 It’s really, really cool.

01:19:28 I mean, if you’ve ever,

01:19:30 back in the day you played with,

01:19:31 I don’t know, Markov models for generating texts,

01:19:34 and you’ve seen the kind of texts that they spit out,

01:19:35 and you compare it to what’s happening now,

01:19:37 it’s amazing, it’s so amazing.

01:19:41 Now, it doesn’t take very long interacting

01:19:43 with one of these systems before you find the holes, right?

01:19:47 It’s not smart in any kind of general way.

01:19:53 It’s really good at a bunch of things.

01:19:55 And it does seem to understand

01:19:56 a lot of the statistics of language extremely well.

01:19:59 And that turns out to be very powerful.

01:20:01 You can answer many questions with that.

01:20:04 But it doesn’t make it a good conversationalist, right?

01:20:06 And it doesn’t make it a good storyteller.

01:20:08 It just makes it good at imitating

01:20:10 of things that is seen in the past.

01:20:12 The exact same thing could be said

01:20:14 by people who are voting for Donald Trump

01:20:16 about Joe Biden supporters,

01:20:18 and people voting for Joe Biden

01:20:19 about Donald Trump supporters is, you know.

01:20:22 That they’re not intelligent, they’re just following the.

01:20:25 Yeah, they’re following things they’ve seen in the past.

01:20:27 And it doesn’t take long to find the flaws

01:20:31 in their natural language generation abilities.

01:20:36 Yes, yes.

01:20:37 So we’re being very.

01:20:38 That’s interesting.

01:20:39 Critical of AI systems.

01:20:41 Right, so I’ve had a similar thought,

01:20:43 which was that the stories that GPT3 spits out

01:20:48 are amazing and very humanlike.

01:20:52 And it doesn’t mean that computers are smarter

01:20:55 than we realize necessarily.

01:20:57 It partly means that people are dumber than we realize.

01:21:00 Or that much of what we do day to day is not that deep.

01:21:04 Like we’re just kind of going with the flow.

01:21:07 We’re saying whatever feels like the natural thing

01:21:09 to say next.

01:21:10 Not a lot of it is creative or meaningful or intentional.

01:21:17 But enough is that we actually get by, right?

01:21:20 We do come up with new ideas sometimes,

01:21:22 and we do manage to talk each other into things sometimes.

01:21:24 And we do sometimes vote for reasonable people sometimes.

01:21:29 But it’s really hard to see in the statistics

01:21:32 because so much of what we’re saying is kind of rote.

01:21:35 And so our metrics that we use to measure

01:21:38 how these systems are doing don’t reveal that

01:21:41 because it’s in the interstices that is very hard to detect.

01:21:47 But is your, do you have an intuition

01:21:49 that with these language models, if they grow in size,

01:21:53 it’s already surprising when you go from GPT2 to GPT3

01:21:57 that there is a noticeable improvement.

01:21:59 So the question now goes back to the ominous David Silver

01:22:02 and the ceiling.

01:22:03 Right, so maybe there’s just no ceiling.

01:22:04 We just need more compute.

01:22:06 Now, I mean, okay, so now I’m speculating.

01:22:10 Yes.

01:22:11 As opposed to before when I was completely on firm ground.

01:22:13 All right, I don’t believe that you can get something

01:22:17 that really can do language and use language as a thing

01:22:21 that doesn’t interact with people.

01:22:24 Like I think that it’s not enough

01:22:25 to just take everything that we’ve said written down

01:22:28 and just say, that’s enough.

01:22:29 You can just learn from that and you can be intelligent.

01:22:32 I think you really need to be pushed back at.

01:22:35 I think that conversations,

01:22:36 even people who are pretty smart,

01:22:38 maybe the smartest thing that we know,

01:22:40 maybe not the smartest thing we can imagine,

01:22:43 but we get so much benefit

01:22:44 out of talking to each other and interacting.

01:22:48 That’s presumably why you have conversations live with guests

01:22:51 is that there’s something in that interaction

01:22:53 that would not be exposed by,

01:22:55 oh, I’ll just write you a story

01:22:57 and then you can read it later.

01:22:58 And I think because these systems

01:23:00 are just learning from our stories,

01:23:01 they’re not learning from being pushed back at by us,

01:23:05 that they’re fundamentally limited

01:23:06 into what they can actually become on this route.

01:23:08 They have to get shut down.

01:23:12 Like we have to have an argument,

01:23:14 they have to have an argument with us

01:23:15 and lose a couple of times

01:23:17 before they start to realize, oh, okay, wait,

01:23:20 there’s some nuance here that actually matters.

01:23:23 Yeah, that’s actually subtle sounding,

01:23:25 but quite profound that the interaction with humans

01:23:30 is essential and the limitation within that

01:23:34 is profound as well because the timescale,

01:23:37 like the bandwidth at which you can really interact

01:23:40 with humans is very low.

01:23:43 So it’s costly.

01:23:44 So you can’t, one of the underlying things about self plays,

01:23:47 it has to do a very large number of interactions.

01:23:53 And so you can’t really deploy reinforcement learning systems

01:23:56 into the real world to interact.

01:23:58 Like you couldn’t deploy a language model

01:24:01 into the real world to interact with humans

01:24:04 because it was just not getting enough data

01:24:06 relative to the cost it takes to interact.

01:24:09 Like the time of humans is expensive,

01:24:12 which is really interesting.

01:24:13 That takes us back to reinforcement learning

01:24:16 and trying to figure out if there’s ways

01:24:18 to make algorithms that are more efficient at learning,

01:24:22 keep the spirit in reinforcement learning

01:24:24 and become more efficient.

01:24:26 In some sense, that seems to be the goal.

01:24:28 I’d love to hear what your thoughts are.

01:24:31 I don’t know if you got a chance to see

01:24:33 the blog post called Bitter Lesson.

01:24:35 Oh yes.

01:24:37 By Rich Sutton that makes an argument,

01:24:39 hopefully I can summarize it.

01:24:41 Perhaps you can.

01:24:43 Yeah, but do you want?

01:24:44 Okay.

01:24:45 So I mean, I could try and you can correct me,

01:24:47 which is he makes an argument that it seems

01:24:50 if we look at the long arc of the history

01:24:52 of the artificial intelligence field,

01:24:55 he calls 70 years that the algorithms

01:24:58 from which we’ve seen the biggest improvements in practice

01:25:02 are the very simple, like dumb algorithms

01:25:05 that are able to leverage computation.

01:25:08 And you just wait for the computation to improve.

01:25:11 Like all of the academics and so on have fun

01:25:13 by finding little tricks

01:25:15 and congratulate themselves on those tricks.

01:25:17 And sometimes those tricks can be like big,

01:25:20 that feel in the moment like big spikes and breakthroughs,

01:25:22 but in reality over the decades,

01:25:25 it’s still the same dumb algorithm

01:25:27 that just waits for the compute to get faster and faster.

01:25:31 Do you find that to be an interesting argument

01:25:36 against the entirety of the field of machine learning

01:25:39 as an academic discipline?

01:25:41 That we’re really just a subfield of computer architecture.

01:25:44 We’re just kind of waiting around

01:25:45 for them to do their next thing.

01:25:46 Who really don’t want to do hardware work.

01:25:48 So like.

01:25:48 That’s right.

01:25:49 I really don’t want to think about it.

01:25:50 We’re procrastinating.

01:25:51 Yes, that’s right, just waiting for them to do their jobs

01:25:53 so that we can pretend to have done ours.

01:25:55 So yeah, I mean, the argument reminds me a lot of,

01:26:00 I think it was a Fred Jelinek quote,

01:26:02 early computational linguist who said,

01:26:04 we’re building these computational linguistic systems

01:26:07 and every time we fire a linguist performance goes up

01:26:11 by 10%, something like that.

01:26:13 And so the idea of us building the knowledge in,

01:26:16 in that case was much less,

01:26:19 he was finding it to be much less successful

01:26:20 than get rid of the people who know about language as a,

01:26:25 from a kind of scholastic academic kind of perspective

01:26:29 and replace them with more compute.

01:26:32 And so I think this is kind of a modern version

01:26:34 of that story, which is, okay,

01:26:35 we want to do better on machine vision.

01:26:38 You could build in all these,

01:26:41 motivated part based models that,

01:26:45 that just feel like obviously the right thing

01:26:47 that you have to have,

01:26:48 or we can throw a lot of data at it

01:26:49 and guess what we’re doing better with a lot of data.

01:26:52 So I hadn’t thought about it until this moment in this way,

01:26:57 but what I believe, well, I’ve thought about what I believe.

01:27:00 What I believe is that, you know, compositionality

01:27:05 and what’s the right way to say it,

01:27:08 the complexity grows rapidly

01:27:12 as you consider more and more possibilities,

01:27:14 like explosively.

01:27:16 And so far Moore’s law has also been growing explosively

01:27:20 exponentially.

01:27:21 And so it really does seem like, well,

01:27:23 we don’t have to think really hard about the algorithm

01:27:27 design or the way that we build the systems,

01:27:29 because the best benefit we could get is exponential.

01:27:32 And the best benefit that we can get from waiting

01:27:34 is exponential.

01:27:35 So we can just wait.

01:27:38 It’s got, that’s gotta end, right?

01:27:39 And there’s hints now that,

01:27:41 that Moore’s law is starting to feel some friction,

01:27:44 starting to, the world is pushing back a little bit.

01:27:48 One thing that I don’t know, do lots of people know this?

01:27:50 I didn’t know this, I was trying to write an essay

01:27:54 and yeah, Moore’s law has been amazing

01:27:56 and it’s enabled all sorts of things,

01:27:58 but there’s also a kind of counter Moore’s law,

01:28:01 which is that the development cost

01:28:03 for each successive generation of chips also is doubling.

01:28:07 So it’s costing twice as much money.

01:28:09 So the amount of development money per cycle or whatever

01:28:12 is actually sort of constant.

01:28:14 And at some point we run out of money.

01:28:17 So, or we have to come up with an entirely different way

01:28:19 of doing the development process.

01:28:22 So like, I guess I always a bit skeptical of the look,

01:28:25 it’s an exponential curve, therefore it has no end.

01:28:28 Soon the number of people going to NeurIPS

01:28:30 will be greater than the population of the earth.

01:28:32 That means we’re gonna discover life on other planets.

01:28:35 No, it doesn’t.

01:28:36 It means that we’re in a sigmoid curve on the front half,

01:28:40 which looks a lot like an exponential.

01:28:42 The second half is gonna look a lot like diminishing returns.

01:28:46 Yeah, I mean, but the interesting thing about Moore’s law,

01:28:48 if you actually like look at the technologies involved,

01:28:52 it’s hundreds, if not thousands of S curves

01:28:55 stacked on top of each other.

01:28:56 It’s not actually an exponential curve,

01:28:58 it’s constant breakthroughs.

01:29:01 And then what becomes useful to think about,

01:29:04 which is exactly what you’re saying,

01:29:05 the cost of development, like the size of teams,

01:29:08 the amount of resources that are invested

01:29:10 in continuing to find new S curves, new breakthroughs.

01:29:14 And yeah, it’s an interesting idea.

01:29:19 If we live in the moment, if we sit here today,

01:29:22 it seems to be the reasonable thing

01:29:25 to say that exponentials end.

01:29:29 And yet in the software realm,

01:29:31 they just keep appearing to be happening.

01:29:34 And it’s so, I mean, it’s so hard to disagree

01:29:39 with Elon Musk on this.

01:29:41 Because it like, I’ve, you know,

01:29:45 I used to be one of those folks,

01:29:47 I’m still one of those folks that studied

01:29:49 autonomous vehicles, that’s what I worked on.

01:29:52 And it’s like, you look at what Elon Musk is saying

01:29:56 about autonomous vehicles, well, obviously,

01:29:58 in a couple of years, or in a year, or next month,

01:30:01 we’ll have fully autonomous vehicles.

01:30:03 Like there’s no reason why we can’t.

01:30:04 Driving is pretty simple, like it’s just a learning problem

01:30:07 and you just need to convert all the driving

01:30:11 that we’re doing into data and just having you all know

01:30:13 with the trains on that data.

01:30:14 And like, we use only our eyes, so you can use cameras

01:30:18 and you can train on it.

01:30:20 And it’s like, yeah, that should work.

01:30:26 And then you put that hat on, like the philosophical hat,

01:30:29 and but then you put the pragmatic hat and it’s like,

01:30:31 this is what the flaws of computer vision are.

01:30:33 Like, this is what it means to train at scale.

01:30:35 And then you put the human factors, the psychology hat on,

01:30:40 which is like, it’s actually driving us a lot,

01:30:43 the cognitive science or cognitive,

01:30:44 whatever the heck you call it, it’s really hard,

01:30:48 it’s much harder to drive than we realize,

01:30:50 there’s a much larger number of edge cases.

01:30:53 So building up an intuition around this is,

01:30:57 around exponentials is really difficult.

01:30:59 And on top of that, the pandemic is making us think

01:31:03 about exponentials, making us realize that like,

01:31:06 we don’t understand anything about it,

01:31:08 we’re not able to intuit exponentials,

01:31:11 we’re either ultra terrified, some part of the population

01:31:15 and some part is like the opposite of whatever

01:31:20 the different carefree and we’re not managing it very well.

01:31:24 Blase, well, wow, is that French?

01:31:28 I assume so, it’s got an accent.

01:31:29 So it’s fascinating to think what the limits

01:31:35 of this exponential growth of technology,

01:31:41 not just Moore’s law, it’s technology,

01:31:44 how that rubs up against the bitter lesson

01:31:49 and GPT three and self play mechanisms.

01:31:53 Like it’s not obvious, I used to be much more skeptical

01:31:56 about neural networks.

01:31:58 Now I at least give a slither of possibility

01:32:00 that we’ll be very much surprised

01:32:04 and also caught in a way that like,

01:32:10 we are not prepared for.

01:32:14 Like in applications of social networks, for example,

01:32:19 cause it feels like really good transformer models

01:32:23 that are able to do some kind of like very good

01:32:28 natural language generation of the same kind of models

01:32:31 that can be used to learn human behavior

01:32:33 and then manipulate that human behavior

01:32:35 to gain advertisers dollars and all those kinds of things

01:32:38 through the capitalist system.

01:32:41 And they arguably already are manipulating human behavior.

01:32:46 But not for self preservation, which I think is a big,

01:32:51 that would be a big step.

01:32:52 Like if they were trying to manipulate us

01:32:54 to convince us not to shut them off,

01:32:57 I would be very freaked out.

01:32:58 But I don’t see a path to that from where we are now.

01:33:01 They don’t have any of those abilities.

01:33:05 That’s not what they’re trying to do.

01:33:07 They’re trying to keep people on the site.

01:33:10 But see the thing is, this is the thing about life on earth

01:33:13 is they might be borrowing our consciousness

01:33:16 and sentience like, so like in a sense they do

01:33:20 because the creators of the algorithms have,

01:33:23 like they’re not, if you look at our body,

01:33:26 we’re not a single organism.

01:33:28 We’re a huge number of organisms

01:33:30 with like tiny little motivations

01:33:31 were built on top of each other.

01:33:33 In the same sense, the AI algorithms that are,

01:33:36 they’re not like.

01:33:37 It’s a system that includes companies and corporations,

01:33:40 because corporations are funny organisms

01:33:42 in and of themselves that really do seem

01:33:44 to have self preservation built in.

01:33:45 And I think that’s at the design level.

01:33:48 I think they’re designed to have self preservation

01:33:50 to be a focus.

01:33:52 So you’re right.

01:33:53 In that broader system that we’re also a part of

01:33:58 and can have some influence on,

01:34:02 it is much more complicated, much more powerful.

01:34:04 Yeah, I agree with that.

01:34:06 So people really love it when I ask,

01:34:09 what three books, technical, philosophical, fiction

01:34:13 had a big impact on your life?

01:34:14 Maybe you can recommend.

01:34:16 We went with movies, we went with Billy Joe

01:34:21 and I forgot what music you recommended, but.

01:34:24 I didn’t, I just said I have no taste in music.

01:34:26 I just like pop music.

01:34:27 That was actually really skillful

01:34:30 the way you avoided that question.

01:34:30 Thank you, thanks.

01:34:31 I’m gonna try to do the same with the books.

01:34:33 So do you have a skillful way to avoid answering

01:34:37 the question about three books you would recommend?

01:34:39 I’d like to tell you a story.

01:34:42 So my first job out of college was at Bellcore.

01:34:45 I mentioned that before, where I worked with Dave Ackley.

01:34:48 The head of the group was a guy named Tom Landauer.

01:34:50 And I don’t know how well known he’s known now,

01:34:53 but arguably he’s the inventor

01:34:56 and the first proselytizer of word embeddings.

01:34:59 So they developed a system shortly before I got to the group

01:35:04 that was called latent semantic analysis

01:35:07 that would take words of English

01:35:09 and embed them in multi hundred dimensional space

01:35:12 and then use that as a way of assessing

01:35:15 similarity and basically doing reinforcement learning,

01:35:17 I’m sorry, not reinforcement, information retrieval,

01:35:20 sort of pre Google information retrieval.

01:35:23 And he was trained as an anthropologist,

01:35:28 but then became a cognitive scientist.

01:35:29 So I was in the cognitive science research group.

01:35:32 Like I said, I’m a cognitive science groupie.

01:35:34 At the time I thought I’d become a cognitive scientist,

01:35:37 but then I realized in that group,

01:35:38 no, I’m a computer scientist,

01:35:40 but I’m a computer scientist who really loves

01:35:41 to hang out with cognitive scientists.

01:35:43 And he said, he studied language acquisition in particular.

01:35:48 He said, you know, humans have about this number of words

01:35:51 of vocabulary and most of that is learned from reading.

01:35:55 And I said, that can’t be true

01:35:57 because I have a really big vocabulary and I don’t read.

01:36:00 He’s like, you must.

01:36:01 I’m like, I don’t think I do.

01:36:03 I mean like stop signs, I definitely read stop signs,

01:36:05 but like reading books is not a thing that I do a lot of.

01:36:08 Do you really though?

01:36:09 It might be just visual, maybe the red color.

01:36:12 Do I read stop signs?

01:36:14 No, it’s just pattern recognition at this point.

01:36:15 I don’t sound it out.

01:36:19 So now I do.

01:36:21 I wonder what that, oh yeah, stop the guns.

01:36:25 So.

01:36:26 That’s fascinating.

01:36:27 So you don’t.

01:36:28 So I don’t read very, I mean, obviously I read

01:36:29 and I’ve read plenty of books,

01:36:31 but like some people like Charles,

01:36:34 my friend Charles and others,

01:36:35 like a lot of people in my field, a lot of academics,

01:36:38 like reading was really a central topic to them

01:36:42 in development and I’m not that guy.

01:36:45 In fact, I used to joke that when I got into college,

01:36:49 that it was on kind of a help out the illiterate

01:36:53 kind of program because I got to,

01:36:55 like in my house, I wasn’t a particularly bad

01:36:57 or good reader, but when I got to college,

01:36:58 I was surrounded by these people that were just voracious

01:37:01 in their reading appetite.

01:37:03 And they would like, have you read this?

01:37:04 Have you read this?

01:37:05 Have you read this?

01:37:06 And I’m like, no, I’m clearly not qualified

01:37:09 to be at this school.

01:37:10 Like there’s no way I should be here.

01:37:11 Now I’ve discovered books on tape, like audio books.

01:37:14 And so I’m much better.

01:37:17 I’m more caught up.

01:37:18 I read a lot of books.

01:37:20 The small tangent on that,

01:37:22 it is a fascinating open question to me

01:37:24 on the topic of driving.

01:37:27 Whether, you know, supervised learning people,

01:37:30 machine learning people think you have to like drive

01:37:33 to learn how to drive.

01:37:35 To me, it’s very possible that just by us humans,

01:37:40 by first of all, walking,

01:37:41 but also by watching other people drive,

01:37:44 not even being inside cars as a passenger,

01:37:46 but let’s say being inside the car as a passenger,

01:37:49 but even just like being a pedestrian and crossing the road,

01:37:53 you learn so much about driving from that.

01:37:56 It’s very possible that you can,

01:37:58 without ever being inside of a car,

01:38:01 be okay at driving once you get in it.

01:38:04 Or like watching a movie, for example.

01:38:06 I don’t know, something like that.

01:38:08 Have you taught anyone to drive?

01:38:11 No, except myself.

01:38:13 I have two children.

01:38:15 And I learned a lot about car driving

01:38:18 because my wife doesn’t want to be the one in the car

01:38:21 while they’re learning.

01:38:21 So that’s my job.

01:38:22 So I sit in the passenger seat and it’s really scary.

01:38:27 You know, I have wishes to live

01:38:30 and they’re figuring things out.

01:38:32 Now, they start off very much better

01:38:37 than I imagine like a neural network would, right?

01:38:39 They get that they’re seeing the world.

01:38:41 They get that there’s a road that they’re trying to be on.

01:38:44 They get that there’s a relationship

01:38:45 between the angle of the steering,

01:38:47 but it takes a while to not be very jerky.

01:38:51 And so that happens pretty quickly.

01:38:52 Like the ability to stay in lane at speed,

01:38:55 that happens relatively fast.

01:38:56 It’s not zero shot learning, but it’s pretty fast.

01:39:00 The thing that’s remarkably hard,

01:39:01 and this is I think partly why self driving cars

01:39:03 are really hard,

01:39:04 is the degree to which driving

01:39:06 is a social interaction activity.

01:39:09 And that blew me away.

01:39:10 I was completely unaware of it

01:39:11 until I watched my son learning to drive.

01:39:14 And I was realizing that he was sending signals

01:39:17 to all the cars around him.

01:39:19 And those in his case,

01:39:20 he’s always had social communication challenges.

01:39:25 He was sending very mixed confusing signals

01:39:28 to the other cars.

01:39:29 And that was causing the other cars

01:39:30 to drive weirdly and erratically.

01:39:32 And there was no question in my mind

01:39:34 that he would have an accident

01:39:36 because they didn’t know how to read him.

01:39:39 There’s things you do with the speed that you drive,

01:39:42 the positioning of your car,

01:39:43 that you’re constantly like in the head

01:39:46 of the other drivers.

01:39:47 And seeing him not knowing how to do that

01:39:50 and having to be taught explicitly,

01:39:52 okay, you have to be thinking

01:39:53 about what the other driver is thinking,

01:39:55 was a revelation to me.

01:39:57 I was stunned.

01:39:58 So creating kind of theories of mind of the other.

01:40:02 Theories of mind of the other cars.

01:40:04 Yeah, yeah.

01:40:05 Which I just hadn’t heard discussed

01:40:07 in the self driving car talks that I’ve been to.

01:40:09 Since then, there’s some people who do consider

01:40:13 those kinds of issues,

01:40:14 but it’s way more subtle than I think

01:40:16 there’s a little bit of work involved with that

01:40:19 when you realize like when you especially focus

01:40:21 not on other cars, but on pedestrians, for example,

01:40:24 it’s literally staring you in the face.

01:40:27 So then when you’re just like,

01:40:28 how do I interact with pedestrians?

01:40:32 Pedestrians, you’re practically talking

01:40:33 to an octopus at that point.

01:40:34 They’ve got all these weird degrees of freedom.

01:40:36 You don’t know what they’re gonna do.

01:40:37 They can turn around any second.

01:40:38 But the point is, we humans know what they’re gonna do.

01:40:42 Like we have a good theory of mind.

01:40:43 We have a good mental model of what they’re doing.

01:40:46 And we have a good model of the model they have a view

01:40:50 and the model of the model of the model.

01:40:52 Like we’re able to kind of reason about this kind of,

01:40:55 the social like game of it all.

01:40:59 The hope is that it’s quite simple actually,

01:41:03 that it could be learned.

01:41:04 That’s why I just talked to the Waymo.

01:41:06 I don’t know if you know that company.

01:41:07 It’s Google South Africa.

01:41:09 They, I talked to their CTO about this podcast

01:41:12 and they like, I rode in their car

01:41:15 and it’s quite aggressive and it’s quite fast

01:41:17 and it’s good and it feels great.

01:41:20 It also, just like Tesla,

01:41:21 Waymo made me change my mind about like,

01:41:24 maybe driving is easier than I thought.

01:41:27 Maybe I’m just being speciest, human centric, maybe.

01:41:33 It’s a speciest argument.

01:41:35 Yeah, so I don’t know.

01:41:36 But it’s fascinating to think about like the same

01:41:41 as with reading, which I think you just said.

01:41:43 You avoided the question,

01:41:45 though I still hope you answered it somewhat.

01:41:47 You avoided it brilliantly.

01:41:48 It is, there’s blind spots as artificial intelligence,

01:41:52 that artificial intelligence researchers have

01:41:55 about what it actually takes to learn to solve a problem.

01:41:58 That’s fascinating.

01:41:59 Have you had Anca Dragan on?

01:42:00 Yeah.

01:42:01 Okay.

01:42:02 She’s one of my favorites.

01:42:03 So much energy.

01:42:04 She’s right.

01:42:05 Oh, yeah.

01:42:05 She’s amazing.

01:42:06 Fantastic.

01:42:07 And in particular, she thinks a lot about this kind of,

01:42:10 I know that you know that I know kind of planning.

01:42:12 And the last time I spoke with her,

01:42:14 she was very articulate about the ways

01:42:17 in which self driving cars are not solved.

01:42:20 Like what’s still really, really hard.

01:42:22 But even her intuition is limited.

01:42:23 Like we’re all like new to this.

01:42:26 So in some sense, the Elon Musk approach

01:42:27 of being ultra confident and just like plowing.

01:42:30 Put it out there.

01:42:31 Putting it out there.

01:42:32 Like some people say it’s reckless and dangerous and so on.

01:42:35 But like, partly it’s like, it seems to be one

01:42:39 of the only ways to make progress

01:42:40 in artificial intelligence.

01:42:41 So it’s, you know, these are difficult things.

01:42:45 You know, democracy is messy.

01:42:49 Implementation of artificial intelligence systems

01:42:51 in the real world is messy.

01:42:53 So many years ago, before self driving cars

01:42:56 were an actual thing you could have a discussion about,

01:42:58 somebody asked me, like, what if we could use

01:43:01 that robotic technology and use it to drive cars around?

01:43:04 Like, isn’t that, aren’t people gonna be killed?

01:43:06 And then it’s not, you know, blah, blah, blah.

01:43:08 I’m like, that’s not what’s gonna happen.

01:43:09 I said with confidence, incorrectly, obviously.

01:43:13 What I think is gonna happen is we’re gonna have a lot more,

01:43:15 like a very gradual kind of rollout

01:43:17 where people have these cars in like closed communities,

01:43:22 right, where it’s somewhat realistic,

01:43:24 but it’s still in a box, right?

01:43:26 So that we can really get a sense of what,

01:43:28 what are the weird things that can happen?

01:43:30 How do we, how do we have to change the way we behave

01:43:34 around these vehicles?

01:43:35 Like, it’s obviously requires a kind of co evolution

01:43:39 that you can’t just plop them in and see what happens.

01:43:42 But of course, we’re basically popping them in

01:43:44 and see what happens.

01:43:45 So I was wrong, but I do think that would have been

01:43:46 a better plan.

01:43:47 So that’s, but your intuition, that’s funny,

01:43:50 just zooming out and looking at the forces of capitalism.

01:43:54 And it seems that capitalism rewards risk takers

01:43:57 and rewards and punishes risk takers, like,

01:44:00 and like, try it out.

01:44:03 The academic approach to let’s try a small thing

01:44:11 and try to understand slowly the fundamentals

01:44:13 of the problem.

01:44:14 And let’s start with one, then do two, and then see that.

01:44:18 And then do the three, you know, the capitalist

01:44:21 like startup entrepreneurial dream is let’s build a thousand

01:44:26 and let’s.

01:44:27 Right, and 500 of them fail, but whatever,

01:44:28 the other 500, we learned from them.

01:44:30 But if you’re good enough, I mean, one thing is like,

01:44:33 your intuition would say like, that’s gonna be

01:44:35 hugely destructive to everything.

01:44:37 But actually, it’s kind of the forces of capitalism,

01:44:42 like people are quite, it’s easy to be critical,

01:44:44 but if you actually look at the data at the way

01:44:47 our world has progressed in terms of the quality of life,

01:44:50 it seems like the competent good people rise to the top.

01:44:54 This is coming from me from the Soviet Union and so on.

01:44:58 It’s like, it’s interesting that somebody like Elon Musk

01:45:03 is the way you push progress in artificial intelligence.

01:45:08 Like it’s forcing Waymo to step their stuff up

01:45:11 and Waymo is forcing Elon Musk to step up.

01:45:17 It’s fascinating, because I have this tension in my heart

01:45:21 and just being upset by the lack of progress

01:45:26 in autonomous vehicles within academia.

01:45:29 So there’s a huge progress in the early days

01:45:33 of the DARPA challenges.

01:45:35 And then it just kind of stopped like at MIT,

01:45:39 but it’s true everywhere else with an exception

01:45:43 of a few sponsors here and there is like,

01:45:46 it’s not seen as a sexy problem, Thomas.

01:45:50 Like the moment artificial intelligence starts approaching

01:45:53 the problems of the real world,

01:45:56 like academics kind of like, all right, let the…

01:46:00 They get really hard in a different way.

01:46:01 In a different way, that’s right.

01:46:03 I think, yeah, right, some of us are not excited

01:46:05 about that other way.

01:46:07 But I still think there’s fundamentals problems

01:46:09 to be solved in those difficult things.

01:46:12 It’s not, it’s still publishable, I think.

01:46:14 Like we just need to, it’s the same criticism

01:46:17 you could have of all these conferences in Europe, CVPR,

01:46:20 where application papers are often as powerful

01:46:24 and as important as like a theory paper.

01:46:27 Even like theory just seems much more respectable and so on.

01:46:31 I mean, machine learning community is changing

01:46:32 that a little bit.

01:46:33 I mean, at least in statements,

01:46:35 but it’s still not seen as the sexiest of pursuits,

01:46:40 which is like, how do I actually make this thing

01:46:42 work in practice as opposed to on this toy data set?

01:46:47 All that to say, are you still avoiding

01:46:49 the three books question?

01:46:50 Is there something on audio book that you can recommend?

01:46:54 Oh, yeah, I mean, yeah, I’ve read a lot of really fun stuff.

01:46:58 In terms of books that I find myself thinking back on

01:47:02 that I read a while ago,

01:47:03 like that stood the test of time to some degree.

01:47:06 I find myself thinking of program or be programmed a lot

01:47:09 by Douglas Roschkopf, which was,

01:47:13 it basically put out the premise

01:47:15 that we all need to become programmers

01:47:19 in one form or another.

01:47:21 And it was an analogy to once upon a time

01:47:24 we all had to become readers.

01:47:26 We had to become literate.

01:47:27 And there was a time before that

01:47:28 when not everybody was literate,

01:47:30 but once literacy was possible,

01:47:31 the people who were literate had more of a say in society

01:47:36 than the people who weren’t.

01:47:37 And so we made a big effort to get everybody up to speed.

01:47:39 And now it’s not 100% universal, but it’s quite widespread.

01:47:44 Like the assumption is generally that people can read.

01:47:48 The analogy that he makes is that programming

01:47:50 is a similar kind of thing,

01:47:51 that we need to have a say in, right?

01:47:57 So being a reader, being literate, being a reader means

01:47:59 you can receive all this information,

01:48:01 but you don’t get to put it out there.

01:48:04 And programming is the way that we get to put it out there.

01:48:06 And that was the argument that he made.

01:48:07 I think he specifically has now backed away from this idea.

01:48:11 He doesn’t think it’s happening quite this way.

01:48:14 And that might be true that it didn’t,

01:48:17 society didn’t sort of play forward quite that way.

01:48:20 I still believe in the premise.

01:48:22 I still believe that at some point,

01:48:24 the relationship that we have to these machines

01:48:26 and these networks has to be one of each individual

01:48:29 can, has the wherewithal to make the machines help them.

01:48:34 Do the things that that person wants done.

01:48:37 And as software people, we know how to do that.

01:48:40 And when we have a problem, we’re like, okay,

01:48:41 I’ll just, I’ll hack up a Pearl script or something

01:48:43 and make it so.

01:48:44 If we lived in a world where everybody could do that,

01:48:47 that would be a better world.

01:48:49 And computers would be, have, I think less sway over us.

01:48:53 And other people’s software would have less sway over us

01:48:56 as a group.

01:48:57 In some sense, software engineering, programming is power.

01:49:00 Programming is power, right?

01:49:03 Yeah, it’s like magic.

01:49:04 It’s like magic spells.

01:49:05 And it’s not out of reach of everyone.

01:49:09 But at the moment, it’s just a sliver of the population

01:49:11 who can commune with machines in this way.

01:49:15 So I don’t know, so that book had a big impact on me.

01:49:18 Currently, I’m reading The Alignment Problem,

01:49:20 actually by Brian Christian.

01:49:22 So I don’t know if you’ve seen this out there yet.

01:49:23 Is this similar to Stuart Russell’s work

01:49:25 with the control problem?

01:49:27 It’s in that same general neighborhood.

01:49:28 I mean, they have different emphases

01:49:31 that they’re concentrating on.

01:49:32 I think Stuart’s book did a remarkably good job,

01:49:36 like just a celebratory good job

01:49:38 at describing AI technology and sort of how it works.

01:49:43 I thought that was great.

01:49:44 It was really cool to see that in a book.

01:49:46 I think he has some experience writing some books.

01:49:49 You know, that’s probably a possible thing.

01:49:52 He’s maybe thought a thing or two

01:49:53 about how to explain AI to people.

01:49:56 Yeah, that’s a really good point.

01:49:57 This book so far has been remarkably good

01:50:00 at telling the story of sort of the history,

01:50:04 the recent history of some of the things

01:50:07 that have happened.

01:50:08 I’m in the first third.

01:50:09 He said this book is in three thirds.

01:50:10 The first third is essentially AI fairness

01:50:14 and implications of AI on society

01:50:16 that we’re seeing right now.

01:50:18 And that’s been great.

01:50:19 I mean, he’s telling the stories really well.

01:50:21 He went out and talked to the frontline people

01:50:23 whose names were associated with some of these ideas

01:50:26 and it’s been terrific.

01:50:28 He says the second half of the book

01:50:29 is on reinforcement learning.

01:50:30 So maybe that’ll be fun.

01:50:33 And then the third half, third third,

01:50:36 is on the super intelligence alignment problem.

01:50:39 And I suspect that that part will be less fun

01:50:43 for me to read.

01:50:44 Yeah.

01:50:46 Yeah, it’s an interesting problem to talk about.

01:50:48 I find it to be the most interesting,

01:50:50 just like thinking about whether we live

01:50:52 in a simulation or not,

01:50:54 as a thought experiment to think about our own existence.

01:50:58 So in the same way,

01:50:59 talking about alignment problem with AGI

01:51:02 is a good way to think similar

01:51:04 to like the trolley problem with autonomous vehicles.

01:51:06 It’s a useless thing for engineering,

01:51:08 but it’s a nice little thought experiment

01:51:10 for actually thinking about what are like

01:51:13 our own human ethical systems, our moral systems

01:51:17 to by thinking how we engineer these things,

01:51:23 you start to understand yourself.

01:51:25 So sci fi can be good at that too.

01:51:27 So one sci fi book to recommend

01:51:29 is Exhalations by Ted Chiang,

01:51:31 bunch of short stories.

01:51:33 This Ted Chiang is the guy who wrote the short story

01:51:35 that became the movie Arrival.

01:51:38 And all of his stories just from a,

01:51:41 he was a computer scientist,

01:51:43 actually he studied at Brown.

01:51:44 And they all have this sort of really insightful bit

01:51:49 of science or computer science that drives them.

01:51:52 And so it’s just a romp, right?

01:51:54 To just like, he creates these artificial worlds

01:51:57 with these by extrapolating on these ideas

01:51:59 that we know about,

01:52:01 but hadn’t really thought through

01:52:02 to this kind of conclusion.

01:52:04 And so his stuff is, it’s really fun to read,

01:52:06 it’s mind warping.

01:52:08 So I’m not sure if you’re familiar,

01:52:10 I seem to mention this every other word

01:52:13 is I’m from the Soviet Union and I’m Russian.

01:52:17 Way too much to see us.

01:52:18 My roots are Russian too,

01:52:20 but a couple generations back.

01:52:22 Well, it’s probably in there somewhere.

01:52:24 So maybe we can pull at that thread a little bit

01:52:28 of the existential dread that we all feel.

01:52:31 You mentioned that you,

01:52:32 I think somewhere in the conversation you mentioned

01:52:34 that you don’t really pretty much like dying.

01:52:38 I forget in which context,

01:52:39 it might’ve been a reinforcement learning perspective.

01:52:41 I don’t know.

01:52:42 No, you know what it was?

01:52:43 It was in teaching my kids to drive.

01:52:47 That’s how you face your mortality, yes.

01:52:49 From a human beings perspective

01:52:52 or from a reinforcement learning researchers perspective,

01:52:55 let me ask you the most absurd question.

01:52:57 What do you think is the meaning of this whole thing?

01:53:01 The meaning of life on this spinning rock.

01:53:06 I mean, I think reinforcement learning researchers

01:53:08 maybe think about this from a science perspective

01:53:11 more often than a lot of other people, right?

01:53:13 As a supervised learning person,

01:53:14 you’re probably not thinking about the sweep of a lifetime,

01:53:18 but reinforcement learning agents

01:53:20 are having little lifetimes, little weird little lifetimes.

01:53:22 And it’s hard not to project yourself

01:53:25 into their world sometimes.

01:53:27 But as far as the meaning of life,

01:53:30 so when I turned 42, you may know from,

01:53:34 that is a book I read,

01:53:35 The Hitchhiker’s Guide to the Galaxy,

01:53:38 that that is the meaning of life.

01:53:40 So when I turned 42, I had a meaning of life party

01:53:43 where I invited people over

01:53:45 and everyone shared their meaning of life.

01:53:48 We had slides made up.

01:53:50 And so we all sat down and did a slide presentation

01:53:54 to each other about the meaning of life.

01:53:56 And mine was balance.

01:54:00 I think that life is balance.

01:54:02 And so the activity at the party,

01:54:06 for a 42 year old, maybe this is a little bit nonstandard,

01:54:09 but I found all the little toys and devices that I had

01:54:12 where you had to balance on them.

01:54:13 You had to like stand on it and balance,

01:54:15 or a pogo stick I brought,

01:54:17 a rip stick, which is like a weird two wheeled skateboard.

01:54:23 I got a unicycle, but I didn’t know how to do it.

01:54:26 I now can do it.

01:54:28 I would love watching you try.

01:54:29 Yeah, I’ll send you a video.

01:54:31 I’m not great, but I managed.

01:54:35 And so balance, yeah.

01:54:37 So my wife has a really good one that she sticks to

01:54:42 and is probably pretty accurate.

01:54:43 And it has to do with healthy relationships

01:54:47 with people that you love and working hard for good causes.

01:54:51 But to me, yeah, balance, balance in a word.

01:54:53 That works for me.

01:54:56 Not too much of anything,

01:54:57 because too much of anything is iffy.

01:55:00 That feels like a Rolling Stones song.

01:55:02 I feel like they must be.

01:55:03 You can’t always get what you want,

01:55:05 but if you try sometimes, you can strike a balance.

01:55:09 Yeah, I think that’s how it goes, Michael.

01:55:12 I’ll write you a parody.

01:55:14 It’s a huge honor to talk to you.

01:55:16 This is really fun.

01:55:17 Oh, no, the honor’s mine.

01:55:17 I’ve been a big fan of yours,

01:55:18 so can’t wait to see what you do next

01:55:24 in the world of education, in the world of parody,

01:55:27 in the world of reinforcement learning.

01:55:28 Thanks for talking to me.

01:55:29 My pleasure.

01:55:30 Thank you for listening to this conversation

01:55:32 with Michael Littman, and thank you to our sponsors,

01:55:35 SimpliSafe, a home security company I use

01:55:37 to monitor and protect my apartment, ExpressVPN,

01:55:41 the VPN I’ve used for many years

01:55:43 to protect my privacy on the internet,

01:55:45 Masterclass, online courses that I enjoy

01:55:48 from some of the most amazing humans in history,

01:55:51 and BetterHelp, online therapy with a licensed professional.

01:55:55 Please check out these sponsors in the description

01:55:58 to get a discount and to support this podcast.

01:56:00 If you enjoy this thing, subscribe on YouTube,

01:56:03 review it with five stars on Apple Podcast,

01:56:05 follow on Spotify, support it on Patreon,

01:56:08 or connect with me on Twitter at Lex Friedman.

01:56:12 And now, let me leave you with some words

01:56:14 from Groucho Marx.

01:56:16 If you’re not having fun, you’re doing something wrong.

01:56:20 Thank you for listening, and hope to see you next time.