Transcript
00:00:00 The following is a conversation with Michael Littman, a computer science professor at Brown
00:00:04 University doing research on and teaching machine learning, reinforcement learning,
00:00:10 and artificial intelligence. He enjoys being silly and lighthearted in conversation,
00:00:16 so this was definitely a fun one. Quick mention of each sponsor,
00:00:20 followed by some thoughts related to the episode. Thank you to SimplySafe, a home security company
00:00:26 I use to monitor and protect my apartment, ExpressVPN, the VPN I’ve used for many years
00:00:32 to protect my privacy on the internet, MasterClass, online courses that I enjoy from
00:00:38 some of the most amazing humans in history, and BetterHelp, online therapy with a licensed
00:00:43 professional. Please check out these sponsors in the description to get a discount and to support
00:00:49 this podcast. As a side note, let me say that I may experiment with doing some solo episodes
00:00:55 in the coming month or two. The three ideas I have floating in my head currently is to use one,
00:01:02 a particular moment in history, two, a particular movie, or three, a book to drive a conversation
00:01:10 about a set of related concepts. For example, I could use 2001, A Space Odyssey, or Ex Machina
00:01:17 to talk about AGI for one, two, three hours. Or I could do an episode on the, yes, rise and fall of
00:01:26 Hitler and Stalin, each in a separate episode, using relevant books and historical moments
00:01:32 for reference. I find the format of a solo episode very uncomfortable and challenging,
00:01:38 but that just tells me that it’s something I definitely need to do and learn from the experience.
00:01:44 Of course, I hope you come along for the ride. Also, since we have all this momentum built up
00:01:49 on announcements, I’m giving a few lectures on machine learning at MIT this January.
00:01:54 In general, if you have ideas for the episodes, for the lectures, or for just short videos on
00:02:01 YouTube, let me know in the comments that I still definitely read, despite my better judgment,
00:02:10 and the wise sage advice of the great Joe Rogan. If you enjoy this thing, subscribe on YouTube,
00:02:17 review it with Five Stars and Apple Podcast, follow on Spotify, support on Patreon, or connect
00:02:22 with me on Twitter at Lex Friedman. And now, here’s my conversation with Michael Littman.
00:02:29 I saw a video of you talking to Charles Isbell about Westworld, the TV series. You guys were
00:02:35 doing the kind of thing where you’re watching new things together, but let’s rewind back.
00:02:41 Is there a sci fi movie or book or shows that was profound, that had an impact on you philosophically,
00:02:50 or just specifically something you enjoyed nerding out about?
00:02:55 Yeah, interesting. I think a lot of us have been inspired by robots in movies. One that I really
00:03:00 like is, there’s a movie called Robot and Frank, which I think is really interesting because it’s
00:03:05 very near term future, where robots are being deployed as helpers in people’s homes. And we
00:03:15 don’t know how to make robots like that at this point, but it seemed very plausible. It seemed
00:03:19 very realistic or imaginable. And I thought that was really cool because they’re awkward,
00:03:25 they do funny things that raise some interesting issues, but it seemed like something that would
00:03:29 ultimately be helpful and good if we could do it right.
00:03:31 Yeah, he was an older cranky gentleman, right?
00:03:33 He was an older cranky jewel thief, yeah.
00:03:36 It’s kind of funny little thing, which is, you know, he’s a jewel thief and so he pulls the
00:03:42 robot into his life, which is like, which is something you could imagine taking a home robotics
00:03:49 thing and pulling into whatever quirky thing that’s involved in your existence.
00:03:54 It’s meaningful to you. Exactly so. Yeah. And I think from that perspective, I mean,
00:04:00 not all of us are jewel thieves. And so when we bring our robots into our lives, it explains a
00:04:05 lot about this apartment, actually. But no, the idea that people should have the ability to make
00:04:12 this technology their own, that it becomes part of their lives. And I think it’s hard for us
00:04:18 as technologists to make that kind of technology. It’s easier to mold people into what we need them
00:04:22 to be. And just that opposite vision, I think, is really inspiring. And then there’s a
00:04:28 anthropomorphization where we project certain things on them, because I think the robot was
00:04:32 kind of dumb. But I have a bunch of Roombas I play with and you immediately project stuff onto
00:04:38 them. Much greater level of intelligence. We’ll probably do that with each other too. Much greater
00:04:43 degree of compassion. That’s right. One of the things we’re learning from AI is where we are
00:04:47 smart and where we are not smart. Yeah. You also enjoy, as people can see, and I enjoyed
00:04:55 myself watching you sing and even dance a little bit, a little bit, a little bit of dancing.
00:05:02 A little bit of dancing. That’s not quite my thing. As a method of education or just in life,
00:05:08 you know, in general. So easy question. What’s the definitive, objectively speaking,
00:05:15 top three songs of all time? Maybe something that, you know, to walk that back a little bit,
00:05:22 maybe something that others might be surprised by the three songs that you kind of enjoy.
00:05:28 That is a great question that I cannot answer. But instead, let me tell you a story.
00:05:32 So pick a question you do want to answer. That’s right. I’ve been watching the
00:05:36 presidential debates and vice presidential debates. And it turns out, yeah, it’s really,
00:05:39 you can just answer any question you want. So it’s a related question. Well said.
00:05:47 I really like pop music. I’ve enjoyed pop music ever since I was very young. So 60s music,
00:05:51 70s music, 80s music. This is all awesome. And then I had kids and I think I stopped listening
00:05:56 to music and I was starting to realize that my musical taste had sort of frozen out.
00:06:01 And so I decided in 2011, I think, to start listening to the top 10 billboard songs each week.
00:06:08 So I’d be on the on the treadmill and I would listen to that week’s top 10 songs
00:06:11 so I could find out what was popular now. And what I discovered is that I have no musical
00:06:17 taste whatsoever. I like what I’m familiar with. And so the first time I’d hear a song
00:06:22 is the first week that was on the charts, I’d be like, and then the second week,
00:06:26 I was into it a little bit. And the third week, I was loving it. And by the fourth week is like,
00:06:30 just part of me. And so I’m afraid that I can’t tell you the most my favorite song of all time,
00:06:36 because it’s whatever I heard most recently. Yeah, that’s interesting. People have told me that
00:06:44 there’s an art to listening to music as well. And you can start to, if you listen to a song,
00:06:48 just carefully, like explicitly, just force yourself to really listen. You start to,
00:06:54 I did this when I was part of jazz band and fusion band in college. You start to hear the layers
00:07:01 of the instruments. You start to hear the individual instruments and you start to,
00:07:04 you can listen to classical music or to orchestra this way. You can listen to jazz this way.
00:07:08 I mean, it’s funny to imagine you now to walking that forward to listening to pop hits now as like
00:07:16 a scholar, listening to like Cardi B or something like that, or Justin Timberlake. Is he? No,
00:07:22 not Timberlake, Bieber. They’ve both been in the top 10 since I’ve been listening.
00:07:26 They’re still up there. Oh my God, I’m so cool.
00:07:29 If you haven’t heard Justin Timberlake’s top 10 in the last few years, there was one
00:07:33 song that he did where the music video was set at essentially NeurIPS.
00:07:38 Oh, wow. Oh, the one with the robotics. Yeah, yeah, yeah, yeah, yeah.
00:07:42 Yeah, yeah. It’s like at an academic conference and he’s doing a demo.
00:07:45 He was presenting, right?
00:07:46 It was sort of a cross between the Apple, like Steve Jobs kind of talk and NeurIPS.
00:07:51 Yeah.
00:07:53 So, you know, it’s always fun when AI shows up in pop culture.
00:07:56 I wonder if he consulted somebody for that. That’s really interesting. So maybe on that topic,
00:08:01 I’ve seen your celebrity multiple dimensions, but one of them is you’ve done cameos in different
00:08:08 places. I’ve seen you in a TurboTax commercial as like, I guess, the brilliant Einstein character.
00:08:16 And the point is that TurboTax doesn’t need somebody like you. It doesn’t need a brilliant
00:08:23 person.
00:08:24 Very few things need someone like me. But yes, they were specifically emphasizing the
00:08:28 idea that you don’t need to be like a computer expert to be able to use their software.
00:08:32 How did you end up in that world?
00:08:33 I think it’s an interesting story. So I was teaching my class. It was an intro computer
00:08:38 science class for non concentrators, non majors. And sometimes when people would visit campus,
00:08:45 they would check in to say, hey, we want to see what a class is like. Can we sit on your class?
00:08:48 So a person came to my class who was the daughter of the brother of the husband of the best friend
00:09:02 of my wife. Anyway, basically a family friend came to campus to check out Brown and asked to
00:09:11 come to my class and came with her dad. Her dad is, who I’ve known from various
00:09:16 kinds of family events and so forth, but he also does advertising. And he said that he was
00:09:21 recruiting scientists for this ad, this TurboTax set of ads. And he said, we wrote the ad with the
00:09:31 idea that we get like the most brilliant researchers, but they all said no. So can you
00:09:36 help us find like B level scientists? And I’m like, sure, that’s who I hang out with.
00:09:44 So that should be fine. So I put together a list and I did what some people call the Dick Cheney.
00:09:49 So I included myself on the list of possible candidates, with a little blurb about each one
00:09:55 and why I thought that would make sense for them to do it. And they reached out to a handful of
00:09:59 them, but then they ultimately, they YouTube stalked me a little bit and they thought,
00:10:03 oh, I think he could do this. And they said, okay, we’re going to offer you the commercial.
00:10:07 I’m like, what? So it was such an interesting experience because they have another world, the
00:10:14 people who do like nationwide kind of ad campaigns and television shows and movies and so forth.
00:10:21 It’s quite a remarkable system that they have going because they have a set. Yeah. So I went to,
00:10:28 it was just somebody’s house that they rented in New Jersey. But in the commercial, it’s just me
00:10:35 and this other woman. In reality, there were 50 people in that room and another, I don’t know,
00:10:41 half a dozen kind of spread out around the house in various ways. There were people whose job it
00:10:46 was to control the sun. They were in the backyard on ladders, putting filters up to try to make sure
00:10:53 that the sun didn’t glare off the window in a way that would wreck the shot. So there was like
00:10:57 six people out there doing that. There was three people out there giving snacks, the craft table.
00:11:02 There was another three people giving healthy snacks because that was a separate craft table.
00:11:05 There was one person whose job it was to keep me from getting lost. And I think the reason for all
00:11:12 this is because so many people are in one place at one time. They have to be time efficient. They
00:11:16 have to get it done. The morning they were going to do my commercial. In the afternoon, they were
00:11:20 going to do a commercial of a mathematics professor from Princeton. They had to get it done. No wasted
00:11:27 time or energy. And so there’s just a fleet of people all working as an organism. And it was
00:11:32 fascinating. I was just the whole time just looking around like, this is so neat. Like one person
00:11:36 whose job it was to take the camera off of the cameraman so that someone else whose job it was
00:11:43 to remove the film canister. Because every couple’s takes, they had to replace the film because film
00:11:48 gets used up. It was just, I don’t know. I was geeking out the whole time. It was so fun.
00:11:53 How many takes did it take? It looked the opposite. There was more than two people there. It was very
00:11:57 relaxed. Right. Yeah. The person who I was in the scene with is a professional. She’s an improv
00:12:06 comedian from New York City. And when I got there, they had given me a script as such as it was. And
00:12:11 then I got there and they said, we’re going to do this as improv. I’m like, I don’t know how to
00:12:15 improv. I don’t know what you’re telling me to do here. Don’t worry. She knows. I’m like, okay.
00:12:21 I’ll go see how this goes. I guess I got pulled into the story because like, where the heck did
00:12:26 you come from? I guess in the scene. Like, how did you show up in this random person’s house?
00:12:32 Yeah. Well, I mean, the reality of it is I stood outside in the blazing sun. There was someone
00:12:36 whose job it was to keep an umbrella over me because I started to sweat. And so I would wreck
00:12:41 the shot because my face was all shiny with sweat. So there was one person who would dab me off,
00:12:45 had an umbrella. But yeah, like the reality of it, like, why is this strange stalkery person hanging
00:12:51 around outside somebody’s house? We’re not sure when you have to look in,
00:12:54 what the ways for the book, but are you, so you make, you make, like you said, YouTube,
00:13:00 you make videos yourself, you make awesome parody, sort of parody songs that kind of focus on a
00:13:07 particular aspect of computer science. How much those seem really interesting to you?
00:13:13 How much those seem really natural? How much production value goes into that?
00:13:18 Do you also have a team of 50 people? The videos, almost all the videos,
00:13:22 except for the ones that people would have actually seen, are just me. I write the lyrics,
00:13:26 I sing the song. I generally find a, like a backing track online because I’m like you,
00:13:34 can’t really play an instrument. And then I do, in some cases I’ll do visuals using just like
00:13:39 PowerPoint. Lots and lots of PowerPoint to make it sort of like an animation.
00:13:44 The most produced one is the one that people might have seen, which is the overfitting video
00:13:49 that I did with Charles Isbell. And that was produced by the Georgia Tech and Udacity people
00:13:55 because we were doing a class together. It was kind of, I usually do parody songs kind of to
00:13:59 cap off a class at the end of a class. So that one you’re wearing, so it was just a
00:14:04 thriller. You’re wearing the Michael Jackson, the red leather jacket. The interesting thing
00:14:09 with podcasting that you’re also into is that I really enjoy is that there’s not a team of people.
00:14:21 It’s kind of more, because you know, there’s something that happens when there’s more people
00:14:29 involved than just one person that just the way you start acting, I don’t know. There’s a censorship.
00:14:36 You’re not given, especially for like slow thinkers like me, you’re not. And I think most of us are,
00:14:42 if we’re trying to actually think we’re a little bit slow and careful, it kind of large teams get
00:14:50 in the way of that. And I don’t know what to do with that. Like that’s the, to me, like if,
00:14:56 yeah, it’s very popular to criticize quote unquote mainstream media.
00:15:01 But there is legitimacy to criticizing them the same. I love listening to NPR, for example,
00:15:06 but every, it’s clear that there’s a team behind it. There’s a commercial,
00:15:11 there’s constant commercial breaks. There’s this kind of like rush of like,
00:15:16 okay, I have to interrupt you now because we have to go to commercial. Just this whole,
00:15:20 it creates, it destroys the possibility of nuanced conversation. Yeah, exactly. Evian,
00:15:29 which Charles Isbell, who I talked to yesterday told me that Evian is naive backwards, which
00:15:36 the fact that his mind thinks this way is quite brilliant. Anyway, there’s a freedom to this
00:15:42 podcast. He’s Dr. Awkward, which by the way, is a palindrome. That’s a palindrome that I happen to
00:15:46 know from other parts of my life. And I just, well, you know, use it against Charles. Dr. Awkward.
00:15:54 So what was the most challenging parody song to make? Was it the Thriller one?
00:16:00 No, that one was really fun. I wrote the lyrics really quickly and then I gave it over to the
00:16:06 production team. They recruited a acapella group to sing. That went really smoothly. It’s great
00:16:11 having a team because then you can just focus on the part that you really love, which in my case
00:16:15 is writing the lyrics. For me, the most challenging one, not challenging in a bad way, but challenging
00:16:21 in a really fun way, was I did one of the parody songs I did is about the halting problem in
00:16:27 computer science. The fact that you can’t create a program that can tell for any other arbitrary
00:16:34 program whether it actually going to get stuck in infinite loop or whether it’s going to eventually
00:16:38 stop. And so I did it to an 80’s song because I hadn’t started my new thing of learning current
00:16:46 songs. And it was Billy Joel’s The Piano Man. Nice. Which is a great song. Sing me a song.
00:16:56 You’re the piano man. Yeah. So the lyrics are great because first of all, it rhymes. Not all
00:17:04 songs rhyme. I’ve done Rolling Stones songs which turn out to have no rhyme scheme whatsoever. They’re
00:17:09 just sort of yelling and having a good time, which makes it not fun from a parody perspective because
00:17:14 like you can say anything. But the lines rhymed and there was a lot of internal rhymes as well.
00:17:18 And so figuring out how to sing with internal rhymes, a proof of the halting problem was really
00:17:24 challenging. And I really enjoyed that process. What about, last question on this topic, what
00:17:30 about the dancing in the Thriller video? How many takes that take? So I wasn’t planning to dance.
00:17:36 They had me in the studio and they gave me the jacket and it’s like, well, you can’t,
00:17:40 if you have the jacket and the glove, like there’s not much you can do. Yeah. So I think I just
00:17:46 danced around and then they said, why don’t you dance a little bit? There was a scene with me
00:17:49 and Charles dancing together. They did not use it in the video, but we recorded it. Yeah. Yeah. No,
00:17:55 it was pretty funny. And Charles, who has this beautiful, wonderful voice doesn’t really sing.
00:18:02 He’s not really a singer. And so that was why I designed the song with him doing a spoken section
00:18:07 and me doing the singing. It’s very like Barry White. Yeah. Smooth baritone. Yeah. Yeah. It’s
00:18:12 great. That was awesome. So one of the other things Charles said is that, you know, everyone
00:18:19 knows you as like a super nice guy, super passionate about teaching and so on. What he said,
00:18:27 don’t know if it’s true, that despite the fact that you’re, you are. Okay. I will admit this
00:18:34 finally for the first time. That was, that was me. It’s the Johnny Cash song. Kill the Manorino just
00:18:39 to watch him die. That you actually do have some strong opinions on some topics. So if this in fact
00:18:46 is true, what strong opinions would you say you have? Is there ideas you think maybe in artificial
00:18:55 intelligence and machine learning, maybe in life that you believe is true that others might,
00:19:02 you know, some number of people might disagree with you on? So I try very hard to see things
00:19:08 from multiple perspectives. There’s this great Calvin and Hobbes cartoon where, do you know?
00:19:15 Yeah. Okay. So Calvin’s dad is always kind of a bit of a foil and he talked Calvin into,
00:19:21 Calvin had done something wrong. The dad talks him into like seeing it from another perspective
00:19:25 and Calvin, like this breaks Calvin because he’s like, oh my gosh, now I can see the opposite sides
00:19:30 of things. And so the, it’s, it becomes like a Cubist cartoon where there is no front and back.
00:19:35 Everything’s just exposed and it really freaks him out. And finally he settles back down. It’s
00:19:39 like, oh good. No, I can make that go away. But like, I’m that, I’m that I live in that world where
00:19:44 I’m trying to see everything from every perspective all the time. So there are some things that I’ve
00:19:48 formed opinions about that I would be harder, I think, to disavow me of. One is the super
00:19:56 intelligence argument and the existential threat of AI is one where I feel pretty confident in my
00:20:02 feeling about that one. Like I’m willing to hear other arguments, but like, I am not particularly
00:20:07 moved by the idea that if we’re not careful, we will accidentally create a super intelligence
00:20:13 that will destroy human life. Let’s talk about that. Let’s get you in trouble and record your
00:20:17 video. It’s like Bill Gates, I think he said like some quote about the internet that that’s just
00:20:24 going to be a small thing. It’s not going to really go anywhere. And then I think Steve
00:20:29 Ballmer said, I don’t know why I’m sticking on Microsoft. That’s something that like smartphones
00:20:36 are useless. There’s no reason why Microsoft should get into smartphones, that kind of.
00:20:40 So let’s get, let’s talk about AGI. As AGI is destroying the world, we’ll look back at this
00:20:45 video and see. No, I think it’s really interesting to actually talk about because nobody really
00:20:49 knows the future. So you have to use your best intuition. It’s very difficult to predict it,
00:20:54 but you have spoken about AGI and the existential risks around it and sort of basing your intuition
00:21:01 that we’re quite far away from that being a serious concern relative to the other concerns
00:21:08 we have. Can you maybe unpack that a little bit? Yeah, sure, sure, sure. So as I understand it,
00:21:15 that for example, I read Bostrom’s book and a bunch of other reading material about this sort
00:21:22 of general way of thinking about the world. And I think the story goes something like this, that we
00:21:27 will at some point create computers that are smart enough that they can help design the next version
00:21:35 of themselves, which itself will be smarter than the previous version of themselves and eventually
00:21:42 bootstrapped up to being smarter than us. At which point we are essentially at the mercy of this sort
00:21:49 of more powerful intellect, which in principle we don’t have any control over what its goals are.
00:21:56 And so if its goals are at all out of sync with our goals, for example, the continued existence
00:22:04 of humanity, we won’t be able to stop it. It’ll be way more powerful than us and we will be toast.
00:22:12 So there’s some, I don’t know, very smart people who have signed on to that story. And it’s a
00:22:18 compelling story. Now I can really get myself in trouble. I once wrote an op ed about this,
00:22:25 specifically responding to some quotes from Elon Musk, who has been on this very podcast
00:22:30 more than once. AI summoning the demon. But then he came to Providence, Rhode Island,
00:22:38 which is where I live, and said to the governors of all the states, you know, you’re worried about
00:22:45 entirely the wrong thing. You need to be worried about AI. You need to be very, very worried about
00:22:49 AI. And journalists kind of reacted to that and they wanted to get people’s take. And I was like,
00:22:56 OK, my my my belief is that one of the things that makes Elon Musk so successful and so remarkable
00:23:03 as an individual is that he believes in the power of ideas. He believes that you can have you can
00:23:08 if you know, if you have a really good idea for getting into space, you can get into space.
00:23:12 If you have a really good idea for a company or for how to change the way that people drive,
00:23:18 you just have to do it and it can happen. It’s really natural to apply that same idea to AI.
00:23:23 You see these systems that are doing some pretty remarkable computational tricks, demonstrations,
00:23:30 and then to take that idea and just push it all the way to the limit and think, OK, where does
00:23:35 this go? Where is this going to take us next? And if you’re a deep believer in the power of ideas,
00:23:40 then it’s really natural to believe that those ideas could be taken to the extreme and kill us.
00:23:47 So I think, you know, his strength is also his undoing, because that doesn’t mean it’s true.
00:23:52 Like, it doesn’t mean that that has to happen, but it’s natural for him to think that.
00:23:56 So another way to phrase the way he thinks, and I find it very difficult to argue with that line
00:24:04 of thinking. So Sam Harris is another person from neuroscience perspective that thinks like that
00:24:09 is saying, well, is there something fundamental in the physics of the universe that prevents this
00:24:18 from eventually happening? And Nick Bostrom thinks in the same way, that kind of zooming out, yeah,
00:24:24 OK, we humans now are existing in this like time scale of minutes and days. And so our intuition
00:24:32 is in this time scale of minutes, hours and days. But if you look at the span of human history,
00:24:39 is there any reason you can’t see this in 100 years? And like, is there something fundamental
00:24:47 about the laws of physics that prevent this? And if it doesn’t, then it eventually will happen
00:24:52 or will we will destroy ourselves in some other way. And it’s very difficult, I find,
00:24:57 to actually argue against that. Yeah, me too.
00:25:03 And not sound like. Not sound like you’re just like rolling your eyes like I have like science
00:25:11 fiction, we don’t have to think about it, but even even worse than that, which is like, I don’t have
00:25:16 kids, but like I got to pick up my kids now like this. OK, I see there’s more pressing short. Yeah,
00:25:20 there’s more pressing short term things that like stop over the next national crisis. We have much,
00:25:25 much shorter things like now, especially this year, there’s covid. So like any kind of discussion
00:25:30 like that is like there’s this, you know, this pressing things today is. And then so the Sam
00:25:37 Harris argument, well, like any day the exponential singularity can can occur is very difficult to
00:25:45 argue against. I mean, I don’t know. But part of his story is also he’s not going to put a date on
00:25:50 it. It could be in a thousand years, it could be in a hundred years, it could be in two years. It’s
00:25:53 just that as long as we keep making this kind of progress, it’s ultimately has to become a concern.
00:25:59 I kind of am on board with that. But the thing that the piece that I feel like is missing from
00:26:03 that that way of extrapolating from the moment that we’re in, is that I believe that in the
00:26:09 process of actually developing technology that can really get around in the world and really process
00:26:14 and do things in the world in a sophisticated way, we’re going to learn a lot about what that means,
00:26:20 which that we don’t know now because we don’t know how to do this right now.
00:26:24 If you believe that you can just turn on a deep learning network and eventually give it enough
00:26:28 compute and eventually get there. Well, sure, that seems really scary because we won’t we won’t be
00:26:32 in the loop at all. We won’t we won’t be helping to design or target these kinds of systems.
00:26:38 But I don’t I don’t see that. That feels like it is against the laws of physics,
00:26:43 because these systems need help. Right. They need they need to surpass the the the difficulty,
00:26:49 the wall of complexity that happens in arranging something in the form that that will happen.
00:26:55 Yeah, like I believe in evolution, like I believe that that that there’s an argument. Right. So
00:27:00 there’s another argument, just to look at it from a different perspective, that people say,
00:27:04 why don’t believe in evolution? How could evolution? It’s it’s sort of like a random set of
00:27:10 parts assemble themselves into a 747. And that could just never happen. So it’s like,
00:27:15 OK, that’s maybe hard to argue against. But clearly, 747 do get assembled. They get assembled
00:27:20 by us. Basically, the idea being that there’s a process by which we will get to the point of
00:27:26 making technology that has that kind of awareness. And in that process, we’re going to learn a lot
00:27:31 about that process and we’ll have more ability to control it or to shape it or to build it in our
00:27:37 own image. It’s not something that is going to spring into existence like that 747. And we’re
00:27:43 just going to have to contend with it completely unprepared. That’s very possible that in the
00:27:49 context of the long arc of human history, it will, in fact, spring into existence.
00:27:55 But that springing might take like if you look at nuclear weapons, like even 20 years is a springing
00:28:02 in in the context of human history. And it’s very possible, just like with nuclear weapons,
00:28:07 that we could have I don’t know what percentage you want to put at it, but the possibility could
00:28:13 have knocked ourselves out. Yeah. The possibility of human beings destroying themselves in the 20th
00:28:17 century with nuclear weapons. I don’t know. You can if you really think through it, you could
00:28:23 really put it close to, like, I don’t know, 30, 40 percent, given like the certain moments of
00:28:28 crisis that happen. So, like, I think one, like, fear in the shadows that’s not being acknowledged
00:28:38 is it’s not so much the A.I. will run away is is that as it’s running away,
00:28:44 we won’t have enough time to think through how to stop it. Right. Fast takeoff or FOOM. Yeah.
00:28:52 I mean, my much bigger concern, I wonder what you think about it, which is
00:28:55 we won’t know it’s happening. So I kind of think that there’s an A.G.I. situation already happening
00:29:05 with social media that our minds, our collective intelligence of human civilization is already
00:29:11 being controlled by an algorithm. And like we’re we’re already super like the level of a collective
00:29:19 intelligence, thanks to Wikipedia, people should donate to Wikipedia to feed the A.G.I.
00:29:23 . Man, if we had a super intelligence that that was in line with Wikipedia’s values,
00:29:31 that it’s a lot better than a lot of other things I could imagine. I trust Wikipedia more than I
00:29:36 trust Facebook or YouTube as far as trying to do the right thing from a rational perspective.
00:29:41 Yeah. Now, that’s not where you were going. I understand that. But it does strike me that
00:29:45 there’s sort of smarter and less smart ways of exposing ourselves to each other on the Internet.
00:29:51 Yeah. The interesting thing is that Wikipedia and social media have very different forces.
00:29:55 You’re right. I mean, Wikipedia, if A.G.I. was Wikipedia, it’d be just like this cranky, overly
00:30:02 competent editor of articles. You know, there’s something to that. But the social
00:30:08 media aspect is not. So the vision of A.G.I. is as a separate system that’s super intelligent.
00:30:17 That’s super intelligent. That’s one key little thing. I mean, there’s the paperclip argument
00:30:20 that’s super dumb, but super powerful systems. But with social media, you have a relatively like
00:30:27 algorithms we may talk about today, very simple algorithms that when something Charles talks a
00:30:35 lot about, which is interactive A.I., when they start like having at scale, like tiny little
00:30:40 interactions with human beings, they can start controlling these human beings. So a single
00:30:45 algorithm can control the minds of human beings slowly to what we might not realize. It could
00:30:51 start wars. It could start. It could change the way we think about things. It feels like
00:30:57 in the long arc of history, if I were to sort of zoom out from all the outrage and all the tension
00:31:03 on social media, that it’s progressing us towards better and better things. It feels like chaos and
00:31:11 toxic and all that kind of stuff. It’s chaos and toxic. Yeah. But it feels like actually
00:31:17 the chaos and toxic is similar to the kind of debates we had from the founding of this country.
00:31:22 You know, there was a civil war that happened over that period. And ultimately it was all about
00:31:28 this tension of like something doesn’t feel right about our implementation of the core values we
00:31:33 hold as human beings. And they’re constantly struggling with this. And that results in people
00:31:38 calling each other, just being shady to each other on Twitter. But ultimately the algorithm is
00:31:47 managing all that. And it feels like there’s a possible future in which that algorithm
00:31:53 controls us into the direction of self destruction and whatever that looks like.
00:31:59 Yeah. So, all right. I do believe in the power of social media to screw us up royally. I do believe
00:32:05 in the power of social media to benefit us too. I do think that we’re in a, yeah, it’s sort of
00:32:12 almost got dropped on top of us. And now we’re trying to, as a culture, figure out how to cope
00:32:16 with it. There’s a sense in which, I don’t know, there’s some arguments that say that, for example,
00:32:23 I guess college age students now, late college age students now, people who were in middle school
00:32:27 when social media started to really take off, may be really damaged. Like this may have really hurt
00:32:34 their development in a way that we don’t have all the implications of quite yet. That’s the generation
00:32:40 who, and I hate to make it somebody else’s responsibility, but like they’re the ones who
00:32:46 can fix it. They’re the ones who can figure out how do we keep the good of this kind of technology
00:32:53 without letting it eat us alive. And if they’re successful, we move on to the next phase, the next
00:33:01 level of the game. If they’re not successful, then yeah, then we’re going to wreck each other. We’re
00:33:06 going to destroy society. So you’re going to, in your old age, sit on a porch and watch the world
00:33:11 burn because of the TikTok generation that… I believe, well, so this is my kid’s age,
00:33:17 right? And that’s certainly my daughter’s age. And she’s very tapped in to social stuff, but she’s
00:33:21 also, she’s trying to find that balance, right? Of participating in it and in getting the positives
00:33:26 of it, but without letting it eat her alive. And I think sometimes she ventures, I hope she doesn’t
00:33:33 watch this. Sometimes I think she ventures a little too far and is consumed by it. And other
00:33:39 times she gets a little distance. And if there’s enough people like her out there, they’re going to
00:33:46 navigate this choppy waters. That’s an interesting skill actually to develop. I talked to my dad
00:33:52 about it. I’ve now, somehow this podcast in particular, but other reasons has received a
00:34:01 little bit of attention. And with that, apparently in this world, even though I don’t shut up about
00:34:07 love and I’m just all about kindness, I have now a little mini army of trolls. It’s kind of hilarious
00:34:15 actually, but it also doesn’t feel good, but it’s a skill to learn to not look at that, like to
00:34:23 moderate actually how much you look at that. The discussion I have with my dad, it’s similar to,
00:34:28 it doesn’t have to be about trolls. It could be about checking email, which is like, if you’re
00:34:33 anticipating, you know, there’s a, my dad runs a large Institute at Drexel University and there
00:34:39 could be stressful like emails you’re waiting, like there’s drama of some kinds. And so like,
00:34:45 there’s a temptation to check the email. If you send an email and you kind of,
00:34:49 and that pulls you in into, it doesn’t feel good. And it’s a skill that he actually complains that
00:34:56 he hasn’t learned. I mean, he grew up without it. So he hasn’t learned the skill of how to
00:35:01 shut off the internet and walk away. And I think young people, while they’re also being
00:35:05 quote unquote damaged by like, you know, being bullied online, all of those stories, which are
00:35:12 very like horrific, you basically can’t escape your bullies these days when you’re growing up.
00:35:17 But at the same time, they’re also learning that skill of how to be able to shut off the,
00:35:23 like disconnect with it, be able to laugh at it, not take it too seriously. It’s fascinating. Like
00:35:29 we’re all trying to figure this out. Just like you said, it’s been dropped on us and we’re trying to
00:35:32 figure it out. Yeah. I think that’s really interesting. And I guess I’ve become a believer
00:35:37 in the human design, which I feel like I don’t completely understand. Like how do you make
00:35:42 something as robust as us? Like we’re so flawed in so many ways. And yet, and yet, you know,
00:35:48 we dominate the planet and we do seem to manage to get ourselves out of scrapes eventually,
00:35:57 not necessarily the most elegant possible way, but somehow we get, we get to the next step.
00:36:02 And I don’t know how I’d make a machine do that. Generally speaking, like if I train one of my
00:36:09 reinforcement learning agents to play a video game and it works really hard on that first stage
00:36:13 over and over and over again, and it makes it through, it succeeds on that first level.
00:36:17 And then the new level comes and it’s just like, okay, I’m back to the drawing board. And somehow
00:36:21 humanity, we keep leveling up and then somehow managing to put together the skills necessary to
00:36:26 achieve success, some semblance of success in that next level too. And, you know,
00:36:33 I hope we can keep doing that.
00:36:36 You mentioned reinforcement learning. So you’ve had a couple of years in the field. No, quite,
00:36:42 you know, quite a few, quite a long career in artificial intelligence broadly, but reinforcement
00:36:50 learning specifically, can you maybe give a hint about your sense of the history of the field?
00:36:58 And in some ways it’s changed with the advent of deep learning, but as a long roots, like how is it
00:37:05 weaved in and out of your own life? How have you seen the community change or maybe the ideas that
00:37:09 it’s playing with change? I’ve had the privilege, the pleasure of being, of having almost a front
00:37:16 row seat to a lot of this stuff. And it’s been really, really fun and interesting. So when I was
00:37:21 in college in the eighties, early eighties, the neural net thing was starting to happen.
00:37:29 And I was taking a lot of psychology classes and a lot of computer science classes as a college
00:37:34 student. And I thought, you know, something that can play tic tac toe and just like learn to get
00:37:38 better at it. That ought to be a really easy thing. So I spent almost, almost all of my, what would
00:37:43 have been vacations during college, like hacking on my home computer, trying to teach it how to
00:37:48 play tic tac toe and programming language. Basic. Oh yeah. That’s, that’s, I was, I that’s my first
00:37:53 language. That’s my native language. Is that when you first fell in love with computer science,
00:37:57 just like programming basic on that? Uh, what was, what was the computer? Do you remember? I had,
00:38:02 I had a TRS 80 model one before they were called model ones. Cause there was nothing else. Uh,
00:38:08 I got my computer in 1979, uh, instead. So I was, I was, I would have been bar mitzvahed,
00:38:18 but instead of having a big party that my parents threw on my behalf, they just got me a computer.
00:38:23 Cause that’s what I really, really, really wanted. I saw them in the, in the, in the mall and
00:38:26 radio shack. And I thought, what, how are they doing that? I would try to stump them. I would
00:38:32 give them math problems like one plus and then in parentheses, two plus one. And I would always get
00:38:37 it right. I’m like, how do you know so much? Like I’ve had to go to algebra class for the last few
00:38:42 years to learn this stuff and you just seem to know. So I was, I was, I was smitten and, uh,
00:38:48 got a computer and I think ages 13 to 15. I have no memory of those years. I think I just was in
00:38:55 my room with the computer, listening to Billy Joel, communing, possibly listening to the radio,
00:38:59 listening to Billy Joel. That was the one album I had, uh, on vinyl at that time. And, um, and then
00:39:06 I got it on cassette tape and that was really helpful because then I could play it. I didn’t
00:39:09 have to go down to my parents, wifi or hi fi sorry. Uh, and at age 15, I remember kind of
00:39:16 walking out and like, okay, I’m ready to talk to people again. Like I’ve learned what I need to
00:39:20 learn here. And, um, so yeah, so, so that was, that was my home computer. And so I went to college
00:39:26 and I was like, oh, I’m totally going to study computer science. And I opted the college I chose
00:39:30 specifically had a computer science major. The one that I really wanted the college I really wanted
00:39:34 to go to didn’t so bye bye to them. So I went to Yale, uh, Princeton would have been way more
00:39:41 convenient and it was just beautiful campus and it was close enough to home. And I was really
00:39:45 excited about Princeton. And I visited, I said, so computer science majors like, well, we have
00:39:50 computer engineering. I’m like, Oh, I don’t like that word engineering. I like computer science.
00:39:55 I really, I want to do like, you’re saying hardware and software. They’re like, yeah.
00:39:59 I’m like, I just want to do software. I couldn’t care less about hardware. And you grew up in
00:40:02 Philadelphia. I grew up outside Philly. Yeah. Yeah. Uh, so the, you know, local schools were
00:40:07 like Penn and Drexel and, uh, temple. Like everyone in my family went to temple at least at
00:40:12 one point in their lives, except for me. So yeah, Philly, Philly family, Yale had a computer science
00:40:18 department. And that’s when you, it’s kind of interesting. You said eighties and neural
00:40:22 networks. That’s when the neural networks was a hot new thing or a hot thing period. Uh, so what
00:40:27 is that in college when you first learned about neural networks or when she learned, like how did
00:40:31 it was in a psychology class, not in a CS. Yeah. Was it psychology or cognitive science or like,
00:40:36 do you remember like what context it was? Yeah. Yeah. Yeah. So, so I was a, I’ve always been a
00:40:42 bit of a cognitive psychology groupie. So like I’m, I studied computer science, but I like,
00:40:47 I like to hang around where the cognitive scientists are. Cause I don’t know brains, man.
00:40:52 They’re like, they’re wacky. Cool. And they have a bigger picture view of things. They’re a little
00:40:57 less engineering. I would say they’re more, they’re more interested in the nature of cognition and
00:41:03 intelligence and perception and how like the vision system work. Like they’re asking always
00:41:07 bigger questions. Now with the deep learning community there, I think more, there’s a lot of
00:41:12 intersections, but I do find that the neuroscience folks actually in cognitive psychology, cognitive
00:41:21 science folks are starting to learn how to program, how to use neural, artificial neural networks.
00:41:27 And they are actually approaching problems in like totally new, interesting ways. It’s fun to
00:41:31 watch that grad students from those departments, like approach a problem of machine learning.
00:41:37 Right. They come in with a different perspective. Yeah. They don’t care about like your
00:41:40 image net data set or whatever they want, like to understand the, the, the, like the basic
00:41:47 mechanisms at the, at the neuronal level and the functional level of intelligence. It’s kind of,
00:41:53 it’s kind of cool to see them work, but yeah. Okay. So you always love, you’re always a groupie
00:41:58 of cognitive psychology. Yeah. Yeah. And so, so it was in a class by Richard Garrig. He was kind of
00:42:04 like my favorite psych professor in college. And I took like three different classes with him
00:42:11 and yeah. So they were talking specifically the class, I think was kind of a,
00:42:17 there was a big paper that was written by Steven Pinker and Prince. I don’t, I’m blanking on
00:42:22 Prince’s first name, but Prince and Pinker and Prince, they wrote kind of a, they were at that
00:42:28 time kind of like, ah, I’m blanking on the names of the current people. The cognitive scientists
00:42:36 who are complaining a lot about deep networks. Oh, Gary, Gary Marcus, Marcus and who else? I mean,
00:42:44 there’s a few, but Gary, Gary’s the most feisty. Sure. Gary’s very feisty. And with this, with his
00:42:49 coauthor, they, they, you know, they’re kind of doing these kinds of take downs where they say,
00:42:52 okay, well, yeah, it does all these amazing, amazing things, but here’s a shortcoming. Here’s
00:42:56 a shortcoming. Here’s a shortcoming. And so the Pinker Prince paper is kind of like the,
00:43:01 that generation’s version of Marcus and Davis, right? Where they’re, they’re trained as cognitive
00:43:07 scientists, but they’re looking skeptically at the results in the, in the artificial intelligence,
00:43:12 neural net kind of world and saying, yeah, it can do this and this and this, but low,
00:43:16 it can’t do that. And it can’t do that. And it can’t do that maybe in principle or maybe just
00:43:20 in practice at this point. But, but the fact of the matter is you’re, you’ve narrowed your focus
00:43:26 too far to be impressed. You know, you’re impressed with the things within that circle,
00:43:30 but you need to broaden that circle a little bit. You need to look at a wider set of problems.
00:43:34 And so, so we had, so I was in this seminar in college that was basically a close reading of
00:43:40 the Pinker Prince paper, which was like really thick. There was a lot going on in there. And,
00:43:47 and it, you know, and it talked about the reinforcement learning idea a little bit.
00:43:51 I’m like, oh, that sounds really cool because behavior is what is really interesting to me
00:43:55 about psychology anyway. So making programs that, I mean, programs are things that behave.
00:44:00 People are things that behave. Like I want to make learning that learns to behave.
00:44:05 And which way was reinforcement learning presented? Is this talking about human and
00:44:09 animal behavior or are we talking about actual mathematical construct?
00:44:12 Ah, that’s right. So that’s a good question. Right. So this is, I think it wasn’t actually
00:44:17 talked about as behavior in the paper that I was reading. I think that it just talked about
00:44:22 learning. And to me, learning is about learning to behave, but really neural nets at that point
00:44:27 were about learning like supervised learning. So learning to produce outputs from inputs.
00:44:31 So I kind of tried to invent reinforcement learning. When I graduated, I joined a research
00:44:36 group at Bellcore, which had spun out of Bell Labs recently at that time because of the divestiture
00:44:42 of the long distance and local phone service in the 1980s, 1984. And I was in a group with
00:44:50 Dave Ackley, who was the first author of the Boltzmann machine paper. So the very first neural
00:44:56 net paper that could handle XOR, right? So XOR sort of killed neural nets. The very first,
00:45:02 the zero with the first winter. Yeah. Um, the, the perceptrons paper and Hinton along with his
00:45:10 student, Dave Ackley, and I think there was other authors as well showed that no, no, no,
00:45:14 with Boltzmann machines, we can actually learn nonlinear concepts. And so everything’s back on
00:45:19 the table again. And that kind of started that second wave of neural networks. So Dave Ackley
00:45:24 was, he became my mentor at, at Bellcore and we talked a lot about learning and life and
00:45:30 computation and how all these things fit together. Now Dave and I have a podcast together. So,
00:45:35 so I get to kind of enjoy that sort of his, his perspective once again, even, even all these years
00:45:42 later. And so I said, so I said, I was really interested in learning, but in the concept of
00:45:48 behavior and he’s like, oh, well that’s reinforcement learning here. And he gave me
00:45:52 Rich Sutton’s 1984 TD paper. So I read that paper. I honestly didn’t get all of it,
00:45:58 but I got the idea. I got that they were using, that he was using ideas that I was familiar with
00:46:04 in the context of neural nets and, and like sort of back prop. But with this idea of making
00:46:09 predictions over time, I’m like, this is so interesting, but I don’t really get all the
00:46:13 details I said to Dave. And Dave said, oh, well, why don’t we have him come and give a talk?
00:46:18 And I was like, wait, what, you can do that? Like, these are real people. I thought they
00:46:23 were just words. I thought it was just like ideas that somehow magically seeped into paper. He’s
00:46:28 like, no, I, I, I know Rich like, we’ll just have him come down and he’ll give a talk. And so I was,
00:46:35 you know, my mind was blown. And so Rich came and he gave a talk at Bellcore and he talked about
00:46:41 what he was super excited, which was they had just figured out at the time Q learning. So Watkins
00:46:48 had visited the Rich Sutton’s lab at, at UMass or Andy Bartow’s lab that Rich was a part of.
00:46:55 And, um, he was really excited about this because it resolved a whole bunch of problems that he
00:47:00 didn’t know how to resolve in the, in the earlier paper. And so, um,
00:47:05 For people who don’t know TD, temporal difference, these are all just algorithms
00:47:09 for reinforcement learning.
00:47:10 Right. And TD, temporal difference in particular is about making predictions over time. And you can
00:47:15 try to use it for making decisions, right? Cause if you can predict how good a future action or an
00:47:19 action outcomes will be in the future, you can choose one that has better and, or, but the thing
00:47:24 that’s really cool about Q learning is it was off policy, which meant that you could actually be
00:47:29 learning about the environment and what the value of different actions would be while actually
00:47:33 figuring out how to behave optimally. So that was a revelation.
00:47:38 Yeah. And the proof of that is kind of interesting. I mean, that’s really surprising
00:47:41 to me when I first read that paper. I mean, it’s, it’s, it’s, it’s, it’s, it’s, it’s, it’s,
00:47:46 it’s, it’s, it’s, it’s, it’s, it’s, it’s, it’s, it’s, it’s, it’s, it’s, it’s, it’s, it’s, it’s, it’s,
00:47:51 it’s interesting. I mean, that’s really surprising to me when I first read that and then in Richard,
00:47:55 Rich Sutton’s book on the matter, it’s, it’s kind of a beautiful that a single equation can
00:48:01 capture all one line of code and like, you can learn anything. Yeah. Like enough time.
00:48:06 So equation and code, you’re right. Like you can the code that you can arguably, at least
00:48:13 if you like squint your eyes can say,
00:48:17 this is all of intelligence is that you can implement
00:48:21 that in a single one.
00:48:22 I think I started with Lisp, which is a shout out to Lisp
00:48:26 with like a single line of code, key piece of code,
00:48:29 maybe a couple that you could do that.
00:48:32 It’s kind of magical.
00:48:33 It’s feels too good to be true.
00:48:37 Well, and it sort of is.
00:48:38 Yeah, kind of.
00:48:40 It seems to require an awful lot
00:48:41 of extra stuff supporting it.
00:48:43 But nonetheless, the idea is really good.
00:48:46 And as far as we know, it is a very reasonable way
00:48:50 of trying to create adaptive behavior,
00:48:52 behavior that gets better at something over time.
00:48:56 Did you find the idea of optimal at all compelling
00:49:00 that you could prove that it’s optimal?
00:49:02 So like one part of computer science
00:49:04 that it makes people feel warm and fuzzy inside
00:49:08 is when you can prove something like
00:49:10 that a sorting algorithm worst case runs
00:49:13 and N log N, and it makes everybody feel so good.
00:49:16 Even though in reality, it doesn’t really matter
00:49:18 what the worst case is, what matters is like,
00:49:20 does this thing actually work in practice
00:49:22 on this particular actual set of data that I enjoy?
00:49:26 Did you?
00:49:26 So here’s a place where I have maybe a strong opinion,
00:49:29 which is like, you’re right, of course, but no, no.
00:49:34 Like, so what makes worst case so great, right?
00:49:37 If you have a worst case analysis so great
00:49:39 is that you get modularity.
00:49:41 You can take that thing and plug it into another thing
00:49:44 and still have some understanding of what’s gonna happen
00:49:47 when you click them together, right?
00:49:49 If it just works well in practice, in other words,
00:49:51 with respect to some distribution that you care about,
00:49:54 when you go plug it into another thing,
00:49:56 that distribution can shift, it can change,
00:49:58 and your thing may not work well anymore.
00:50:00 And you want it to, and you wish it does,
00:50:02 and you hope that it will, but it might not,
00:50:04 and then, ah.
00:50:06 So you’re saying you don’t like machine learning.
00:50:13 But we have some positive theoretical results
00:50:15 for these things.
00:50:17 You can come back at me with,
00:50:20 yeah, but they’re really weak,
00:50:21 and yeah, they’re really weak.
00:50:22 And you can even say that sorting algorithms,
00:50:25 like if you do the optimal sorting algorithm,
00:50:27 it’s not really the one that you want,
00:50:30 and that might be true as well.
00:50:31 But it is, the modularity is a really powerful statement.
00:50:34 I really like that.
00:50:35 If you’re an engineer, you can then assemble
00:50:36 different things, you can count on them to be,
00:50:39 I mean, it’s interesting.
00:50:42 It’s a balance, like with everything else in life,
00:50:45 you don’t want to get too obsessed.
00:50:47 I mean, this is what computer scientists do,
00:50:48 which they tend to get obsessed,
00:50:51 and they overoptimize things,
00:50:53 or they start by optimizing, and then they overoptimize.
00:50:56 So it’s easy to get really granular about this thing,
00:51:00 but like the step from an n squared to an n log n
00:51:06 sorting algorithm is a big leap for most real world systems.
00:51:10 No matter what the actual behavior of the system is,
00:51:13 that’s a big leap.
00:51:14 And the same can probably be said
00:51:17 for other kind of first leaps
00:51:20 that you would take on a particular problem.
00:51:22 Like it’s picking the low hanging fruit,
00:51:25 or whatever the equivalent of doing the,
00:51:29 not the dumbest thing, but the next to the dumbest thing.
00:51:32 Picking the most delicious reachable fruit.
00:51:34 Yeah, most delicious reachable fruit.
00:51:36 I don’t know why that’s not a saying.
00:51:38 Yeah.
00:51:39 Okay, so then this is the 80s,
00:51:44 and this kind of idea starts to percolate of learning.
00:51:47 At that point, I got to meet Rich Sutton,
00:51:50 so everything was sort of downhill from there,
00:51:52 and that was really the pinnacle of everything.
00:51:55 But then I felt like I was kind of on the inside.
00:51:58 So then as interesting results were happening,
00:52:00 I could like check in with Rich or with Jerry Tesaro,
00:52:03 who had a huge impact on kind of early thinking
00:52:06 in temporal difference learning and reinforcement learning
00:52:10 and showed that you could do,
00:52:11 you could solve problems
00:52:12 that we didn’t know how to solve any other way.
00:52:16 And so that was really cool.
00:52:17 So as good things were happening,
00:52:18 I would hear about it from either the people
00:52:20 who were doing it,
00:52:21 or the people who were talking to the people
00:52:23 who were doing it.
00:52:23 And so I was able to track things pretty well
00:52:25 through the 90s.
00:52:28 So what wasn’t most of the excitement
00:52:32 on reinforcement learning in the 90s era
00:52:34 with, what is it, TD Gamma?
00:52:37 Like what’s the role of these kind of little
00:52:40 like fun game playing things and breakthroughs
00:52:43 about exciting the community?
00:52:46 Was that, like what were your,
00:52:48 because you’ve also built across,
00:52:50 or part of building across a puzzle solver,
00:52:56 solving program called proverb.
00:53:00 So you were interested in this as a problem,
00:53:05 like in forming, using games to understand
00:53:09 how to build intelligence systems.
00:53:12 So like, what did you think about TD Gamma?
00:53:14 Like what did you think about that whole thing in the 90s?
00:53:16 Yeah, I mean, I found the TD Gamma result
00:53:19 really just remarkable.
00:53:20 So I had known about some of Jerry’s stuff
00:53:22 before he did TD Gamma and he did a system,
00:53:24 just more vanilla, well, not entirely vanilla,
00:53:27 but a more classical back proppy kind of network
00:53:31 for playing backgammon,
00:53:32 where he was training it on expert moves.
00:53:35 So it was kind of supervised,
00:53:37 but the way that it worked was not to mimic the actions,
00:53:41 but to learn internally an evaluation function.
00:53:44 So to learn, well, if the expert chose this over this,
00:53:47 that must mean that the expert values this more than this.
00:53:50 And so let me adjust my weights to make it
00:53:52 so that the network evaluates this
00:53:54 as being better than this.
00:53:56 So it could learn from human preferences,
00:53:59 it could learn its own preferences.
00:54:02 And then when he took the step from that
00:54:04 to actually doing it
00:54:06 as a full on reinforcement learning problem,
00:54:08 where you didn’t need a trainer,
00:54:10 you could just let it play, that was remarkable, right?
00:54:13 And so I think as humans often do,
00:54:17 as we’ve done in the recent past as well,
00:54:20 people extrapolate.
00:54:22 It’s like, oh, well, if you can do that,
00:54:23 which is obviously very hard,
00:54:24 then obviously you could do all these other problems
00:54:27 that we wanna solve that we know are also really hard.
00:54:31 And it turned out very few of them ended up being practical,
00:54:35 partly because I think neural nets,
00:54:38 certainly at the time,
00:54:39 were struggling to be consistent and reliable.
00:54:42 And so training them in a reinforcement learning setting
00:54:45 was a bit of a mess.
00:54:46 I had, I don’t know, generation after generation
00:54:50 of like master students
00:54:51 who wanted to do value function approximation,
00:54:55 basically reinforcement learning with neural nets.
00:54:59 And over and over and over again, we were failing.
00:55:03 We couldn’t get the good results that Jerry Tesaro got.
00:55:06 I now believe that Jerry is a neural net whisperer.
00:55:09 He has a particular ability to get neural networks
00:55:14 to do things that other people would find impossible.
00:55:18 And it’s not the technology,
00:55:19 it’s the technology and Jerry together.
00:55:22 Which I think speaks to the role of the human expert
00:55:27 in the process of machine learning.
00:55:28 Right, it’s so easy.
00:55:30 We’re so drawn to the idea that it’s the technology
00:55:32 that is where the power is coming from
00:55:36 that I think we lose sight of the fact
00:55:38 that sometimes you need a really good,
00:55:39 just like, I mean, no one would think,
00:55:40 hey, here’s this great piece of software.
00:55:42 Here’s like, I don’t know, GNU Emacs or whatever.
00:55:44 And doesn’t that prove that computers are super powerful
00:55:48 and basically gonna take over the world?
00:55:49 It’s like, no, Stalman is a hell of a hacker, right?
00:55:52 So he was able to make the code do these amazing things.
00:55:55 He couldn’t have done it without the computer,
00:55:57 but the computer couldn’t have done it without him.
00:55:59 And so I think people discount the role of people
00:56:02 like Jerry who have just a particular set of skills.
00:56:07 On that topic, by the way, as a small side note,
00:56:10 I tweeted Emacs is greater than Vim yesterday
00:56:14 and deleted the tweet 10 minutes later
00:56:18 when I realized it started a war.
00:56:21 I was like, oh, I was just kidding.
00:56:24 I was just being, and I’m gonna walk back and forth.
00:56:29 So people still feel passionately
00:56:30 about that particular piece of good stuff.
00:56:32 Yeah, I don’t get that
00:56:33 because Emacs is clearly so much better, I don’t understand.
00:56:37 But why do I say that?
00:56:38 Because I spent a block of time in the 80s
00:56:43 making my fingers know the Emacs keys
00:56:46 and now that’s part of the thought process for me.
00:56:49 Like I need to express, and if you take that,
00:56:51 if you take my Emacs key bindings away, I become…
00:56:57 I can’t express myself.
00:56:58 I’m the same way with the,
00:56:59 I don’t know if you know what it is,
00:57:01 but it’s a Kinesis keyboard, which is this butt shaped keyboard.
00:57:05 Yes, I’ve seen them.
00:57:06 They’re very, I don’t know, sexy, elegant?
00:57:10 They’re just beautiful.
00:57:11 Yeah, they’re gorgeous, way too expensive.
00:57:14 But the problem with them, similar with Emacs,
00:57:19 is once you learn to use it.
00:57:23 It’s harder to use other things.
00:57:24 It’s hard to use other things.
00:57:26 There’s this absurd thing where I have like small, elegant,
00:57:29 lightweight, beautiful little laptops
00:57:31 and I’m sitting there in a coffee shop
00:57:33 with a giant Kinesis keyboard and a sexy little laptop.
00:57:36 It’s absurd, but I used to feel bad about it,
00:57:40 but at the same time, you just kind of have to,
00:57:42 sometimes it’s back to the Billy Joel thing.
00:57:44 You just have to throw that Billy Joel record
00:57:47 and throw Taylor Swift and Justin Bieber to the wind.
00:57:51 So…
00:57:52 See, but I like them now because again,
00:57:54 I have no musical taste.
00:57:55 Like now that I’ve heard Justin Bieber enough,
00:57:57 I’m like, I really like his songs.
00:57:59 And Taylor Swift, not only do I like her songs,
00:58:02 but my daughter’s convinced that she’s a genius.
00:58:04 And so now I basically have signed onto that.
00:58:07 So…
00:58:08 So yeah, that speaks to the,
00:58:10 back to the robustness of the human brain.
00:58:11 That speaks to the neuroplasticity
00:58:13 that you can just like a mouse teach yourself to,
00:58:17 or probably a dog teach yourself to enjoy Taylor Swift.
00:58:21 I’ll try it out.
00:58:22 I don’t know.
00:58:23 I try, you know what?
00:58:25 It has to do with just like acclimation, right?
00:58:28 Just like you said, a couple of weeks.
00:58:29 Yeah.
00:58:30 That’s an interesting experiment.
00:58:31 I’ll actually try that.
00:58:32 Like I’ll listen to it.
00:58:33 That wasn’t the intent of the experiment?
00:58:33 Just like social media,
00:58:34 it wasn’t intended as an experiment
00:58:36 to see what we can take as a society,
00:58:38 but it turned out that way.
00:58:39 I don’t think I’ll be the same person
00:58:40 on the other side of the week listening to Taylor Swift,
00:58:43 but let’s try.
00:58:44 No, it’s more compartmentalized.
00:58:45 Don’t be so worried.
00:58:46 Like it’s, like I get that you can be worried,
00:58:48 but don’t be so worried
00:58:49 because we compartmentalize really well.
00:58:51 And so it won’t bleed into other parts of your life.
00:58:53 You won’t start, I don’t know,
00:58:56 wearing red lipstick or whatever.
00:58:57 Like it’s fine.
00:58:58 It’s fine.
00:58:59 It changed fashion and everything.
00:58:59 It’s fine.
00:59:00 But you know what?
00:59:01 The thing you have to watch out for
00:59:02 is you’ll walk into a coffee shop
00:59:03 once we can do that again.
00:59:05 And recognize the song?
00:59:06 And you’ll be, no,
00:59:07 you won’t know that you’re singing along
00:59:09 until everybody in the coffee shop is looking at you.
00:59:11 And then you’re like, that wasn’t me.
00:59:16 Yeah, that’s the, you know,
00:59:17 people are afraid of AGI.
00:59:18 I’m afraid of the Taylor Swift.
00:59:21 The Taylor Swift takeover.
00:59:22 Yeah, and I mean, people should know that TD Gammon was,
00:59:26 I get, would you call it,
00:59:28 do you like the terminology of self play by any chance?
00:59:31 So like systems that learn by playing themselves.
00:59:35 Just, I don’t know if it’s the best word, but.
00:59:38 So what’s the problem with that term?
00:59:41 I don’t know.
00:59:42 So it’s like the big bang,
00:59:43 like it’s like talking to a serious physicist.
00:59:46 Do you like the term big bang?
00:59:47 And when it was early,
00:59:49 I feel like it’s the early days of self play.
00:59:51 I don’t know, maybe it was used previously,
00:59:53 but I think it’s been used by only a small group of people.
00:59:57 And so like, I think we’re still deciding
00:59:59 is this ridiculously silly name a good name
01:00:02 for potentially one of the most important concepts
01:00:05 in artificial intelligence?
01:00:07 Okay, it depends how broadly you apply the term.
01:00:09 So I used the term in my 1996 PhD dissertation.
01:00:12 Wow, the actual terms of self play.
01:00:14 Yeah, because Tesoro’s paper was something like
01:00:18 training up an expert backgammon player through self play.
01:00:21 So I think it was in the title of his paper.
01:00:24 If not in the title, it was definitely a term that he used.
01:00:27 There’s another term that we got from that work is rollout.
01:00:29 So I don’t know if you, do you ever hear the term rollout?
01:00:32 That’s a backgammon term that has now applied
01:00:35 generally in computers, well, at least in AI
01:00:38 because of TD gammon.
01:00:40 That’s fascinating.
01:00:41 So how is self play being used now?
01:00:43 And like, why is it,
01:00:44 does it feel like a more general powerful concept
01:00:46 is sort of the idea of,
01:00:47 well, the machine’s just gonna teach itself to be smart.
01:00:50 Yeah, so that’s where maybe you can correct me,
01:00:53 but that’s where the continuation of the spirit
01:00:56 and actually like literally the exact algorithms
01:01:00 of TD gammon are applied by DeepMind and OpenAI
01:01:03 to learn games that are a little bit more complex
01:01:07 that when I was learning artificial intelligence,
01:01:09 Go was presented to me
01:01:10 with artificial intelligence, the modern approach.
01:01:13 I don’t know if they explicitly pointed to Go
01:01:16 in those books as like unsolvable kind of thing,
01:01:20 like implying that these approaches hit their limit
01:01:24 in this, with these particular kind of games.
01:01:26 So something, I don’t remember if the book said it or not,
01:01:29 but something in my head,
01:01:31 or if it was the professors instilled in me the idea
01:01:34 like this is the limits of artificial intelligence
01:01:37 of the field.
01:01:38 Like it instilled in me the idea
01:01:40 that if we can create a system that can solve the game of Go
01:01:44 we’ve achieved AGI.
01:01:46 That was kind of, I didn’t explicitly like say this,
01:01:49 but that was the feeling.
01:01:51 And so from, I was one of the people that it seemed magical
01:01:54 when a learning system was able to beat
01:01:59 a human world champion at the game of Go
01:02:02 and even more so from that, that was AlphaGo,
01:02:06 even more so with AlphaGo Zero
01:02:08 than kind of renamed and advanced into AlphaZero
01:02:11 beating a world champion or world class player
01:02:16 without any supervised learning on expert games.
01:02:21 We’re doing only through by playing itself.
01:02:24 So that is, I don’t know what to make of it.
01:02:29 I think it would be interesting to hear
01:02:31 what your opinions are on just how exciting,
01:02:35 surprising, profound, interesting, or boring
01:02:40 the breakthrough performance of AlphaZero was.
01:02:45 Okay, so AlphaGo knocked my socks off.
01:02:48 That was so remarkable.
01:02:50 Which aspect of it?
01:02:52 That they got it to work,
01:02:55 that they actually were able to leverage
01:02:57 a whole bunch of different ideas,
01:02:58 integrate them into one giant system.
01:03:01 Just the software engineering aspect of it is mind blowing.
01:03:04 I don’t, I’ve never been a part of a program
01:03:06 as complicated as the program that they built for that.
01:03:09 And just the, like Jerry Tesaro is a neural net whisperer,
01:03:14 like David Silver is a kind of neural net whisperer too.
01:03:17 He was able to coax these networks
01:03:19 and these new way out there architectures
01:03:22 to do these, solve these problems that,
01:03:25 as you said, when we were learning from AI,
01:03:31 no one had an idea how to make it work.
01:03:32 It was remarkable that these techniques
01:03:35 that were so good at playing chess
01:03:40 and that could beat the world champion in chess
01:03:42 couldn’t beat your typical Go playing teenager in Go.
01:03:46 So the fact that in a very short number of years,
01:03:49 we kind of ramped up to trouncing people in Go
01:03:54 just blew me away.
01:03:55 So you’re kind of focusing on the engineering aspect,
01:03:58 which is also very surprising.
01:04:00 I mean, there’s something different
01:04:02 about large, well funded companies.
01:04:05 I mean, there’s a compute aspect to it too.
01:04:07 Like that, of course, I mean, that’s similar
01:04:11 to Deep Blue, right, with IBM.
01:04:14 Like there’s something important to be learned
01:04:16 and remembered about a large company
01:04:19 taking the ideas that are already out there
01:04:22 and investing a few million dollars into it or more.
01:04:26 And so you’re kind of saying the engineering
01:04:29 is kind of fascinating, both on the,
01:04:32 with AlphaGo is probably just gathering all the data,
01:04:35 right, of the expert games, like organizing everything,
01:04:38 actually doing distributed supervised learning.
01:04:42 And to me, see the engineering I kind of took for granted,
01:04:49 to me philosophically being able to persist
01:04:55 in the face of like long odds,
01:04:57 because it feels like for me,
01:05:00 I would be one of the skeptical people in the room
01:05:02 thinking that you can learn your way to beat Go.
01:05:05 Like it sounded like, especially with David Silver,
01:05:08 it sounded like David was not confident at all.
01:05:11 So like it was, like not,
01:05:15 it’s funny how confidence works.
01:05:18 It’s like, you’re not like cocky about it, like, but.
01:05:24 Right, because if you’re cocky about it,
01:05:26 you kind of stop and stall and don’t get anywhere.
01:05:28 But there’s like a hope that’s unbreakable.
01:05:31 Maybe that’s better than confidence.
01:05:33 It’s a kind of wishful hope and a little dream.
01:05:36 And you almost don’t want to do anything else.
01:05:38 You kind of keep doing it.
01:05:40 That’s, that seems to be the story and.
01:05:43 But with enough skepticism that you’re looking
01:05:45 for where the problems are and fighting through them.
01:05:48 Cause you know, there’s gotta be a way out of this thing.
01:05:51 And for him, it was probably,
01:05:52 there’s a bunch of little factors that come into play.
01:05:55 It’s funny how these stories just all come together.
01:05:57 Like everything he did in his life came into play,
01:06:00 which is like a love for video games
01:06:02 and also a connection to,
01:06:05 so the nineties had to happen with TD Gammon and so on.
01:06:09 In some ways it’s surprising,
01:06:10 maybe you can provide some intuition to it
01:06:13 that not much more than TD Gammon was done
01:06:16 for quite a long time on the reinforcement learning front.
01:06:19 Is that weird to you?
01:06:21 I mean, like I said, the students who I worked with,
01:06:24 we tried to get, basically apply that architecture
01:06:27 to other problems and we consistently failed.
01:06:30 There were a couple of really nice demonstrations
01:06:33 that ended up being in the literature.
01:06:35 There was a paper about controlling elevators, right?
01:06:38 Where it’s like, okay, can we modify the heuristic
01:06:42 that elevators use for deciding,
01:06:43 like a bank of elevators for deciding which floors
01:06:46 we should be stopping on to maximize throughput essentially.
01:06:50 And you can set that up as a reinforcement learning problem
01:06:52 and you can have a neural net represent the value function
01:06:55 so that it’s taking where all the elevators,
01:06:57 where the button pushes, you know, this high dimensional,
01:07:00 well, at the time high dimensional input,
01:07:03 you know, a couple of dozen dimensions
01:07:05 and turn that into a prediction as to,
01:07:07 oh, is it gonna be better if I stop at this floor or not?
01:07:10 And ultimately it appeared as though
01:07:13 for the standard simulation distribution
01:07:16 for people trying to leave the building
01:07:18 at the end of the day,
01:07:19 that the neural net learned a better strategy
01:07:21 than the standard one that’s implemented
01:07:22 in elevator controllers.
01:07:24 So that was nice.
01:07:26 There was some work that Satyendra Singh et al
01:07:28 did on handoffs with cell phones,
01:07:34 you know, deciding when should you hand off
01:07:36 from this cell tower to this cell tower.
01:07:38 Oh, okay, communication networks, yeah.
01:07:39 Yeah, and so a couple of things
01:07:42 seemed like they were really promising.
01:07:44 None of them made it into production that I’m aware of.
01:07:46 And neural nets as a whole started
01:07:48 to kind of implode around then.
01:07:50 And so there just wasn’t a lot of air in the room
01:07:53 for people to try to figure out,
01:07:55 okay, how do we get this to work in the RL setting?
01:07:58 And then they found their way back in 10 plus years.
01:08:03 So you said AlphaGo was impressive,
01:08:05 like it’s a big spectacle.
01:08:06 Is there, is that?
01:08:07 Right, so then AlphaZero.
01:08:09 So I think I may have a slightly different opinion
01:08:11 on this than some people.
01:08:12 So I talked to Satyendra Singh in particular about this.
01:08:15 So Satyendra was like Rich Sutton,
01:08:18 a student of Andy Bartow.
01:08:19 So they came out of the same lab,
01:08:21 very influential machine learning,
01:08:23 reinforcement learning researcher.
01:08:26 Now at DeepMind, as is Rich.
01:08:29 Though different sites, the two of them.
01:08:31 He’s in Alberta.
01:08:33 Rich is in Alberta and Satyendra would be in England,
01:08:36 but I think he’s in England from Michigan at the moment.
01:08:39 But the, but he was, yes,
01:08:41 he was much more impressed with AlphaGo Zero,
01:08:46 which is didn’t get a kind of a bootstrap
01:08:50 in the beginning with human trained games.
01:08:51 It just was purely self play.
01:08:53 Though the first one AlphaGo
01:08:55 was also a tremendous amount of self play, right?
01:08:58 They started off, they kickstarted the action network
01:09:01 that was making decisions,
01:09:02 but then they trained it for a really long time
01:09:04 using more traditional temporal difference methods.
01:09:08 So as a result, I didn’t,
01:09:09 it didn’t seem that different to me.
01:09:11 Like, it seems like, yeah, why wouldn’t that work?
01:09:15 Like once you, once it works, it works.
01:09:17 So what, but he found that removal
01:09:21 of that extra information to be breathtaking.
01:09:23 Like that’s a game changer.
01:09:25 To me, the first thing was more of a game changer.
01:09:27 But the open question, I mean,
01:09:29 I guess that’s the assumption is the expert games
01:09:32 might contain within them a humongous amount of information.
01:09:39 But we know that it went beyond that, right?
01:09:41 We know that it somehow got away from that information
01:09:43 because it was learning strategies.
01:09:45 I don’t think AlphaGo is just better
01:09:48 at implementing human strategies.
01:09:50 I think it actually developed its own strategies
01:09:52 that were more effective.
01:09:54 And so from that perspective, okay, well,
01:09:56 so it made at least one quantum leap
01:10:00 in terms of strategic knowledge.
01:10:02 Okay, so now maybe it makes three, like, okay.
01:10:05 But that first one is the doozy, right?
01:10:07 Getting it to work reliably and for the networks
01:10:11 to hold onto the value well enough.
01:10:13 Like that was a big step.
01:10:16 Well, maybe you could speak to this
01:10:17 on the reinforcement learning front.
01:10:19 So starting from scratch and learning to do something,
01:10:25 like the first like random behavior
01:10:29 to like crappy behavior to like somewhat okay behavior.
01:10:34 It’s not obvious to me that that’s not like impossible
01:10:39 to take those steps.
01:10:41 Like if you just think about the intuition,
01:10:43 like how the heck does random behavior
01:10:46 become somewhat basic intelligent behavior?
01:10:51 Not human level, not superhuman level, but just basic.
01:10:55 But you’re saying to you kind of the intuition is like,
01:10:58 if you can go from human to superhuman level intelligence
01:11:01 on this particular task of game playing,
01:11:04 then so you’re good at taking leaps.
01:11:07 So you can take many of them.
01:11:08 That the system, I believe that the system
01:11:10 can take that kind of leap.
01:11:12 Yeah, and also I think that beginner knowledge in go,
01:11:17 like you can start to get a feel really quickly
01:11:19 for the idea that being in certain parts of the board
01:11:25 seems to be more associated with winning, right?
01:11:28 Cause it’s not stumbling upon the concept of winning.
01:11:32 It’s told that it wins or that it loses.
01:11:34 Well, it’s self play.
01:11:35 So it both wins and loses.
01:11:36 It’s told which side won.
01:11:39 And the information is kind of there
01:11:41 to start percolating around to make a difference as to,
01:11:46 well, these things have a better chance of helping you win.
01:11:48 And these things have a worse chance of helping you win.
01:11:50 And so it can get to basic play, I think pretty quickly.
01:11:54 Then once it has basic play,
01:11:55 well now it’s kind of forced to do some search
01:11:58 to actually experiment with, okay,
01:12:00 well what gets me that next increment of improvement?
01:12:04 How far do you think, okay, this is where you kind of
01:12:07 bring up the Elon Musk and the Sam Harris, right?
01:12:10 How far is your intuition about these kinds
01:12:13 of self play mechanisms being able to take us?
01:12:16 Cause it feels, one of the ominous but stated calmly things
01:12:23 that when I talked to David Silver, he said,
01:12:25 is that they have not yet discovered a ceiling
01:12:29 for Alpha Zero, for example, in the game of Go or chess.
01:12:32 Like it keeps, no matter how much they compute,
01:12:35 they throw at it, it keeps improving.
01:12:37 So it’s possible, it’s very possible that if you throw,
01:12:43 you know, some like 10 X compute that it will improve
01:12:46 by five X or something like that.
01:12:48 And when stated calmly, it’s so like, oh yeah, I guess so.
01:12:54 But like, and then you think like,
01:12:56 well, can we potentially have like continuations
01:13:00 of Moore’s law in totally different way,
01:13:02 like broadly defined Moore’s law,
01:13:04 not the exponential improvement, like,
01:13:08 are we going to have an Alpha Zero that swallows the world?
01:13:13 But notice it’s not getting better at other things.
01:13:15 It’s getting better at Go.
01:13:16 And I think that’s a big leap to say,
01:13:19 okay, well, therefore it’s better at other things.
01:13:22 Well, I mean, the question is how much of the game of life
01:13:26 can be turned into.
01:13:27 Right, so that I think is a really good question.
01:13:30 And I think that we don’t, I don’t think we as a,
01:13:32 I don’t know, community really know the answer to this,
01:13:34 but so, okay, so I went to a talk
01:13:39 by some experts on computer chess.
01:13:43 So in particular, computer chess is really interesting
01:13:45 because for, of course, for a thousand years,
01:13:49 humans were the best chess playing things on the planet.
01:13:52 And then computers like edged ahead of the best person.
01:13:56 And they’ve been ahead ever since.
01:13:57 It’s not like people have overtaken computers.
01:14:01 But computers and people together
01:14:05 have overtaken computers.
01:14:07 So at least last time I checked,
01:14:09 I don’t know what the very latest is,
01:14:10 but last time I checked that there were teams of people
01:14:14 who could work with computer programs
01:14:16 to defeat the best computer programs.
01:14:17 In the game of Go?
01:14:18 In the game of chess.
01:14:19 In the game of chess.
01:14:20 Right, and so using the information about how,
01:14:25 these things called ELO scores,
01:14:27 this sort of notion of how strong a player are you.
01:14:30 There’s kind of a range of possible scores.
01:14:32 And you increment in score,
01:14:35 basically if you can beat another player
01:14:37 of that lower score 62% of the time or something like that.
01:14:41 Like there’s some threshold
01:14:42 of if you can somewhat consistently beat someone,
01:14:46 then you are of a higher score than that person.
01:14:48 And there’s a question as to how many times
01:14:50 can you do that in chess, right?
01:14:52 And so we know that there’s a range of human ability levels
01:14:55 that cap out with the best playing humans.
01:14:57 And the computers went a step beyond that.
01:15:00 And computers and people together have not gone,
01:15:03 I think a full step beyond that.
01:15:05 It feels, the estimates that they have
01:15:07 is that it’s starting to asymptote.
01:15:09 That we’ve reached kind of the maximum,
01:15:11 the best possible chess playing.
01:15:13 And so that means that there’s kind of
01:15:15 a finite strategic depth, right?
01:15:18 At some point you just can’t get any better at this game.
01:15:21 Yeah, I mean, I don’t, so I’ll actually check that.
01:15:25 I think it’s interesting because if you have somebody
01:15:29 like Magnus Carlsen, who’s using these chess programs
01:15:34 to train his mind, like to learn about chess.
01:15:37 To become a better chess player, yeah.
01:15:38 And so like, that’s a very interesting thing
01:15:41 because we’re not static creatures.
01:15:43 We’re learning together.
01:15:45 I mean, just like we’re talking about social networks,
01:15:47 those algorithms are teaching us
01:15:49 just like we’re teaching those algorithms.
01:15:51 So that’s a fascinating thing.
01:15:52 But I think the best chess playing programs
01:15:57 are now better than the pairs.
01:15:58 Like they have competition between pairs,
01:16:00 but it’s still, even if they weren’t,
01:16:03 it’s an interesting question, where’s the ceiling?
01:16:06 So the David, the ominous David Silver kind of statement
01:16:09 is like, we have not found the ceiling.
01:16:12 Right, so the question is, okay,
01:16:14 so I don’t know his analysis on that.
01:16:16 My, from talking to Go experts,
01:16:20 the depth, the strategic depth of Go
01:16:22 seems to be substantially greater than that of chess.
01:16:25 That there’s more kind of steps of improvement
01:16:27 that you can make, getting better and better
01:16:29 and better and better.
01:16:30 But there’s no reason to think that it’s infinite.
01:16:32 Infinite, yeah.
01:16:33 And so it could be that what David is seeing
01:16:37 is a kind of asymptoting that you can keep getting better,
01:16:39 but with diminishing returns.
01:16:41 And at some point you hit optimal play.
01:16:43 Like in theory, all these finite games, they’re finite.
01:16:47 They have an optimal strategy.
01:16:49 There’s a strategy that is the minimax optimal strategy.
01:16:51 And so at that point, you can’t get any better.
01:16:54 You can’t beat that strategy.
01:16:56 Now that strategy may be,
01:16:58 from an information processing perspective, intractable.
01:17:02 Right, you need, all the situations
01:17:06 are sufficiently different that you can’t compress it at all.
01:17:08 It’s this giant mess of hardcoded rules.
01:17:12 And we can never achieve that.
01:17:14 But that still puts a cap on how many levels of improvement
01:17:17 that we can actually make.
01:17:19 But the thing about self play is if you put it,
01:17:23 although I don’t like doing that,
01:17:24 in the broader category of self supervised learning,
01:17:28 is that it doesn’t require too much or any human input.
01:17:31 Human labeling, yeah.
01:17:32 Yeah, human label or just human effort.
01:17:34 The human involvement passed a certain point.
01:17:37 And the same thing you could argue is true
01:17:41 for the recent breakthroughs in natural language processing
01:17:44 with language models.
01:17:45 Oh, this is how you get to GPT3.
01:17:47 Yeah, see how that did the.
01:17:49 That was a good transition.
01:17:51 Yeah, I practiced that for days leading up to this now.
01:17:56 But like that’s one of the questions is,
01:17:59 can we find ways to formulate problems in this world
01:18:03 that are important to us humans,
01:18:05 like more important than the game of chess,
01:18:08 that to which self supervised kinds of approaches
01:18:12 could be applied?
01:18:13 Whether it’s self play, for example,
01:18:15 for like maybe you could think of like autonomous vehicles
01:18:19 in simulation, that kind of stuff,
01:18:22 or just robotics applications and simulation,
01:18:25 or in the self supervised learning,
01:18:29 where unannotated data,
01:18:33 or data that’s generated by humans naturally
01:18:37 without extra costs, like Wikipedia,
01:18:41 or like all of the internet can be used
01:18:44 to learn something about,
01:18:46 to create intelligent systems that do something
01:18:49 really powerful, that pass the Turing test,
01:18:52 or that do some kind of superhuman level performance.
01:18:56 So what’s your intuition,
01:18:58 like trying to stitch all of it together
01:19:01 about our discussion of AGI,
01:19:05 the limits of self play,
01:19:07 and your thoughts about maybe the limits of neural networks
01:19:10 in the context of language models.
01:19:13 Is there some intuition in there
01:19:14 that might be useful to think about?
01:19:17 Yeah, yeah, yeah.
01:19:17 So first of all, the whole Transformer network
01:19:22 family of things is really cool.
01:19:26 It’s really, really cool.
01:19:28 I mean, if you’ve ever,
01:19:30 back in the day you played with,
01:19:31 I don’t know, Markov models for generating texts,
01:19:34 and you’ve seen the kind of texts that they spit out,
01:19:35 and you compare it to what’s happening now,
01:19:37 it’s amazing, it’s so amazing.
01:19:41 Now, it doesn’t take very long interacting
01:19:43 with one of these systems before you find the holes, right?
01:19:47 It’s not smart in any kind of general way.
01:19:53 It’s really good at a bunch of things.
01:19:55 And it does seem to understand
01:19:56 a lot of the statistics of language extremely well.
01:19:59 And that turns out to be very powerful.
01:20:01 You can answer many questions with that.
01:20:04 But it doesn’t make it a good conversationalist, right?
01:20:06 And it doesn’t make it a good storyteller.
01:20:08 It just makes it good at imitating
01:20:10 of things that is seen in the past.
01:20:12 The exact same thing could be said
01:20:14 by people who are voting for Donald Trump
01:20:16 about Joe Biden supporters,
01:20:18 and people voting for Joe Biden
01:20:19 about Donald Trump supporters is, you know.
01:20:22 That they’re not intelligent, they’re just following the.
01:20:25 Yeah, they’re following things they’ve seen in the past.
01:20:27 And it doesn’t take long to find the flaws
01:20:31 in their natural language generation abilities.
01:20:36 Yes, yes.
01:20:37 So we’re being very.
01:20:38 That’s interesting.
01:20:39 Critical of AI systems.
01:20:41 Right, so I’ve had a similar thought,
01:20:43 which was that the stories that GPT3 spits out
01:20:48 are amazing and very humanlike.
01:20:52 And it doesn’t mean that computers are smarter
01:20:55 than we realize necessarily.
01:20:57 It partly means that people are dumber than we realize.
01:21:00 Or that much of what we do day to day is not that deep.
01:21:04 Like we’re just kind of going with the flow.
01:21:07 We’re saying whatever feels like the natural thing
01:21:09 to say next.
01:21:10 Not a lot of it is creative or meaningful or intentional.
01:21:17 But enough is that we actually get by, right?
01:21:20 We do come up with new ideas sometimes,
01:21:22 and we do manage to talk each other into things sometimes.
01:21:24 And we do sometimes vote for reasonable people sometimes.
01:21:29 But it’s really hard to see in the statistics
01:21:32 because so much of what we’re saying is kind of rote.
01:21:35 And so our metrics that we use to measure
01:21:38 how these systems are doing don’t reveal that
01:21:41 because it’s in the interstices that is very hard to detect.
01:21:47 But is your, do you have an intuition
01:21:49 that with these language models, if they grow in size,
01:21:53 it’s already surprising when you go from GPT2 to GPT3
01:21:57 that there is a noticeable improvement.
01:21:59 So the question now goes back to the ominous David Silver
01:22:02 and the ceiling.
01:22:03 Right, so maybe there’s just no ceiling.
01:22:04 We just need more compute.
01:22:06 Now, I mean, okay, so now I’m speculating.
01:22:10 Yes.
01:22:11 As opposed to before when I was completely on firm ground.
01:22:13 All right, I don’t believe that you can get something
01:22:17 that really can do language and use language as a thing
01:22:21 that doesn’t interact with people.
01:22:24 Like I think that it’s not enough
01:22:25 to just take everything that we’ve said written down
01:22:28 and just say, that’s enough.
01:22:29 You can just learn from that and you can be intelligent.
01:22:32 I think you really need to be pushed back at.
01:22:35 I think that conversations,
01:22:36 even people who are pretty smart,
01:22:38 maybe the smartest thing that we know,
01:22:40 maybe not the smartest thing we can imagine,
01:22:43 but we get so much benefit
01:22:44 out of talking to each other and interacting.
01:22:48 That’s presumably why you have conversations live with guests
01:22:51 is that there’s something in that interaction
01:22:53 that would not be exposed by,
01:22:55 oh, I’ll just write you a story
01:22:57 and then you can read it later.
01:22:58 And I think because these systems
01:23:00 are just learning from our stories,
01:23:01 they’re not learning from being pushed back at by us,
01:23:05 that they’re fundamentally limited
01:23:06 into what they can actually become on this route.
01:23:08 They have to get shut down.
01:23:12 Like we have to have an argument,
01:23:14 they have to have an argument with us
01:23:15 and lose a couple of times
01:23:17 before they start to realize, oh, okay, wait,
01:23:20 there’s some nuance here that actually matters.
01:23:23 Yeah, that’s actually subtle sounding,
01:23:25 but quite profound that the interaction with humans
01:23:30 is essential and the limitation within that
01:23:34 is profound as well because the timescale,
01:23:37 like the bandwidth at which you can really interact
01:23:40 with humans is very low.
01:23:43 So it’s costly.
01:23:44 So you can’t, one of the underlying things about self plays,
01:23:47 it has to do a very large number of interactions.
01:23:53 And so you can’t really deploy reinforcement learning systems
01:23:56 into the real world to interact.
01:23:58 Like you couldn’t deploy a language model
01:24:01 into the real world to interact with humans
01:24:04 because it was just not getting enough data
01:24:06 relative to the cost it takes to interact.
01:24:09 Like the time of humans is expensive,
01:24:12 which is really interesting.
01:24:13 That takes us back to reinforcement learning
01:24:16 and trying to figure out if there’s ways
01:24:18 to make algorithms that are more efficient at learning,
01:24:22 keep the spirit in reinforcement learning
01:24:24 and become more efficient.
01:24:26 In some sense, that seems to be the goal.
01:24:28 I’d love to hear what your thoughts are.
01:24:31 I don’t know if you got a chance to see
01:24:33 the blog post called Bitter Lesson.
01:24:35 Oh yes.
01:24:37 By Rich Sutton that makes an argument,
01:24:39 hopefully I can summarize it.
01:24:41 Perhaps you can.
01:24:43 Yeah, but do you want?
01:24:44 Okay.
01:24:45 So I mean, I could try and you can correct me,
01:24:47 which is he makes an argument that it seems
01:24:50 if we look at the long arc of the history
01:24:52 of the artificial intelligence field,
01:24:55 he calls 70 years that the algorithms
01:24:58 from which we’ve seen the biggest improvements in practice
01:25:02 are the very simple, like dumb algorithms
01:25:05 that are able to leverage computation.
01:25:08 And you just wait for the computation to improve.
01:25:11 Like all of the academics and so on have fun
01:25:13 by finding little tricks
01:25:15 and congratulate themselves on those tricks.
01:25:17 And sometimes those tricks can be like big,
01:25:20 that feel in the moment like big spikes and breakthroughs,
01:25:22 but in reality over the decades,
01:25:25 it’s still the same dumb algorithm
01:25:27 that just waits for the compute to get faster and faster.
01:25:31 Do you find that to be an interesting argument
01:25:36 against the entirety of the field of machine learning
01:25:39 as an academic discipline?
01:25:41 That we’re really just a subfield of computer architecture.
01:25:44 We’re just kind of waiting around
01:25:45 for them to do their next thing.
01:25:46 Who really don’t want to do hardware work.
01:25:48 So like.
01:25:48 That’s right.
01:25:49 I really don’t want to think about it.
01:25:50 We’re procrastinating.
01:25:51 Yes, that’s right, just waiting for them to do their jobs
01:25:53 so that we can pretend to have done ours.
01:25:55 So yeah, I mean, the argument reminds me a lot of,
01:26:00 I think it was a Fred Jelinek quote,
01:26:02 early computational linguist who said,
01:26:04 we’re building these computational linguistic systems
01:26:07 and every time we fire a linguist performance goes up
01:26:11 by 10%, something like that.
01:26:13 And so the idea of us building the knowledge in,
01:26:16 in that case was much less,
01:26:19 he was finding it to be much less successful
01:26:20 than get rid of the people who know about language as a,
01:26:25 from a kind of scholastic academic kind of perspective
01:26:29 and replace them with more compute.
01:26:32 And so I think this is kind of a modern version
01:26:34 of that story, which is, okay,
01:26:35 we want to do better on machine vision.
01:26:38 You could build in all these,
01:26:41 motivated part based models that,
01:26:45 that just feel like obviously the right thing
01:26:47 that you have to have,
01:26:48 or we can throw a lot of data at it
01:26:49 and guess what we’re doing better with a lot of data.
01:26:52 So I hadn’t thought about it until this moment in this way,
01:26:57 but what I believe, well, I’ve thought about what I believe.
01:27:00 What I believe is that, you know, compositionality
01:27:05 and what’s the right way to say it,
01:27:08 the complexity grows rapidly
01:27:12 as you consider more and more possibilities,
01:27:14 like explosively.
01:27:16 And so far Moore’s law has also been growing explosively
01:27:20 exponentially.
01:27:21 And so it really does seem like, well,
01:27:23 we don’t have to think really hard about the algorithm
01:27:27 design or the way that we build the systems,
01:27:29 because the best benefit we could get is exponential.
01:27:32 And the best benefit that we can get from waiting
01:27:34 is exponential.
01:27:35 So we can just wait.
01:27:38 It’s got, that’s gotta end, right?
01:27:39 And there’s hints now that,
01:27:41 that Moore’s law is starting to feel some friction,
01:27:44 starting to, the world is pushing back a little bit.
01:27:48 One thing that I don’t know, do lots of people know this?
01:27:50 I didn’t know this, I was trying to write an essay
01:27:54 and yeah, Moore’s law has been amazing
01:27:56 and it’s enabled all sorts of things,
01:27:58 but there’s also a kind of counter Moore’s law,
01:28:01 which is that the development cost
01:28:03 for each successive generation of chips also is doubling.
01:28:07 So it’s costing twice as much money.
01:28:09 So the amount of development money per cycle or whatever
01:28:12 is actually sort of constant.
01:28:14 And at some point we run out of money.
01:28:17 So, or we have to come up with an entirely different way
01:28:19 of doing the development process.
01:28:22 So like, I guess I always a bit skeptical of the look,
01:28:25 it’s an exponential curve, therefore it has no end.
01:28:28 Soon the number of people going to NeurIPS
01:28:30 will be greater than the population of the earth.
01:28:32 That means we’re gonna discover life on other planets.
01:28:35 No, it doesn’t.
01:28:36 It means that we’re in a sigmoid curve on the front half,
01:28:40 which looks a lot like an exponential.
01:28:42 The second half is gonna look a lot like diminishing returns.
01:28:46 Yeah, I mean, but the interesting thing about Moore’s law,
01:28:48 if you actually like look at the technologies involved,
01:28:52 it’s hundreds, if not thousands of S curves
01:28:55 stacked on top of each other.
01:28:56 It’s not actually an exponential curve,
01:28:58 it’s constant breakthroughs.
01:29:01 And then what becomes useful to think about,
01:29:04 which is exactly what you’re saying,
01:29:05 the cost of development, like the size of teams,
01:29:08 the amount of resources that are invested
01:29:10 in continuing to find new S curves, new breakthroughs.
01:29:14 And yeah, it’s an interesting idea.
01:29:19 If we live in the moment, if we sit here today,
01:29:22 it seems to be the reasonable thing
01:29:25 to say that exponentials end.
01:29:29 And yet in the software realm,
01:29:31 they just keep appearing to be happening.
01:29:34 And it’s so, I mean, it’s so hard to disagree
01:29:39 with Elon Musk on this.
01:29:41 Because it like, I’ve, you know,
01:29:45 I used to be one of those folks,
01:29:47 I’m still one of those folks that studied
01:29:49 autonomous vehicles, that’s what I worked on.
01:29:52 And it’s like, you look at what Elon Musk is saying
01:29:56 about autonomous vehicles, well, obviously,
01:29:58 in a couple of years, or in a year, or next month,
01:30:01 we’ll have fully autonomous vehicles.
01:30:03 Like there’s no reason why we can’t.
01:30:04 Driving is pretty simple, like it’s just a learning problem
01:30:07 and you just need to convert all the driving
01:30:11 that we’re doing into data and just having you all know
01:30:13 with the trains on that data.
01:30:14 And like, we use only our eyes, so you can use cameras
01:30:18 and you can train on it.
01:30:20 And it’s like, yeah, that should work.
01:30:26 And then you put that hat on, like the philosophical hat,
01:30:29 and but then you put the pragmatic hat and it’s like,
01:30:31 this is what the flaws of computer vision are.
01:30:33 Like, this is what it means to train at scale.
01:30:35 And then you put the human factors, the psychology hat on,
01:30:40 which is like, it’s actually driving us a lot,
01:30:43 the cognitive science or cognitive,
01:30:44 whatever the heck you call it, it’s really hard,
01:30:48 it’s much harder to drive than we realize,
01:30:50 there’s a much larger number of edge cases.
01:30:53 So building up an intuition around this is,
01:30:57 around exponentials is really difficult.
01:30:59 And on top of that, the pandemic is making us think
01:31:03 about exponentials, making us realize that like,
01:31:06 we don’t understand anything about it,
01:31:08 we’re not able to intuit exponentials,
01:31:11 we’re either ultra terrified, some part of the population
01:31:15 and some part is like the opposite of whatever
01:31:20 the different carefree and we’re not managing it very well.
01:31:24 Blase, well, wow, is that French?
01:31:28 I assume so, it’s got an accent.
01:31:29 So it’s fascinating to think what the limits
01:31:35 of this exponential growth of technology,
01:31:41 not just Moore’s law, it’s technology,
01:31:44 how that rubs up against the bitter lesson
01:31:49 and GPT three and self play mechanisms.
01:31:53 Like it’s not obvious, I used to be much more skeptical
01:31:56 about neural networks.
01:31:58 Now I at least give a slither of possibility
01:32:00 that we’ll be very much surprised
01:32:04 and also caught in a way that like,
01:32:10 we are not prepared for.
01:32:14 Like in applications of social networks, for example,
01:32:19 cause it feels like really good transformer models
01:32:23 that are able to do some kind of like very good
01:32:28 natural language generation of the same kind of models
01:32:31 that can be used to learn human behavior
01:32:33 and then manipulate that human behavior
01:32:35 to gain advertisers dollars and all those kinds of things
01:32:38 through the capitalist system.
01:32:41 And they arguably already are manipulating human behavior.
01:32:46 But not for self preservation, which I think is a big,
01:32:51 that would be a big step.
01:32:52 Like if they were trying to manipulate us
01:32:54 to convince us not to shut them off,
01:32:57 I would be very freaked out.
01:32:58 But I don’t see a path to that from where we are now.
01:33:01 They don’t have any of those abilities.
01:33:05 That’s not what they’re trying to do.
01:33:07 They’re trying to keep people on the site.
01:33:10 But see the thing is, this is the thing about life on earth
01:33:13 is they might be borrowing our consciousness
01:33:16 and sentience like, so like in a sense they do
01:33:20 because the creators of the algorithms have,
01:33:23 like they’re not, if you look at our body,
01:33:26 we’re not a single organism.
01:33:28 We’re a huge number of organisms
01:33:30 with like tiny little motivations
01:33:31 were built on top of each other.
01:33:33 In the same sense, the AI algorithms that are,
01:33:36 they’re not like.
01:33:37 It’s a system that includes companies and corporations,
01:33:40 because corporations are funny organisms
01:33:42 in and of themselves that really do seem
01:33:44 to have self preservation built in.
01:33:45 And I think that’s at the design level.
01:33:48 I think they’re designed to have self preservation
01:33:50 to be a focus.
01:33:52 So you’re right.
01:33:53 In that broader system that we’re also a part of
01:33:58 and can have some influence on,
01:34:02 it is much more complicated, much more powerful.
01:34:04 Yeah, I agree with that.
01:34:06 So people really love it when I ask,
01:34:09 what three books, technical, philosophical, fiction
01:34:13 had a big impact on your life?
01:34:14 Maybe you can recommend.
01:34:16 We went with movies, we went with Billy Joe
01:34:21 and I forgot what music you recommended, but.
01:34:24 I didn’t, I just said I have no taste in music.
01:34:26 I just like pop music.
01:34:27 That was actually really skillful
01:34:30 the way you avoided that question.
01:34:30 Thank you, thanks.
01:34:31 I’m gonna try to do the same with the books.
01:34:33 So do you have a skillful way to avoid answering
01:34:37 the question about three books you would recommend?
01:34:39 I’d like to tell you a story.
01:34:42 So my first job out of college was at Bellcore.
01:34:45 I mentioned that before, where I worked with Dave Ackley.
01:34:48 The head of the group was a guy named Tom Landauer.
01:34:50 And I don’t know how well known he’s known now,
01:34:53 but arguably he’s the inventor
01:34:56 and the first proselytizer of word embeddings.
01:34:59 So they developed a system shortly before I got to the group
01:35:04 that was called latent semantic analysis
01:35:07 that would take words of English
01:35:09 and embed them in multi hundred dimensional space
01:35:12 and then use that as a way of assessing
01:35:15 similarity and basically doing reinforcement learning,
01:35:17 I’m sorry, not reinforcement, information retrieval,
01:35:20 sort of pre Google information retrieval.
01:35:23 And he was trained as an anthropologist,
01:35:28 but then became a cognitive scientist.
01:35:29 So I was in the cognitive science research group.
01:35:32 Like I said, I’m a cognitive science groupie.
01:35:34 At the time I thought I’d become a cognitive scientist,
01:35:37 but then I realized in that group,
01:35:38 no, I’m a computer scientist,
01:35:40 but I’m a computer scientist who really loves
01:35:41 to hang out with cognitive scientists.
01:35:43 And he said, he studied language acquisition in particular.
01:35:48 He said, you know, humans have about this number of words
01:35:51 of vocabulary and most of that is learned from reading.
01:35:55 And I said, that can’t be true
01:35:57 because I have a really big vocabulary and I don’t read.
01:36:00 He’s like, you must.
01:36:01 I’m like, I don’t think I do.
01:36:03 I mean like stop signs, I definitely read stop signs,
01:36:05 but like reading books is not a thing that I do a lot of.
01:36:08 Do you really though?
01:36:09 It might be just visual, maybe the red color.
01:36:12 Do I read stop signs?
01:36:14 No, it’s just pattern recognition at this point.
01:36:15 I don’t sound it out.
01:36:19 So now I do.
01:36:21 I wonder what that, oh yeah, stop the guns.
01:36:25 So.
01:36:26 That’s fascinating.
01:36:27 So you don’t.
01:36:28 So I don’t read very, I mean, obviously I read
01:36:29 and I’ve read plenty of books,
01:36:31 but like some people like Charles,
01:36:34 my friend Charles and others,
01:36:35 like a lot of people in my field, a lot of academics,
01:36:38 like reading was really a central topic to them
01:36:42 in development and I’m not that guy.
01:36:45 In fact, I used to joke that when I got into college,
01:36:49 that it was on kind of a help out the illiterate
01:36:53 kind of program because I got to,
01:36:55 like in my house, I wasn’t a particularly bad
01:36:57 or good reader, but when I got to college,
01:36:58 I was surrounded by these people that were just voracious
01:37:01 in their reading appetite.
01:37:03 And they would like, have you read this?
01:37:04 Have you read this?
01:37:05 Have you read this?
01:37:06 And I’m like, no, I’m clearly not qualified
01:37:09 to be at this school.
01:37:10 Like there’s no way I should be here.
01:37:11 Now I’ve discovered books on tape, like audio books.
01:37:14 And so I’m much better.
01:37:17 I’m more caught up.
01:37:18 I read a lot of books.
01:37:20 The small tangent on that,
01:37:22 it is a fascinating open question to me
01:37:24 on the topic of driving.
01:37:27 Whether, you know, supervised learning people,
01:37:30 machine learning people think you have to like drive
01:37:33 to learn how to drive.
01:37:35 To me, it’s very possible that just by us humans,
01:37:40 by first of all, walking,
01:37:41 but also by watching other people drive,
01:37:44 not even being inside cars as a passenger,
01:37:46 but let’s say being inside the car as a passenger,
01:37:49 but even just like being a pedestrian and crossing the road,
01:37:53 you learn so much about driving from that.
01:37:56 It’s very possible that you can,
01:37:58 without ever being inside of a car,
01:38:01 be okay at driving once you get in it.
01:38:04 Or like watching a movie, for example.
01:38:06 I don’t know, something like that.
01:38:08 Have you taught anyone to drive?
01:38:11 No, except myself.
01:38:13 I have two children.
01:38:15 And I learned a lot about car driving
01:38:18 because my wife doesn’t want to be the one in the car
01:38:21 while they’re learning.
01:38:21 So that’s my job.
01:38:22 So I sit in the passenger seat and it’s really scary.
01:38:27 You know, I have wishes to live
01:38:30 and they’re figuring things out.
01:38:32 Now, they start off very much better
01:38:37 than I imagine like a neural network would, right?
01:38:39 They get that they’re seeing the world.
01:38:41 They get that there’s a road that they’re trying to be on.
01:38:44 They get that there’s a relationship
01:38:45 between the angle of the steering,
01:38:47 but it takes a while to not be very jerky.
01:38:51 And so that happens pretty quickly.
01:38:52 Like the ability to stay in lane at speed,
01:38:55 that happens relatively fast.
01:38:56 It’s not zero shot learning, but it’s pretty fast.
01:39:00 The thing that’s remarkably hard,
01:39:01 and this is I think partly why self driving cars
01:39:03 are really hard,
01:39:04 is the degree to which driving
01:39:06 is a social interaction activity.
01:39:09 And that blew me away.
01:39:10 I was completely unaware of it
01:39:11 until I watched my son learning to drive.
01:39:14 And I was realizing that he was sending signals
01:39:17 to all the cars around him.
01:39:19 And those in his case,
01:39:20 he’s always had social communication challenges.
01:39:25 He was sending very mixed confusing signals
01:39:28 to the other cars.
01:39:29 And that was causing the other cars
01:39:30 to drive weirdly and erratically.
01:39:32 And there was no question in my mind
01:39:34 that he would have an accident
01:39:36 because they didn’t know how to read him.
01:39:39 There’s things you do with the speed that you drive,
01:39:42 the positioning of your car,
01:39:43 that you’re constantly like in the head
01:39:46 of the other drivers.
01:39:47 And seeing him not knowing how to do that
01:39:50 and having to be taught explicitly,
01:39:52 okay, you have to be thinking
01:39:53 about what the other driver is thinking,
01:39:55 was a revelation to me.
01:39:57 I was stunned.
01:39:58 So creating kind of theories of mind of the other.
01:40:02 Theories of mind of the other cars.
01:40:04 Yeah, yeah.
01:40:05 Which I just hadn’t heard discussed
01:40:07 in the self driving car talks that I’ve been to.
01:40:09 Since then, there’s some people who do consider
01:40:13 those kinds of issues,
01:40:14 but it’s way more subtle than I think
01:40:16 there’s a little bit of work involved with that
01:40:19 when you realize like when you especially focus
01:40:21 not on other cars, but on pedestrians, for example,
01:40:24 it’s literally staring you in the face.
01:40:27 So then when you’re just like,
01:40:28 how do I interact with pedestrians?
01:40:32 Pedestrians, you’re practically talking
01:40:33 to an octopus at that point.
01:40:34 They’ve got all these weird degrees of freedom.
01:40:36 You don’t know what they’re gonna do.
01:40:37 They can turn around any second.
01:40:38 But the point is, we humans know what they’re gonna do.
01:40:42 Like we have a good theory of mind.
01:40:43 We have a good mental model of what they’re doing.
01:40:46 And we have a good model of the model they have a view
01:40:50 and the model of the model of the model.
01:40:52 Like we’re able to kind of reason about this kind of,
01:40:55 the social like game of it all.
01:40:59 The hope is that it’s quite simple actually,
01:41:03 that it could be learned.
01:41:04 That’s why I just talked to the Waymo.
01:41:06 I don’t know if you know that company.
01:41:07 It’s Google South Africa.
01:41:09 They, I talked to their CTO about this podcast
01:41:12 and they like, I rode in their car
01:41:15 and it’s quite aggressive and it’s quite fast
01:41:17 and it’s good and it feels great.
01:41:20 It also, just like Tesla,
01:41:21 Waymo made me change my mind about like,
01:41:24 maybe driving is easier than I thought.
01:41:27 Maybe I’m just being speciest, human centric, maybe.
01:41:33 It’s a speciest argument.
01:41:35 Yeah, so I don’t know.
01:41:36 But it’s fascinating to think about like the same
01:41:41 as with reading, which I think you just said.
01:41:43 You avoided the question,
01:41:45 though I still hope you answered it somewhat.
01:41:47 You avoided it brilliantly.
01:41:48 It is, there’s blind spots as artificial intelligence,
01:41:52 that artificial intelligence researchers have
01:41:55 about what it actually takes to learn to solve a problem.
01:41:58 That’s fascinating.
01:41:59 Have you had Anca Dragan on?
01:42:00 Yeah.
01:42:01 Okay.
01:42:02 She’s one of my favorites.
01:42:03 So much energy.
01:42:04 She’s right.
01:42:05 Oh, yeah.
01:42:05 She’s amazing.
01:42:06 Fantastic.
01:42:07 And in particular, she thinks a lot about this kind of,
01:42:10 I know that you know that I know kind of planning.
01:42:12 And the last time I spoke with her,
01:42:14 she was very articulate about the ways
01:42:17 in which self driving cars are not solved.
01:42:20 Like what’s still really, really hard.
01:42:22 But even her intuition is limited.
01:42:23 Like we’re all like new to this.
01:42:26 So in some sense, the Elon Musk approach
01:42:27 of being ultra confident and just like plowing.
01:42:30 Put it out there.
01:42:31 Putting it out there.
01:42:32 Like some people say it’s reckless and dangerous and so on.
01:42:35 But like, partly it’s like, it seems to be one
01:42:39 of the only ways to make progress
01:42:40 in artificial intelligence.
01:42:41 So it’s, you know, these are difficult things.
01:42:45 You know, democracy is messy.
01:42:49 Implementation of artificial intelligence systems
01:42:51 in the real world is messy.
01:42:53 So many years ago, before self driving cars
01:42:56 were an actual thing you could have a discussion about,
01:42:58 somebody asked me, like, what if we could use
01:43:01 that robotic technology and use it to drive cars around?
01:43:04 Like, isn’t that, aren’t people gonna be killed?
01:43:06 And then it’s not, you know, blah, blah, blah.
01:43:08 I’m like, that’s not what’s gonna happen.
01:43:09 I said with confidence, incorrectly, obviously.
01:43:13 What I think is gonna happen is we’re gonna have a lot more,
01:43:15 like a very gradual kind of rollout
01:43:17 where people have these cars in like closed communities,
01:43:22 right, where it’s somewhat realistic,
01:43:24 but it’s still in a box, right?
01:43:26 So that we can really get a sense of what,
01:43:28 what are the weird things that can happen?
01:43:30 How do we, how do we have to change the way we behave
01:43:34 around these vehicles?
01:43:35 Like, it’s obviously requires a kind of co evolution
01:43:39 that you can’t just plop them in and see what happens.
01:43:42 But of course, we’re basically popping them in
01:43:44 and see what happens.
01:43:45 So I was wrong, but I do think that would have been
01:43:46 a better plan.
01:43:47 So that’s, but your intuition, that’s funny,
01:43:50 just zooming out and looking at the forces of capitalism.
01:43:54 And it seems that capitalism rewards risk takers
01:43:57 and rewards and punishes risk takers, like,
01:44:00 and like, try it out.
01:44:03 The academic approach to let’s try a small thing
01:44:11 and try to understand slowly the fundamentals
01:44:13 of the problem.
01:44:14 And let’s start with one, then do two, and then see that.
01:44:18 And then do the three, you know, the capitalist
01:44:21 like startup entrepreneurial dream is let’s build a thousand
01:44:26 and let’s.
01:44:27 Right, and 500 of them fail, but whatever,
01:44:28 the other 500, we learned from them.
01:44:30 But if you’re good enough, I mean, one thing is like,
01:44:33 your intuition would say like, that’s gonna be
01:44:35 hugely destructive to everything.
01:44:37 But actually, it’s kind of the forces of capitalism,
01:44:42 like people are quite, it’s easy to be critical,
01:44:44 but if you actually look at the data at the way
01:44:47 our world has progressed in terms of the quality of life,
01:44:50 it seems like the competent good people rise to the top.
01:44:54 This is coming from me from the Soviet Union and so on.
01:44:58 It’s like, it’s interesting that somebody like Elon Musk
01:45:03 is the way you push progress in artificial intelligence.
01:45:08 Like it’s forcing Waymo to step their stuff up
01:45:11 and Waymo is forcing Elon Musk to step up.
01:45:17 It’s fascinating, because I have this tension in my heart
01:45:21 and just being upset by the lack of progress
01:45:26 in autonomous vehicles within academia.
01:45:29 So there’s a huge progress in the early days
01:45:33 of the DARPA challenges.
01:45:35 And then it just kind of stopped like at MIT,
01:45:39 but it’s true everywhere else with an exception
01:45:43 of a few sponsors here and there is like,
01:45:46 it’s not seen as a sexy problem, Thomas.
01:45:50 Like the moment artificial intelligence starts approaching
01:45:53 the problems of the real world,
01:45:56 like academics kind of like, all right, let the…
01:46:00 They get really hard in a different way.
01:46:01 In a different way, that’s right.
01:46:03 I think, yeah, right, some of us are not excited
01:46:05 about that other way.
01:46:07 But I still think there’s fundamentals problems
01:46:09 to be solved in those difficult things.
01:46:12 It’s not, it’s still publishable, I think.
01:46:14 Like we just need to, it’s the same criticism
01:46:17 you could have of all these conferences in Europe, CVPR,
01:46:20 where application papers are often as powerful
01:46:24 and as important as like a theory paper.
01:46:27 Even like theory just seems much more respectable and so on.
01:46:31 I mean, machine learning community is changing
01:46:32 that a little bit.
01:46:33 I mean, at least in statements,
01:46:35 but it’s still not seen as the sexiest of pursuits,
01:46:40 which is like, how do I actually make this thing
01:46:42 work in practice as opposed to on this toy data set?
01:46:47 All that to say, are you still avoiding
01:46:49 the three books question?
01:46:50 Is there something on audio book that you can recommend?
01:46:54 Oh, yeah, I mean, yeah, I’ve read a lot of really fun stuff.
01:46:58 In terms of books that I find myself thinking back on
01:47:02 that I read a while ago,
01:47:03 like that stood the test of time to some degree.
01:47:06 I find myself thinking of program or be programmed a lot
01:47:09 by Douglas Roschkopf, which was,
01:47:13 it basically put out the premise
01:47:15 that we all need to become programmers
01:47:19 in one form or another.
01:47:21 And it was an analogy to once upon a time
01:47:24 we all had to become readers.
01:47:26 We had to become literate.
01:47:27 And there was a time before that
01:47:28 when not everybody was literate,
01:47:30 but once literacy was possible,
01:47:31 the people who were literate had more of a say in society
01:47:36 than the people who weren’t.
01:47:37 And so we made a big effort to get everybody up to speed.
01:47:39 And now it’s not 100% universal, but it’s quite widespread.
01:47:44 Like the assumption is generally that people can read.
01:47:48 The analogy that he makes is that programming
01:47:50 is a similar kind of thing,
01:47:51 that we need to have a say in, right?
01:47:57 So being a reader, being literate, being a reader means
01:47:59 you can receive all this information,
01:48:01 but you don’t get to put it out there.
01:48:04 And programming is the way that we get to put it out there.
01:48:06 And that was the argument that he made.
01:48:07 I think he specifically has now backed away from this idea.
01:48:11 He doesn’t think it’s happening quite this way.
01:48:14 And that might be true that it didn’t,
01:48:17 society didn’t sort of play forward quite that way.
01:48:20 I still believe in the premise.
01:48:22 I still believe that at some point,
01:48:24 the relationship that we have to these machines
01:48:26 and these networks has to be one of each individual
01:48:29 can, has the wherewithal to make the machines help them.
01:48:34 Do the things that that person wants done.
01:48:37 And as software people, we know how to do that.
01:48:40 And when we have a problem, we’re like, okay,
01:48:41 I’ll just, I’ll hack up a Pearl script or something
01:48:43 and make it so.
01:48:44 If we lived in a world where everybody could do that,
01:48:47 that would be a better world.
01:48:49 And computers would be, have, I think less sway over us.
01:48:53 And other people’s software would have less sway over us
01:48:56 as a group.
01:48:57 In some sense, software engineering, programming is power.
01:49:00 Programming is power, right?
01:49:03 Yeah, it’s like magic.
01:49:04 It’s like magic spells.
01:49:05 And it’s not out of reach of everyone.
01:49:09 But at the moment, it’s just a sliver of the population
01:49:11 who can commune with machines in this way.
01:49:15 So I don’t know, so that book had a big impact on me.
01:49:18 Currently, I’m reading The Alignment Problem,
01:49:20 actually by Brian Christian.
01:49:22 So I don’t know if you’ve seen this out there yet.
01:49:23 Is this similar to Stuart Russell’s work
01:49:25 with the control problem?
01:49:27 It’s in that same general neighborhood.
01:49:28 I mean, they have different emphases
01:49:31 that they’re concentrating on.
01:49:32 I think Stuart’s book did a remarkably good job,
01:49:36 like just a celebratory good job
01:49:38 at describing AI technology and sort of how it works.
01:49:43 I thought that was great.
01:49:44 It was really cool to see that in a book.
01:49:46 I think he has some experience writing some books.
01:49:49 You know, that’s probably a possible thing.
01:49:52 He’s maybe thought a thing or two
01:49:53 about how to explain AI to people.
01:49:56 Yeah, that’s a really good point.
01:49:57 This book so far has been remarkably good
01:50:00 at telling the story of sort of the history,
01:50:04 the recent history of some of the things
01:50:07 that have happened.
01:50:08 I’m in the first third.
01:50:09 He said this book is in three thirds.
01:50:10 The first third is essentially AI fairness
01:50:14 and implications of AI on society
01:50:16 that we’re seeing right now.
01:50:18 And that’s been great.
01:50:19 I mean, he’s telling the stories really well.
01:50:21 He went out and talked to the frontline people
01:50:23 whose names were associated with some of these ideas
01:50:26 and it’s been terrific.
01:50:28 He says the second half of the book
01:50:29 is on reinforcement learning.
01:50:30 So maybe that’ll be fun.
01:50:33 And then the third half, third third,
01:50:36 is on the super intelligence alignment problem.
01:50:39 And I suspect that that part will be less fun
01:50:43 for me to read.
01:50:44 Yeah.
01:50:46 Yeah, it’s an interesting problem to talk about.
01:50:48 I find it to be the most interesting,
01:50:50 just like thinking about whether we live
01:50:52 in a simulation or not,
01:50:54 as a thought experiment to think about our own existence.
01:50:58 So in the same way,
01:50:59 talking about alignment problem with AGI
01:51:02 is a good way to think similar
01:51:04 to like the trolley problem with autonomous vehicles.
01:51:06 It’s a useless thing for engineering,
01:51:08 but it’s a nice little thought experiment
01:51:10 for actually thinking about what are like
01:51:13 our own human ethical systems, our moral systems
01:51:17 to by thinking how we engineer these things,
01:51:23 you start to understand yourself.
01:51:25 So sci fi can be good at that too.
01:51:27 So one sci fi book to recommend
01:51:29 is Exhalations by Ted Chiang,
01:51:31 bunch of short stories.
01:51:33 This Ted Chiang is the guy who wrote the short story
01:51:35 that became the movie Arrival.
01:51:38 And all of his stories just from a,
01:51:41 he was a computer scientist,
01:51:43 actually he studied at Brown.
01:51:44 And they all have this sort of really insightful bit
01:51:49 of science or computer science that drives them.
01:51:52 And so it’s just a romp, right?
01:51:54 To just like, he creates these artificial worlds
01:51:57 with these by extrapolating on these ideas
01:51:59 that we know about,
01:52:01 but hadn’t really thought through
01:52:02 to this kind of conclusion.
01:52:04 And so his stuff is, it’s really fun to read,
01:52:06 it’s mind warping.
01:52:08 So I’m not sure if you’re familiar,
01:52:10 I seem to mention this every other word
01:52:13 is I’m from the Soviet Union and I’m Russian.
01:52:17 Way too much to see us.
01:52:18 My roots are Russian too,
01:52:20 but a couple generations back.
01:52:22 Well, it’s probably in there somewhere.
01:52:24 So maybe we can pull at that thread a little bit
01:52:28 of the existential dread that we all feel.
01:52:31 You mentioned that you,
01:52:32 I think somewhere in the conversation you mentioned
01:52:34 that you don’t really pretty much like dying.
01:52:38 I forget in which context,
01:52:39 it might’ve been a reinforcement learning perspective.
01:52:41 I don’t know.
01:52:42 No, you know what it was?
01:52:43 It was in teaching my kids to drive.
01:52:47 That’s how you face your mortality, yes.
01:52:49 From a human beings perspective
01:52:52 or from a reinforcement learning researchers perspective,
01:52:55 let me ask you the most absurd question.
01:52:57 What do you think is the meaning of this whole thing?
01:53:01 The meaning of life on this spinning rock.
01:53:06 I mean, I think reinforcement learning researchers
01:53:08 maybe think about this from a science perspective
01:53:11 more often than a lot of other people, right?
01:53:13 As a supervised learning person,
01:53:14 you’re probably not thinking about the sweep of a lifetime,
01:53:18 but reinforcement learning agents
01:53:20 are having little lifetimes, little weird little lifetimes.
01:53:22 And it’s hard not to project yourself
01:53:25 into their world sometimes.
01:53:27 But as far as the meaning of life,
01:53:30 so when I turned 42, you may know from,
01:53:34 that is a book I read,
01:53:35 The Hitchhiker’s Guide to the Galaxy,
01:53:38 that that is the meaning of life.
01:53:40 So when I turned 42, I had a meaning of life party
01:53:43 where I invited people over
01:53:45 and everyone shared their meaning of life.
01:53:48 We had slides made up.
01:53:50 And so we all sat down and did a slide presentation
01:53:54 to each other about the meaning of life.
01:53:56 And mine was balance.
01:54:00 I think that life is balance.
01:54:02 And so the activity at the party,
01:54:06 for a 42 year old, maybe this is a little bit nonstandard,
01:54:09 but I found all the little toys and devices that I had
01:54:12 where you had to balance on them.
01:54:13 You had to like stand on it and balance,
01:54:15 or a pogo stick I brought,
01:54:17 a rip stick, which is like a weird two wheeled skateboard.
01:54:23 I got a unicycle, but I didn’t know how to do it.
01:54:26 I now can do it.
01:54:28 I would love watching you try.
01:54:29 Yeah, I’ll send you a video.
01:54:31 I’m not great, but I managed.
01:54:35 And so balance, yeah.
01:54:37 So my wife has a really good one that she sticks to
01:54:42 and is probably pretty accurate.
01:54:43 And it has to do with healthy relationships
01:54:47 with people that you love and working hard for good causes.
01:54:51 But to me, yeah, balance, balance in a word.
01:54:53 That works for me.
01:54:56 Not too much of anything,
01:54:57 because too much of anything is iffy.
01:55:00 That feels like a Rolling Stones song.
01:55:02 I feel like they must be.
01:55:03 You can’t always get what you want,
01:55:05 but if you try sometimes, you can strike a balance.
01:55:09 Yeah, I think that’s how it goes, Michael.
01:55:12 I’ll write you a parody.
01:55:14 It’s a huge honor to talk to you.
01:55:16 This is really fun.
01:55:17 Oh, no, the honor’s mine.
01:55:17 I’ve been a big fan of yours,
01:55:18 so can’t wait to see what you do next
01:55:24 in the world of education, in the world of parody,
01:55:27 in the world of reinforcement learning.
01:55:28 Thanks for talking to me.
01:55:29 My pleasure.
01:55:30 Thank you for listening to this conversation
01:55:32 with Michael Littman, and thank you to our sponsors,
01:55:35 SimpliSafe, a home security company I use
01:55:37 to monitor and protect my apartment, ExpressVPN,
01:55:41 the VPN I’ve used for many years
01:55:43 to protect my privacy on the internet,
01:55:45 Masterclass, online courses that I enjoy
01:55:48 from some of the most amazing humans in history,
01:55:51 and BetterHelp, online therapy with a licensed professional.
01:55:55 Please check out these sponsors in the description
01:55:58 to get a discount and to support this podcast.
01:56:00 If you enjoy this thing, subscribe on YouTube,
01:56:03 review it with five stars on Apple Podcast,
01:56:05 follow on Spotify, support it on Patreon,
01:56:08 or connect with me on Twitter at Lex Friedman.
01:56:12 And now, let me leave you with some words
01:56:14 from Groucho Marx.
01:56:16 If you’re not having fun, you’re doing something wrong.
01:56:20 Thank you for listening, and hope to see you next time.