Anca Dragan: Human-Robot Interaction and Reward Engineering #81

Transcript

00:00:00 The following is a conversation with Anca Drogon,

00:00:03 a professor at Berkeley working on human robot interaction,

00:00:08 algorithms that look beyond the robot’s function

00:00:10 in isolation and generate robot behavior

00:00:13 that accounts for interaction

00:00:15 and coordination with human beings.

00:00:18 She also consults at Waymo, the autonomous vehicle company,

00:00:22 but in this conversation,

00:00:23 she is 100% wearing her Berkeley hat.

00:00:27 She is one of the most brilliant and fun roboticists

00:00:30 in the world to talk with.

00:00:32 I had a tough and crazy day leading up to this conversation,

00:00:36 so I was a bit tired, even more so than usual,

00:00:41 but almost immediately as she walked in,

00:00:44 her energy, passion, and excitement

00:00:46 for human robot interaction was contagious.

00:00:48 So I had a lot of fun and really enjoyed this conversation.

00:00:52 This is the Artificial Intelligence Podcast.

00:00:55 If you enjoy it, subscribe on YouTube,

00:00:57 review it with five stars on Apple Podcast,

00:01:00 support it on Patreon,

00:01:01 or simply connect with me on Twitter at Lex Friedman,

00:01:05 spelled F R I D M A N.

00:01:08 As usual, I’ll do one or two minutes of ads now

00:01:11 and never any ads in the middle

00:01:12 that can break the flow of the conversation.

00:01:14 I hope that works for you

00:01:16 and doesn’t hurt the listening experience.

00:01:20 This show is presented by Cash App,

00:01:22 the number one finance app in the App Store.

00:01:25 When you get it, use code LEXPODCAST.

00:01:29 Cash App lets you send money to friends,

00:01:31 buy Bitcoin, and invest in the stock market

00:01:33 with as little as one dollar.

00:01:36 Since Cash App does fractional share trading,

00:01:39 let me mention that the order execution algorithm

00:01:41 that works behind the scenes

00:01:43 to create the abstraction of fractional orders

00:01:45 is an algorithmic marvel.

00:01:48 So big props to the Cash App engineers

00:01:50 for solving a hard problem that in the end

00:01:53 provides an easy interface that takes a step up

00:01:56 to the next layer of abstraction over the stock market,

00:01:59 making trading more accessible for new investors

00:02:02 and diversification much easier.

00:02:05 So again, if you get Cash App from the App Store

00:02:08 or Google Play and use the code LEXPODCAST,

00:02:11 you get $10 and Cash App will also donate $10 to FIRST,

00:02:15 an organization that is helping to advance robotics

00:02:18 and STEM education for young people around the world.

00:02:22 And now, here’s my conversation with Anca Drogon.

00:02:26 When did you first fall in love with robotics?

00:02:29 I think it was a very gradual process

00:02:34 and it was somewhat accidental actually

00:02:37 because I first started getting into programming

00:02:41 when I was a kid and then into math

00:02:43 and then I decided computer science

00:02:46 was the thing I was gonna do

00:02:47 and then in college I got into AI

00:02:50 and then I applied to the Robotics Institute

00:02:52 at Carnegie Mellon and I was coming from this little school

00:02:56 in Germany that nobody had heard of

00:02:59 but I had spent an exchange semester at Carnegie Mellon

00:03:01 so I had letters from Carnegie Mellon.

00:03:04 So that was the only, you know, MIT said no,

00:03:06 Berkeley said no, Stanford said no.

00:03:09 That was the only place I got into

00:03:11 so I went there to the Robotics Institute

00:03:13 and I thought that robotics is a really cool way

00:03:16 to actually apply the stuff that I knew and loved

00:03:20 to like optimization so that’s how I got into robotics.

00:03:23 I have a better story how I got into cars

00:03:25 which is I used to do mostly manipulation in my PhD

00:03:31 but now I do kind of a bit of everything application wise

00:03:34 including cars and I got into cars

00:03:38 because I was here in Berkeley

00:03:42 while I was a PhD student still for RSS 2014,

00:03:46 Peter Bill organized it and he arranged for,

00:03:50 it was Google at the time to give us rides

00:03:52 in self driving cars and I was in a robot

00:03:56 and it was just making decision after decision,

00:04:00 the right call and it was so amazing.

00:04:03 So it was a whole different experience, right?

00:04:05 Just I mean manipulation is so hard you can’t do anything

00:04:07 and there it was.

00:04:08 Was it the most magical robot you’ve ever met?

00:04:11 So like for me to meet a Google self driving car

00:04:14 for the first time was like a transformative moment.

00:04:18 Like I had two moments like that,

00:04:19 that and Spot Mini, I don’t know if you met Spot Mini

00:04:22 from Boston Dynamics.

00:04:24 I felt like I fell in love or something

00:04:27 like it, cause I know how a Spot Mini works, right?

00:04:30 It’s just, I mean there’s nothing truly special,

00:04:34 it’s great engineering work but the anthropomorphism

00:04:38 that went on into my brain that came to life

00:04:41 like it had a little arm and it looked at me,

00:04:45 he, she looked at me, I don’t know,

00:04:47 there’s a magical connection there

00:04:48 and it made me realize, wow, robots can be so much more

00:04:52 than things that manipulate objects.

00:04:54 They can be things that have a human connection.

00:04:56 Do you have, was the self driving car the moment like,

00:05:01 was there a robot that truly sort of inspired you?

00:05:04 That was, I remember that experience very viscerally,

00:05:08 riding in that car and being just wowed.

00:05:11 I had the, they gave us a sticker that said,

00:05:16 I rode in a self driving car

00:05:17 and it had this cute little firefly on and,

00:05:20 or logo or something like that.

00:05:21 Oh, that was like the smaller one, like the firefly.

00:05:23 Yeah, the really cute one, yeah.

00:05:25 And I put it on my laptop and I had that for years

00:05:30 until I finally changed my laptop out and you know.

00:05:33 What about if we walk back, you mentioned optimization,

00:05:36 like what beautiful ideas inspired you in math,

00:05:40 computer science early on?

00:05:42 Like why get into this field?

00:05:44 It seems like a cold and boring field of math.

00:05:47 Like what was exciting to you about it?

00:05:49 The thing is I liked math from very early on,

00:05:52 from fifth grade is when I got into the math Olympiad

00:05:56 and all of that.

00:05:57 Oh, you competed too?

00:05:58 Yeah, this, it Romania is like our national sport too,

00:06:01 you gotta understand.

00:06:02 So I got into that fairly early

00:06:05 and it was a little, maybe too just theory

00:06:10 with no kind of, I didn’t kind of had a,

00:06:13 didn’t really have a goal.

00:06:15 And other than understanding, which was cool,

00:06:17 I always liked learning and understanding,

00:06:19 but there was no, okay,

00:06:20 what am I applying this understanding to?

00:06:22 And so I think that’s how I got into,

00:06:23 more heavily into computer science

00:06:25 because it was kind of math meets something

00:06:29 you can do tangibly in the world.

00:06:31 Do you remember like the first program you’ve written?

00:06:34 Okay, the first program I’ve written with,

00:06:37 I kind of do, it was in Cubasic in fourth grade.

00:06:42 Wow.

00:06:43 And it was drawing like a circle.

00:06:46 Graphics.

00:06:47 Yeah, that was, I don’t know how to do that anymore,

00:06:51 but in fourth grade,

00:06:52 that’s the first thing that they taught me.

00:06:54 I was like, you could take a special,

00:06:56 I wouldn’t say it was an extracurricular,

00:06:57 it’s in the sense an extracurricular,

00:06:59 so you could sign up for dance or music or programming.

00:07:03 And I did the programming thing

00:07:04 and my mom was like, what, why?

00:07:07 Did you compete in programming?

00:07:08 Like these days, Romania probably,

00:07:12 that’s like a big thing.

00:07:12 There’s a programming competition.

00:07:15 Was that, did that touch you at all?

00:07:17 I did a little bit of the computer science Olympian,

00:07:21 but not as seriously as I did the math Olympian.

00:07:24 So it was programming.

00:07:25 Yeah, it’s basically,

00:07:26 here’s a hard math problem,

00:07:27 solve it with a computer is kind of the deal.

00:07:29 Yeah, it’s more like algorithm.

00:07:30 Exactly, it’s always algorithmic.

00:07:32 So again, you kind of mentioned the Google self driving car,

00:07:36 but outside of that,

00:07:39 what’s like who or what is your favorite robot,

00:07:44 real or fictional that like captivated

00:07:46 your imagination throughout?

00:07:48 I mean, I guess you kind of alluded

00:07:49 to the Google self drive,

00:07:51 the Firefly was a magical moment,

00:07:53 but is there something else?

00:07:54 It wasn’t the Firefly there,

00:07:56 I think there was the Lexus by the way.

00:07:58 This was back then.

00:07:59 But yeah, so good question.

00:08:02 Okay, my favorite fictional robot is WALLI.

00:08:08 And I love how amazingly expressive it is.

00:08:15 I’m personally thinks a little bit

00:08:16 about expressive motion kinds of things you’re saying with,

00:08:18 you can do this and it’s a head and it’s the manipulator

00:08:20 and what does it all mean?

00:08:22 I like to think about that stuff.

00:08:24 I love Pixar, I love animation.

00:08:26 WALLI has two big eyes, I think, or no?

00:08:28 Yeah, it has these cameras and they move.

00:08:34 So yeah, it goes and then it’s super cute.

00:08:38 Yeah, the way it moves is just so expressive,

00:08:41 the timing of that motion,

00:08:43 what it’s doing with its arms

00:08:44 and what it’s doing with these lenses is amazing.

00:08:48 And so I’ve really liked that from the start.

00:08:53 And then on top of that, sometimes I share this,

00:08:56 it’s a personal story I share with people

00:08:58 or when I teach about AI or whatnot.

00:09:01 My husband proposed to me by building a WALLI

00:09:07 and he actuated it.

00:09:09 So it’s seven degrees of freedom, including the lens thing.

00:09:13 And it kind of came in and it had the,

00:09:17 he made it have like the belly box opening thing.

00:09:21 So it just did that.

00:09:23 And then it spewed out this box made out of Legos

00:09:27 that open slowly and then bam, yeah.

00:09:31 Yeah, it was quite, it set a bar.

00:09:34 That could be like the most impressive thing I’ve ever heard.

00:09:37 Okay.

00:09:39 That was special connection to WALLI, long story short.

00:09:40 I like WALLI because I like animation and I like robots

00:09:43 and I like the fact that this was,

00:09:46 we still have this robot to this day.

00:09:49 How hard is that problem,

00:09:50 do you think of the expressivity of robots?

00:09:54 Like with the Boston Dynamics, I never talked to those folks

00:09:59 about this particular element.

00:10:00 I’ve talked to them a lot,

00:10:02 but it seems to be like almost an accidental side effect

00:10:05 for them that they weren’t,

00:10:07 I don’t know if they’re faking it.

00:10:08 They weren’t trying to, okay.

00:10:11 They do say that the gripper,

00:10:14 it was not intended to be a face.

00:10:17 I don’t know if that’s a honest statement,

00:10:20 but I think they’re legitimate.

00:10:21 Probably yes. And so do we automatically just

00:10:25 anthropomorphize anything we can see about a robot?

00:10:29 So like the question is,

00:10:30 how hard is it to create a WALLI type robot

00:10:33 that connects so deeply with us humans?

00:10:35 What do you think?

00:10:36 It’s really hard, right?

00:10:37 So it depends on what setting.

00:10:39 So if you wanna do it in this very particular narrow setting

00:10:45 where it does only one thing and it’s expressive,

00:10:48 then you can get an animator, you know,

00:10:50 you can have Pixar on call come in,

00:10:52 design some trajectories.

00:10:53 There was a, Anki had a robot called Cosmo

00:10:56 where they put in some of these animations.

00:10:58 That part is easy, right?

00:11:00 The hard part is doing it not via these

00:11:04 kind of handcrafted behaviors,

00:11:06 but doing it generally autonomously.

00:11:09 Like I want robots, I don’t work on,

00:11:12 just to clarify, I don’t, I used to work a lot on this.

00:11:14 I don’t work on that quite as much these days,

00:11:17 but the notion of having robots that, you know,

00:11:21 when they pick something up and put it in a place,

00:11:24 they can do that with various forms of style,

00:11:28 or you can say, well, this robot is, you know,

00:11:30 succeeding at this task and is confident

00:11:32 versus it’s hesitant versus, you know,

00:11:34 maybe it’s happy or it’s, you know,

00:11:35 disappointed about something, some failure that it had.

00:11:38 I think that when robots move,

00:11:42 they can communicate so much about internal states

00:11:46 or perceived internal states that they have.

00:11:49 And I think that’s really useful

00:11:53 and an element that we’ll want in the future

00:11:55 because I was reading this article

00:11:58 about how kids are,

00:12:04 kids are being rude to Alexa

00:12:07 because they can be rude to it

00:12:09 and it doesn’t really get angry, right?

00:12:11 It doesn’t reply in any way, it just says the same thing.

00:12:15 So I think there’s, at least for that,

00:12:17 for the correct development of children,

00:12:20 it’s important that these things,

00:12:21 you kind of react differently.

00:12:22 I also think, you know, you walk in your home

00:12:24 and you have a personal robot and if you’re really pissed,

00:12:27 presumably the robot should kind of behave

00:12:28 slightly differently than when you’re super happy

00:12:31 and excited, but it’s really hard because it’s,

00:12:36 I don’t know, you know, the way I would think about it

00:12:38 and the way I thought about it when it came to

00:12:40 expressing goals or intentions for robots,

00:12:44 it’s, well, what’s really happening is that

00:12:47 instead of doing robotics where you have your state

00:12:51 and you have your action space and you have your space,

00:12:55 the reward function that you’re trying to optimize,

00:12:57 now you kind of have to expand the notion of state

00:13:00 to include this human internal state.

00:13:02 What is the person actually perceiving?

00:13:05 What do they think about the robots?

00:13:08 Something or rather,

00:13:10 and then you have to optimize in that system.

00:13:12 And so that means that you have to understand

00:13:14 how your motion, your actions end up sort of influencing

00:13:17 the observer’s kind of perception of you.

00:13:20 And it’s very hard to write math about that.

00:13:25 Right, so when you start to think about

00:13:27 incorporating the human into the state model,

00:13:31 apologize for the philosophical question,

00:13:33 but how complicated are human beings, do you think?

00:13:36 Like, can they be reduced to a kind of

00:13:40 almost like an object that moves

00:13:43 and maybe has some basic intents?

00:13:46 Or is there something, do we have to model things like mood

00:13:50 and general aggressiveness and time?

00:13:52 I mean, all these kinds of human qualities

00:13:54 or like game theoretic qualities, like what’s your sense?

00:13:58 How complicated is…

00:14:00 How hard is the problem of human robot interaction?

00:14:03 Yeah, should we talk about

00:14:05 what the problem of human robot interaction is?

00:14:07 Yeah, what is human robot interaction?

00:14:10 And then talk about how that, yeah.

00:14:12 So, and by the way, I’m gonna talk about

00:14:15 this very particular view of human robot interaction, right?

00:14:19 Which is not so much on the social side

00:14:21 or on the side of how do you have a good conversation

00:14:24 with the robot, what should the robot’s appearance be?

00:14:26 It turns out that if you make robots taller versus shorter,

00:14:29 this has an effect on how people act with them.

00:14:31 So I’m not talking about that.

00:14:34 But I’m talking about this very kind of narrow thing,

00:14:36 which is you take, if you wanna take a task

00:14:39 that a robot can do in isolation,

00:14:42 in a lab out there in the world, but in isolation,

00:14:46 and now you’re asking what does it mean for the robot

00:14:49 to be able to do this task for,

00:14:52 presumably what its actually end goal is,

00:14:54 which is to help some person.

00:14:56 That ends up changing the problem in two ways.

00:15:02 The first way it changes the problem is that

00:15:04 the robot is no longer the single agent acting.

00:15:08 That you have humans who also take actions

00:15:10 in that same space.

00:15:12 Cars navigating around people, robots around an office,

00:15:15 navigating around the people in that office.

00:15:18 If I send the robot over there in the cafeteria

00:15:20 to get me a coffee, then there’s probably other people

00:15:23 reaching for stuff in the same space.

00:15:25 And so now you have your robot and you’re in charge

00:15:28 of the actions that the robot is taking.

00:15:30 Then you have these people who are also making decisions

00:15:33 and taking actions in that same space.

00:15:36 And even if, you know, the robot knows what it should do

00:15:39 and all of that, just coexisting with these people, right?

00:15:42 Kind of getting the actions to gel well,

00:15:45 to mesh well together.

00:15:47 That’s sort of the kind of problem number one.

00:15:50 And then there’s problem number two,

00:15:51 which is, goes back to this notion of if I’m a programmer,

00:15:58 I can specify some objective for the robot

00:16:00 to go off and optimize and specify the task.

00:16:03 But if I put the robot in your home,

00:16:07 presumably you might have your own opinions about,

00:16:11 well, okay, I want my house clean,

00:16:12 but how do I want it cleaned?

00:16:14 And how should robot move, how close to me it should come

00:16:16 and all of that.

00:16:17 And so I think those are the two differences that you have.

00:16:20 You’re acting around people and what you should be

00:16:24 optimizing for should satisfy the preferences

00:16:27 of that end user, not of your programmer who programmed you.

00:16:30 Yeah, and the preferences thing is tricky.

00:16:33 So figuring out those preferences,

00:16:35 be able to interactively adjust

00:16:38 to understand what the human is doing.

00:16:39 So really it boils down to understand the humans

00:16:42 in order to interact with them and in order to please them.

00:16:45 Right.

00:16:47 So why is this hard?

00:16:48 Yeah, why is understanding humans hard?

00:16:51 So I think there’s two tasks about understanding humans

00:16:57 that in my mind are very, very similar,

00:16:59 but not everyone agrees.

00:17:00 So there’s the task of being able to just anticipate

00:17:04 what people will do.

00:17:05 We all know that cars need to do this, right?

00:17:07 We all know that, well, if I navigate around some people,

00:17:10 the robot has to get some notion of,

00:17:12 okay, where is this person gonna be?

00:17:15 So that’s kind of the prediction side.

00:17:17 And then there’s what you were saying,

00:17:19 satisfying the preferences, right?

00:17:21 So adapting to the person’s preferences,

00:17:22 knowing what to optimize for,

00:17:24 which is more this inference side,

00:17:25 this what does this person want?

00:17:28 What is their intent? What are their preferences?

00:17:31 And to me, those kind of go together

00:17:35 because I think that at the very least,

00:17:39 if you can understand, if you can look at human behavior

00:17:42 and understand what it is that they want,

00:17:45 then that’s sort of the key enabler

00:17:47 to being able to anticipate what they’ll do in the future.

00:17:50 Because I think that we’re not arbitrary.

00:17:53 We make these decisions that we make,

00:17:55 we act in the way we do

00:17:56 because we’re trying to achieve certain things.

00:17:59 And so I think that’s the relationship between them.

00:18:01 Now, how complicated do these models need to be

00:18:05 in order to be able to understand what people want?

00:18:10 So we’ve gotten a long way in robotics

00:18:15 with something called inverse reinforcement learning,

00:18:17 which is the notion of if someone acts,

00:18:19 demonstrates how they want the thing done.

00:18:22 What is inverse reinforcement learning?

00:18:24 You just briefly said it.

00:18:25 Right, so it’s the problem of take human behavior

00:18:30 and infer reward function from this.

00:18:33 So figure out what it is

00:18:34 that that behavior is optimal with respect to.

00:18:37 And it’s a great way to think

00:18:38 about learning human preferences

00:18:40 in the sense of you have a car and the person can drive it

00:18:45 and then you can say, well, okay,

00:18:46 I can actually learn what the person is optimizing for.

00:18:51 I can learn their driving style,

00:18:53 or you can have people demonstrate

00:18:55 how they want the house clean.

00:18:57 And then you can say, okay, this is,

00:18:59 I’m getting the trade offs that they’re making.

00:19:02 I’m getting the preferences that they want out of this.

00:19:06 And so we’ve been successful in robotics somewhat with this.

00:19:10 And it’s based on a very simple model of human behavior.

00:19:15 It was remarkably simple,

00:19:16 which is that human behavior is optimal

00:19:18 with respect to whatever it is that people want, right?

00:19:22 So you make that assumption

00:19:23 and now you can kind of inverse through.

00:19:24 That’s why it’s called inverse,

00:19:25 well, really optimal control,

00:19:27 but also inverse reinforcement learning.

00:19:30 So this is based on utility maximization in economics.

00:19:36 Back in the forties, von Neumann and Morgenstern

00:19:39 were like, okay, people are making choices

00:19:43 by maximizing utility, go.

00:19:45 And then in the late fifties,

00:19:48 we had Luce and Shepherd come in and say,

00:19:52 people are a little bit noisy and approximate in that process.

00:19:57 So they might choose something kind of stochastically

00:20:01 with probability proportional to

00:20:03 how much utility something has.

00:20:07 So there’s a bit of noise in there.

00:20:09 This has translated into robotics

00:20:11 and something that we call Boltzmann rationality.

00:20:14 So it’s a kind of an evolution

00:20:15 of inverse reinforcement learning

00:20:16 that accounts for human noise.

00:20:19 And we’ve had some success with that too,

00:20:21 for these tasks where it turns out

00:20:23 people act noisily enough that you can’t just do vanilla,

00:20:28 the vanilla version.

00:20:29 You can account for noise

00:20:31 and still infer what they seem to want based on this.

00:20:36 Then now we’re hitting tasks where that’s not enough.

00:20:39 And because…

00:20:41 What are examples of spatial tasks?

00:20:43 So imagine you’re trying to control some robot,

00:20:45 that’s fairly complicated.

00:20:47 You’re trying to control a robot arm

00:20:49 because maybe you’re a patient with a motor impairment

00:20:52 and you have this wheelchair mounted arm

00:20:53 and you’re trying to control it around.

00:20:56 Or one task that we’ve looked at with Sergei is,

00:21:00 and our students did, is a lunar lander.

00:21:02 So I don’t know if you know this Atari game,

00:21:05 it’s called Lunar Lander.

00:21:06 It’s really hard.

00:21:07 People really suck at landing the thing.

00:21:09 Mostly they just crash it left and right.

00:21:11 Okay, so this is the kind of task we imagine

00:21:14 you’re trying to provide some assistance

00:21:16 to a person operating such a robot

00:21:20 where you want the kind of the autonomy to kick in,

00:21:21 figure out what it is that you’re trying to do

00:21:23 and help you do it.

00:21:25 It’s really hard to do that for, say, Lunar Lander

00:21:30 because people are all over the place.

00:21:32 And so they seem much more noisy than really irrational.

00:21:36 That’s an example of a task

00:21:37 where these models are kind of failing us.

00:21:41 And it’s not surprising because

00:21:43 we’re talking about the 40s, utility, late 50s,

00:21:47 sort of noisy.

00:21:48 Then the 70s came and behavioral economics

00:21:52 started being a thing where people were like,

00:21:54 no, no, no, no, no, people are not rational.

00:21:58 People are messy and emotional and irrational

00:22:03 and have all sorts of heuristics

00:22:05 that might be domain specific.

00:22:06 And they’re just a mess.

00:22:08 The mess.

00:22:09 So what does my robot do to understand

00:22:13 what you want?

00:22:14 And it’s a very, it’s very, that’s why it’s complicated.

00:22:18 It’s, you know, for the most part,

00:22:19 we get away with pretty simple models until we don’t.

00:22:23 And then the question is, what do you do then?

00:22:26 And I had days when I wanted to, you know,

00:22:30 pack my bags and go home and switch jobs

00:22:32 because it’s just, it feels really daunting

00:22:35 to make sense of human behavior enough

00:22:37 that you can reliably understand what people want,

00:22:40 especially as, you know,

00:22:41 robot capabilities will continue to get developed.

00:22:44 You’ll get these systems that are more and more capable

00:22:47 of all sorts of things.

00:22:48 And then you really want to make sure

00:22:49 that you’re telling them the right thing to do.

00:22:51 What is that thing?

00:22:52 Well, read it in human behavior.

00:22:56 So if I just sat here quietly

00:22:58 and tried to understand something about you

00:23:00 by listening to you talk,

00:23:02 it would be harder than if I got to say something

00:23:06 and ask you and interact and control.

00:23:08 Can you, can the robot help its understanding of the human

00:23:13 by influencing the behavior by actually acting?

00:23:18 Yeah, absolutely.

00:23:19 So one of the things that’s been exciting to me lately

00:23:23 is this notion that when you try to,

00:23:28 that when you try to think of the robotics problem as,

00:23:31 okay, I have a robot and it needs to optimize

00:23:34 for whatever it is that a person wants it to optimize

00:23:37 as opposed to maybe what a programmer said.

00:23:40 That problem we think of as a human robot

00:23:44 collaboration problem in which both agents get to act

00:23:49 in which the robot knows less than the human

00:23:52 because the human actually has access to,

00:23:54 you know, at least implicitly to what it is that they want.

00:23:57 They can’t write it down, but they can talk about it.

00:24:00 They can give all sorts of signals.

00:24:02 They can demonstrate and,

00:24:04 but the robot doesn’t need to sit there

00:24:06 and passively observe human behavior

00:24:08 and try to make sense of it.

00:24:10 The robot can act too.

00:24:11 And so there’s these information gathering actions

00:24:15 that the robot can take to sort of solicit responses

00:24:19 that are actually informative.

00:24:21 So for instance, this is not for the purpose

00:24:22 of assisting people, but with kind of back to coordinating

00:24:25 with people in cars and all of that.

00:24:27 One thing that Dorsa did was,

00:24:31 so we were looking at cars being able to navigate

00:24:34 around people and you might not know exactly

00:24:39 the driving style of a particular individual

00:24:41 that’s next to you,

00:24:43 but you wanna change lanes in front of them.

00:24:45 Navigating around other humans inside cars.

00:24:48 Yeah, good, good clarification question.

00:24:50 So you have an autonomous car and it’s trying to navigate

00:24:55 the road around human driven vehicles.

00:24:58 Similar things ideas apply to pedestrians as well,

00:25:01 but let’s just take human driven vehicles.

00:25:03 So now you’re trying to change a lane.

00:25:06 Well, you could be trying to infer the driving style

00:25:10 of this person next to you.

00:25:12 You’d like to know if they’re in particular,

00:25:13 if they’re sort of aggressive or defensive,

00:25:15 if they’re gonna let you kind of go in

00:25:18 or if they’re gonna not.

00:25:20 And it’s very difficult to just,

00:25:25 if you think that if you wanna hedge your bets

00:25:27 and say, ah, maybe they’re actually pretty aggressive,

00:25:30 I shouldn’t try this.

00:25:31 You kind of end up driving next to them

00:25:33 and driving next to them, right?

00:25:34 And then you don’t know

00:25:36 because you’re not actually getting the observations

00:25:39 that you’re getting away.

00:25:40 Someone drives when they’re next to you

00:25:42 and they just need to go straight.

00:25:44 It’s kind of the same

00:25:45 regardless if they’re aggressive or defensive.

00:25:47 And so you need to enable the robot

00:25:51 to reason about how it might actually be able

00:25:54 to gather information by changing the actions

00:25:57 that it’s taking.

00:25:58 And then the robot comes up with these cool things

00:25:59 where it kind of nudges towards you

00:26:02 and then sees if you’re gonna slow down or not.

00:26:05 Then if you slow down,

00:26:06 it sort of updates its model of you

00:26:07 and says, oh, okay, you’re more on the defensive side.

00:26:11 So now I can actually like.

00:26:12 That’s a fascinating dance.

00:26:14 That’s so cool that you could use your own actions

00:26:18 to gather information.

00:26:19 That feels like a totally open,

00:26:22 exciting new world of robotics.

00:26:24 I mean, how many people are even thinking

00:26:26 about that kind of thing?

00:26:28 A handful of us, I’d say.

00:26:30 It’s rare because it’s actually leveraging human.

00:26:33 I mean, most roboticists,

00:26:34 I’ve talked to a lot of colleagues and so on,

00:26:38 are kind of, being honest, kind of afraid of humans.

00:26:42 Because they’re messy and complicated, right?

00:26:45 I understand.

00:26:47 Going back to what we were talking about earlier,

00:26:49 right now we’re kind of in this dilemma of, okay,

00:26:52 there are tasks that we can just assume

00:26:54 people are approximately rational for

00:26:55 and we can figure out what they want.

00:26:57 We can figure out their goals.

00:26:57 We can figure out their driving styles, whatever.

00:26:59 Cool.

00:27:00 There are these tasks that we can’t.

00:27:02 So what do we do, right?

00:27:03 Do we pack our bags and go home?

00:27:06 And this one, I’ve had a little bit of hope recently.

00:27:12 And I’m kind of doubting myself

00:27:13 because what do I know that, you know,

00:27:15 50 years of behavioral economics hasn’t figured out.

00:27:19 But maybe it’s not really in contradiction

00:27:21 with the way that field is headed.

00:27:23 But basically one thing that we’ve been thinking about is,

00:27:27 instead of kind of giving up and saying

00:27:30 people are too crazy and irrational

00:27:32 for us to make sense of them,

00:27:34 maybe we can give them a bit the benefit of the doubt.

00:27:39 And maybe we can think of them

00:27:41 as actually being relatively rational,

00:27:43 but just under different assumptions about the world,

00:27:48 about how the world works, about, you know,

00:27:51 they don’t have, when we think about rationality,

00:27:54 implicit assumption is, oh, they’re rational,

00:27:56 and they’re all the same assumptions and constraints

00:27:58 as the robot, right?

00:27:59 What, if this is the state of the world,

00:28:01 that’s what they know.

00:28:02 This is the transition function, that’s what they know.

00:28:05 This is the horizon, that’s what they know.

00:28:07 But maybe the kind of this difference,

00:28:11 the way, the reason they can seem a little messy

00:28:13 and hectic, especially to robots,

00:28:16 is that perhaps they just make different assumptions

00:28:20 or have different beliefs.

00:28:21 Yeah, I mean, that’s another fascinating idea

00:28:24 that this, our kind of anecdotal desire

00:28:29 to say that humans are irrational,

00:28:31 perhaps grounded in behavioral economics,

00:28:33 is that we just don’t understand the constraints

00:28:36 and the rewards under which they operate.

00:28:38 And so our goal shouldn’t be to throw our hands up

00:28:40 and say they’re irrational,

00:28:42 it’s to say, let’s try to understand

00:28:44 what are the constraints.

00:28:46 What it is that they must be assuming

00:28:48 that makes this behavior make sense.

00:28:51 Good life lesson, right?

00:28:52 Good life lesson.

00:28:53 That’s true, it’s just outside of robotics.

00:28:55 That’s just good to, that’s communicating with humans.

00:28:58 That’s just a good assume

00:29:00 that you just don’t, sort of empathy, right?

00:29:03 It’s a…

00:29:04 This is maybe there’s something you’re missing

00:29:06 and it’s, you know, it especially happens to robots

00:29:08 cause they’re kind of dumb and they don’t know things.

00:29:10 And oftentimes people are sort of supra rational

00:29:12 and that they actually know a lot of things

00:29:14 that robots don’t.

00:29:15 Sometimes like with the lunar lander,

00:29:17 the robot, you know, knows much more.

00:29:20 So it turns out that if you try to say,

00:29:23 look, maybe people are operating this thing

00:29:26 but assuming a much more simplified physics model

00:29:31 cause they don’t get the complexity of this kind of craft

00:29:33 or the robot arm with seven degrees of freedom

00:29:36 with these inertias and whatever.

00:29:38 So maybe they have this intuitive physics model

00:29:41 which is not, you know, this notion of intuitive physics

00:29:44 is something that you studied actually in cognitive science

00:29:46 was like Josh Denenbaum, Tom Griffith’s work on this stuff.

00:29:49 And what we found is that you can actually try

00:29:54 to figure out what physics model

00:29:58 kind of best explains human actions.

00:30:01 And then you can use that to sort of correct what it is

00:30:06 that they’re commanding the craft to do.

00:30:08 So they might, you know, be sending the craft somewhere

00:30:11 but instead of executing that action,

00:30:13 you can sort of take a step back and say,

00:30:15 according to their intuitive,

00:30:16 if the world worked according to their intuitive physics

00:30:20 model, where do they think that the craft is going?

00:30:23 Where are they trying to send it to?

00:30:26 And then you can use the real physics, right?

00:30:28 The inverse of that to actually figure out

00:30:30 what you should do so that you do that

00:30:31 instead of where they were actually sending you

00:30:33 in the real world.

00:30:34 And I kid you not at work people land the damn thing

00:30:38 and you know, in between the two flags and all that.

00:30:42 So it’s not conclusive in any way

00:30:45 but I’d say it’s evidence that yeah,

00:30:47 maybe we’re kind of underestimating humans in some ways

00:30:50 when we’re giving up and saying,

00:30:51 yeah, they’re just crazy noisy.

00:30:53 So then you try to explicitly try to model

00:30:56 the kind of worldview that they have.

00:30:58 That they have, that’s right.

00:30:59 That’s right.

00:31:00 And it’s not too, I mean,

00:31:02 there’s things in behavior economics too

00:31:03 that for instance have touched upon the planning horizon.

00:31:06 So there’s this idea that there’s bounded rationality

00:31:09 essentially and the idea that, well,

00:31:11 maybe we work under computational constraints.

00:31:13 And I think kind of our view recently has been

00:31:17 take the Bellman update in AI

00:31:19 and just break it in all sorts of ways by saying state,

00:31:22 no, no, no, the person doesn’t get to see the real state.

00:31:25 Maybe they’re estimating somehow.

00:31:26 Transition function, no, no, no, no, no.

00:31:28 Even the actual reward evaluation,

00:31:31 maybe they’re still learning

00:31:32 about what it is that they want.

00:31:34 Like, you know, when you watch Netflix

00:31:37 and you know, you have all the things

00:31:39 and then you have to pick something,

00:31:41 imagine that, you know, the AI system interpreted

00:31:46 that choice as this is the thing you prefer to see.

00:31:48 Like, how are you going to know?

00:31:49 You’re still trying to figure out what you like,

00:31:51 what you don’t like, et cetera.

00:31:52 So I think it’s important to also account for that.

00:31:55 So it’s not irrationality,

00:31:56 because they’re doing the right thing

00:31:58 under the things that they know.

00:31:59 Yeah, that’s brilliant.

00:32:01 You mentioned recommender systems.

00:32:03 What kind of, and we were talking

00:32:05 about human robot interaction,

00:32:07 what kind of problem spaces are you thinking about?

00:32:10 So is it robots, like wheeled robots

00:32:14 with autonomous vehicles?

00:32:16 Is it object manipulation?

00:32:18 Like when you think

00:32:19 about human robot interaction in your mind,

00:32:21 and maybe I’m sure you can speak

00:32:24 for the entire community of human robot interaction.

00:32:27 But like, what are the problems of interest here?

00:32:30 And does it, you know, I kind of think

00:32:34 of open domain dialogue as human robot interaction,

00:32:40 and that happens not in the physical space,

00:32:43 but it could just happen in the virtual space.

00:32:46 So where’s the boundaries of this field for you

00:32:49 when you’re thinking about the things

00:32:50 we’ve been talking about?

00:32:51 Yeah, so I try to find kind of underlying,

00:33:00 I don’t know what to even call them.

00:33:02 I try to work on, you know, I might call what I do,

00:33:05 the kind of working on the foundations

00:33:07 of algorithmic human robot interaction

00:33:09 and trying to make contributions there.

00:33:12 And it’s important to me that whatever we do

00:33:15 is actually somewhat domain agnostic when it comes to,

00:33:19 is it about, you know, autonomous cars

00:33:23 or is it about quadrotors or is it about,

00:33:27 is this sort of the same underlying principles apply?

00:33:30 Of course, when you’re trying to get

00:33:31 a particular domain to work,

00:33:32 you usually have to do some extra work

00:33:34 to adapt that to that particular domain.

00:33:36 But these things that we were talking about around,

00:33:40 well, you know, how do you model humans?

00:33:42 It turns out that a lot of systems need

00:33:44 to core benefit from a better understanding

00:33:47 of how human behavior relates to what people want

00:33:50 and need to predict human behavior,

00:33:53 physical robots of all sorts and beyond that.

00:33:56 And so I used to do manipulation.

00:33:58 I used to be, you know, picking up stuff

00:34:00 and then I was picking up stuff with people around.

00:34:03 And now it’s sort of very broad

00:34:05 when it comes to the application level,

00:34:07 but in a sense, very focused on, okay,

00:34:11 how does the problem need to change?

00:34:14 How do the algorithms need to change

00:34:15 when we’re not doing a robot by itself?

00:34:19 You know, emptying the dishwasher,

00:34:21 but we’re stepping outside of that.

00:34:23 I thought that popped into my head just now.

00:34:26 On the game theoretic side,

00:34:27 I think you said this really interesting idea

00:34:29 of using actions to gain more information.

00:34:33 But if we think of sort of game theory,

00:34:39 the humans that are interacting with you,

00:34:43 with you, the robot?

00:34:44 Wow, I’m thinking the identity of the robot.

00:34:46 Yeah, I do that all the time.

00:34:47 Yeah, is they also have a world model of you

00:34:55 and you can manipulate that.

00:34:57 I mean, if we look at autonomous vehicles,

00:34:59 people have a certain viewpoint.

00:35:01 You said with the kids, people see Alexa in a certain way.

00:35:07 Is there some value in trying to also optimize

00:35:10 how people see you as a robot?

00:35:15 Or is that a little too far away from the specifics

00:35:20 of what we can solve right now?

00:35:21 So, well, both, right?

00:35:24 So it’s really interesting.

00:35:26 And we’ve seen a little bit of progress on this problem,

00:35:30 on pieces of this problem.

00:35:32 So you can, again, it kind of comes down

00:35:36 to how complicated does the human model need to be?

00:35:38 But in one piece of work that we were looking at,

00:35:42 we just said, okay, there’s these parameters

00:35:46 that are internal to the robot

00:35:47 and what the robot is about to do,

00:35:51 or maybe what objective,

00:35:52 what driving style the robot has or something like that.

00:35:55 And what we’re gonna do is we’re gonna set up a system

00:35:58 where part of the state is the person’s belief

00:36:00 over those parameters.

00:36:02 And now when the robot acts,

00:36:05 that the person gets new evidence

00:36:07 about this robot internal state.

00:36:10 And so they’re updating their mental model of the robot.

00:36:13 So if they see a car that sort of cuts someone off,

00:36:16 they’re like, oh, that’s an aggressive car.

00:36:18 They know more.

00:36:20 If they see sort of a robot head towards a particular door,

00:36:24 they’re like, oh yeah, the robot’s trying to get

00:36:25 to that door.

00:36:26 So this thing that we have to do with humans

00:36:27 to try and understand their goals and intentions,

00:36:31 humans are inevitably gonna do that to robots.

00:36:34 And then that raises this interesting question

00:36:36 that you asked, which is, can we do something about that?

00:36:38 This is gonna happen inevitably,

00:36:40 but we can sort of be more confusing

00:36:42 or less confusing to people.

00:36:44 And it turns out you can optimize

00:36:45 for being more informative and less confusing

00:36:48 if you have an understanding of how your actions

00:36:51 are being interpreted by the human,

00:36:53 and how they’re using these actions to update their belief.

00:36:56 And honestly, all we did is just Bayes rule.

00:36:59 Basically, okay, the person has a belief,

00:37:02 they see an action, they make some assumptions

00:37:04 about how the robot generates its actions,

00:37:06 presumably as being rational,

00:37:07 because robots are rational.

00:37:09 It’s reasonable to assume that about them.

00:37:11 And then they incorporate that new piece of evidence

00:37:17 in the Bayesian sense in their belief,

00:37:19 and they obtain a posterior.

00:37:20 And now the robot is trying to figure out

00:37:23 what actions to take such that it steers

00:37:25 the person’s belief to put as much probability mass

00:37:27 as possible on the correct parameters.

00:37:31 So that’s kind of a mathematical formalization of that.

00:37:33 But my worry, and I don’t know if you wanna go there

00:37:38 with me, but I talk about this quite a bit.

00:37:44 The kids talking to Alexa disrespectfully worries me.

00:37:49 I worry in general about human nature.

00:37:52 Like I said, I grew up in Soviet Union, World War II,

00:37:54 I’m a Jew too, so with the Holocaust and everything.

00:37:58 I just worry about how we humans sometimes treat the other,

00:38:02 the group that we call the other, whatever it is.

00:38:05 Through human history, the group that’s the other

00:38:07 has been changed faces.

00:38:09 But it seems like the robot will be the other, the other,

00:38:13 the next other.

00:38:15 And one thing is it feels to me

00:38:19 that robots don’t get no respect.

00:38:22 They get shoved around.

00:38:23 Shoved around, and is there, one, at the shallow level,

00:38:27 for a better experience, it seems that robots

00:38:29 need to talk back a little bit.

00:38:31 Like my intuition says, I mean, most companies

00:38:35 from sort of Roomba, autonomous vehicle companies

00:38:38 might not be so happy with the idea that a robot

00:38:41 has a little bit of an attitude.

00:38:43 But I feel, it feels to me that that’s necessary

00:38:46 to create a compelling experience.

00:38:48 Like we humans don’t seem to respect anything

00:38:50 that doesn’t give us some attitude.

00:38:52 That, or like a mix of mystery and attitude and anger

00:38:58 and that threatens us subtly, maybe passive aggressively.

00:39:03 I don’t know.

00:39:04 It seems like we humans, yeah, need that.

00:39:08 Do you, what are your, is there something,

00:39:10 you have thoughts on this?

00:39:11 All right, I’ll give you two thoughts on this.

00:39:13 Okay, sure.

00:39:13 One is, one is, it’s, we respond to, you know,

00:39:18 someone being assertive, but we also respond

00:39:24 to someone being vulnerable.

00:39:26 So I think robots, my first thought is that

00:39:28 robots get shoved around and bullied a lot

00:39:31 because they’re sort of, you know, tempting

00:39:32 and they’re sort of showing off

00:39:34 or they appear to be showing off.

00:39:35 And so I think going back to these things

00:39:38 we were talking about in the beginning

00:39:39 of making robots a little more, a little more expressive,

00:39:43 a little bit more like, eh, that wasn’t cool to do.

00:39:46 And now I’m bummed, right?

00:39:49 I think that that can actually help

00:39:51 because people can’t help but anthropomorphize

00:39:53 and respond to that.

00:39:54 Even that though, the emotion being communicated

00:39:56 is not in any way a real thing.

00:39:58 And people know that it’s not a real thing

00:40:00 because they know it’s just a machine.

00:40:01 We’re still interpreting, you know, we watch,

00:40:04 there’s this famous psychology experiment

00:40:07 with little triangles and kind of dots on a screen

00:40:11 and a triangle is chasing the square

00:40:12 and you get really angry at the darn triangle

00:40:15 because why is it not leaving the square alone?

00:40:18 So that’s, yeah, we can’t help.

00:40:20 So that was the first thought.

00:40:21 The vulnerability, that’s really interesting that,

00:40:25 I think of like being, pushing back, being assertive

00:40:31 as the only mechanism of getting,

00:40:33 of forming a connection, of getting respect,

00:40:36 but perhaps vulnerability,

00:40:37 perhaps there’s other mechanisms that are less threatening.

00:40:40 Yeah.

00:40:40 Is there?

00:40:41 Well, I think, well, a little bit, yes,

00:40:43 but then this other thing that we can think about is,

00:40:47 it goes back to what you were saying,

00:40:48 that interaction is really game theoretic, right?

00:40:50 So the moment you’re taking actions in a space,

00:40:52 the humans are taking actions in that same space,

00:40:55 but you have your own objective, which is, you know,

00:40:58 you’re a car, you need to get your passenger

00:40:59 to the destination.

00:41:00 And then the human nearby has their own objective,

00:41:03 which somewhat overlaps with you, but not entirely.

00:41:07 You’re not interested in getting into an accident

00:41:09 with each other, but you have different destinations

00:41:11 and you wanna get home faster

00:41:13 and they wanna get home faster.

00:41:14 And that’s a general sum game at that point.

00:41:17 And so that’s, I think that’s what,

00:41:22 treating it as such is kind of a way we can step outside

00:41:25 of this kind of mode that,

00:41:29 where you try to anticipate what people do

00:41:32 and you don’t realize you have any influence over it

00:41:35 while still protecting yourself

00:41:37 because you’re understanding that people also understand

00:41:40 that they can influence you.

00:41:42 And it’s just kind of back and forth is this negotiation,

00:41:45 which is really talking about different equilibria

00:41:49 of a game.

00:41:50 The very basic way to solve coordination

00:41:53 is to just make predictions about what people will do

00:41:55 and then stay out of their way.

00:41:57 And that’s hard for the reasons we talked about,

00:41:59 which is how you have to understand people’s intentions

00:42:02 implicitly, explicitly, who knows,

00:42:05 but somehow you have to get enough of an understanding

00:42:07 of that to be able to anticipate what happens next.

00:42:10 And so that’s challenging.

00:42:11 But then it’s further challenged by the fact

00:42:13 that people change what they do based on what you do

00:42:17 because they don’t plan in isolation either, right?

00:42:21 So when you see cars trying to merge on a highway

00:42:25 and not succeeding, one of the reasons this can be

00:42:27 is because they look at traffic that keeps coming,

00:42:33 they predict what these people are planning on doing,

00:42:35 which is to just keep going,

00:42:37 and then they stay out of the way

00:42:39 because there’s no feasible plan, right?

00:42:42 Any plan would actually intersect

00:42:44 with one of these other people.

00:42:46 So that’s bad, so you get stuck there.

00:42:49 So now kind of if you start thinking about it as no, no, no,

00:42:53 actually these people change what they do

00:42:58 depending on what the car does.

00:42:59 Like if the car actually tries to kind of inch itself forward,

00:43:03 they might actually slow down and let the car in.

00:43:07 And now taking advantage of that,

00:43:10 well, that’s kind of the next level.

00:43:13 We call this like this underactuated system idea

00:43:16 where it’s kind of underactuated system robotics,

00:43:18 but it’s kind of, you’re influenced

00:43:22 these other degrees of freedom,

00:43:23 but you don’t get to decide what they do.

00:43:25 I’ve somewhere seen you mention it,

00:43:28 the human element in this picture as underactuated.

00:43:32 So you understand underactuated robotics

00:43:35 is that you can’t fully control the system.

00:43:41 You can’t go in arbitrary directions

00:43:43 in the configuration space.

00:43:44 Under your control.

00:43:46 Yeah, it’s a very simple way of underactuation

00:43:48 where basically there’s literally these degrees of freedom

00:43:51 that you can control,

00:43:52 and these degrees of freedom that you can’t,

00:43:53 but you influence them.

00:43:54 And I think that’s the important part

00:43:55 is that they don’t do whatever, regardless of what you do,

00:43:59 that what you do influences what they end up doing.

00:44:02 I just also like the poetry of calling human robot

00:44:05 interaction an underactuated robotics problem.

00:44:09 And you also mentioned sort of nudging.

00:44:11 It seems that they’re, I don’t know.

00:44:14 I think about this a lot in the case of pedestrians

00:44:16 I’ve collected hundreds of hours of videos.

00:44:18 I like to just watch pedestrians.

00:44:21 And it seems that.

00:44:22 It’s a funny hobby.

00:44:24 Yeah, it’s weird.

00:44:25 Cause I learn a lot.

00:44:27 I learned a lot about myself,

00:44:28 about our human behavior, from watching pedestrians,

00:44:32 watching people in their environment.

00:44:35 Basically crossing the street

00:44:37 is like you’re putting your life on the line.

00:44:41 I don’t know, tens of millions of time in America every day

00:44:44 is people are just like playing this weird game of chicken

00:44:48 when they cross the street,

00:44:49 especially when there’s some ambiguity

00:44:51 about the right of way.

00:44:54 That has to do either with the rules of the road

00:44:56 or with the general personality of the intersection

00:44:59 based on the time of day and so on.

00:45:02 And this nudging idea,

00:45:05 it seems that people don’t even nudge.

00:45:07 They just aggressively take, make a decision.

00:45:10 Somebody, there’s a runner that gave me this advice.

00:45:14 I sometimes run in the street,

00:45:17 not in the street, on the sidewalk.

00:45:18 And he said that if you don’t make eye contact with people

00:45:22 when you’re running, they will all move out of your way.

00:45:25 It’s called civil inattention.

00:45:27 Civil inattention, that’s a thing.

00:45:29 Oh wow, I need to look this up, but it works.

00:45:32 What is that?

00:45:32 My sense was if you communicate like confidence

00:45:37 in your actions that you’re unlikely to deviate

00:45:41 from the action that you’re following,

00:45:43 that’s a really powerful signal to others

00:45:44 that they need to plan around your actions.

00:45:47 As opposed to nudging where you’re sort of hesitantly,

00:45:50 then the hesitation might communicate

00:45:53 that you’re still in the dance and the game

00:45:56 that they can influence with their own actions.

00:45:59 I’ve recently had a conversation with Jim Keller,

00:46:03 who’s a sort of this legendary chip architect,

00:46:08 but he also led the autopilot team for a while.

00:46:12 And his intuition that driving is fundamentally

00:46:16 still like a ballistics problem.

00:46:18 Like you can ignore the human element

00:46:22 that is just not hitting things.

00:46:24 And you can kind of learn the right dynamics

00:46:26 required to do the merger and all those kinds of things.

00:46:29 And then my sense is, and I don’t know if I can provide

00:46:32 sort of definitive proof of this,

00:46:34 but my sense is like an order of magnitude

00:46:38 are more difficult when humans are involved.

00:46:41 Like it’s not simply object collision avoidance problem.

00:46:48 Where does your intuition,

00:46:49 of course, nobody knows the right answer here,

00:46:51 but where does your intuition fall on the difficulty,

00:46:54 fundamental difficulty of the driving problem

00:46:57 when humans are involved?

00:46:58 Yeah, good question.

00:47:00 I have many opinions on this.

00:47:03 Imagine downtown San Francisco.

00:47:07 Yeah, it’s crazy, busy, everything.

00:47:10 Okay, now take all the humans out.

00:47:12 No pedestrians, no human driven vehicles,

00:47:15 no cyclists, no people on little electric scooters

00:47:18 zipping around, nothing.

00:47:19 I think we’re done.

00:47:21 I think driving at that point is done.

00:47:23 We’re done.

00:47:25 There’s nothing really that still needs

00:47:27 to be solved about that.

00:47:28 Well, let’s pause there.

00:47:30 I think I agree with you and I think a lot of people

00:47:34 that will hear will agree with that,

00:47:37 but we need to sort of internalize that idea.

00:47:41 So what’s the problem there?

00:47:42 Cause we might not quite yet be done with that.

00:47:45 Cause a lot of people kind of focus

00:47:46 on the perception problem.

00:47:48 A lot of people kind of map autonomous driving

00:47:52 into how close are we to solving,

00:47:55 being able to detect all the, you know,

00:47:57 the drivable area, the objects in the scene.

00:48:02 Do you see that as a, how hard is that problem?

00:48:07 So your intuition there behind your statement

00:48:09 was we might have not solved it yet,

00:48:11 but we’re close to solving basically the perception problem.

00:48:14 I think the perception problem, I mean,

00:48:17 and by the way, a bunch of years ago,

00:48:19 this would not have been true.

00:48:21 And a lot of issues in the space were coming

00:48:24 from the fact that, oh, we don’t really, you know,

00:48:27 we don’t know what’s where.

00:48:29 But I think it’s fairly safe to say that at this point,

00:48:33 although you could always improve on things

00:48:35 and all of that, you can drive through downtown San Francisco

00:48:38 if there are no people around.

00:48:40 There’s no really perception issues

00:48:42 standing in your way there.

00:48:44 I think perception is hard, but yeah, it’s, we’ve made

00:48:47 a lot of progress on the perception,

00:48:49 so I had to undermine the difficulty of the problem.

00:48:50 I think everything about robotics is really difficult,

00:48:53 of course, I think that, you know, the planning problem,

00:48:57 the control problem, all very difficult,

00:48:59 but I think what’s, what makes it really kind of, yeah.

00:49:03 It might be, I mean, you know,

00:49:05 and I picked downtown San Francisco,

00:49:07 it’s adapting to, well, now it’s snowing,

00:49:11 now it’s no longer snowing, now it’s slippery in this way,

00:49:14 now it’s the dynamics part could,

00:49:16 I could imagine being still somewhat challenging, but.

00:49:24 No, the thing that I think worries us,

00:49:26 and our intuition’s not good there,

00:49:27 is the perception problem at the edge cases.

00:49:31 Sort of downtown San Francisco, the nice thing,

00:49:35 it’s not actually, it may not be a good example because.

00:49:39 Because you know what you’re getting from,

00:49:41 well, there’s like crazy construction zones

00:49:43 and all of that. Yeah, but the thing is,

00:49:44 you’re traveling at slow speeds,

00:49:46 so like it doesn’t feel dangerous.

00:49:47 To me, what feels dangerous is highway speeds,

00:49:51 when everything is, to us humans, super clear.

00:49:54 Yeah, I’m assuming LiDAR here, by the way.

00:49:57 I think it’s kind of irresponsible to not use LiDAR.

00:49:59 That’s just my personal opinion.

00:50:02 That’s, I mean, depending on your use case,

00:50:04 but I think like, you know, if you have the opportunity

00:50:07 to use LiDAR, in a lot of cases, you might not.

00:50:11 Good, your intuition makes more sense now.

00:50:13 So you don’t think vision.

00:50:15 I really just don’t know enough to say,

00:50:18 well, vision alone, what, you know, what’s like,

00:50:21 there’s a lot of, how many cameras do you have?

00:50:24 Is it, how are you using them?

00:50:25 I don’t know. There’s details.

00:50:26 There’s all, there’s all sorts of details.

00:50:28 I imagine there’s stuff that’s really hard

00:50:30 to actually see, you know, how do you deal with glare,

00:50:33 exactly what you were saying,

00:50:34 stuff that people would see that you don’t.

00:50:37 I think I have, more of my intuition comes from systems

00:50:40 that can actually use LiDAR as well.

00:50:44 Yeah, and until we know for sure,

00:50:45 it makes sense to be using LiDAR.

00:50:48 That’s kind of the safety focus.

00:50:50 But then the sort of the,

00:50:52 I also sympathize with the Elon Musk statement

00:50:55 of LiDAR is a crutch.

00:50:57 It’s a fun notion to think that the things that work today

00:51:04 is a crutch for the invention of the things

00:51:08 that will work tomorrow, right?

00:51:09 Like it, it’s kind of true in the sense that if,

00:51:15 you know, we want to stick to the comfort zone,

00:51:17 you see this in academic and research settings

00:51:19 all the time, the things that work force you

00:51:22 to not explore outside, think outside the box.

00:51:25 I mean, that happens all the time.

00:51:26 The problem is in the safety critical systems,

00:51:29 you kind of want to stick with the things that work.

00:51:32 So it’s an interesting and difficult trade off

00:51:34 in the case of real world sort of safety critical

00:51:38 robotic systems, but so your intuition is,

00:51:44 just to clarify, how, I mean,

00:51:48 how hard is this human element for,

00:51:51 like how hard is driving

00:51:52 when this human element is involved?

00:51:55 Are we years, decades away from solving it?

00:52:00 But perhaps actually the year isn’t the thing I’m asking.

00:52:03 It doesn’t matter what the timeline is,

00:52:05 but do you think we’re, how many breakthroughs

00:52:09 are we away from in solving

00:52:12 the human robotic interaction problem

00:52:13 to get this, to get this right?

00:52:15 I think it, in a sense, it really depends.

00:52:20 I think that, you know, we were talking about how,

00:52:24 well, look, it’s really hard

00:52:25 because anticipate what people do is hard.

00:52:27 And on top of that, playing the game is hard.

00:52:30 But I think we sort of have the fundamental,

00:52:35 some of the fundamental understanding for that.

00:52:38 And then you already see that these systems

00:52:41 are being deployed in the real world,

00:52:45 you know, even driverless.

00:52:47 Like there’s, I think now a few companies

00:52:50 that don’t have a driver in the car in some small areas.

00:52:55 I got a chance to, I went to Phoenix and I,

00:52:59 I shot a video with Waymo and I needed to get

00:53:03 that video out.

00:53:04 People have been giving me slack,

00:53:06 but there’s incredible engineering work being done there.

00:53:09 And it’s one of those other seminal moments

00:53:11 for me in my life to be able to, it sounds silly,

00:53:13 but to be able to drive without a ride, sorry,

00:53:17 without a driver in the seat.

00:53:19 I mean, that was an incredible robotics.

00:53:22 I was driven by a robot without being able to take over,

00:53:27 without being able to take the steering wheel.

00:53:31 That’s a magical, that’s a magical moment.

00:53:33 So in that regard, in those domains,

00:53:35 at least for like Waymo, they’re solving that human,

00:53:39 there’s, I mean, they’re going, I mean, it felt fast

00:53:43 because you’re like freaking out at first.

00:53:45 That was, this is my first experience,

00:53:47 but it’s going like the speed limit, right?

00:53:49 30, 40, whatever it is.

00:53:51 And there’s humans and it deals with them quite well.

00:53:53 It detects them, it negotiates the intersections,

00:53:57 the left turns and all of that.

00:53:58 So at least in those domains, it’s solving them.

00:54:01 The open question for me is like, how quickly can we expand?

00:54:06 You know, that’s the, you know,

00:54:08 outside of the weather conditions,

00:54:10 all of those kinds of things,

00:54:11 how quickly can we expand to like cities like San Francisco?

00:54:14 Yeah, and I wouldn’t say that it’s just, you know,

00:54:17 now it’s just pure engineering and it’s probably the,

00:54:20 I mean, and by the way,

00:54:22 I’m speaking kind of very generally here as hypothesizing,

00:54:26 but I think that there are successes

00:54:31 and yet no one is everywhere out there.

00:54:34 So that seems to suggest that things can be expanded

00:54:38 and can be scaled and we know how to do a lot of things,

00:54:41 but there’s still probably, you know,

00:54:44 new algorithms or modified algorithms

00:54:46 that you still need to put in there

00:54:49 as you learn more and more about new challenges

00:54:53 that you get faced with.

00:54:55 How much of this problem do you think can be learned

00:54:58 through end to end?

00:54:59 Is it the success of machine learning

00:55:00 and reinforcement learning?

00:55:02 How much of it can be learned from sort of data

00:55:05 from scratch and how much,

00:55:07 which most of the success of autonomous vehicle systems

00:55:10 have a lot of heuristics and rule based stuff on top,

00:55:14 like human expertise injected forced into the system

00:55:19 to make it work.

00:55:20 What’s your sense?

00:55:22 How much, what will be the role of learning

00:55:26 in the near term and long term?

00:55:28 I think on the one hand that learning is inevitable here,

00:55:36 right?

00:55:37 I think on the other hand that when people characterize

00:55:39 the problem as it’s a bunch of rules

00:55:42 that some people wrote down,

00:55:44 versus it’s an end to end RL system or imitation learning,

00:55:49 then maybe there’s kind of something missing

00:55:53 from maybe that’s more.

00:55:57 So for instance, I think a very, very useful tool

00:56:02 in this sort of problem,

00:56:04 both in how to generate the car’s behavior

00:56:07 and robots in general and how to model human beings

00:56:11 is actually planning, search optimization, right?

00:56:15 So robotics is the sequential decision making problem.

00:56:18 And when a robot can figure out on its own

00:56:26 how to achieve its goal without hitting stuff

00:56:28 and all that stuff, right?

00:56:30 All the good stuff for motion planning 101,

00:56:33 I think of that as very much AI,

00:56:36 not this is some rule or something.

00:56:38 There’s nothing rule based around that, right?

00:56:40 It’s just you’re searching through a space

00:56:42 and figuring out are you optimizing through a space

00:56:43 and figure out what seems to be the right thing to do.

00:56:47 And I think it’s hard to just do that

00:56:49 because you need to learn models of the world.

00:56:52 And I think it’s hard to just do the learning part

00:56:55 where you don’t bother with any of that,

00:56:58 because then you’re saying, well, I could do imitation,

00:57:01 but then when I go off distribution, I’m really screwed.

00:57:04 Or you can say, I can do reinforcement learning,

00:57:08 which adds a lot of robustness,

00:57:09 but then you have to do either reinforcement learning

00:57:12 in the real world, which sounds a little challenging

00:57:15 or that trial and error, you know,

00:57:18 or you have to do reinforcement learning in simulation.

00:57:21 And then that means, well, guess what?

00:57:23 You need to model things, at least to model people,

00:57:27 model the world enough that whatever policy you get of that

00:57:31 is actually fine to roll out in the world

00:57:34 and do some additional learning there.

00:57:36 So. Do you think simulation, by the way, just a quick tangent

00:57:40 has a role in the human robot interaction space?

00:57:44 Like, is it useful?

00:57:46 It seems like humans, everything we’ve been talking about

00:57:48 are difficult to model and simulate.

00:57:51 Do you think simulation has a role in this space?

00:57:53 I do.

00:57:54 I think so because you can take models

00:57:58 and train with them ahead of time, for instance.

00:58:04 You can.

00:58:06 But the models, sorry to interrupt,

00:58:07 the models are sort of human constructed or learned?

00:58:10 I think they have to be a combination

00:58:14 because if you get some human data and then you say,

00:58:20 this is how, this is gonna be my model of the person.

00:58:22 What are for simulation and training

00:58:24 or for just deployment time?

00:58:25 And that’s what I’m planning with

00:58:27 as my model of how people work.

00:58:29 Regardless, if you take some data

00:58:33 and you don’t assume anything else and you just say,

00:58:35 okay, this is some data that I’ve collected.

00:58:39 Let me fit a policy to how people work based on that.

00:58:42 What tends to happen is you collected some data

00:58:45 and some distribution, and then now your robot

00:58:50 sort of computes a best response to that, right?

00:58:52 It’s sort of like, what should I do

00:58:54 if this is how people work?

00:58:56 And easily goes off of distribution

00:58:58 where that model that you’ve built of the human

00:59:01 completely sucks because out of distribution,

00:59:03 you have no idea, right?

00:59:05 If you think of all the possible policies

00:59:07 and then you take only the ones that are consistent

00:59:10 with the human data that you’ve observed,

00:59:13 that still leads a lot of, a lot of things could happen

00:59:15 outside of that distribution where you’re confident

00:59:18 then you know what’s going on.

00:59:19 By the way, that’s, I mean, I’ve gotten used

00:59:22 to this terminology of not a distribution,

00:59:25 but it’s such a machine learning terminology

00:59:29 because it kind of assumes,

00:59:30 so distribution is referring to the data

00:59:36 that you’ve seen.

00:59:36 The set of states that you encounter

00:59:38 at training time. They’ve encountered so far

00:59:39 at training time. Yeah.

00:59:40 But it kind of also implies that there’s a nice

00:59:43 like statistical model that represents that data.

00:59:47 So out of distribution feels like, I don’t know,

00:59:50 it raises to me philosophical questions

00:59:54 of how we humans reason out of distribution,

00:59:58 reason about things that are completely,

01:00:01 we haven’t seen before.

01:00:03 And so, and what we’re talking about here is

01:00:05 how do we reason about what other people do

01:00:09 in situations where we haven’t seen them?

01:00:11 And somehow we just magically navigate that.

01:00:14 I can anticipate what will happen in situations

01:00:18 that are even novel in many ways.

01:00:21 And I have a pretty good intuition for,

01:00:22 I don’t always get it right, but you know,

01:00:24 and I might be a little uncertain and so on.

01:00:26 But I think it’s this that if you just rely on data,

01:00:33 you know, there’s just too many possibilities,

01:00:36 there’s too many policies out there that fit the data.

01:00:37 And by the way, it’s not just state,

01:00:39 it’s really kind of history of state,

01:00:40 cause to really be able to anticipate

01:00:41 what the person will do,

01:00:43 it kind of depends on what they’ve been doing so far,

01:00:45 cause that’s the information you need to kind of,

01:00:47 at least implicitly sort of say,

01:00:49 oh, this is the kind of person that this is,

01:00:51 this is probably what they’re trying to do.

01:00:53 So anyway, it’s like you’re trying to map history of states

01:00:55 to actions, there’s many mappings.

01:00:56 And history meaning like the last few seconds

01:00:59 or the last few minutes or the last few months.

01:01:02 Who knows, who knows how much you need, right?

01:01:04 In terms of if your state is really like the positions

01:01:07 of everything or whatnot and velocities,

01:01:09 who knows how much you need.

01:01:10 And then there’s so many mappings.

01:01:14 And so now you’re talking about

01:01:16 how do you regularize that space?

01:01:17 What priors do you impose or what’s the inductive bias?

01:01:21 So, you know, there’s all very related things

01:01:23 to think about it.

01:01:25 Basically, what are assumptions that we should be making

01:01:29 such that these models actually generalize

01:01:32 outside of the data that we’ve seen?

01:01:35 And now you’re talking about, well, I don’t know,

01:01:37 what can you assume?

01:01:38 Maybe you can assume that people like actually

01:01:40 have intentions and that’s what drives their actions.

01:01:43 Maybe that’s, you know, the right thing to do

01:01:46 when you haven’t seen data very nearby

01:01:49 that tells you otherwise.

01:01:51 I don’t know, it’s a very open question.

01:01:53 Do you think sort of that one of the dreams

01:01:55 of artificial intelligence was to solve

01:01:58 common sense reasoning, whatever the heck that means.

01:02:02 Do you think something like common sense reasoning

01:02:04 has to be solved in part to be able to solve this dance

01:02:09 of human robot interaction, the driving space

01:02:12 or human robot interaction in general?

01:02:14 Do you have to be able to reason about these kinds

01:02:16 of common sense concepts of physics,

01:02:21 of, you know, all the things we’ve been talking about

01:02:27 humans, I don’t even know how to express them with words,

01:02:30 but the basics of human behavior, a fear of death.

01:02:34 So like, to me, it’s really important to encode

01:02:38 in some kind of sense, maybe not, maybe it’s implicit,

01:02:41 but it feels that it’s important to explicitly encode

01:02:44 the fear of death, that people don’t wanna die.

01:02:48 Because it seems silly, but like the game of chicken

01:02:56 that involves with the pedestrian crossing the street

01:02:59 is playing with the idea of mortality.

01:03:03 Like we really don’t wanna die.

01:03:04 It’s not just like a negative reward.

01:03:07 I don’t know, it just feels like all these human concepts

01:03:10 have to be encoded.

01:03:11 Do you share that sense or is this a lot simpler

01:03:14 than I’m making out to be?

01:03:15 I think it might be simpler.

01:03:17 And I’m the person who likes to complicate things.

01:03:18 I think it might be simpler than that.

01:03:21 Because it turns out, for instance,

01:03:24 if you say model people in the very,

01:03:29 I’ll call it traditional, I don’t know if it’s fair

01:03:31 to look at it as a traditional way,

01:03:33 but you know, calling people as,

01:03:35 okay, they’re rational somehow,

01:03:37 the utilitarian perspective.

01:03:40 Well, in that, once you say that,

01:03:45 you automatically capture that they have an incentive

01:03:48 to keep on being.

01:03:50 You know, Stuart likes to say,

01:03:53 you can’t fetch the coffee if you’re dead.

01:03:56 Stuart Russell, by the way.

01:03:59 That’s a good line.

01:04:01 So when you’re sort of treating agents

01:04:05 as having these objectives, these incentives,

01:04:10 humans or artificial, you’re kind of implicitly modeling

01:04:14 that they’d like to stick around

01:04:16 so that they can accomplish those goals.

01:04:20 So I think in a sense,

01:04:22 maybe that’s what draws me so much

01:04:24 to the rationality framework,

01:04:25 even though it’s so broken,

01:04:26 we’ve been able to, it’s been such a useful perspective.

01:04:30 And like we were talking about earlier,

01:04:32 what’s the alternative?

01:04:33 I give up and go home or, you know,

01:04:34 I just use complete black boxes,

01:04:36 but then I don’t know what to assume out of distribution

01:04:37 that come back to this.

01:04:40 It’s just, it’s been a very fruitful way

01:04:42 to think about the problem

01:04:43 in a very more positive way, right?

01:04:47 People aren’t just crazy.

01:04:49 Maybe they make more sense than we think.

01:04:51 But I think we also have to somehow be ready for it

01:04:55 to be wrong, be able to detect

01:04:58 when these assumptions aren’t holding,

01:05:00 be all of that stuff.

01:05:02 Let me ask sort of another small side of this

01:05:06 that we’ve been talking about

01:05:07 the pure autonomous driving problem,

01:05:09 but there’s also relatively successful systems

01:05:13 already deployed out there in what you may call

01:05:17 like level two autonomy or semi autonomous vehicles,

01:05:20 whether that’s Tesla Autopilot,

01:05:23 work quite a bit with Cadillac SuperGuru system,

01:05:27 which has a driver facing camera that detects your state.

01:05:31 There’s a bunch of basically lane centering systems.

01:05:35 What’s your sense about this kind of way of dealing

01:05:41 with the human robot interaction problem

01:05:43 by having a really dumb robot

01:05:46 and relying on the human to help the robot out

01:05:50 to keep them both alive?

01:05:53 Is that from the research perspective,

01:05:57 how difficult is that problem?

01:05:59 And from a practical deployment perspective,

01:06:02 is that a fruitful way to approach

01:06:05 this human robot interaction problem?

01:06:08 I think what we have to be careful about there

01:06:12 is to not, it seems like some of these systems,

01:06:16 not all are making this underlying assumption

01:06:19 that if, so I’m a driver and I’m now really not driving,

01:06:25 but supervising and my job is to intervene, right?

01:06:28 And so we have to be careful with this assumption

01:06:31 that when I’m, if I’m supervising,

01:06:36 I will be just as safe as when I’m driving.

01:06:41 That I will, if I wouldn’t get into some kind of accident,

01:06:46 if I’m driving, I will be able to avoid that accident

01:06:50 when I’m supervising too.

01:06:52 And I think I’m concerned about this assumption

01:06:55 from a few perspectives.

01:06:56 So from a technical perspective,

01:06:58 it’s that when you let something kind of take control

01:07:01 and do its thing, and it depends on what that thing is,

01:07:03 obviously, and how much it’s taking control

01:07:05 and how, what things are you trusting it to do.

01:07:07 But if you let it do its thing and take control,

01:07:11 it will go to what we might call off policy

01:07:15 from the person’s perspective state.

01:07:16 So states that the person wouldn’t actually

01:07:18 find themselves in if they were the ones driving.

01:07:22 And the assumption that the person functions

01:07:24 just as well there as they function in the states

01:07:26 that they would normally encounter

01:07:28 is a little questionable.

01:07:30 Now, another part is the kind of the human factor side

01:07:34 of this, which is that I don’t know about you,

01:07:38 but I think I definitely feel like I’m experiencing things

01:07:42 very differently when I’m actively engaged in the task

01:07:45 versus when I’m a passive observer.

01:07:47 Like even if I try to stay engaged, right?

01:07:49 It’s very different than when I’m actually

01:07:51 actively making decisions.

01:07:53 And you see this in life in general.

01:07:55 Like you see students who are actively trying

01:07:58 to come up with the answer, learn this thing better

01:08:00 than when they’re passively told the answer.

01:08:03 I think that’s somewhat related.

01:08:04 And I think people have studied this in human factors

01:08:06 for airplanes.

01:08:07 And I think it’s actually fairly established

01:08:10 that these two are not the same.

01:08:12 So.

01:08:13 On that point, because I’ve gotten a huge amount

01:08:14 of heat on this and I stand by it.

01:08:17 Okay.

01:08:18 Because I know the human factors community well

01:08:22 and the work here is really strong.

01:08:24 And there’s many decades of work showing exactly

01:08:27 what you’re saying.

01:08:28 Nevertheless, I’ve been continuously surprised

01:08:30 that much of the predictions of that work has been wrong

01:08:33 in what I’ve seen.

01:08:35 So what we have to do,

01:08:37 I still agree with everything you said,

01:08:40 but we have to be a little bit more open minded.

01:08:45 So the, I’ll tell you, there’s a few surprising things

01:08:49 that supervise, like everything you said to the word

01:08:52 is actually exactly correct.

01:08:54 But it doesn’t say, what you didn’t say

01:08:57 is that these systems are,

01:09:00 you said you can’t assume a bunch of things,

01:09:02 but we don’t know if these systems are fundamentally unsafe.

01:09:06 That’s still unknown.

01:09:08 There’s a lot of interesting things,

01:09:11 like I’m surprised by the fact, not the fact,

01:09:15 that what seems to be anecdotally from,

01:09:18 well, from large data collection that we’ve done,

01:09:21 but also from just talking to a lot of people,

01:09:23 when in the supervisory role of semi autonomous systems

01:09:27 that are sufficiently dumb, at least,

01:09:29 which is, that might be the key element,

01:09:33 is the systems have to be dumb.

01:09:35 The people are actually more energized as observers.

01:09:38 So they’re actually better,

01:09:40 they’re better at observing the situation.

01:09:43 So there might be cases in systems,

01:09:46 if you get the interaction right,

01:09:48 where you, as a supervisor,

01:09:50 will do a better job with the system together.

01:09:53 I agree, I think that is actually really possible.

01:09:56 I guess mainly I’m pointing out that if you do it naively,

01:10:00 you’re implicitly assuming something,

01:10:02 that assumption might actually really be wrong.

01:10:04 But I do think that if you explicitly think about

01:10:09 what the agent should do

01:10:10 so that the person still stays engaged.

01:10:13 What the, so that you essentially empower the person

01:10:16 to do more than they could,

01:10:17 that’s really the goal, right?

01:10:19 Is you still have a driver,

01:10:20 so you wanna empower them to be so much better

01:10:25 than they would be by themselves.

01:10:27 And that’s different, it’s a very different mindset

01:10:29 than I want them to basically not drive, right?

01:10:33 And, but be ready to sort of take over.

01:10:40 So one of the interesting things we’ve been talking about

01:10:42 is the rewards, that they seem to be fundamental too,

01:10:47 the way robots behaves.

01:10:49 So broadly speaking,

01:10:52 we’ve been talking about utility functions and so on,

01:10:54 but could you comment on how do we approach

01:10:56 the design of reward functions?

01:10:59 Like, how do we come up with good reward functions?

01:11:02 Well, really good question,

01:11:05 because the answer is we don’t.

01:11:10 This was, you know, I used to think,

01:11:13 I used to think about how,

01:11:16 well, it’s actually really hard to specify rewards

01:11:18 for interaction because it’s really supposed to be

01:11:22 what the people want, and then you really, you know,

01:11:25 we talked about how you have to customize

01:11:26 what you wanna do to the end user.

01:11:30 But I kind of realized that even if you take

01:11:36 the interactive component away,

01:11:39 it’s still really hard to design reward functions.

01:11:42 So what do I mean by that?

01:11:43 I mean, if we assume this sort of AI paradigm

01:11:47 in which there’s an agent and his job is to optimize

01:11:51 some objectives, some reward, utility, loss, whatever, cost,

01:11:58 if you write it out, maybe it’s a set,

01:12:00 depending on the situation or whatever it is,

01:12:03 if you write that out and then you deploy the agent,

01:12:06 you’d wanna make sure that whatever you specified

01:12:10 incentivizes the behavior you want from the agent

01:12:14 in any situation that the agent will be faced with, right?

01:12:18 So I do motion planning on my robot arm,

01:12:22 I specify some cost function like, you know,

01:12:25 this is how far away you should try to stay,

01:12:28 so much it matters to stay away from people,

01:12:29 and this is how much it matters to be able to be efficient

01:12:31 and blah, blah, blah, right?

01:12:33 I need to make sure that whatever I specified,

01:12:36 those constraints or trade offs or whatever they are,

01:12:40 that when the robot goes and solves that problem

01:12:43 in every new situation,

01:12:45 that behavior is the behavior that I wanna see.

01:12:47 And what I’ve been finding is

01:12:50 that we have no idea how to do that.

01:12:52 Basically, what I can do is I can sample,

01:12:56 I can think of some situations

01:12:58 that I think are representative of what the robot will face,

01:13:02 and I can tune and add and tune some reward function

01:13:08 until the optimal behavior is what I want

01:13:11 on those situations,

01:13:13 which first of all is super frustrating

01:13:15 because, you know, through the miracle of AI,

01:13:19 we’ve taken, we don’t have to specify rules

01:13:21 for behavior anymore, right?

01:13:22 The, who were saying before,

01:13:24 the robot comes up with the right thing to do,

01:13:27 you plug in this situation,

01:13:28 it optimizes right in that situation, it optimizes,

01:13:31 but you have to spend still a lot of time

01:13:34 on actually defining what it is

01:13:37 that that criteria should be,

01:13:39 making sure you didn’t forget

01:13:40 about 50 bazillion things that are important

01:13:42 and how they all should be combining together

01:13:44 to tell the robot what’s good and what’s bad

01:13:46 and how good and how bad.

01:13:48 And so I think this is a lesson that I don’t know,

01:13:55 kind of, I guess I close my eyes to it for a while

01:13:59 cause I’ve been, you know,

01:14:00 tuning cost functions for 10 years now,

01:14:03 but it’s really strikes me that,

01:14:07 yeah, we’ve moved the tuning

01:14:09 and the like designing of features or whatever

01:14:13 from the behavior side into the reward side.

01:14:19 And yes, I agree that there’s way less of it,

01:14:22 but it still seems really hard

01:14:24 to anticipate any possible situation

01:14:26 and make sure you specify a reward function

01:14:30 that when optimized will work well

01:14:32 in every possible situation.

01:14:35 So you’re kind of referring to unintended consequences

01:14:38 or just in general, any kind of suboptimal behavior

01:14:42 that emerges outside of the things you said,

01:14:44 out of distribution.

01:14:46 Suboptimal behavior that is, you know, actually optimal.

01:14:49 I mean, this, I guess the idea of unintended consequences,

01:14:51 you know, it’s optimal respect to what you specified,

01:14:53 but it’s not what you want.

01:14:55 And there’s a difference between those.

01:14:57 But that’s not fundamentally a robotics problem, right?

01:14:59 That’s a human problem.

01:15:01 So like. That’s the thing, right?

01:15:03 So there’s this thing called Goodhart’s law,

01:15:05 which is you set a metric for an organization

01:15:07 and the moment it becomes a target

01:15:10 that people actually optimize for,

01:15:13 it’s no longer a good metric.

01:15:15 What’s it called?

01:15:15 Goodhart’s law.

01:15:16 Goodhart’s law.

01:15:17 So the moment you specify a metric,

01:15:20 it stops doing its job.

01:15:21 Yeah, it stops doing its job.

01:15:24 So there’s, yeah, there’s such a thing

01:15:25 as optimizing for things and, you know,

01:15:27 failing to think ahead of time

01:15:32 of all the possible things that might be important.

01:15:35 And so that’s, so that’s interesting

01:15:38 because Historia works a lot on reward learning

01:15:41 from the perspective of customizing to the end user,

01:15:44 but it really seems like it’s not just the interaction

01:15:48 with the end user that’s a problem of the human

01:15:50 and the robot collaborating

01:15:52 so that the robot can do what the human wants, right?

01:15:55 This kind of back and forth, the robot probing,

01:15:57 the person being informative, all of that stuff

01:16:00 might be actually just as applicable

01:16:04 to this kind of maybe new form of human robot interaction,

01:16:07 which is the interaction between the robot

01:16:10 and the expert programmer, roboticist designer

01:16:14 in charge of actually specifying

01:16:16 what the heck the robot should do,

01:16:18 specifying the task for the robot.

01:16:20 That’s fascinating.

01:16:21 That’s so cool, like collaborating on the reward design.

01:16:23 Right, collaborating on the reward design.

01:16:26 And so what does it mean, right?

01:16:28 What does it, when we think about the problem,

01:16:29 not as someone specifies all of your job is to optimize,

01:16:34 and we start thinking about you’re in this interaction

01:16:37 and this collaboration.

01:16:39 And the first thing that comes up is

01:16:42 when the person specifies a reward, it’s not, you know,

01:16:46 gospel, it’s not like the letter of the law.

01:16:48 It’s not the definition of the reward function

01:16:52 you should be optimizing,

01:16:53 because they’re doing their best,

01:16:54 but they’re not some magic perfect oracle.

01:16:57 And the sooner we start understanding that,

01:16:58 I think the sooner we’ll get to more robust robots

01:17:02 that function better in different situations.

01:17:06 And then you have kind of say, okay, well,

01:17:08 it’s almost like robots are over learning,

01:17:12 over putting too much weight on the reward specified

01:17:16 by definition, and maybe leaving a lot of other information

01:17:21 on the table, like what are other things we could do

01:17:23 to actually communicate to the robot

01:17:25 about what we want them to do besides attempting

01:17:28 to specify a reward function.

01:17:29 Yeah, you have this awesome,

01:17:31 and again, I love the poetry of it, of leaked information.

01:17:34 So you mentioned humans leak information

01:17:38 about what they want, you know,

01:17:40 leak reward signal for the robot.

01:17:44 So how do we detect these leaks?

01:17:47 What is that?

01:17:48 Yeah, what are these leaks?

01:17:49 Whether it just, I don’t know,

01:17:51 those were just recently saw it, read it,

01:17:54 I don’t know where from you,

01:17:55 and it’s gonna stick with me for a while for some reason,

01:17:58 because it’s not explicitly expressed.

01:18:00 It kind of leaks indirectly from our behavior.

01:18:04 From what we do, yeah, absolutely.

01:18:06 So I think maybe some surprising bits, right?

01:18:11 So we were talking before about, I’m a robot arm,

01:18:14 it needs to move around people, carry stuff,

01:18:18 put stuff away, all of that.

01:18:20 And now imagine that, you know,

01:18:25 the robot has some initial objective

01:18:27 that the programmer gave it

01:18:28 so they can do all these things functionally.

01:18:30 It’s capable of doing that.

01:18:32 And now I noticed that it’s doing something

01:18:35 and maybe it’s coming too close to me, right?

01:18:39 And maybe I’m the designer,

01:18:40 maybe I’m the end user and this robot is now in my home.

01:18:43 And I push it away.

01:18:47 So I push away because, you know,

01:18:49 it’s a reaction to what the robot is currently doing.

01:18:52 And this is what we call physical human robot interaction.

01:18:55 And now there’s a lot of interesting work

01:18:58 on how the heck do you respond to physical human

01:19:00 robot interaction?

01:19:01 What should the robot do if such an event occurs?

01:19:03 And there’s sort of different schools of thought.

01:19:05 Well, you know, you can sort of treat it

01:19:07 the control theoretic way and say,

01:19:08 this is a disturbance that you must reject.

01:19:11 You can sort of treat it more kind of heuristically

01:19:15 and say, I’m gonna go into some like gravity compensation

01:19:18 mode so that I’m easily maneuverable around.

01:19:19 I’m gonna go in the direction that the person pushed me.

01:19:22 And to us, part of realization has been

01:19:27 that that is signal that communicates about the reward.

01:19:30 Because if my robot was moving in an optimal way

01:19:34 and I intervened, that means that I disagree

01:19:37 with his notion of optimality, right?

01:19:40 Whatever it thinks is optimal is not actually optimal.

01:19:43 And sort of optimization problems aside,

01:19:45 that means that the cost function,

01:19:47 the reward function is incorrect,

01:19:51 or at least is not what I want it to be.

01:19:53 How difficult is that signal to interpret

01:19:58 and make actionable?

01:19:59 So like, cause this connects

01:20:00 to our autonomous vehicle discussion

01:20:02 where they’re in the semi autonomous vehicle

01:20:03 or autonomous vehicle when a safety driver

01:20:06 disengages the car, like,

01:20:08 but they could have disengaged it for a million reasons.

01:20:11 Yeah, so that’s true.

01:20:15 Again, it comes back to, can you structure a little bit

01:20:19 your assumptions about how human behavior

01:20:22 relates to what they want?

01:20:24 And you can, one thing that we’ve done is

01:20:26 literally just treated this external torque

01:20:29 that they applied as, when you take that

01:20:32 and you add it with what the torque

01:20:34 the robot was already applying,

01:20:36 that overall action is probably relatively optimal

01:20:39 in respect to whatever it is that the person wants.

01:20:41 And then that gives you information

01:20:43 about what it is that they want.

01:20:44 So you can learn that people want you

01:20:45 to stay further away from them.

01:20:47 Now you’re right that there might be many things

01:20:49 that explain just that one signal

01:20:51 and that you might need much more data than that

01:20:53 for the person to be able to shape

01:20:55 your reward function over time.

01:20:58 You can also do this info gathering stuff

01:21:00 that we were talking about.

01:21:01 Not that we’ve done that in that context,

01:21:03 just to clarify, but it’s definitely something

01:21:04 we thought about where you can have the robot

01:21:09 start acting in a way, like if there’s

01:21:11 a bunch of different explanations, right?

01:21:13 It moves in a way where it sees if you correct it

01:21:16 in some other way or not,

01:21:17 and then kind of actually plans its motion

01:21:19 so that it can disambiguate

01:21:21 and collect information about what you want.

01:21:24 Anyway, so that’s one way,

01:21:26 that’s kind of sort of leaked information,

01:21:27 maybe even more subtle leaked information

01:21:29 is if I just press the E stop, right?

01:21:32 I just, I’m doing it out of panic

01:21:34 because the robot is about to do something bad.

01:21:36 There’s again, information there, right?

01:21:38 Okay, the robot should definitely stop,

01:21:40 but it should also figure out

01:21:42 that whatever it was about to do was not good.

01:21:45 And in fact, it was so not good

01:21:46 that stopping and remaining stopped for a while

01:21:48 was a better trajectory for it

01:21:51 than whatever it is that it was about to do.

01:21:52 And that again is information about

01:21:54 what are my preferences, what do I want?

01:21:57 Speaking of E stops, what are your expert opinions

01:22:03 on the three laws of robotics from Isaac Asimov

01:22:08 that don’t harm humans, obey orders, protect yourself?

01:22:11 I mean, it’s such a silly notion,

01:22:13 but I speak to so many people these days,

01:22:15 just regular folks, just, I don’t know,

01:22:17 my parents and so on about robotics.

01:22:19 And they kind of operate in that space of,

01:22:23 you know, imagining our future with robots

01:22:25 and thinking what are the ethical,

01:22:28 how do we get that dance right?

01:22:31 I know the three laws might be a silly notion,

01:22:34 but do you think about like

01:22:35 what universal reward functions that might be

01:22:39 that we should enforce on the robots of the future?

01:22:44 Or is that a little too far out and it doesn’t,

01:22:48 or is the mechanism that you just described,

01:22:51 it shouldn’t be three laws,

01:22:52 it should be constantly adjusting kind of thing.

01:22:55 I think it should constantly be adjusting kind of thing.

01:22:57 You know, the issue with the laws is,

01:23:01 I don’t even, you know, they’re words

01:23:02 and I have to write math

01:23:04 and have to translate them into math.

01:23:06 What does it mean to?

01:23:07 What does harm mean?

01:23:08 What is, it’s not math.

01:23:11 Obey what, right?

01:23:12 Cause we just talked about how

01:23:14 you try to say what you want,

01:23:17 but you don’t always get it right.

01:23:19 And you want these machines to do what you want,

01:23:22 not necessarily exactly what you literally,

01:23:24 so you don’t want them to take you literally.

01:23:26 You wanna take what you say and interpret it in context.

01:23:31 And that’s what we do with the specified rewards.

01:23:33 We don’t take them literally anymore from the designer.

01:23:36 We, not we as a community, we as, you know,

01:23:39 some members of my group, we,

01:23:44 and some of our collaborators like Peter Beal

01:23:46 and Stuart Russell, we sort of say,

01:23:50 okay, the designer specified this thing,

01:23:53 but I’m gonna interpret it not as,

01:23:55 this is the universal reward function

01:23:57 that I shall always optimize always and forever,

01:23:59 but as this is good evidence about what the person wants.

01:24:05 And I should interpret that evidence

01:24:07 in the context of these situations that it was specified for.

01:24:11 Cause ultimately that’s what the designer thought about.

01:24:12 That’s what they had in mind.

01:24:14 And really them specifying reward function

01:24:16 that works for me in all these situations

01:24:18 is really kind of telling me that whatever behavior

01:24:22 that incentivizes must be good behavior

01:24:24 with respect to the thing

01:24:25 that I should actually be optimizing for.

01:24:28 And so now the robot kind of has uncertainty

01:24:30 about what it is that it should be,

01:24:32 what its reward function is.

01:24:34 And then there’s all these additional signals

01:24:36 that we’ve been finding that it can kind of continually

01:24:39 learn from and adapt its understanding of what people want.

01:24:41 Every time the person corrects it, maybe they demonstrate,

01:24:44 maybe they stop, hopefully not, right?

01:24:48 One really, really crazy one is the environment itself.

01:24:54 Like our world, you don’t, it’s not, you know,

01:24:58 you observe our world and the state of it.

01:25:01 And it’s not that you’re seeing behavior

01:25:03 and you’re saying, oh, people are making decisions

01:25:05 that are rational, blah, blah, blah.

01:25:07 It’s, but our world is something that we’ve been acting with

01:25:12 according to our preferences.

01:25:14 So I have this example where like,

01:25:15 the robot walks into my home and my shoes are laid down

01:25:18 on the floor kind of in a line, right?

01:25:21 It took effort to do that.

01:25:23 So even though the robot doesn’t see me doing this,

01:25:27 you know, actually aligning the shoes,

01:25:29 it should still be able to figure out

01:25:31 that I want the shoes aligned

01:25:33 because there’s no way for them to have magically,

01:25:35 you know, be instantiated themselves in that way.

01:25:39 Someone must have actually taken the time to do that.

01:25:43 So it must be important.

01:25:44 So the environment actually tells, the environment is.

01:25:46 Leaks information.

01:25:48 It leaks information.

01:25:48 I mean, the environment is the way it is

01:25:50 because humans somehow manipulated it.

01:25:52 So you have to kind of reverse engineer the narrative

01:25:55 that happened to create the environment as it is

01:25:57 and that leaks the preference information.

01:26:00 Yeah, and you have to be careful, right?

01:26:03 Because people don’t have the bandwidth to do everything.

01:26:06 So just because, you know, my house is messy

01:26:08 doesn’t mean that I want it to be messy, right?

01:26:10 But that just, you know, I didn’t put the effort into that.

01:26:14 I put the effort into something else.

01:26:16 So the robot should figure out,

01:26:17 well, that something else was more important,

01:26:19 but it doesn’t mean that, you know,

01:26:20 the house being messy is not.

01:26:21 So it’s a little subtle, but yeah, we really think of it.

01:26:24 The state itself is kind of like a choice

01:26:26 that people implicitly made about how they want their world.

01:26:31 What book or books, technical or fiction or philosophical,

01:26:34 when you like look back, you know, life had a big impact,

01:26:39 maybe it was a turning point, it was inspiring in some way.

01:26:42 Maybe we’re talking about some silly book

01:26:45 that nobody in their right mind would want to read.

01:26:48 Or maybe it’s a book that you would recommend

01:26:51 to others to read.

01:26:52 Or maybe those could be two different recommendations

01:26:56 of books that could be useful for people on their journey.

01:27:00 When I was in, it’s kind of a personal story.

01:27:03 When I was in 12th grade,

01:27:05 I got my hands on a PDF copy in Romania

01:27:10 of Russell Norvig, AI modern approach.

01:27:14 I didn’t know anything about AI at that point.

01:27:16 I was, you know, I had watched the movie,

01:27:19 The Matrix was my exposure.

01:27:22 And so I started going through this thing

01:27:28 and, you know, you were asking in the beginning,

01:27:31 what are, you know, it’s math and it’s algorithms,

01:27:35 what’s interesting.

01:27:36 It was so captivating.

01:27:38 This notion that you could just have a goal

01:27:41 and figure out your way through

01:27:44 kind of a messy, complicated situation.

01:27:47 So what sequence of decisions you should make

01:27:50 to autonomously to achieve that goal.

01:27:53 That was so cool.

01:27:55 I’m, you know, I’m biased, but that’s a cool book to look at.

01:28:00 You can convert, you know, the goal of intelligence,

01:28:03 the process of intelligence and mechanize it.

01:28:06 I had the same experience.

01:28:07 I was really interested in psychiatry

01:28:09 and trying to understand human behavior.

01:28:11 And then AI modern approach is like, wait,

01:28:14 you can just reduce it all to.

01:28:15 You can write math about human behavior, right?

01:28:18 Yeah.

01:28:19 So that’s, and I think that stuck with me

01:28:21 because, you know, a lot of what I do, a lot of what we do

01:28:25 in my lab is write math about human behavior,

01:28:28 combine it with data and learning, put it all together,

01:28:31 give it to robots to plan with, and, you know,

01:28:33 hope that instead of writing rules for the robots,

01:28:37 writing heuristics, designing behavior,

01:28:39 they can actually autonomously come up with the right thing

01:28:42 to do around people.

01:28:43 That’s kind of our, you know, that’s our signature move.

01:28:46 We wrote some math and then instead of kind of hand crafting

01:28:49 this and that and that and the robot figuring stuff out

01:28:52 and isn’t that cool.

01:28:53 And I think that is the same enthusiasm that I got from

01:28:56 the robot figured out how to reach that goal in that graph.

01:28:59 Isn’t that cool?

01:29:02 So apologize for the romanticized questions,

01:29:05 but, and the silly ones,

01:29:07 if a doctor gave you five years to live,

01:29:11 sort of emphasizing the finiteness of our existence,

01:29:15 what would you try to accomplish?

01:29:20 It’s like my biggest nightmare, by the way.

01:29:22 I really like living.

01:29:24 So I’m actually, I really don’t like the idea of being told

01:29:28 that I’m going to die.

01:29:30 Sorry to linger on that for a second.

01:29:32 Do you, I mean, do you meditate or ponder on your mortality

01:29:36 or human, the fact that this thing ends,

01:29:38 it seems to be a fundamental feature.

01:29:41 Do you think of it as a feature or a bug too?

01:29:44 Is it, you said you don’t like the idea of dying,

01:29:47 but if I were to give you a choice of living forever,

01:29:50 like you’re not allowed to die.

01:29:52 Now I’ll say that I want to live forever,

01:29:54 but I watched this show.

01:29:55 It’s very silly.

01:29:56 It’s called The Good Place and they reflect a lot on this.

01:29:59 And you know, the,

01:30:00 the moral of the story is that you have to make the afterlife

01:30:03 be a finite too.

01:30:05 Cause otherwise people just kind of, it’s like Wally.

01:30:08 It’s like, ah, whatever.

01:30:10 So, so I think the finiteness helps, but,

01:30:13 but yeah, it’s just, you know, I don’t, I don’t,

01:30:16 I’m not a religious person.

01:30:18 I don’t think that there’s something after.

01:30:21 And so I think it just ends and you stop existing.

01:30:25 And I really like existing.

01:30:26 It’s just, it’s such a great privilege to exist that,

01:30:31 that yeah, it’s just, I think that’s the scary part.

01:30:35 I still think that we like existing so much because it ends.

01:30:40 And that’s so sad.

01:30:41 Like it’s so sad to me every time.

01:30:43 Like I find almost everything about this life beautiful.

01:30:46 Like the silliest, most mundane things are just beautiful.

01:30:49 And I think I’m cognizant of the fact that I find it beautiful

01:30:52 because it ends like it.

01:30:55 And it’s so, I don’t know.

01:30:57 I don’t know how to feel about that.

01:30:59 I also feel like there’s a lesson in there for robotics

01:31:03 and AI that is not like the finiteness of things seems

01:31:10 to be a fundamental nature of human existence.

01:31:13 I think some people sort of accuse me of just being Russian

01:31:16 and melancholic and romantic or something,

01:31:19 but that seems to be a fundamental nature of our existence

01:31:24 that should be incorporated in our reward functions.

01:31:28 But anyway, if you were speaking of reward functions,

01:31:34 if you only had five years, what would you try to accomplish?

01:31:38 This is the thing.

01:31:41 I’m thinking about this question and have a pretty joyous moment

01:31:45 because I don’t know that I would change much.

01:31:49 I’m trying to make some contributions to how we understand

01:31:55 human AI interaction.

01:31:57 I don’t think I would change that.

01:32:00 Maybe I’ll take more trips to the Caribbean or something,

01:32:04 but I tried some of that already from time to time.

01:32:08 So, yeah, I try to do the things that bring me joy

01:32:13 and thinking about these things bring me joy is the Marie Kondo thing.

01:32:17 Don’t do stuff that doesn’t spark joy.

01:32:19 For the most part, I do things that spark joy.

01:32:22 Maybe I’ll do less service in the department or something.

01:32:25 I’m not dealing with admissions anymore.

01:32:30 But no, I think I have amazing colleagues and amazing students

01:32:36 and amazing family and friends and spending time in some balance

01:32:40 with all of them is what I do and that’s what I’m doing already.

01:32:44 So, I don’t know that I would really change anything.

01:32:47 So, on the spirit of positiveness, what small act of kindness,

01:32:52 if one pops to mind, were you once shown that you will never forget?

01:32:57 When I was in high school, my friends, my classmates did some tutoring.

01:33:08 We were gearing up for our baccalaureate exam

01:33:11 and they did some tutoring on, well, some on math, some on whatever.

01:33:15 I was comfortable enough with some of those subjects,

01:33:19 but physics was something that I hadn’t focused on in a while.

01:33:22 And so, they were all working with this one teacher

01:33:28 and I started working with that teacher.

01:33:31 Her name is Nicole Beccano.

01:33:33 And she was the one who kind of opened up this whole world for me

01:33:39 because she sort of told me that I should take the SATs

01:33:44 and apply to go to college abroad and do better on my English and all of that.

01:33:51 And when it came to, well, financially I couldn’t,

01:33:55 my parents couldn’t really afford to do all these things,

01:33:58 she started tutoring me on physics for free

01:34:01 and on top of that sitting down with me to kind of train me for SATs

01:34:06 and all that jazz that she had experience with.

01:34:09 Wow. And obviously that has taken you to be here today,

01:34:15 sort of one of the world experts in robotics.

01:34:17 It’s funny those little… For no reason really.

01:34:24 Just out of karma.

01:34:27 Wanting to support someone, yeah.

01:34:29 Yeah. So, we talked a ton about reward functions.

01:34:33 Let me talk about the most ridiculous big question.

01:34:37 What is the meaning of life?

01:34:39 What’s the reward function under which we humans operate?

01:34:42 Like what, maybe to your life, maybe broader to human life in general,

01:34:47 what do you think…

01:34:51 What gives life fulfillment, purpose, happiness, meaning?

01:34:57 You can’t even ask that question with a straight face.

01:34:59 That’s how ridiculous this is.

01:35:00 I can’t, I can’t.

01:35:01 Okay. So, you know…

01:35:05 You’re going to try to answer it anyway, aren’t you?

01:35:09 So, I was in a planetarium once.

01:35:13 Yes.

01:35:14 And, you know, they show you the thing and then they zoom out and zoom out

01:35:18 and this whole, like, you’re a speck of dust kind of thing.

01:35:20 I think I was conceptualizing that we’re kind of, you know, what are humans?

01:35:23 We’re just on this little planet, whatever.

01:35:26 We don’t matter much in the grand scheme of things.

01:35:29 And then my mind got really blown because they talked about this multiverse theory

01:35:35 where they kind of zoomed out and were like, this is our universe.

01:35:38 And then, like, there’s a bazillion other ones and they just pop in and out of existence.

01:35:42 So, like, our whole thing that we can’t even fathom how big it is was like a blimp that went in and out.

01:35:48 And at that point, I was like, okay, like, I’m done.

01:35:51 This is not, there is no meaning.

01:35:54 And clearly what we should be doing is try to impact whatever local thing we can impact,

01:35:59 our communities, leave a little bit behind there, our friends, our family, our local communities,

01:36:05 and just try to be there for other humans because I just, everything beyond that seems ridiculous.

01:36:13 I mean, are you, like, how do you make sense of these multiverses?

01:36:16 Like, are you inspired by the immensity of it?

01:36:21 Do you, I mean, is there, like, is it amazing to you or is it almost paralyzing in the mystery of it?

01:36:34 It’s frustrating.

01:36:35 I’m frustrated by my inability to comprehend.

01:36:41 It just feels very frustrating.

01:36:43 It’s like there’s some stuff that, you know, we should time, blah, blah, blah, that we should really be understanding.

01:36:48 And I definitely don’t understand it.

01:36:50 But, you know, the amazing physicists of the world have a much better understanding than me.

01:36:56 But it still seems epsilon in the grand scheme of things.

01:36:58 So, it’s very frustrating.

01:37:00 It just, it sort of feels like our brain don’t have some fundamental capacity yet, well, yet or ever.

01:37:06 I don’t know.

01:37:07 Well, that’s one of the dreams of artificial intelligence is to create systems that will aid,

01:37:12 expand our cognitive capacity in order to understand, build the theory of everything with the physics

01:37:19 and understand what the heck these multiverses are.

01:37:24 So, I think there’s no better way to end it than talking about the meaning of life and the fundamental nature of the universe and the multiverses.

01:37:32 And the multiverse.

01:37:33 So, Anca, it is a huge honor.

01:37:35 One of my favorite conversations I’ve had.

01:37:38 I really, really appreciate your time.

01:37:40 Thank you for talking today.

01:37:41 Thank you for coming.

01:37:42 Come back again.

01:37:44 Thanks for listening to this conversation with Anca Dragan.

01:37:47 And thank you to our presenting sponsor, Cash App.

01:37:50 Please consider supporting the podcast by downloading Cash App and using code LexPodcast.

01:37:56 If you enjoy this podcast, subscribe on YouTube, review it with 5 stars on Apple Podcast,

01:38:01 support it on Patreon, or simply connect with me on Twitter at LexFriedman.

01:38:07 And now, let me leave you with some words from Isaac Asimov.

01:38:12 Your assumptions are your windows in the world.

01:38:15 Scrub them off every once in a while or the light won’t come in.

01:38:20 Thank you for listening and hope to see you next time.