Transcript
00:00:00 The following is a conversation with Jim Keller,
00:00:03 legendary microprocessor engineer
00:00:05 who has worked at AMD, Apple, Tesla, and now Intel.
00:00:10 He’s known for his work on AMD K7, K8, K12,
00:00:13 and Zen microarchitectures, Apple A4 and A5 processors,
00:00:18 and coauthor of the specification
00:00:20 for the x8664 instruction set
00:00:23 and hypertransport interconnect.
00:00:26 He’s a brilliant first principles engineer
00:00:28 and out of the box thinker,
00:00:30 and just an interesting and fun human being to talk to.
00:00:33 This is the Artificial Intelligence Podcast.
00:00:36 If you enjoy it, subscribe on YouTube,
00:00:38 give it five stars on Apple Podcast,
00:00:40 follow on Spotify, support it on Patreon,
00:00:43 or simply connect with me on Twitter,
00:00:45 at Lex Friedman, spelled F R I D M A N.
00:00:49 I recently started doing ads
00:00:51 at the end of the introduction.
00:00:52 I’ll do one or two minutes after introducing the episode
00:00:55 and never any ads in the middle
00:00:57 that can break the flow of the conversation.
00:00:59 I hope that works for you
00:01:00 and doesn’t hurt the listening experience.
00:01:04 This show is presented by Cash App,
00:01:06 the number one finance app in the App Store.
00:01:08 I personally use Cash App to send money to friends,
00:01:11 but you can also use it to buy, sell,
00:01:13 and deposit Bitcoin in just seconds.
00:01:15 Cash App also has a new investing feature.
00:01:18 You can buy fractions of a stock, say $1 worth,
00:01:21 no matter what the stock price is.
00:01:23 Broker services are provided by Cash App Investing,
00:01:26 a subsidiary of Square and member SIPC.
00:01:29 I’m excited to be working with Cash App
00:01:32 to support one of my favorite organizations called First,
00:01:35 best known for their FIRST Robotics and Lego competitions.
00:01:38 They educate and inspire hundreds of thousands of students
00:01:42 in over 110 countries and have a perfect rating
00:01:45 at Charity Navigator,
00:01:46 which means that donated money
00:01:48 is used to maximum effectiveness.
00:01:50 When you get Cash App from the App Store or Google Play
00:01:53 and use code LEXPODCAST,
00:01:56 you’ll get $10 and Cash App will also donate $10 to FIRST,
00:02:00 which again is an organization
00:02:02 that I’ve personally seen inspire girls and boys
00:02:04 to dream of engineering a better world.
00:02:08 And now here’s my conversation with Jim Keller.
00:02:12 What are the differences and similarities
00:02:14 between the human brain and a computer
00:02:17 with the microprocessor at its core?
00:02:19 Let’s start with the philosophical question perhaps.
00:02:22 Well, since people don’t actually understand
00:02:25 how human brains work, I think that’s true.
00:02:29 I think that’s true.
00:02:30 So it’s hard to compare them.
00:02:32 Computers are, you know, there’s really two things.
00:02:37 There’s memory and there’s computation, right?
00:02:40 And to date, almost all computer architectures
00:02:43 are global memory, which is a thing, right?
00:02:47 And then computation where you pull data
00:02:49 and you do relatively simple operations on it
00:02:52 and write data back.
00:02:53 So it’s decoupled in modern computers.
00:02:57 And you think in the human brain,
00:02:59 everything’s a mesh, a mess that’s combined together?
00:03:02 What people observe is there’s, you know,
00:03:04 some number of layers of neurons
00:03:06 which have local and global connections
00:03:09 and information is stored in some distributed fashion
00:03:13 and people build things called neural networks in computers
00:03:18 where the information is distributed
00:03:21 in some kind of fashion.
00:03:22 You know, there’s a mathematics behind it.
00:03:25 I don’t know that the understanding of that is super deep.
00:03:29 The computations we run on those
00:03:31 are straightforward computations.
00:03:33 I don’t believe anybody has said
00:03:35 a neuron does this computation.
00:03:37 So to date, it’s hard to compare them, I would say.
00:03:44 So let’s get into the basics before we zoom back out.
00:03:48 How do you build a computer from scratch?
00:03:51 What is a microprocessor?
00:03:52 What is a microarchitecture?
00:03:54 What’s an instruction set architecture?
00:03:56 Maybe even as far back as what is a transistor?
00:04:01 So the special charm of computer engineering
00:04:05 is there’s a relatively good understanding
00:04:08 of abstraction layers.
00:04:10 So down at the bottom, you have atoms
00:04:12 and atoms get put together in materials like silicon
00:04:15 or dope silicon or metal and we build transistors.
00:04:19 On top of that, we build logic gates, right?
00:04:23 And then functional units, like an adder or a subtractor
00:04:27 or an instruction parsing unit.
00:04:28 And then we assemble those into processing elements.
00:04:32 Modern computers are built out of probably 10 to 20
00:04:37 locally organic processing elements
00:04:40 or coherent processing elements.
00:04:42 And then that runs computer programs, right?
00:04:46 So there’s abstraction layers and then software,
00:04:49 there’s an instruction set you run
00:04:51 and then there’s assembly language C, C++, Java, JavaScript.
00:04:56 There’s abstraction layers,
00:04:58 essentially from the atom to the data center, right?
00:05:02 So when you build a computer,
00:05:06 first there’s a target, like what’s it for?
00:05:08 Like how fast does it have to be?
00:05:09 Which today there’s a whole bunch of metrics
00:05:12 about what that is.
00:05:13 And then in an organization of 1,000 people
00:05:17 who build a computer, there’s lots of different disciplines
00:05:22 that you have to operate on.
00:05:24 Does that make sense?
00:05:25 And so…
00:05:27 So there’s a bunch of levels of abstraction
00:05:30 in an organization like Intel and in your own vision,
00:05:35 there’s a lot of brilliance that comes in
00:05:37 at every one of those layers.
00:05:39 Some of it is science, some of it is engineering,
00:05:41 some of it is art, what’s the most,
00:05:45 if you could pick favorites,
00:05:46 what’s the most important, your favorite layer
00:05:49 on these layers of abstractions?
00:05:51 Where does the magic enter this hierarchy?
00:05:55 I don’t really care.
00:05:57 That’s the fun, you know, I’m somewhat agnostic to that.
00:06:00 So I would say for relatively long periods of time,
00:06:05 instruction sets are stable.
00:06:08 So the x86 instruction set, the ARM instruction set.
00:06:12 What’s an instruction set?
00:06:13 So it says, how do you encode the basic operations?
00:06:16 Load, store, multiply, add, subtract, conditional, branch.
00:06:20 You know, there aren’t that many interesting instructions.
00:06:23 Look, if you look at a program and it runs,
00:06:26 you know, 90% of the execution is on 25 opcodes,
00:06:29 you know, 25 instructions.
00:06:31 And those are stable, right?
00:06:33 What does it mean, stable?
00:06:35 Intel architecture’s been around for 25 years.
00:06:38 It works.
00:06:38 It works.
00:06:39 And that’s because the basics, you know,
00:06:42 are defined a long time ago, right?
00:06:45 Now, the way an old computer ran is you fetched
00:06:49 instructions and you executed them in order.
00:06:52 Do the load, do the add, do the compare.
00:06:57 The way a modern computer works is you fetch
00:06:59 large numbers of instructions, say 500.
00:07:03 And then you find the dependency graph
00:07:06 between the instructions.
00:07:07 And then you execute in independent units
00:07:12 those little micrographs.
00:07:15 So a modern computer, like people like to say,
00:07:17 computers should be simple and clean.
00:07:20 But it turns out the market for simple,
00:07:22 clean, slow computers is zero, right?
00:07:26 We don’t sell any simple, clean computers.
00:07:29 No, you can, how you build it can be clean,
00:07:33 but the computer people want to buy,
00:07:36 that’s, say, in a phone or a data center,
00:07:40 fetches a large number of instructions,
00:07:42 computes the dependency graph,
00:07:45 and then executes it in a way that gets the right answers.
00:07:49 And optimizes that graph somehow.
00:07:50 Yeah, they run deeply out of order.
00:07:53 And then there’s semantics around how memory ordering works
00:07:57 and other things work.
00:07:58 So the computer sort of has a bunch of bookkeeping tables
00:08:01 that says what order should these operations finish in
00:08:05 or appear to finish in?
00:08:07 But to go fast, you have to fetch a lot of instructions
00:08:10 and find all the parallelism.
00:08:12 Now, there’s a second kind of computer,
00:08:15 which we call GPUs today.
00:08:17 And I call it the difference.
00:08:19 There’s found parallelism, like you have a program
00:08:21 with a lot of dependent instructions.
00:08:24 You fetch a bunch and then you go figure out
00:08:26 the dependency graph and you issue instructions out of order.
00:08:29 That’s because you have one serial narrative to execute,
00:08:32 which, in fact, can be done out of order.
00:08:35 Did you call it a narrative?
00:08:37 Yeah.
00:08:37 Oh, wow.
00:08:38 Yeah, so humans think of serial narrative.
00:08:40 So read a book, right?
00:08:42 There’s a sentence after sentence after sentence,
00:08:45 and there’s paragraphs.
00:08:46 Now, you could diagram that.
00:08:49 Imagine you diagrammed it properly and you said,
00:08:52 which sentences could be read in any order,
00:08:55 any order without changing the meaning, right?
00:08:59 That’s a fascinating question to ask of a book, yeah.
00:09:02 Yeah, you could do that, right?
00:09:04 So some paragraphs could be reordered,
00:09:06 some sentences can be reordered.
00:09:08 You could say, he is tall and smart and X, right?
00:09:15 And it doesn’t matter the order of tall and smart.
00:09:19 But if you say the tall man is wearing a red shirt,
00:09:22 what colors, you can create dependencies, right?
00:09:28 And so GPUs, on the other hand,
00:09:32 run simple programs on pixels,
00:09:35 but you’re given a million of them.
00:09:36 And the first order, the screen you’re looking at
00:09:40 doesn’t care which order you do it in.
00:09:42 So I call that given parallelism.
00:09:44 Simple narratives around the large numbers of things
00:09:48 where you can just say,
00:09:49 it’s parallel because you told me it was.
00:09:52 So found parallelism where the narrative is sequential,
00:09:57 but you discover like little pockets of parallelism versus.
00:10:01 Turns out large pockets of parallelism.
00:10:03 Large, so how hard is it to discover?
00:10:05 Well, how hard is it?
00:10:06 That’s just transistor count, right?
00:10:08 So once you crack the problem, you say,
00:10:11 here’s how you fetch 10 instructions at a time.
00:10:13 Here’s how you calculate the dependencies between them.
00:10:16 Here’s how you describe the dependencies.
00:10:18 Here’s, you know, these are pieces, right?
00:10:20 So once you describe the dependencies,
00:10:25 then it’s just a graph.
00:10:27 Sort of, it’s an algorithm that finds,
00:10:31 what is that?
00:10:31 I’m sure there’s a graph theoretical answer here
00:10:34 that’s solvable.
00:10:35 In general, programs, modern programs
00:10:40 that human beings write,
00:10:42 how much found parallelism is there in them?
00:10:45 What does 10X mean?
00:10:47 So if you execute it in order, you would get
00:10:52 what’s called cycles per instruction,
00:10:53 and it would be about, you know,
00:10:57 three instructions, three cycles per instruction
00:11:00 because of the latency of the operations and stuff.
00:11:02 And in a modern computer, excuse it,
00:11:05 but like 0.2, 0.25 cycles per instruction.
00:11:08 So it’s about, we today find 10X.
00:11:11 And there’s two things.
00:11:13 One is the found parallelism in the narrative, right?
00:11:17 And the other is the predictability of the narrative, right?
00:11:21 So certain operations say, do a bunch of calculations,
00:11:25 and if greater than one, do this, else do that.
00:11:30 That decision is predicted in modern computers
00:11:33 to high 90% accuracy.
00:11:36 So branches happen a lot.
00:11:38 So imagine you have a decision
00:11:40 to make every six instructions,
00:11:41 which is about the average, right?
00:11:43 But you want to fetch 500 instructions,
00:11:45 figure out the graph, and execute them all in parallel.
00:11:48 That means you have, let’s say,
00:11:51 if you fetch 600 instructions and it’s every six,
00:11:54 you have to fetch, you have to predict
00:11:56 99 out of 100 branches correctly
00:12:00 for that window to be effective.
00:12:02 Okay, so parallelism, you can’t parallelize branches.
00:12:06 Or you can.
00:12:07 No, you can predict.
00:12:08 You can predict.
00:12:09 What does predicted branch mean?
00:12:10 What does predicted branch mean?
00:12:11 So imagine you do a computation over and over.
00:12:13 You’re in a loop.
00:12:14 So while n is greater than one, do.
00:12:19 And you go through that loop a million times.
00:12:21 So every time you look at the branch,
00:12:22 you say, it’s probably still greater than one.
00:12:25 And you’re saying you could do that accurately.
00:12:27 Very accurately.
00:12:28 Modern computers.
00:12:29 My mind is blown.
00:12:30 How the heck do you do that?
00:12:31 Wait a minute.
00:12:32 Well, you want to know?
00:12:33 This is really sad.
00:12:35 20 years ago, you simply recorded
00:12:38 which way the branch went last time
00:12:40 and predicted the same thing.
00:12:42 Right.
00:12:43 Okay.
00:12:44 What’s the accuracy of that?
00:12:46 85%.
00:12:48 So then somebody said, hey, let’s keep a couple of bits
00:12:51 and have a little counter so when it predicts one way,
00:12:54 we count up and then pins.
00:12:56 So say you have a three bit counter.
00:12:58 So you count up and then you count down.
00:13:00 And you can use the top bit as the signed bit
00:13:03 so you have a signed two bit number.
00:13:05 So if it’s greater than one, you predict taken.
00:13:07 And less than one, you predict not taken, right?
00:13:11 Or less than zero, whatever the thing is.
00:13:14 And that got us to 92%.
00:13:16 Oh.
00:13:17 Okay, no, it gets better.
00:13:19 This branch depends on how you got there.
00:13:22 So if you came down the code one way,
00:13:25 you’re talking about Bob and Jane, right?
00:13:28 And then said, does Bob like Jane?
00:13:30 It went one way.
00:13:31 But if you’re talking about Bob and Jill,
00:13:32 does Bob like Jane?
00:13:33 You go a different way.
00:13:35 Right, so that’s called history.
00:13:36 So you take the history and a counter.
00:13:40 That’s cool, but that’s not how anything works today.
00:13:43 They use something that looks a little like a neural network.
00:13:48 So modern, you take all the execution flows.
00:13:52 And then you do basically deep pattern recognition
00:13:56 of how the program is executing.
00:13:59 And you do that multiple different ways.
00:14:03 And you have something that chooses what the best result is.
00:14:07 There’s a little supercomputer inside the computer.
00:14:10 That’s trying to predict branching.
00:14:11 That calculates which way branches go.
00:14:14 So the effective window that it’s worth finding grass
00:14:17 in gets bigger.
00:14:19 Why was that gonna make me sad?
00:14:21 Because that’s amazing.
00:14:22 It’s amazingly complicated.
00:14:24 Oh, well.
00:14:25 Well, here’s the funny thing.
00:14:27 So to get to 85% took 1,000 bits.
00:14:31 To get to 99% takes tens of megabits.
00:14:38 So this is one of those, to get the result,
00:14:42 to get from a window of say 50 instructions to 500,
00:14:47 it took three orders of magnitude
00:14:49 or four orders of magnitude more bits.
00:14:52 Now if you get the prediction of a branch wrong,
00:14:55 what happens then?
00:14:56 You flush the pipe.
00:14:57 You flush the pipe, so it’s just the performance cost.
00:14:59 But it gets even better.
00:15:00 Yeah.
00:15:01 So we’re starting to look at stuff that says,
00:15:03 so they executed down this path,
00:15:06 and then you had two ways to go.
00:15:09 But far away, there’s something that doesn’t matter
00:15:12 which path you went.
00:15:14 So you took the wrong path.
00:15:17 You executed a bunch of stuff.
00:15:20 Then you had the mispredicting.
00:15:21 You backed it up.
00:15:22 You remembered all the results you already calculated.
00:15:25 Some of those are just fine.
00:15:27 Like if you read a book and you misunderstand a paragraph,
00:15:30 your understanding of the next paragraph
00:15:32 sometimes is invariant to that understanding.
00:15:35 Sometimes it depends on it.
00:15:38 And you can kind of anticipate that invariance.
00:15:43 Yeah, well, you can keep track of whether the data changed.
00:15:47 And so when you come back through a piece of code,
00:15:49 should you calculate it again or do the same thing?
00:15:51 Okay, how much of this is art and how much of it is science?
00:15:55 Because it sounds pretty complicated.
00:15:59 Well, how do you describe a situation?
00:16:00 So imagine you come to a point in the road
00:16:02 where you have to make a decision, right?
00:16:05 And you have a bunch of knowledge about which way to go.
00:16:07 Maybe you have a map.
00:16:08 So you wanna go the shortest way,
00:16:11 or do you wanna go the fastest way,
00:16:13 or do you wanna take the nicest road?
00:16:14 So there’s some set of data.
00:16:17 So imagine you’re doing something complicated
00:16:19 like building a computer.
00:16:21 And there’s hundreds of decision points,
00:16:24 all with hundreds of possible ways to go.
00:16:27 And the ways you pick interact in a complicated way.
00:16:32 Right.
00:16:33 And then you have to pick the right spot.
00:16:35 Right, so that’s.
00:16:36 So that’s art or science, I don’t know.
00:16:37 You avoided the question.
00:16:38 You just described the Robert Frost problem
00:16:41 of road less taken.
00:16:43 I described the Robert Frost problem?
00:16:45 That’s what we do as computer designers.
00:16:49 It’s all poetry.
00:16:50 Okay.
00:16:51 Great.
00:16:52 Yeah, I don’t know how to describe that
00:16:54 because some people are very good
00:16:56 at making those intuitive leaps.
00:16:57 It seems like just combinations of things.
00:17:00 Some people are less good at it,
00:17:02 but they’re really good at evaluating the alternatives.
00:17:05 Right, and everybody has a different way to do it.
00:17:09 And some people can’t make those leaps,
00:17:11 but they’re really good at analyzing it.
00:17:14 So when you see computers are designed
00:17:16 by teams of people who have very different skill sets.
00:17:19 And a good team has lots of different kinds of people.
00:17:24 I suspect you would describe some of them
00:17:26 as artistic, but not very many.
00:17:30 Unfortunately, or fortunately.
00:17:32 Fortunately.
00:17:33 Well, you know, computer design’s hard.
00:17:36 It’s 99% perspiration.
00:17:40 And the 1% inspiration is really important.
00:17:44 But you still need the 99.
00:17:45 Yeah, you gotta do a lot of work.
00:17:47 And then there are interesting things to do
00:17:50 at every level of that stack.
00:17:52 So at the end of the day,
00:17:55 if you run the same program multiple times,
00:17:58 does it always produce the same result?
00:18:01 Is there some room for fuzziness there?
00:18:04 That’s a math problem.
00:18:06 So if you run a correct C program,
00:18:08 the definition is every time you run it,
00:18:11 you get the same answer.
00:18:12 Yeah, well that’s a math statement.
00:18:14 But that’s a language definitional statement.
00:18:17 So for years when people did,
00:18:19 when we first did 3D acceleration of graphics,
00:18:24 you could run the same scene multiple times
00:18:27 and get different answers.
00:18:28 Right.
00:18:29 Right, and then some people thought that was okay
00:18:32 and some people thought it was a bad idea.
00:18:34 And then when the HPC world used GPUs for calculations,
00:18:39 they thought it was a really bad idea.
00:18:41 Okay, now in modern AI stuff,
00:18:44 people are looking at networks
00:18:48 where the precision of the data is low enough
00:18:51 that the data is somewhat noisy.
00:18:53 And the observation is the input data is unbelievably noisy.
00:18:57 So why should the calculation be not noisy?
00:19:00 And people have experimented with algorithms
00:19:02 that say can get faster answers by being noisy.
00:19:05 Like as a network starts to converge,
00:19:08 if you look at the computation graph,
00:19:09 it starts out really wide and then it gets narrower.
00:19:12 And you can say is that last little bit that important
00:19:14 or should I start the graph on the next rev
00:19:17 before we whittle it all the way down to the answer, right?
00:19:21 So you can create algorithms that are noisy.
00:19:24 Now if you’re developing something
00:19:25 and every time you run it, you get a different answer,
00:19:27 it’s really annoying.
00:19:29 And so most people think even today,
00:19:33 every time you run the program, you get the same answer.
00:19:36 No, I know, but the question is
00:19:38 that’s the formal definition of a programming language.
00:19:42 There is a definition of languages
00:19:44 that don’t get the same answer,
00:19:45 but people who use those, you always want something
00:19:49 because you get a bad answer and then you’re wondering
00:19:51 is it because of something in the algorithm
00:19:54 or because of this?
00:19:55 And so everybody wants a little switch that says
00:19:57 no matter what, do it deterministically.
00:20:00 And it’s really weird because almost everything
00:20:02 going into modern calculations is noisy.
00:20:05 So why do the answers have to be so clear?
00:20:08 Right, so where do you stand?
00:20:09 I design computers for people who run programs.
00:20:12 So if somebody says I want a deterministic answer,
00:20:16 like most people want that.
00:20:18 Can you deliver a deterministic answer,
00:20:20 I guess is the question.
00:20:21 Like when you.
00:20:22 Yeah, hopefully, sure.
00:20:24 What people don’t realize is you get a deterministic answer
00:20:27 even though the execution flow is very undeterministic.
00:20:31 So you run this program 100 times,
00:20:33 it never runs the same way twice, ever.
00:20:36 And the answer, it arrives at the same answer.
00:20:37 But it gets the same answer every time.
00:20:39 It’s just amazing.
00:20:42 Okay, you’ve achieved, in the eyes of many people,
00:20:49 legend status as a chip art architect.
00:20:53 What design creation are you most proud of?
00:20:56 Perhaps because it was challenging,
00:20:59 because of its impact, or because of the set
00:21:01 of brilliant ideas that were involved in bringing it to life?
00:21:06 I find that description odd.
00:21:10 And I have two small children, and I promise you,
00:21:14 they think it’s hilarious.
00:21:15 This question.
00:21:16 Yeah.
00:21:17 I do it for them.
00:21:18 So I’m really interested in building computers.
00:21:23 And I’ve worked with really, really smart people.
00:21:27 I’m not unbelievably smart.
00:21:30 I’m fascinated by how they go together,
00:21:32 both as a thing to do and as an endeavor that people do.
00:21:38 How people and computers go together?
00:21:40 Yeah.
00:21:40 Like how people think and build a computer.
00:21:44 And I find sometimes that the best computer architects
00:21:47 aren’t that interested in people,
00:21:49 or the best people managers aren’t that good
00:21:51 at designing computers.
00:21:54 So the whole stack of human beings is fascinating.
00:21:56 So the managers, the individual engineers.
00:21:58 Yeah, yeah.
00:21:59 Yeah, I said I realized after a lot of years
00:22:02 of building computers, where you sort of build them
00:22:04 out of transistors, logic gates, functional units,
00:22:06 computational elements, that you could think of people
00:22:09 the same way, so people are functional units.
00:22:12 And then you could think of organizational design
00:22:14 as a computer architecture problem.
00:22:16 And then it was like, oh, that’s super cool,
00:22:19 because the people are all different,
00:22:20 just like the computational elements are all different.
00:22:23 And they like to do different things.
00:22:25 And so I had a lot of fun reframing
00:22:29 how I think about organizations.
00:22:31 Just like with computers, we were saying execution paths,
00:22:35 you can have a lot of different paths that end up
00:22:37 at the same good destination.
00:22:41 So what have you learned about the human abstractions
00:22:45 from individual functional human units
00:22:48 to the broader organization?
00:22:51 What does it take to create something special?
00:22:55 Well, most people don’t think simple enough.
00:23:00 All right, so the difference between a recipe
00:23:02 and the understanding.
00:23:04 There’s probably a philosophical description of this.
00:23:09 So imagine you’re gonna make a loaf of bread.
00:23:11 The recipe says get some flour, add some water,
00:23:14 add some yeast, mix it up, let it rise,
00:23:16 put it in a pan, put it in the oven.
00:23:19 It’s a recipe.
00:23:21 Understanding bread, you can understand biology,
00:23:24 supply chains, grain grinders, yeast, physics,
00:23:29 thermodynamics, there’s so many levels of understanding.
00:23:37 And then when people build and design things,
00:23:40 they frequently are executing some stack of recipes.
00:23:45 And the problem with that is the recipes
00:23:46 all have limited scope.
00:23:48 Like if you have a really good recipe book
00:23:50 for making bread, it won’t tell you anything
00:23:52 about how to make an omelet.
00:23:54 But if you have a deep understanding of cooking,
00:23:57 right, than bread, omelets, you know, sandwich,
00:24:03 you know, there’s a different way of viewing everything.
00:24:07 And most people, when you get to be an expert at something,
00:24:13 you know, you’re hoping to achieve deeper understanding,
00:24:16 not just a large set of recipes to go execute.
00:24:20 And it’s interesting to walk groups of people
00:24:22 because executing recipes is unbelievably efficient
00:24:27 if it’s what you want to do.
00:24:30 If it’s not what you want to do, you’re really stuck.
00:24:34 And that difference is crucial.
00:24:36 And everybody has a balance of, let’s say,
00:24:39 deeper understanding of recipes.
00:24:40 And some people are really good at recognizing
00:24:43 when the problem is to understand something deeply.
00:24:47 Does that make sense?
00:24:49 It totally makes sense, does every stage of development,
00:24:52 deep understanding on the team needed?
00:24:55 Oh, this goes back to the art versus science question.
00:24:58 Sure.
00:24:59 If you constantly unpack everything
00:25:01 for deeper understanding, you never get anything done.
00:25:04 And if you don’t unpack understanding when you need to,
00:25:06 you’ll do the wrong thing.
00:25:09 And then at every juncture, like human beings
00:25:12 are these really weird things because everything you tell them
00:25:15 has a million possible outputs, right?
00:25:18 And then they all interact in a hilarious way.
00:25:21 Yeah, it’s very interesting.
00:25:21 And then having some intuition about what you tell them,
00:25:24 what you do, when do you intervene, when do you not,
00:25:26 it’s complicated.
00:25:28 Right, so.
00:25:29 It’s essentially computationally unsolvable.
00:25:33 Yeah, it’s an intractable problem, sure.
00:25:36 Humans are a mess.
00:25:37 But with deep understanding,
00:25:41 do you mean also sort of fundamental questions
00:25:44 of things like what is a computer?
00:25:51 Or why, like the why questions,
00:25:55 why are we even building this, like of purpose?
00:25:58 Or do you mean more like going towards
00:26:02 the fundamental limits of physics,
00:26:04 sort of really getting into the core of the science?
00:26:07 In terms of building a computer, think a little simpler.
00:26:11 So common practice is you build a computer,
00:26:14 and then when somebody says, I wanna make it 10% faster,
00:26:17 you’ll go in and say, all right,
00:26:19 I need to make this buffer bigger,
00:26:20 and maybe I’ll add an add unit.
00:26:23 Or I have this thing that’s three instructions wide,
00:26:25 I’m gonna make it four instructions wide.
00:26:27 And what you see is each piece
00:26:30 gets incrementally more complicated, right?
00:26:34 And then at some point you hit this limit,
00:26:37 like adding another feature or buffer
00:26:39 doesn’t seem to make it any faster.
00:26:41 And then people will say,
00:26:42 well, that’s because it’s a fundamental limit.
00:26:45 And then somebody else will look at it and say,
00:26:46 well, actually the way you divided the problem up
00:26:49 and the way the different features are interacting
00:26:52 is limiting you, and it has to be rethought, rewritten.
00:26:56 So then you refactor it and rewrite it,
00:26:58 and what people commonly find is the rewrite
00:27:00 is not only faster, but half as complicated.
00:27:03 From scratch? Yes.
00:27:05 So how often in your career, but just have you seen
00:27:08 is needed, maybe more generally,
00:27:11 to just throw the whole thing out and start over?
00:27:14 This is where I’m on one end of it,
00:27:17 every three to five years.
00:27:19 Which end are you on?
00:27:21 Rewrite more often.
00:27:22 Rewrite, and three to five years is?
00:27:25 If you wanna really make a lot of progress
00:27:27 on computer architecture, every five years
00:27:28 you should do one from scratch.
00:27:31 So where does the x86.64 standard come in?
00:27:36 How often do you?
00:27:38 I was the coauthor of that spec in 98.
00:27:42 That’s 20 years ago.
00:27:43 Yeah, so that’s still around.
00:27:45 The instruction set itself has been extended
00:27:48 quite a few times.
00:27:50 And instruction sets are less interesting
00:27:52 than the implementation underneath.
00:27:54 There’s been, on x86 architecture, Intel’s designed a few,
00:27:58 AIM designed a few very different architectures.
00:28:02 And I don’t wanna go into too much of the detail
00:28:06 about how often, but there’s a tendency
00:28:10 to rewrite it every 10 years,
00:28:12 and it really should be every five.
00:28:15 So you’re saying you’re an outlier in that sense.
00:28:17 Rewrite more often.
00:28:19 Rewrite more often.
00:28:20 Well, and here’s the problem.
00:28:20 Isn’t that scary?
00:28:22 Yeah, of course.
00:28:23 Well, scary to who?
00:28:25 To everybody involved, because like you said,
00:28:28 repeating the recipe is efficient.
00:28:30 Companies wanna make money.
00:28:34 No, individual engineers wanna succeed,
00:28:36 so you wanna incrementally improve,
00:28:39 increase the buffer from three to four.
00:28:41 Well, this is where you get
00:28:42 into the diminishing return curves.
00:28:45 I think Steve Jobs said this, right?
00:28:46 So every, you have a project, and you start here,
00:28:49 and it goes up, and you have diminishing return.
00:28:52 And to get to the next level, you have to do a new one,
00:28:54 and the initial starting point will be lower
00:28:57 than the old optimization point, but it’ll get higher.
00:29:01 So now you have two kinds of fear,
00:29:03 short term disaster and long term disaster.
00:29:07 And you’re, you’re haunted.
00:29:08 So grown ups, right, like, you know,
00:29:12 people with a quarter by quarter business objective
00:29:15 are terrified about changing everything.
00:29:17 And people who are trying to run a business
00:29:21 or build a computer for a long term objective
00:29:23 know that the short term limitations block them
00:29:27 from the long term success.
00:29:29 So if you look at leaders of companies
00:29:32 that had really good long term success,
00:29:35 every time they saw that they had to redo something, they did.
00:29:39 And so somebody has to speak up.
00:29:41 Or you do multiple projects in parallel,
00:29:43 like you optimize the old one while you build a new one.
00:29:46 But the marketing guys are always like,
00:29:48 make promise me that the new computer
00:29:49 is faster on every single thing.
00:29:52 And the computer architect says,
00:29:53 well, the new computer will be faster on the average,
00:29:56 but there’s a distribution of results and performance,
00:29:59 and you’ll have some outliers that are slower.
00:30:01 And that’s very hard,
00:30:02 because they have one customer who cares about that one.
00:30:05 So speaking of the long term, for over 50 years now,
00:30:08 Moore’s Law has served, for me and millions of others,
00:30:12 as an inspiring beacon of what kind of amazing future
00:30:16 brilliant engineers can build.
00:30:18 Yep.
00:30:19 I’m just making your kids laugh all of today.
00:30:21 That was great.
00:30:23 So first, in your eyes, what is Moore’s Law,
00:30:27 if you could define for people who don’t know?
00:30:29 Well, the simple statement was, from Gordon Moore,
00:30:34 was double the number of transistors every two years.
00:30:37 Something like that.
00:30:39 And then my operational model is,
00:30:43 we increase the performance of computers
00:30:45 by two X every two or three years.
00:30:48 And it’s wiggled around substantially over time.
00:30:51 And also, in how we deliver, performance has changed.
00:30:55 But the foundational idea was
00:31:00 two X to transistors every two years.
00:31:02 The current cadence is something like,
00:31:05 they call it a shrink factor, like 0.6 every two years,
00:31:10 which is not 0.5.
00:31:11 But that’s referring strictly, again,
00:31:13 to the original definition of just.
00:31:15 A transistor count.
00:31:16 A shrink factor’s just getting them
00:31:18 smaller and smaller and smaller.
00:31:19 Well, it’s for a constant chip area.
00:31:21 If you make the transistors smaller by 0.6,
00:31:24 then you get one over 0.6 more transistors.
00:31:27 So can you linger on it a little longer?
00:31:29 What’s a broader, what do you think should be
00:31:31 the broader definition of Moore’s Law?
00:31:33 When you mentioned how you think of performance,
00:31:37 just broadly, what’s a good way to think about Moore’s Law?
00:31:42 Well, first of all, I’ve been aware
00:31:45 of Moore’s Law for 30 years.
00:31:48 In which sense?
00:31:49 Well, I’ve been designing computers for 40.
00:31:52 You’re just watching it before your eyes kind of thing.
00:31:56 And somewhere where I became aware of it,
00:31:58 I was also informed that Moore’s Law
00:31:59 was gonna die in 10 to 15 years.
00:32:02 And then I thought that was true at first.
00:32:03 But then after 10 years, it was gonna die in 10 to 15 years.
00:32:07 And then at one point, it was gonna die in five years.
00:32:09 And then it went back up to 10 years.
00:32:11 And at some point, I decided not to worry
00:32:13 about that particular prognostication
00:32:16 for the rest of my life, which is fun.
00:32:19 And then I joined Intel and everybody said
00:32:21 Moore’s Law is dead.
00:32:22 And I thought that’s sad,
00:32:23 because it’s the Moore’s Law company.
00:32:25 And it’s not dead.
00:32:26 And it’s always been gonna die.
00:32:29 And humans like these apocryphal kind of statements,
00:32:33 like we’ll run out of food, or we’ll run out of air,
00:32:36 or we’ll run out of room, or we’ll run out of something.
00:32:39 Right, but it’s still incredible
00:32:41 that it’s lived for as long as it has.
00:32:44 And yes, there’s many people who believe now
00:32:47 that Moore’s Law is dead.
00:32:50 You know, they can join the last 50 years
00:32:52 of people who had the same idea.
00:32:53 Yeah, there’s a long tradition.
00:32:55 But why do you think, if you can try to understand it,
00:33:00 why do you think it’s not dead?
00:33:03 Well, let’s just think, people think Moore’s Law
00:33:06 is one thing, transistors get smaller.
00:33:09 But actually, under the sheet,
00:33:10 there’s literally thousands of innovations.
00:33:12 And almost all those innovations
00:33:14 have their own diminishing return curves.
00:33:17 So if you graph it, it looks like a cascade
00:33:19 of diminishing return curves.
00:33:21 I don’t know what to call that.
00:33:22 But the result is an exponential curve.
00:33:26 Well, at least it has been.
00:33:27 So, and we keep inventing new things.
00:33:30 So if you’re an expert in one of the things
00:33:32 on a diminishing return curve, right,
00:33:35 and you can see it’s plateau,
00:33:38 you will probably tell people, well, this is done.
00:33:42 Meanwhile, some other pile of people
00:33:43 are doing something different.
00:33:46 So that’s just normal.
00:33:48 So then there’s the observation of
00:33:50 how small could a switching device be?
00:33:54 So a modern transistor is something like
00:33:55 a thousand by a thousand by a thousand atoms, right?
00:33:59 And you get quantum effects down around two to 10 atoms.
00:34:04 So you can imagine the transistor
00:34:06 as small as 10 by 10 by 10.
00:34:08 So that’s a million times smaller.
00:34:12 And then the quantum computational people
00:34:14 are working away at how to use quantum effects.
00:34:17 So.
00:34:20 A thousand by a thousand by a thousand.
00:34:21 Atoms.
00:34:23 That’s a really clean way of putting it.
00:34:26 Well, a fan, like a modern transistor,
00:34:28 if you look at the fan, it’s like 120 atoms wide,
00:34:32 but we can make that thinner.
00:34:33 And then there’s a gate wrapped around it,
00:34:35 and then there’s spacing.
00:34:36 There’s a whole bunch of geometry.
00:34:38 And a competent transistor designer
00:34:42 could count both atoms in every single direction.
00:34:48 Like there’s techniques now to already put down atoms
00:34:50 in a single atomic layer.
00:34:53 And you can place atoms if you want to.
00:34:55 It’s just from a manufacturing process,
00:34:59 if placing an atom takes 10 minutes
00:35:01 and you need to put 10 to the 23rd atoms together
00:35:05 to make a computer, it would take a long time.
00:35:08 So the methods are both shrinking things
00:35:13 and then coming up with effective ways
00:35:15 to control what’s happening.
00:35:17 Manufacture stably and cheaply.
00:35:20 Yeah.
00:35:21 So the innovation stock’s pretty broad.
00:35:23 There’s equipment, there’s optics, there’s chemistry,
00:35:26 there’s physics, there’s material science,
00:35:29 there’s metallurgy, there’s lots of ideas
00:35:31 about when you put different materials together,
00:35:33 how do they interact, are they stable,
00:35:35 is it stable over temperature, like are they repeatable?
00:35:40 There’s like literally thousands of technologies involved.
00:35:45 But just for the shrinking, you don’t think
00:35:46 we’re quite yet close to the fundamental limits of physics?
00:35:50 I did a talk on Moore’s Law and I asked for a roadmap
00:35:53 to a path of 100 and after two weeks,
00:35:56 they said we only got to 50.
00:35:58 100 what, sorry?
00:35:59 100 X shrink.
00:36:00 100 X shrink?
00:36:01 We only got to 50.
00:36:02 And I said, why don’t you give it another two weeks?
00:36:05 Well, here’s the thing about Moore’s Law, right?
00:36:09 So I believe that the next 10 or 20 years
00:36:14 of shrinking is gonna happen, right?
00:36:16 Now, as a computer designer, you have two stances.
00:36:20 You think it’s going to shrink, in which case
00:36:23 you’re designing and thinking about architecture
00:36:26 in a way that you’ll use more transistors.
00:36:29 Or conversely, not be swamped by the complexity
00:36:32 of all the transistors you get, right?
00:36:36 You have to have a strategy, you know?
00:36:39 So you’re open to the possibility and waiting
00:36:42 for the possibility of a whole new army
00:36:44 of transistors ready to work.
00:36:45 I’m expecting more transistors every two or three years
00:36:50 by a number large enough that how you think about design,
00:36:54 how you think about architecture has to change.
00:36:57 Like, imagine you build buildings out of bricks,
00:37:01 and every year the bricks are half the size,
00:37:04 or every two years.
00:37:05 Well, if you kept building bricks the same way,
00:37:08 so many bricks per person per day,
00:37:11 the amount of time to build a building
00:37:13 would go up exponentially, right?
00:37:16 But if you said, I know that’s coming,
00:37:19 so now I’m gonna design equipment that moves bricks faster,
00:37:22 uses them better, because maybe you’re getting something
00:37:24 out of the smaller bricks, more strength, thinner walls,
00:37:27 you know, less material, efficiency out of that.
00:37:30 So once you have a roadmap with what’s gonna happen,
00:37:33 transistors, we’re gonna get more of them,
00:37:36 then you design all this collateral around it
00:37:38 to take advantage of it, and also to cope with it.
00:37:42 Like, that’s the thing people don’t understand.
00:37:43 It’s like, if I didn’t believe in Moore’s Law,
00:37:46 and then Moore’s Law transistors showed up,
00:37:48 my design teams would all drown.
00:37:50 So what’s the hardest part of this inflow
00:37:56 of new transistors?
00:37:57 I mean, even if you just look historically,
00:37:59 throughout your career, what’s the thing,
00:38:03 what fundamentally changes when you add more transistors
00:38:06 in the task of designing an architecture?
00:38:10 Well, there’s two constants, right?
00:38:12 One is people don’t get smarter.
00:38:16 By the way, there’s some science showing
00:38:17 that we do get smarter because of nutrition or whatever.
00:38:21 Sorry to bring that up.
00:38:22 Blend effect.
00:38:22 Yes.
00:38:23 Yeah, I’m familiar with it.
00:38:24 Nobody understands it, nobody knows if it’s still going on.
00:38:26 So that’s a…
00:38:27 Or whether it’s real or not.
00:38:28 But yeah, it’s a…
00:38:30 I sort of…
00:38:31 Anyway, but not exponentially.
00:38:32 I would believe for the most part,
00:38:33 people aren’t getting much smarter.
00:38:35 The evidence doesn’t support it, that’s right.
00:38:37 And then teams can’t grow that much.
00:38:40 Right.
00:38:40 Right, so human beings, you know,
00:38:43 we’re really good in teams of 10,
00:38:45 you know, up to teams of 100, they can know each other.
00:38:48 Beyond that, you have to have organizational boundaries.
00:38:50 So you’re kind of, you have,
00:38:51 those are pretty hard constraints, right?
00:38:54 So then you have to divide and conquer,
00:38:56 like as the designs get bigger,
00:38:57 you have to divide it into pieces.
00:39:00 You know, the power of abstraction layers is really high.
00:39:03 We used to build computers out of transistors.
00:39:06 Now we have a team that turns transistors into logic cells
00:39:08 and another team that turns them into functional units,
00:39:10 another one that turns them into computers, right?
00:39:13 So we have abstraction layers in there
00:39:16 and you have to think about when do you shift gears on that.
00:39:21 We also use faster computers to build faster computers.
00:39:24 So some algorithms run twice as fast on new computers,
00:39:27 but a lot of algorithms are N squared.
00:39:30 So, you know, a computer with twice as many transistors
00:39:33 and it might take four times as long to run.
00:39:36 So you have to refactor the software.
00:39:39 Like simply using faster computers
00:39:41 to build bigger computers doesn’t work.
00:39:44 So you have to think about all these things.
00:39:46 So in terms of computing performance
00:39:47 and the exciting possibility
00:39:49 that more powerful computers bring,
00:39:51 is shrinking the thing which you’ve been talking about,
00:39:57 for you, one of the biggest exciting possibilities
00:39:59 of advancement in performance?
00:40:01 Or is there other directions that you’re interested in,
00:40:03 like in the direction of sort of enforcing given parallelism
00:40:08 or like doing massive parallelism
00:40:12 in terms of many, many CPUs,
00:40:15 you know, stacking CPUs on top of each other,
00:40:17 that kind of parallelism or any kind of parallelism?
00:40:20 Well, think about it a different way.
00:40:22 So old computers, you know, slow computers,
00:40:25 you said A equal B plus C times D, pretty simple, right?
00:40:30 And then we made faster computers with vector units
00:40:33 and you can do proper equations and matrices, right?
00:40:38 And then modern like AI computations
00:40:41 or like convolutional neural networks,
00:40:43 where you convolve one large data set against another.
00:40:47 And so there’s sort of this hierarchy of mathematics,
00:40:51 you know, from simple equation to linear equations,
00:40:54 to matrix equations, to deeper kind of computation.
00:40:58 And the data sets are getting so big
00:41:00 that people are thinking of data as a topology problem.
00:41:04 You know, data is organized in some immense shape.
00:41:07 And then the computation, which sort of wants to be,
00:41:11 get data from immense shape and do some computation on it.
00:41:15 So what computers have allowed people to do
00:41:18 is have algorithms go much, much further.
00:41:22 So that paper you reference, the Sutton paper,
00:41:26 they talked about, you know, like when AI started,
00:41:29 it was apply rule sets to something.
00:41:31 That’s a very simple computational situation.
00:41:35 And then when they did first chess thing,
00:41:37 they solved deep searches.
00:41:39 So have a huge database of moves and results, deep search,
00:41:44 but it’s still just a search, right?
00:41:48 Now we take large numbers of images
00:41:51 and we use it to train these weight sets
00:41:54 that we convolve across.
00:41:56 It’s a completely different kind of phenomena.
00:41:58 We call that AI.
00:41:59 Now they’re doing the next generation.
00:42:02 And if you look at it,
00:42:03 they’re going up this mathematical graph, right?
00:42:07 And then computations, both computation and data sets
00:42:11 support going up that graph.
00:42:13 Yeah, the kind of computation that might,
00:42:15 I mean, I would argue that all of it is still a search,
00:42:18 right?
00:42:20 Just like you said, a topology problem with data sets,
00:42:22 you’re searching the data sets for valuable data
00:42:27 and also the actual optimization of neural networks
00:42:30 is a kind of search for the…
00:42:33 I don’t know, if you had looked at the interlayers
00:42:34 of finding a cat, it’s not a search.
00:42:39 It’s a set of endless projections.
00:42:41 So, you know, a projection,
00:42:42 here’s a shadow of this phone, right?
00:42:45 And then you can have a shadow of that on the something
00:42:47 and a shadow on that of something.
00:42:49 And if you look in the layers, you’ll see
00:42:51 this layer actually describes pointy ears
00:42:53 and round eyeness and fuzziness.
00:42:56 But the computation to tease out the attributes
00:43:02 is not search.
00:43:03 Like the inference part might be search,
00:43:05 but the training’s not search.
00:43:07 And then in deep networks, they look at layers
00:43:10 and they don’t even know it’s represented.
00:43:14 And yet, if you take the layers out, it doesn’t work.
00:43:16 So I don’t think it’s search.
00:43:18 But you’d have to talk to a mathematician
00:43:21 about what that actually is.
00:43:22 Well, we could disagree, but it’s just semantics,
00:43:27 I think, it’s not, but it’s certainly not…
00:43:29 I would say it’s absolutely not semantics, but…
00:43:31 Okay, all right, well, if you want to go there.
00:43:37 So optimization to me is search,
00:43:39 and we’re trying to optimize the ability
00:43:42 of a neural network to detect cat ears.
00:43:45 And the difference between chess and the space,
00:43:51 the incredibly multidimensional,
00:43:54 100,000 dimensional space that neural networks
00:43:57 are trying to optimize over is nothing like
00:44:00 the chessboard database.
00:44:02 So it’s a totally different kind of thing.
00:44:04 And okay, in that sense, you can say it loses the meaning.
00:44:07 I can see how you might say, if you…
00:44:11 The funny thing is, it’s the difference
00:44:12 between given search space and found search space.
00:44:16 Right, exactly.
00:44:17 Yeah, maybe that’s a different way to describe it.
00:44:18 That’s a beautiful way to put it, okay.
00:44:19 But you’re saying, what’s your sense
00:44:21 in terms of the basic mathematical operations
00:44:24 and the architectures, computer hardware
00:44:27 that enables those operations?
00:44:29 Do you see the CPUs of today still being
00:44:33 a really core part of executing
00:44:36 those mathematical operations?
00:44:37 Yes.
00:44:38 Well, the operations continue to be add, subtract,
00:44:42 load, store, compare, and branch.
00:44:44 It’s remarkable.
00:44:46 So it’s interesting, the building blocks
00:44:48 of computers or transistors under that atoms.
00:44:52 So you got atoms, transistors, logic gates, computers,
00:44:56 functional units of computers.
00:44:58 The building blocks of mathematics at some level
00:45:01 are things like adds and subtracts and multiplies,
00:45:04 but the space mathematics can describe
00:45:08 is, I think, essentially infinite.
00:45:11 But the computers that run the algorithms
00:45:14 are still doing the same things.
00:45:16 Now, a given algorithm might say, I need sparse data,
00:45:20 or I need 32 bit data, or I need, you know,
00:45:24 like a convolution operation that naturally takes
00:45:27 eight bit data, multiplies it, and sums it up a certain way.
00:45:31 So like the data types in TensorFlow
00:45:35 imply an optimization set.
00:45:38 But when you go right down and look at the computers,
00:45:40 it’s and and or gates doing adds and multiplies.
00:45:42 Like that hasn’t changed much.
00:45:46 Now, the quantum researchers think
00:45:48 they’re going to change that radically,
00:45:50 and then there’s people who think about analog computing
00:45:52 because you look in the brain, and it
00:45:53 seems to be more analogish.
00:45:55 You know, that maybe there’s a way to do that more
00:45:58 efficiently.
00:45:59 But we have a million X on computation,
00:46:03 and I don’t know the relationship
00:46:07 between computational, let’s say,
00:46:09 intensity and ability to hit mathematical abstractions.
00:46:15 I don’t know any way to describe that, but just like you saw
00:46:19 in AI, you went from rule sets to simple search
00:46:23 to complex search to, say, found search.
00:46:26 Like those are orders of magnitude more computation
00:46:30 to do.
00:46:31 And as we get the next two orders of magnitude,
00:46:34 like a friend, Roger Gaduri, said,
00:46:36 like every order of magnitude changes the computation.
00:46:40 Fundamentally changes what the computation is doing.
00:46:42 Yeah.
00:46:44 Oh, you know the expression the difference in quantity
00:46:46 is the difference in kind.
00:46:49 You know, the difference between ant and anthill, right?
00:46:53 Or neuron and brain.
00:46:56 You know, there’s this indefinable place
00:46:58 where the quantity changed the quality, right?
00:47:02 And we’ve seen that happen in mathematics multiple times,
00:47:05 and you know, my guess is it’s going to keep happening.
00:47:08 So your sense is, yeah, if you focus head down
00:47:12 and shrinking the transistor.
00:47:14 Well, it’s not just head down, we’re aware of the software
00:47:18 stacks that are running in the computational loads,
00:47:20 and we’re kind of pondering what do you
00:47:22 do with a petabyte of memory that wants
00:47:24 to be accessed in a sparse way and have, you know,
00:47:28 the kind of calculations AI programmers want.
00:47:32 So there’s a dialogue interaction,
00:47:34 but when you go in the computer chip,
00:47:38 you know, you find adders and subtractors and multipliers.
00:47:43 So if you zoom out then with, as you mentioned very sudden,
00:47:46 the idea that most of the development in the last many
00:47:50 decades in AI research came from just leveraging computation
00:47:54 and just simple algorithms waiting for the computation
00:47:59 to improve.
00:48:00 Well, software guys have a thing that they call it
00:48:03 the problem of early optimization.
00:48:07 So you write a big software stack,
00:48:09 and if you start optimizing like the first thing you write,
00:48:12 the odds of that being the performance limiter is low.
00:48:15 But when you get the whole thing working,
00:48:17 can you make it 2x faster by optimizing the right things?
00:48:19 Sure.
00:48:21 While you’re optimizing that, could you
00:48:22 have written a new software stack, which
00:48:24 would have been a better choice?
00:48:26 Maybe.
00:48:27 Now you have creative tension.
00:48:29 So.
00:48:30 But the whole time as you’re doing the writing,
00:48:33 that’s the software we’re talking about.
00:48:34 The hardware underneath gets faster and faster.
00:48:36 Well, this goes back to the Moore’s law.
00:48:38 If Moore’s law is going to continue, then your AI research
00:48:43 should expect that to show up, and then you
00:48:46 make a slightly different set of choices then.
00:48:48 We’ve hit the wall.
00:48:49 Nothing’s going to happen.
00:48:51 And from here, it’s just us rewriting algorithms.
00:48:55 That seems like a failed strategy for the last 30
00:48:57 years of Moore’s law’s death.
00:49:00 So can you just linger on it?
00:49:03 I think you’ve answered it, but I’ll just
00:49:05 ask the same dumb question over and over.
00:49:06 So why do you think Moore’s law is not going to die?
00:49:12 Which is the most promising, exciting possibility
00:49:15 of why it won’t die in the next 5, 10 years?
00:49:17 So is it the continued shrinking of the transistor,
00:49:20 or is it another S curve that steps in and it totally sort
00:49:25 of matches up?
00:49:26 Shrinking the transistor is literally
00:49:28 thousands of innovations.
00:49:30 Right, so there’s stacks of S curves in there.
00:49:33 There’s a whole bunch of S curves just kind
00:49:35 of running their course and being reinvented
00:49:38 and new things.
00:49:41 The semiconductor fabricators and technologists have all
00:49:45 announced what’s called nanowires.
00:49:47 So they took a fan, which had a gate around it,
00:49:51 and turned that into little wires
00:49:52 so you have better control of that, and they’re smaller.
00:49:55 And then from there, there are some obvious steps
00:49:57 about how to shrink that.
00:49:59 The metallurgy around wire stacks and stuff
00:50:03 has very obvious abilities to shrink.
00:50:07 And there’s a whole combination of things there to do.
00:50:11 Your sense is that we’re going to get a lot
00:50:13 if this innovation performed just that, shrinking.
00:50:16 Yeah, like a factor of 100 is a lot.
00:50:19 Yeah, I would say that’s incredible.
00:50:22 And it’s totally unknown.
00:50:23 It’s only 10 or 15 years.
00:50:25 Now, you’re smarter, you might know,
00:50:26 but to me it’s totally unpredictable
00:50:28 of what that 100x would bring in terms
00:50:30 of the nature of the computation that people would be.
00:50:34 Yeah, are you familiar with Bell’s law?
00:50:37 So for a long time, it was mainframes, minis, workstation,
00:50:40 PC, mobile.
00:50:42 Moore’s law drove faster, smaller computers.
00:50:46 And then when we were thinking about Moore’s law,
00:50:49 Rajagaduri said, every 10x generates a new computation.
00:50:53 So scalar, vector, matrix, topological computation.
00:51:01 And if you go look at the industry trends,
00:51:03 there was mainframes, and then minicomputers, and then PCs,
00:51:07 and then the internet took off.
00:51:08 And then we got mobile devices.
00:51:10 And now we’re building 5G wireless
00:51:12 with one millisecond latency.
00:51:14 And people are starting to think about the smart world
00:51:17 where everything knows you, recognizes you.
00:51:23 The transformations are going to be unpredictable.
00:51:27 How does it make you feel that you’re
00:51:29 one of the key architects of this kind of future?
00:51:35 So we’re not talking about the architects
00:51:37 of the high level people who build the Angry Bird apps,
00:51:42 and Snapchat.
00:51:43 Angry Bird apps.
00:51:44 Who knows?
00:51:45 Maybe that’s the whole point of the universe.
00:51:47 I’m going to take a stand at that,
00:51:48 and the attention distracting nature of mobile phones.
00:51:52 I’ll take a stand.
00:51:53 But anyway, in terms of the side effects of smartphones,
00:52:01 or the attention distraction, which part?
00:52:03 Well, who knows where this is all leading?
00:52:06 It’s changing so fast.
00:52:08 My parents used to yell at my sisters
00:52:09 for hiding in the closet with a wired phone with a dial on it.
00:52:13 Stop talking to your friends all day.
00:52:15 Now my wife yells at my kids for talking to their friends
00:52:18 all day on text.
00:52:20 It looks the same to me.
00:52:21 It’s always echoes of the same thing.
00:52:23 But you are one of the key people
00:52:26 architecting the hardware of this future.
00:52:29 How does that make you feel?
00:52:30 Do you feel responsible?
00:52:33 Do you feel excited?
00:52:36 So we’re in a social context.
00:52:38 So there’s billions of people on this planet.
00:52:40 There are literally millions of people working on technology.
00:52:45 I feel lucky to be doing what I do and getting paid for it,
00:52:50 and there’s an interest in it.
00:52:52 But there’s so many things going on in parallel.
00:52:56 The actions are so unpredictable.
00:52:58 If I wasn’t here, somebody else would do it.
00:53:01 The vectors of all these different things
00:53:03 are happening all the time.
00:53:06 You know, there’s a, I’m sure, some philosopher
00:53:10 or metaphilosopher is wondering about how
00:53:12 we transform our world.
00:53:16 So you can’t deny the fact that these tools are
00:53:22 changing our world.
00:53:24 That’s right.
00:53:25 Do you think it’s changing for the better?
00:53:29 I read this thing recently.
00:53:31 It said the two disciplines with the highest GRE scores in college
00:53:36 are physics and philosophy.
00:53:39 And they’re both sort of trying to answer the question,
00:53:41 why is there anything?
00:53:43 And the philosophers are on the kind of theological side,
00:53:47 and the physicists are obviously on the material side.
00:53:52 And there’s 100 billion galaxies with 100 billion stars.
00:53:56 It seems, well, repetitive at best.
00:54:01 So you know, there’s on our way to 10 billion people.
00:54:06 I mean, it’s hard to say what it’s all for,
00:54:08 if that’s what you’re asking.
00:54:09 Yeah, I guess I am.
00:54:11 Things do tend to significantly increase in complexity.
00:54:16 And I’m curious about how computation,
00:54:21 like our physical world inherently
00:54:24 generates mathematics.
00:54:25 It’s kind of obvious, right?
00:54:26 So we have x, y, z coordinates.
00:54:28 You take a sphere, you make it bigger.
00:54:30 You get a surface that grows by r squared.
00:54:34 Like, it generally generates mathematics.
00:54:36 And the mathematicians and the physicists
00:54:38 have been having a lot of fun talking to each other for years.
00:54:41 And computation has been, let’s say, relatively pedestrian.
00:54:46 Like, computation in terms of mathematics
00:54:48 has been doing binary algebra, while those guys have
00:54:52 been gallivanting through the other realms of possibility.
00:54:58 Now recently, the computation lets
00:55:01 you do mathematical computations that
00:55:04 are sophisticated enough that nobody understands
00:55:07 how the answers came out.
00:55:10 Machine learning.
00:55:10 Machine learning.
00:55:12 It used to be you get data set, you guess at a function.
00:55:16 The function is considered physics
00:55:18 if it’s predictive of new functions, new data sets.
00:55:23 Modern, you can take a large data set
00:55:28 with no intuition about what it is
00:55:29 and use machine learning to find a pattern that
00:55:31 has no function, right?
00:55:34 And it can arrive at results that I
00:55:37 don’t know if they’re completely mathematically describable.
00:55:39 So computation has kind of done something interesting compared
00:55:44 to a equal b plus c.
00:55:47 There’s something reminiscent of that step
00:55:49 from the basic operations of addition
00:55:54 to taking a step towards neural networks that’s
00:55:56 reminiscent of what life on Earth at its origins was doing.
00:56:01 Do you think we’re creating sort of the next step
00:56:03 in our evolution in creating artificial intelligence
00:56:06 systems that will?
00:56:07 I don’t know.
00:56:08 I mean, there’s so much in the universe already,
00:56:11 it’s hard to say.
00:56:12 Where we stand in this whole thing.
00:56:14 Are human beings working on additional abstraction
00:56:17 layers and possibilities?
00:56:18 Yeah, it appears so.
00:56:20 Does that mean that human beings don’t need dogs?
00:56:22 You know, no.
00:56:24 Like, there’s so many things that
00:56:26 are all simultaneously interesting and useful.
00:56:30 Well, you’ve seen, throughout your career,
00:56:32 you’ve seen greater and greater level abstractions built
00:56:35 in artificial machines, right?
00:56:39 Do you think, when you look at humans,
00:56:41 do you think that the look of all life on Earth
00:56:44 is a single organism building this thing,
00:56:46 this machine with greater and greater levels of abstraction?
00:56:49 Do you think humans are the peak,
00:56:52 the top of the food chain in this long arc of history
00:56:57 on Earth?
00:56:58 Or do you think we’re just somewhere in the middle?
00:57:00 Are we the basic functional operations of a CPU?
00:57:05 Are we the C++ program, the Python program,
00:57:09 or the neural network?
00:57:10 Like, somebody’s, you know, people
00:57:12 have calculated, like, how many operations does the brain do?
00:57:14 Something, you know, I’ve seen the number 10 to the 18th
00:57:17 a bunch of times, arrive different ways.
00:57:20 So could you make a computer that
00:57:22 did 10 to the 20th operations?
00:57:23 Yes.
00:57:24 Sure.
00:57:24 Do you think?
00:57:25 We’re going to do that.
00:57:27 Now, is there something magical about how brains compute things?
00:57:31 I don’t know.
00:57:32 You know, my personal experience is interesting,
00:57:35 because, you know, you think you know how you think,
00:57:37 and then you have all these ideas,
00:57:39 and you can’t figure out how they happened.
00:57:41 And if you meditate, you know, what you can be aware of
00:57:47 is interesting.
00:57:48 So I don’t know if brains are magical or not.
00:57:51 You know, the physical evidence says no.
00:57:54 Lots of people’s personal experience says yes.
00:57:57 So what would be funny is if brains are magical,
00:58:01 and yet we can make brains with more computation.
00:58:04 You know, I don’t know what to say about that.
00:58:07 But do you think magic is an emergent phenomena?
00:58:11 Could be.
00:58:12 I have no explanation for it.
00:58:13 Let me ask Jim Keller of what in your view is consciousness?
00:58:19 With consciousness?
00:58:20 Yeah, like what, you know, consciousness, love,
00:58:25 things that are these deeply human things that
00:58:27 seems to emerge from our brain, is that something
00:58:30 that we’ll be able to make encode in chips that get
00:58:36 faster and faster and faster and faster?
00:58:38 That’s like a 10 hour conversation.
00:58:40 Nobody really knows.
00:58:41 Can you summarize it in a couple of sentences?
00:58:45 Many people have observed that organisms run
00:58:48 at lots of different levels, right?
00:58:51 If you had two neurons, somebody said
00:58:52 you’d have one sensory neuron and one motor neuron, right?
00:58:56 So we move towards things and away from things.
00:58:58 And we have physical integrity and safety or not, right?
00:59:03 And then if you look at the animal kingdom,
00:59:05 you can see brains that are a little more complicated.
00:59:08 And at some point, there’s a planning system.
00:59:10 And then there’s an emotional system
00:59:11 that’s happy about being safe or unhappy about being threatened.
00:59:17 And then our brains have massive numbers of structures,
00:59:21 like planning and movement and thinking and feeling
00:59:25 and drives and emotions.
00:59:27 And we seem to have multiple layers of thinking systems.
00:59:31 And we have a dream system that nobody understands whatsoever,
00:59:35 which I find completely hilarious.
00:59:37 And you can think in a way that those systems are
00:59:44 more independent.
00:59:45 And you can observe the different parts of yourself
00:59:47 can observe them.
00:59:49 I don’t know which one’s magical.
00:59:51 I don’t know which one’s not computational.
00:59:55 So.
00:59:56 Is it possible that it’s all computation?
00:59:58 Probably.
01:00:00 Is there a limit to computation?
01:00:01 I don’t think so.
01:00:03 Do you think the universe is a computer?
01:00:06 It seems to be.
01:00:07 It’s a weird kind of computer.
01:00:09 Because if it was a computer, like when
01:00:13 they do calculations on how much calculation
01:00:16 it takes to describe quantum effects, it’s unbelievably high.
01:00:20 So if it was a computer, wouldn’t you
01:00:22 have built it out of something that was easier to compute?
01:00:26 That’s a funny system.
01:00:29 But then the simulation guys pointed out
01:00:31 that the rules are kind of interesting.
01:00:32 When you look really close, it’s uncertain.
01:00:35 And the speed of light says you can only look so far.
01:00:37 And things can’t be simultaneous,
01:00:39 except for the odd entanglement problem where they seem to be.
01:00:42 The rules are all kind of weird.
01:00:45 And somebody said physics is like having
01:00:47 50 equations with 50 variables to define 50 variables.
01:00:55 Physics itself has been a shit show for thousands of years.
01:00:59 It seems odd when you get to the corners of everything.
01:01:02 It’s either uncomputable or undefinable or uncertain.
01:01:07 It’s almost like the designers of the simulation
01:01:09 are trying to prevent us from understanding it perfectly.
01:01:12 But also, the things that require calculations
01:01:16 require so much calculation that our idea
01:01:18 of the universe of a computer is absurd,
01:01:20 because every single little bit of it
01:01:23 takes all the computation in the universe to figure out.
01:01:26 So that’s a weird kind of computer.
01:01:28 You say the simulation is running
01:01:29 in a computer, which has, by definition, infinite computation.
01:01:34 Not infinite.
01:01:35 Oh, you mean if the universe is infinite?
01:01:37 Yeah.
01:01:38 Well, every little piece of our universe
01:01:40 seems to take infinite computation to figure out.
01:01:43 Not infinite, just a lot.
01:01:44 Well, a lot.
01:01:44 Some pretty big number.
01:01:46 Compute this little teeny spot takes all the mass
01:01:50 in the local one light year by one light year space.
01:01:53 It’s close enough to infinite.
01:01:54 Well, it’s a heck of a computer if it is one.
01:01:56 I know.
01:01:57 It’s a weird description, because the simulation
01:02:01 description seems to break when you look closely at it.
01:02:04 But the rules of the universe seem to imply something’s up.
01:02:08 That seems a little arbitrary.
01:02:10 The universe, the whole thing, the laws of physics,
01:02:14 it just seems like, how did it come out to be the way it is?
01:02:20 Well, lots of people talk about that.
01:02:22 Like I said, the two smartest groups of humans
01:02:24 are working on the same problem.
01:02:26 From different aspects.
01:02:27 And they’re both complete failures.
01:02:29 So that’s kind of cool.
01:02:32 They might succeed eventually.
01:02:34 Well, after 2,000 years, the trend isn’t good.
01:02:37 Oh, 2,000 years is nothing in the span
01:02:39 of the history of the universe.
01:02:40 That’s for sure.
01:02:41 We have some time.
01:02:42 But the next 1,000 years doesn’t look good either.
01:02:46 That’s what everybody says at every stage.
01:02:48 But with Moore’s law, as you’ve just described,
01:02:50 not being dead, the exponential growth of technology,
01:02:54 the future seems pretty incredible.
01:02:57 Well, it’ll be interesting, that’s for sure.
01:02:59 That’s right.
01:03:00 So what are your thoughts on Ray Kurzweil’s sense
01:03:03 that exponential improvement in technology
01:03:05 will continue indefinitely?
01:03:07 Is that how you see Moore’s law?
01:03:09 Do you see Moore’s law more broadly,
01:03:12 in the sense that technology of all kinds
01:03:15 has a way of stacking S curves on top of each other,
01:03:20 where it’ll be exponential, and then we’ll see all kinds of…
01:03:24 What does an exponential of a million mean?
01:03:27 That’s a pretty amazing number.
01:03:29 And that’s just for a local little piece of silicon.
01:03:32 Now let’s imagine you, say, decided
01:03:35 to get 1,000 tons of silicon to collaborate in one computer
01:03:41 at a million times the density.
01:03:44 Now you’re talking, I don’t know, 10 to the 20th more
01:03:47 computation power than our current, already unbelievably
01:03:51 fast computers.
01:03:54 Nobody knows what that’s going to mean.
01:03:55 The sci fi guys call it computronium,
01:03:58 like when a local civilization turns the nearby star
01:04:02 into a computer.
01:04:05 I don’t know if that’s true, but…
01:04:06 So just even when you shrink a transistor, the…
01:04:11 That’s only one dimension.
01:04:12 The ripple effects of that.
01:04:14 People tend to think about computers as a cost problem.
01:04:17 So computers are made out of silicon and minor amounts
01:04:20 of metals and this and that.
01:04:24 None of those things cost any money.
01:04:27 There’s plenty of sand.
01:04:30 You could just turn the beach and a little bit of ocean water
01:04:32 into computers.
01:04:33 So all the cost is in the equipment to do it.
01:04:36 And the trend on equipment is once you
01:04:39 figure out how to build the equipment,
01:04:40 the trend of cost is zero.
01:04:41 Elon said, first you figure out what
01:04:44 configuration you want the atoms in,
01:04:47 and then how to put them there.
01:04:50 His great insight is people are how constrained.
01:04:56 I have this thing, I know how it works,
01:04:58 and then little tweaks to that will generate something,
01:05:02 as opposed to what do I actually want,
01:05:05 and then figure out how to build it.
01:05:07 It’s a very different mindset.
01:05:09 And almost nobody has it, obviously.
01:05:12 Well, let me ask on that topic,
01:05:15 you were one of the key early people
01:05:18 in the development of autopilot, at least in the hardware
01:05:21 side, Elon Musk believes that autopilot
01:05:24 and vehicle autonomy, if you just look at that problem,
01:05:26 can follow this kind of exponential improvement.
01:05:29 In terms of the how question that we’re talking about,
01:05:32 there’s no reason why you can’t.
01:05:34 What are your thoughts on this particular space
01:05:37 of vehicle autonomy, and your part of it
01:05:42 and Elon Musk’s and Tesla’s vision for vehicle autonomy?
01:05:45 Well, the computer you need to build is straightforward.
01:05:48 And you could argue, well, does it need to be
01:05:51 two times faster or five times or 10 times?
01:05:54 But that’s just a matter of time or price in the short run.
01:05:58 So that’s not a big deal.
01:06:00 You don’t have to be especially smart to drive a car.
01:06:03 So it’s not like a super hard problem.
01:06:05 I mean, the big problem with safety is attention,
01:06:07 which computers are really good at, not skills.
01:06:11 Well, let me push back on one.
01:06:15 You see, everything you said is correct,
01:06:17 but we as humans tend to take for granted
01:06:24 how incredible our vision system is.
01:06:26 So you can drive a car with 20, 50 vision,
01:06:30 and you can train a neural network to extract
01:06:33 the distance of any object in the shape of any surface
01:06:36 from a video and data.
01:06:38 Yeah, but that’s really simple.
01:06:40 No, it’s not simple.
01:06:42 That’s a simple data problem.
01:06:44 It’s not, it’s not simple.
01:06:46 It’s because it’s not just detecting objects,
01:06:50 it’s understanding the scene,
01:06:52 and it’s being able to do it in a way
01:06:54 that doesn’t make errors.
01:06:56 So the beautiful thing about the human vision system
01:07:00 and our entire brain around the whole thing
01:07:02 is we’re able to fill in the gaps.
01:07:05 It’s not just about perfectly detecting cars.
01:07:08 It’s inferring the occluded cars.
01:07:09 It’s trying to, it’s understanding the physics.
01:07:12 I think that’s mostly a data problem.
01:07:14 So you think what data would compute
01:07:17 with improvement of computation
01:07:19 with improvement in collection of data?
01:07:20 Well, there is a, you know, when you’re driving a car
01:07:22 and somebody cuts you off, your brain has theories
01:07:24 about why they did it.
01:07:26 You know, they’re a bad person, they’re distracted,
01:07:28 they’re dumb, you know, you can listen to yourself, right?
01:07:32 So, you know, if you think that narrative is important
01:07:37 to be able to successfully drive a car,
01:07:38 then current autopilot systems can’t do it.
01:07:41 But if cars are ballistic things with tracks
01:07:44 and probabilistic changes of speed and direction,
01:07:47 and roads are fixed and given, by the way,
01:07:50 they don’t change dynamically, right?
01:07:53 You can map the world really thoroughly.
01:07:56 You can place every object really thoroughly.
01:08:01 Right, you can calculate trajectories
01:08:03 of things really thoroughly, right?
01:08:06 But everything you said about really thoroughly
01:08:09 has a different degree of difficulty, so.
01:08:13 And you could say at some point,
01:08:15 computer autonomous systems will be way better
01:08:17 at things that humans are lousy at.
01:08:20 Like, they’ll be better at attention,
01:08:22 they’ll always remember there was a pothole in the road
01:08:25 that humans keep forgetting about,
01:08:27 they’ll remember that this set of roads
01:08:29 has these weirdo lines on it
01:08:31 that the computers figured out once,
01:08:32 and especially if they get updates,
01:08:35 so if somebody changes a given,
01:08:38 like, the key to robots and stuff somebody said
01:08:41 is to maximize the givens, right?
01:08:44 Right.
01:08:45 So having a robot pick up this bottle cap
01:08:47 is way easier if you put a red dot on the top,
01:08:51 because then you’ll have to figure out,
01:08:52 and if you wanna do a certain thing with it,
01:08:54 maximize the givens is the thing.
01:08:57 And autonomous systems are happily maximizing the givens.
01:09:01 Like, humans, when you drive someplace new,
01:09:04 you remember it, because you’re processing it
01:09:06 the whole time, and after the 50th time you drove to work,
01:09:08 you get to work, you don’t know how you got there, right?
01:09:11 You’re on autopilot, right?
01:09:14 Autonomous cars are always on autopilot.
01:09:17 But the cars have no theories about why they got cut off,
01:09:20 or why they’re in traffic.
01:09:22 So they also never stop paying attention.
01:09:24 Right, so I tend to believe you do have to have theories,
01:09:28 meta models of other people,
01:09:30 especially with pedestrian cyclists,
01:09:31 but also with other cars.
01:09:32 So everything you said is actually essential to driving.
01:09:38 Driving is a lot more complicated than people realize,
01:09:41 I think, so to push back slightly, but to…
01:09:44 So to cut into traffic, right?
01:09:46 Yep.
01:09:47 You can’t just wait for a gap,
01:09:48 you have to be somewhat aggressive.
01:09:50 You’ll be surprised how simple a calculation for that is.
01:09:53 I may be on that particular point,
01:09:55 but there’s, maybe I actually have to push back.
01:10:00 I would be surprised.
01:10:01 You know what, yeah, I’ll just say where I stand.
01:10:03 I would be very surprised,
01:10:04 but I think you might be surprised how complicated it is.
01:10:10 I tell people, progress disappoints in the short run,
01:10:12 and surprises in the long run.
01:10:13 It’s very possible, yeah.
01:10:15 I suspect in 10 years it’ll be just taken for granted.
01:10:19 Yeah, probably.
01:10:19 But you’re probably right, not look like…
01:10:22 It’s gonna be a $50 solution that nobody cares about.
01:10:25 It’s like GPSes, like, wow, GPSes.
01:10:27 We have satellites in space
01:10:29 that tell you where your location is.
01:10:31 It was a really big deal, now everything has a GPS in it.
01:10:33 Yeah, that’s true, but I do think that systems
01:10:36 that involve human behavior are more complicated
01:10:39 than we give them credit for.
01:10:40 So we can do incredible things with technology
01:10:43 that don’t involve humans, but when you…
01:10:45 I think humans are less complicated than people.
01:10:48 You know, frequently ascribed.
01:10:50 Maybe I feel…
01:10:51 We tend to operate out of large numbers of patterns
01:10:53 and just keep doing it over and over.
01:10:55 But I can’t trust you because you’re a human.
01:10:58 That’s something a human would say.
01:11:00 But my hope is on the point you’ve made is,
01:11:04 even if, no matter who’s right,
01:11:08 I’m hoping that there’s a lot of things
01:11:10 that humans aren’t good at
01:11:11 that machines are definitely good at,
01:11:13 like you said, attention and things like that.
01:11:15 Well, they’ll be so much better
01:11:17 that the overall picture of safety and autonomy
01:11:21 will be, obviously cars will be safer,
01:11:22 even if they’re not as good at understanding.
01:11:24 I’m a big believer in safety.
01:11:26 I mean, there are already the current safety systems,
01:11:29 like cruise control that doesn’t let you run into people
01:11:32 and lane keeping.
01:11:33 There are so many features
01:11:34 that you just look at the parade of accidents
01:11:37 and knocking off like 80% of them is super doable.
01:11:42 Just to linger on the autopilot team
01:11:44 and the efforts there,
01:11:48 it seems to be that there’s a very intense scrutiny
01:11:51 by the media and the public in terms of safety,
01:11:54 the pressure, the bar put before autonomous vehicles.
01:11:58 What are your, sort of as a person there
01:12:01 working on the hardware and trying to build a system
01:12:03 that builds a safe vehicle and so on,
01:12:07 what was your sense about that pressure?
01:12:08 Is it unfair?
01:12:09 Is it expected of new technology?
01:12:12 Yeah, it seems reasonable.
01:12:13 I was interested, I talked to both American
01:12:15 and European regulators,
01:12:17 and I was worried that the regulations
01:12:21 would write into the rules technology solutions,
01:12:25 like modern brake systems imply hydraulic brakes.
01:12:30 So if you read the regulations,
01:12:32 to meet the letter of the law for brakes,
01:12:35 it sort of has to be hydraulic, right?
01:12:37 And the regulator said they’re interested in the use cases,
01:12:42 like a head on crash, an offset crash,
01:12:44 don’t hit pedestrians, don’t run into people,
01:12:47 don’t leave the road, don’t run a red light or a stoplight.
01:12:50 They were very much into the scenarios.
01:12:53 And they had all the data about which scenarios
01:12:56 injured or killed the most people.
01:12:59 And for the most part, those conversations were like,
01:13:04 what’s the right thing to do to take the next step?
01:13:08 Now, Elon’s very interested also in the benefits
01:13:12 of autonomous driving or freeing people’s time
01:13:14 and attention, as well as safety.
01:13:18 And I think that’s also an interesting thing,
01:13:20 but building autonomous systems so they’re safe
01:13:25 and safer than people seemed,
01:13:27 since the goal is to be 10X safer than people,
01:13:30 having the bar to be safer than people
01:13:32 and scrutinizing accidents seems philosophically correct.
01:13:39 So I think that’s a good thing.
01:13:41 What are, is different than the things you worked at,
01:13:46 Intel, AMD, Apple, with autopilot chip design
01:13:51 and hardware design, what are interesting
01:13:54 or challenging aspects of building this specialized
01:13:56 kind of computing system in the automotive space?
01:14:00 I mean, there’s two tricks to building
01:14:01 like an automotive computer.
01:14:02 One is the software team, the machine learning team
01:14:07 is developing algorithms that are changing fast.
01:14:10 So as you’re building the accelerator,
01:14:14 you have this, you know, worry or intuition
01:14:16 that the algorithms will change enough
01:14:18 that the accelerator will be the wrong one, right?
01:14:22 And there’s the generic thing, which is,
01:14:25 if you build a really good general purpose computer,
01:14:27 say its performance is one, and then GPU guys
01:14:31 will deliver about 5X to performance
01:14:34 for the same amount of silicon,
01:14:35 because instead of discovering parallelism,
01:14:37 you’re given parallelism.
01:14:39 And then special accelerators get another two to 5X
01:14:43 on top of a GPU, because you say,
01:14:46 I know the math is always eight bit integers
01:14:49 into 32 bit accumulators, and the operations
01:14:52 are the subset of mathematical possibilities.
01:14:55 So AI accelerators have a claimed performance benefit
01:15:00 over GPUs because in the narrow math space,
01:15:05 you’re nailing the algorithm.
01:15:07 Now, you still try to make it programmable,
01:15:10 but the AI field is changing really fast.
01:15:13 So there’s a, you know, there’s a little
01:15:15 creative tension there of, I want the acceleration
01:15:18 afforded by specialization without being over specialized
01:15:22 so that the new algorithm is so much more effective
01:15:25 that you’d have been better off on a GPU.
01:15:27 So there’s a tension there.
01:15:30 To build a good computer for an application
01:15:33 like automotive, there’s all kinds of sensor inputs
01:15:36 and safety processors and a bunch of stuff.
01:15:39 So one of Elon’s goals is to make it super affordable.
01:15:42 So every car gets an autopilot computer.
01:15:44 So some of the recent startups you look at,
01:15:46 and they have a server in the trunk,
01:15:48 because they’re saying, I’m gonna build
01:15:49 this autopilot computer, replaces the driver.
01:15:52 So their cost budget’s 10 or $20,000.
01:15:55 And Elon’s constraint was, I’m gonna put one in every car,
01:15:58 whether people buy autonomous driving or not.
01:16:01 So the cost constraint he had in mind was great, right?
01:16:05 And to hit that, you had to think about the system design.
01:16:08 That’s complicated, and it’s fun.
01:16:09 You know, it’s like, it’s like, it’s craftsman’s work.
01:16:12 Like, you know, a violin maker, right?
01:16:14 You can say, Stradivarius is this incredible thing,
01:16:16 the musicians are incredible.
01:16:18 But the guy making the violin, you know,
01:16:20 picked wood and sanded it, and then he cut it,
01:16:24 you know, and he glued it, you know,
01:16:25 and he waited for the right day
01:16:27 so that when he put the finish on it,
01:16:29 it didn’t, you know, do something dumb.
01:16:31 That’s craftsman’s work, right?
01:16:33 You may be a genius craftsman
01:16:35 because you have the best techniques
01:16:36 and you discover a new one,
01:16:38 but most engineers, craftsman’s work.
01:16:41 And humans really like to do that.
01:16:44 You know the expression?
01:16:45 Smart humans.
01:16:45 No, everybody.
01:16:46 All humans.
01:16:47 I don’t know.
01:16:48 I used to, I dug ditches when I was in college.
01:16:50 I got really good at it.
01:16:51 Satisfying.
01:16:52 Yeah.
01:16:53 So.
01:16:54 Digging ditches is also craftsman’s work.
01:16:55 Yeah, of course.
01:16:56 So there’s an expression called complex mastery behavior.
01:17:00 So when you’re learning something,
01:17:02 that’s fine, because you’re learning something.
01:17:04 When you do something, it’s relatively simple.
01:17:05 It’s not that satisfying.
01:17:06 But if the steps that you have to do are complicated
01:17:10 and you’re good at them, it’s satisfying to do them.
01:17:14 And then if you’re intrigued by it all,
01:17:16 as you’re doing them, you sometimes learn new things
01:17:19 that you can raise your game.
01:17:21 But craftsman’s work is good.
01:17:23 And engineers, like engineering is complicated enough
01:17:27 that you have to learn a lot of skills.
01:17:28 And then a lot of what you do is then craftsman’s work,
01:17:32 which is fun.
01:17:33 Autonomous driving, building a very resource
01:17:37 constrained computer.
01:17:37 So a computer has to be cheap enough
01:17:39 to put in every single car.
01:17:41 That essentially boils down to craftsman’s work.
01:17:45 It’s engineering, it’s innovation.
01:17:45 Yeah, you know, there’s thoughtful decisions
01:17:47 and problems to solve and trade offs to make.
01:17:50 Do you need 10 camera and ports or eight?
01:17:52 You know, you’re building for the current car
01:17:54 or the next one.
01:17:56 You know, how do you do the safety stuff?
01:17:57 You know, there’s a whole bunch of details.
01:18:00 But it’s fun.
01:18:01 It’s not like I’m building a new type of neural network,
01:18:04 which has a new mathematics and a new computer to work.
01:18:08 You know, that’s like, there’s more invention than that.
01:18:12 But the rejection to practice,
01:18:14 once you pick the architecture, you look inside
01:18:16 and what do you see?
01:18:17 Adders and multipliers and memories and, you know,
01:18:20 the basics.
01:18:21 So computers is always this weird set of abstraction layers
01:18:25 of ideas and thinking that reduction to practice
01:18:29 is transistors and wires and, you know, pretty basic stuff.
01:18:33 And that’s an interesting phenomenon.
01:18:37 By the way, like factory work,
01:18:38 like lots of people think factory work
01:18:40 is road assembly stuff.
01:18:42 I’ve been on the assembly line.
01:18:44 Like the people who work there really like it.
01:18:46 It’s a really great job.
01:18:47 It’s really complicated.
01:18:48 Putting cars together is hard, right?
01:18:50 And the car is moving and the parts are moving
01:18:53 and sometimes the parts are damaged
01:18:55 and you have to coordinate putting all the stuff together
01:18:57 and people are good at it.
01:18:59 They’re good at it.
01:19:00 And I remember one day I went to work
01:19:01 and the line was shut down for some reason
01:19:03 and some of the guys sitting around were really bummed
01:19:06 because they had reorganized a bunch of stuff
01:19:09 and they were gonna hit a new record
01:19:10 for the number of cars built that day.
01:19:12 And they were all gung ho to do it.
01:19:14 And these were big, tough buggers.
01:19:15 And, you know, but what they did was complicated
01:19:19 and you couldn’t do it.
01:19:20 Yeah, and I mean.
01:19:21 Well, after a while you could,
01:19:22 but you’d have to work your way up
01:19:24 because, you know, like putting the bright,
01:19:27 what’s called the brights, the trim on a car
01:19:30 on a moving assembly line
01:19:32 where it has to be attached 25 places
01:19:34 in a minute and a half is unbelievably complicated.
01:19:39 And human beings can do it, it’s really good.
01:19:42 I think that’s harder than driving a car, by the way.
01:19:45 Putting together, working at a.
01:19:47 Working on a factory.
01:19:48 Two smart people can disagree.
01:19:51 Yay.
01:19:52 I think driving a car.
01:19:54 We’ll get you in the factory someday
01:19:56 and then we’ll see how you do.
01:19:57 No, not for us humans driving a car is easy.
01:19:59 I’m saying building a machine that drives a car
01:20:03 is not easy.
01:20:04 No, okay.
01:20:05 Okay.
01:20:05 Driving a car is easy for humans
01:20:07 because we’ve been evolving for billions of years.
01:20:10 Drive cars.
01:20:11 Yeah, I noticed that.
01:20:13 The pale of the cars are super cool.
01:20:16 No, now you join the rest of the internet
01:20:18 and mocking me.
01:20:19 Okay.
01:20:20 I wasn’t mocking, I was just.
01:20:22 Yeah, yeah.
01:20:23 Intrigued by your anthropology.
01:20:26 Yeah, it’s.
01:20:27 I’ll have to go dig into that.
01:20:28 There’s some inaccuracies there, yes.
01:20:31 Okay, but in general,
01:20:35 what have you learned in terms of
01:20:39 thinking about passion, craftsmanship,
01:20:44 tension, chaos.
01:20:47 Jesus.
01:20:48 The whole mess of it.
01:20:50 What have you learned, have taken away from your time
01:20:54 working with Elon Musk, working at Tesla,
01:20:57 which is known to be a place of chaos innovation,
01:21:02 craftsmanship, and all of those things.
01:21:03 I really like the way you thought.
01:21:06 You think you have an understanding
01:21:07 about what first principles of something is,
01:21:10 and then you talk to Elon about it,
01:21:11 and you didn’t scratch the surface.
01:21:15 He has a deep belief that no matter what you do,
01:21:18 it’s a local maximum, right?
01:21:21 And I had a friend, he invented a better electric motor,
01:21:24 and it was a lot better than what we were using.
01:21:26 And one day he came by, he said,
01:21:28 I’m a little disappointed, because this is really great,
01:21:31 and you didn’t seem that impressed.
01:21:33 And I said, when the super intelligent aliens come,
01:21:37 are they going to be looking for you?
01:21:38 Like, where is he?
01:21:39 The guy who built the motor.
01:21:41 Yeah.
01:21:42 Probably not.
01:21:43 You know, like, but doing interesting work
01:21:48 that’s both innovative and, let’s say,
01:21:49 craftsman’s work on the current thing
01:21:51 is really satisfying, and it’s good.
01:21:54 And that’s cool.
01:21:55 And then Elon was good at taking everything apart,
01:21:59 and like, what’s the deep first principle?
01:22:01 Oh, no, what’s really, no, what’s really?
01:22:03 You know, that ability to look at it without assumptions
01:22:08 and how constraints is super wild.
01:22:13 You know, he built a rocket ship, and an electric car,
01:22:17 and you know, everything.
01:22:19 And that’s super fun, and he’s into it, too.
01:22:21 Like, when they first landed two SpaceX rockets at Tesla,
01:22:25 we had a video projector in the big room,
01:22:27 and like, 500 people came down,
01:22:29 and when they landed, everybody cheered,
01:22:30 and some people cried.
01:22:32 It was so cool.
01:22:34 All right, but how did you do that?
01:22:35 Well, it was super hard, and then people say,
01:22:40 well, it’s chaotic, really?
01:22:42 To get out of all your assumptions,
01:22:44 you think that’s not gonna be unbelievably painful?
01:22:47 And is Elon tough?
01:22:49 Yeah, probably.
01:22:50 Do people look back on it and say,
01:22:52 boy, I’m really happy I had that experience
01:22:57 to go take apart that many layers of assumptions?
01:23:02 Sometimes super fun, sometimes painful.
01:23:04 So it could be emotionally and intellectually painful,
01:23:07 that whole process of just stripping away assumptions.
01:23:10 Yeah, imagine 99% of your thought process
01:23:13 is protecting your self conception,
01:23:16 and 98% of that’s wrong.
01:23:20 Now you got the math right.
01:23:22 How do you think you’re feeling
01:23:23 when you get back into that one bit that’s useful,
01:23:26 and now you’re open,
01:23:27 and you have the ability to do something different?
01:23:30 I don’t know if I got the math right.
01:23:33 It might be 99.9, but it ain’t 50.
01:23:38 Imagining it, the 50% is hard enough.
01:23:44 Now, for a long time, I’ve suspected you could get better.
01:23:48 Like you can think better, you can think more clearly,
01:23:50 you can take things apart.
01:23:52 And there’s lots of examples of that, people who do that.
01:23:56 And Nilan is an example of that, you are an example.
01:24:02 I don’t know if I am, I’m fun to talk to.
01:24:06 Certainly.
01:24:07 I’ve learned a lot of stuff.
01:24:09 Well, here’s the other thing, I joke, like I read books,
01:24:12 and people think, oh, you read books.
01:24:14 Well, no, I’ve read a couple of books a week for 55 years.
01:24:20 Well, maybe 50,
01:24:21 because I didn’t learn to read until I was age or something.
01:24:24 And it turns out when people write books,
01:24:28 they often take 20 years of their life
01:24:31 where they passionately did something,
01:24:33 reduce it to 200 pages.
01:24:36 That’s kind of fun.
01:24:37 And then you go online,
01:24:38 and you can find out who wrote the best books
01:24:41 and who liked, you know, that’s kind of wild.
01:24:43 So there’s this wild selection process,
01:24:45 and then you can read it,
01:24:46 and for the most part, understand it.
01:24:49 And then you can go apply it.
01:24:51 Like I went to one company,
01:24:53 I thought, I haven’t managed much before.
01:24:55 So I read 20 management books,
01:24:57 and I started talking to them,
01:24:58 and basically compared to all the VPs running around,
01:25:01 I’d read 19 more management books than anybody else.
01:25:05 It wasn’t even that hard.
01:25:08 And half the stuff worked, like first time.
01:25:11 It wasn’t even rocket science.
01:25:13 But at the core of that is questioning the assumptions,
01:25:16 or sort of entering the thinking,
01:25:20 first principles thinking,
01:25:21 sort of looking at the reality of the situation,
01:25:24 and using that knowledge, applying that knowledge.
01:25:28 So that’s.
01:25:29 So I would say my brain has this idea
01:25:31 that you can question first assumptions.
01:25:35 But I can go days at a time and forget that,
01:25:38 and you have to kind of like circle back that observation.
01:25:42 Because it is emotionally challenging.
01:25:45 Well, it’s hard to just keep it front and center,
01:25:47 because you operate on so many levels all the time,
01:25:50 and getting this done takes priority,
01:25:53 or being happy takes priority,
01:25:56 or screwing around takes priority.
01:25:59 Like how you go through life is complicated.
01:26:03 And then you remember, oh yeah,
01:26:04 I could really think first principles.
01:26:06 Oh shit, that’s tiring.
01:26:09 But you do for a while, and that’s kind of cool.
01:26:12 So just as a last question in your sense,
01:26:16 from the big picture, from the first principles,
01:26:19 do you think, you kind of answered it already,
01:26:21 but do you think autonomous driving is something
01:26:25 we can solve on a timeline of years?
01:26:28 So one, two, three, five, 10 years,
01:26:32 as opposed to a century?
01:26:33 Yeah, definitely.
01:26:35 Just to linger on it a little longer,
01:26:37 where’s the confidence coming from?
01:26:40 Is it the fundamentals of the problem,
01:26:42 the fundamentals of building the hardware and the software?
01:26:46 As a computational problem, understanding ballistics,
01:26:50 roles, topography, it seems pretty solvable.
01:26:56 And you can see this, like speech recognition,
01:26:59 for a long time people are doing frequency
01:27:01 and domain analysis, and all kinds of stuff,
01:27:04 and that didn’t work at all, right?
01:27:07 And then they did deep learning about it,
01:27:09 and it worked great.
01:27:11 And it took multiple iterations.
01:27:13 And autonomous driving is way past
01:27:18 the frequency analysis point.
01:27:21 Use radar, don’t run into things.
01:27:23 And the data gathering’s going up,
01:27:25 and the computation’s going up,
01:27:26 and the algorithm understanding’s going up,
01:27:28 and there’s a whole bunch of problems
01:27:30 getting solved like that.
01:27:32 The data side is really powerful,
01:27:33 but I disagree with both you and Elon.
01:27:35 I’ll tell Elon once again, as I did before,
01:27:38 that when you add human beings into the picture,
01:27:42 it’s no longer a ballistics problem.
01:27:45 It’s something more complicated,
01:27:47 but I could be very well proven wrong.
01:27:50 Cars are highly damped in terms of rate of change.
01:27:53 Like the steering system’s really slow
01:27:56 compared to a computer.
01:27:57 The acceleration of the acceleration’s really slow.
01:28:01 Yeah, on a certain timescale, on a ballistics timescale,
01:28:04 but human behavior, I don’t know.
01:28:07 I shouldn’t say.
01:28:08 Human beings are really slow too.
01:28:09 Weirdly, we operate half a second behind reality.
01:28:13 Nobody really understands that one either.
01:28:15 It’s pretty funny.
01:28:16 Yeah, yeah.
01:28:20 We very well could be surprised,
01:28:23 and I think with the rate of improvement
01:28:25 in all aspects on both the compute
01:28:26 and the software and the hardware,
01:28:29 there’s gonna be pleasant surprises all over the place.
01:28:34 Speaking of unpleasant surprises,
01:28:36 many people have worries about a singularity
01:28:39 in the development of AI.
01:28:41 Forgive me for such questions.
01:28:43 Yeah.
01:28:44 When AI improves the exponential
01:28:46 and reaches a point of superhuman level
01:28:48 general intelligence, beyond the point,
01:28:52 there’s no looking back.
01:28:53 Do you share this worry of existential threats
01:28:56 from artificial intelligence,
01:28:57 from computers becoming superhuman level intelligent?
01:29:01 No, not really.
01:29:04 We already have a very stratified society,
01:29:07 and then if you look at the whole animal kingdom
01:29:09 of capabilities and abilities and interests,
01:29:12 and smart people have their niche,
01:29:15 and normal people have their niche,
01:29:17 and craftsmen have their niche,
01:29:19 and animals have their niche.
01:29:22 I suspect that the domains of interest
01:29:26 for things that are astronomically different,
01:29:29 like the whole something got 10 times smarter than us
01:29:32 and wanted to track us all down because what?
01:29:34 We like to have coffee at Starbucks?
01:29:36 Like, it doesn’t seem plausible.
01:29:38 No, is there an existential problem
01:29:40 that how do you live in a world
01:29:42 where there’s something way smarter than you,
01:29:44 and you based your kind of self esteem
01:29:46 on being the smartest local person?
01:29:48 Well, there’s what, 0.1% of the population who thinks that?
01:29:52 Because the rest of the population’s been dealing with it
01:29:54 since they were born.
01:29:56 So the breadth of possible experience
01:30:00 that can be interesting is really big.
01:30:03 And, you know, superintelligence seems likely,
01:30:11 although we still don’t know if we’re magical,
01:30:14 but I suspect we’re not.
01:30:16 And it seems likely that it’ll create possibilities
01:30:18 that are interesting for us,
01:30:20 and its interests will be interesting for that,
01:30:24 for whatever it is.
01:30:26 It’s not obvious why its interests would somehow
01:30:30 want to fight over some square foot of dirt,
01:30:32 or, you know, whatever the usual fears are about.
01:30:37 So you don’t think it’ll inherit
01:30:38 some of the darker aspects of human nature?
01:30:42 Depends on how you think reality’s constructed.
01:30:45 So for whatever reason,
01:30:48 human beings are in, let’s say,
01:30:50 creative tension and opposition
01:30:52 with both our good and bad forces.
01:30:55 Like, there’s lots of philosophical understanding of that.
01:30:58 I don’t know why that would be different.
01:31:03 So you think the evil is necessary for the good?
01:31:06 I mean, the tension.
01:31:08 I don’t know about evil,
01:31:09 but like we live in a competitive world
01:31:11 where your good is somebody else’s evil.
01:31:16 You know, there’s the malignant part of it,
01:31:19 but that seems to be self limiting,
01:31:22 although occasionally it’s super horrible.
01:31:26 But yes, there’s a debate over ideas,
01:31:29 and some people have different beliefs,
01:31:32 and that debate itself is a process.
01:31:34 So the arriving at something.
01:31:37 Yeah, and why wouldn’t that continue?
01:31:39 Yeah.
01:31:41 But you don’t think that whole process
01:31:43 will leave humans behind in a way that’s painful?
01:31:47 Emotionally painful, yes.
01:31:48 For the 0.1%, they’ll be.
01:31:51 Why isn’t it already painful
01:31:52 for a large percentage of the population?
01:31:54 And it is.
01:31:54 I mean, society does have a lot of stress in it,
01:31:57 about the 1%, and about the this, and about the that,
01:32:00 but you know, everybody has a lot of stress in their life
01:32:03 about what they find satisfying,
01:32:05 and you know, know yourself seems to be the proper dictum,
01:32:10 and pursue something that makes your life meaningful
01:32:14 seems proper, and there’s so many avenues on that.
01:32:18 Like, there’s so much unexplored space
01:32:21 at every single level, you know.
01:32:25 I’m somewhat of, my nephew called me a jaded optimist.
01:32:29 And you know, so it’s.
01:32:33 There’s a beautiful tension in that label,
01:32:37 but if you were to look back at your life,
01:32:40 and could relive a moment, a set of moments,
01:32:45 because there were the happiest times of your life,
01:32:49 outside of family, what would that be?
01:32:54 I don’t want to relive any moments.
01:32:56 I like that.
01:32:58 I like that situation where you have some amount of optimism
01:33:01 and then the anxiety of the unknown.
01:33:06 So you love the unknown, the mystery of it.
01:33:10 I don’t know about the mystery.
01:33:11 It sure gets your blood pumping.
01:33:14 What do you think is the meaning of this whole thing?
01:33:17 Of life, on this pale blue dot?
01:33:21 It seems to be what it does.
01:33:25 Like, the universe, for whatever reason,
01:33:29 makes atoms, which makes us, which we do stuff.
01:33:34 And we figure out things, and we explore things, and.
01:33:38 That’s just what it is.
01:33:39 It’s not just.
01:33:41 Yeah, it is.
01:33:44 Jim, I don’t think there’s a better place to end it
01:33:46 is a huge honor, and.
01:33:50 Well, that was super fun.
01:33:51 Thank you so much for talking today.
01:33:52 All right, great.
01:33:54 Thanks for listening to this conversation,
01:33:56 and thank you to our presenting sponsor, Cash App.
01:33:59 Download it, use code LexPodcast.
01:34:02 You’ll get $10, and $10 will go to FIRST,
01:34:04 a STEM education nonprofit that inspires hundreds
01:34:07 of thousands of young minds to become future leaders
01:34:10 and innovators.
01:34:12 If you enjoy this podcast, subscribe on YouTube.
01:34:15 Give it five stars on Apple Podcast.
01:34:17 Follow on Spotify, support it on Patreon,
01:34:19 or simply connect with me on Twitter.
01:34:22 And now, let me leave you with some words of wisdom
01:34:24 from Gordon Moore.
01:34:26 If everything you try works,
01:34:28 you aren’t trying hard enough.
01:34:30 Thank you for listening, and hope to see you next time.