Jim Keller: Moore's Law, Microprocessors, and First Principles #70

Transcript

00:00:00 The following is a conversation with Jim Keller,

00:00:03 legendary microprocessor engineer

00:00:05 who has worked at AMD, Apple, Tesla, and now Intel.

00:00:10 He’s known for his work on AMD K7, K8, K12,

00:00:13 and Zen microarchitectures, Apple A4 and A5 processors,

00:00:18 and coauthor of the specification

00:00:20 for the x8664 instruction set

00:00:23 and hypertransport interconnect.

00:00:26 He’s a brilliant first principles engineer

00:00:28 and out of the box thinker,

00:00:30 and just an interesting and fun human being to talk to.

00:00:33 This is the Artificial Intelligence Podcast.

00:00:36 If you enjoy it, subscribe on YouTube,

00:00:38 give it five stars on Apple Podcast,

00:00:40 follow on Spotify, support it on Patreon,

00:00:43 or simply connect with me on Twitter,

00:00:45 at Lex Friedman, spelled F R I D M A N.

00:00:49 I recently started doing ads

00:00:51 at the end of the introduction.

00:00:52 I’ll do one or two minutes after introducing the episode

00:00:55 and never any ads in the middle

00:00:57 that can break the flow of the conversation.

00:00:59 I hope that works for you

00:01:00 and doesn’t hurt the listening experience.

00:01:04 This show is presented by Cash App,

00:01:06 the number one finance app in the App Store.

00:01:08 I personally use Cash App to send money to friends,

00:01:11 but you can also use it to buy, sell,

00:01:13 and deposit Bitcoin in just seconds.

00:01:15 Cash App also has a new investing feature.

00:01:18 You can buy fractions of a stock, say $1 worth,

00:01:21 no matter what the stock price is.

00:01:23 Broker services are provided by Cash App Investing,

00:01:26 a subsidiary of Square and member SIPC.

00:01:29 I’m excited to be working with Cash App

00:01:32 to support one of my favorite organizations called First,

00:01:35 best known for their FIRST Robotics and Lego competitions.

00:01:38 They educate and inspire hundreds of thousands of students

00:01:42 in over 110 countries and have a perfect rating

00:01:45 at Charity Navigator,

00:01:46 which means that donated money

00:01:48 is used to maximum effectiveness.

00:01:50 When you get Cash App from the App Store or Google Play

00:01:53 and use code LEXPODCAST,

00:01:56 you’ll get $10 and Cash App will also donate $10 to FIRST,

00:02:00 which again is an organization

00:02:02 that I’ve personally seen inspire girls and boys

00:02:04 to dream of engineering a better world.

00:02:08 And now here’s my conversation with Jim Keller.

00:02:12 What are the differences and similarities

00:02:14 between the human brain and a computer

00:02:17 with the microprocessor at its core?

00:02:19 Let’s start with the philosophical question perhaps.

00:02:22 Well, since people don’t actually understand

00:02:25 how human brains work, I think that’s true.

00:02:29 I think that’s true.

00:02:30 So it’s hard to compare them.

00:02:32 Computers are, you know, there’s really two things.

00:02:37 There’s memory and there’s computation, right?

00:02:40 And to date, almost all computer architectures

00:02:43 are global memory, which is a thing, right?

00:02:47 And then computation where you pull data

00:02:49 and you do relatively simple operations on it

00:02:52 and write data back.

00:02:53 So it’s decoupled in modern computers.

00:02:57 And you think in the human brain,

00:02:59 everything’s a mesh, a mess that’s combined together?

00:03:02 What people observe is there’s, you know,

00:03:04 some number of layers of neurons

00:03:06 which have local and global connections

00:03:09 and information is stored in some distributed fashion

00:03:13 and people build things called neural networks in computers

00:03:18 where the information is distributed

00:03:21 in some kind of fashion.

00:03:22 You know, there’s a mathematics behind it.

00:03:25 I don’t know that the understanding of that is super deep.

00:03:29 The computations we run on those

00:03:31 are straightforward computations.

00:03:33 I don’t believe anybody has said

00:03:35 a neuron does this computation.

00:03:37 So to date, it’s hard to compare them, I would say.

00:03:44 So let’s get into the basics before we zoom back out.

00:03:48 How do you build a computer from scratch?

00:03:51 What is a microprocessor?

00:03:52 What is a microarchitecture?

00:03:54 What’s an instruction set architecture?

00:03:56 Maybe even as far back as what is a transistor?

00:04:01 So the special charm of computer engineering

00:04:05 is there’s a relatively good understanding

00:04:08 of abstraction layers.

00:04:10 So down at the bottom, you have atoms

00:04:12 and atoms get put together in materials like silicon

00:04:15 or dope silicon or metal and we build transistors.

00:04:19 On top of that, we build logic gates, right?

00:04:23 And then functional units, like an adder or a subtractor

00:04:27 or an instruction parsing unit.

00:04:28 And then we assemble those into processing elements.

00:04:32 Modern computers are built out of probably 10 to 20

00:04:37 locally organic processing elements

00:04:40 or coherent processing elements.

00:04:42 And then that runs computer programs, right?

00:04:46 So there’s abstraction layers and then software,

00:04:49 there’s an instruction set you run

00:04:51 and then there’s assembly language C, C++, Java, JavaScript.

00:04:56 There’s abstraction layers,

00:04:58 essentially from the atom to the data center, right?

00:05:02 So when you build a computer,

00:05:06 first there’s a target, like what’s it for?

00:05:08 Like how fast does it have to be?

00:05:09 Which today there’s a whole bunch of metrics

00:05:12 about what that is.

00:05:13 And then in an organization of 1,000 people

00:05:17 who build a computer, there’s lots of different disciplines

00:05:22 that you have to operate on.

00:05:24 Does that make sense?

00:05:25 And so…

00:05:27 So there’s a bunch of levels of abstraction

00:05:30 in an organization like Intel and in your own vision,

00:05:35 there’s a lot of brilliance that comes in

00:05:37 at every one of those layers.

00:05:39 Some of it is science, some of it is engineering,

00:05:41 some of it is art, what’s the most,

00:05:45 if you could pick favorites,

00:05:46 what’s the most important, your favorite layer

00:05:49 on these layers of abstractions?

00:05:51 Where does the magic enter this hierarchy?

00:05:55 I don’t really care.

00:05:57 That’s the fun, you know, I’m somewhat agnostic to that.

00:06:00 So I would say for relatively long periods of time,

00:06:05 instruction sets are stable.

00:06:08 So the x86 instruction set, the ARM instruction set.

00:06:12 What’s an instruction set?

00:06:13 So it says, how do you encode the basic operations?

00:06:16 Load, store, multiply, add, subtract, conditional, branch.

00:06:20 You know, there aren’t that many interesting instructions.

00:06:23 Look, if you look at a program and it runs,

00:06:26 you know, 90% of the execution is on 25 opcodes,

00:06:29 you know, 25 instructions.

00:06:31 And those are stable, right?

00:06:33 What does it mean, stable?

00:06:35 Intel architecture’s been around for 25 years.

00:06:38 It works.

00:06:39 And that’s because the basics, you know,

00:06:42 are defined a long time ago, right?

00:06:45 Now, the way an old computer ran is you fetched

00:06:49 instructions and you executed them in order.

00:06:52 Do the load, do the add, do the compare.

00:06:57 The way a modern computer works is you fetch

00:06:59 large numbers of instructions, say 500.

00:07:03 And then you find the dependency graph

00:07:06 between the instructions.

00:07:07 And then you execute in independent units

00:07:12 those little micrographs.

00:07:15 So a modern computer, like people like to say,

00:07:17 computers should be simple and clean.

00:07:20 But it turns out the market for simple,

00:07:22 clean, slow computers is zero, right?

00:07:26 We don’t sell any simple, clean computers.

00:07:29 No, you can, how you build it can be clean,

00:07:33 but the computer people want to buy,

00:07:36 that’s, say, in a phone or a data center,

00:07:40 fetches a large number of instructions,

00:07:42 computes the dependency graph,

00:07:45 and then executes it in a way that gets the right answers.

00:07:49 And optimizes that graph somehow.

00:07:50 Yeah, they run deeply out of order.

00:07:53 And then there’s semantics around how memory ordering works

00:07:57 and other things work.

00:07:58 So the computer sort of has a bunch of bookkeeping tables

00:08:01 that says what order should these operations finish in

00:08:05 or appear to finish in?

00:08:07 But to go fast, you have to fetch a lot of instructions

00:08:10 and find all the parallelism.

00:08:12 Now, there’s a second kind of computer,

00:08:15 which we call GPUs today.

00:08:17 And I call it the difference.

00:08:19 There’s found parallelism, like you have a program

00:08:21 with a lot of dependent instructions.

00:08:24 You fetch a bunch and then you go figure out

00:08:26 the dependency graph and you issue instructions out of order.

00:08:29 That’s because you have one serial narrative to execute,

00:08:32 which, in fact, can be done out of order.

00:08:35 Did you call it a narrative?

00:08:37 Yeah.

00:08:37 Oh, wow.

00:08:38 Yeah, so humans think of serial narrative.

00:08:40 So read a book, right?

00:08:42 There’s a sentence after sentence after sentence,

00:08:45 and there’s paragraphs.

00:08:46 Now, you could diagram that.

00:08:49 Imagine you diagrammed it properly and you said,

00:08:52 which sentences could be read in any order,

00:08:55 any order without changing the meaning, right?

00:08:59 That’s a fascinating question to ask of a book, yeah.

00:09:02 Yeah, you could do that, right?

00:09:04 So some paragraphs could be reordered,

00:09:06 some sentences can be reordered.

00:09:08 You could say, he is tall and smart and X, right?

00:09:15 And it doesn’t matter the order of tall and smart.

00:09:19 But if you say the tall man is wearing a red shirt,

00:09:22 what colors, you can create dependencies, right?

00:09:28 And so GPUs, on the other hand,

00:09:32 run simple programs on pixels,

00:09:35 but you’re given a million of them.

00:09:36 And the first order, the screen you’re looking at

00:09:40 doesn’t care which order you do it in.

00:09:42 So I call that given parallelism.

00:09:44 Simple narratives around the large numbers of things

00:09:48 where you can just say,

00:09:49 it’s parallel because you told me it was.

00:09:52 So found parallelism where the narrative is sequential,

00:09:57 but you discover like little pockets of parallelism versus.

00:10:01 Turns out large pockets of parallelism.

00:10:03 Large, so how hard is it to discover?

00:10:05 Well, how hard is it?

00:10:06 That’s just transistor count, right?

00:10:08 So once you crack the problem, you say,

00:10:11 here’s how you fetch 10 instructions at a time.

00:10:13 Here’s how you calculate the dependencies between them.

00:10:16 Here’s how you describe the dependencies.

00:10:18 Here’s, you know, these are pieces, right?

00:10:20 So once you describe the dependencies,

00:10:25 then it’s just a graph.

00:10:27 Sort of, it’s an algorithm that finds,

00:10:31 what is that?

00:10:31 I’m sure there’s a graph theoretical answer here

00:10:34 that’s solvable.

00:10:35 In general, programs, modern programs

00:10:40 that human beings write,

00:10:42 how much found parallelism is there in them?

00:10:45 What does 10X mean?

00:10:47 So if you execute it in order, you would get

00:10:52 what’s called cycles per instruction,

00:10:53 and it would be about, you know,

00:10:57 three instructions, three cycles per instruction

00:11:00 because of the latency of the operations and stuff.

00:11:02 And in a modern computer, excuse it,

00:11:05 but like 0.2, 0.25 cycles per instruction.

00:11:08 So it’s about, we today find 10X.

00:11:11 And there’s two things.

00:11:13 One is the found parallelism in the narrative, right?

00:11:17 And the other is the predictability of the narrative, right?

00:11:21 So certain operations say, do a bunch of calculations,

00:11:25 and if greater than one, do this, else do that.

00:11:30 That decision is predicted in modern computers

00:11:33 to high 90% accuracy.

00:11:36 So branches happen a lot.

00:11:38 So imagine you have a decision

00:11:40 to make every six instructions,

00:11:41 which is about the average, right?

00:11:43 But you want to fetch 500 instructions,

00:11:45 figure out the graph, and execute them all in parallel.

00:11:48 That means you have, let’s say,

00:11:51 if you fetch 600 instructions and it’s every six,

00:11:54 you have to fetch, you have to predict

00:11:56 99 out of 100 branches correctly

00:12:00 for that window to be effective.

00:12:02 Okay, so parallelism, you can’t parallelize branches.

00:12:06 Or you can.

00:12:07 No, you can predict.

00:12:08 You can predict.

00:12:09 What does predicted branch mean?

00:12:10 What does predicted branch mean?

00:12:11 So imagine you do a computation over and over.

00:12:13 You’re in a loop.

00:12:14 So while n is greater than one, do.

00:12:19 And you go through that loop a million times.

00:12:21 So every time you look at the branch,

00:12:22 you say, it’s probably still greater than one.

00:12:25 And you’re saying you could do that accurately.

00:12:27 Very accurately.

00:12:28 Modern computers.

00:12:29 My mind is blown.

00:12:30 How the heck do you do that?

00:12:31 Wait a minute.

00:12:32 Well, you want to know?

00:12:33 This is really sad.

00:12:35 20 years ago, you simply recorded

00:12:38 which way the branch went last time

00:12:40 and predicted the same thing.

00:12:42 Right.

00:12:43 Okay.

00:12:44 What’s the accuracy of that?

00:12:46 85%.

00:12:48 So then somebody said, hey, let’s keep a couple of bits

00:12:51 and have a little counter so when it predicts one way,

00:12:54 we count up and then pins.

00:12:56 So say you have a three bit counter.

00:12:58 So you count up and then you count down.

00:13:00 And you can use the top bit as the signed bit

00:13:03 so you have a signed two bit number.

00:13:05 So if it’s greater than one, you predict taken.

00:13:07 And less than one, you predict not taken, right?

00:13:11 Or less than zero, whatever the thing is.

00:13:14 And that got us to 92%.

00:13:16 Oh.

00:13:17 Okay, no, it gets better.

00:13:19 This branch depends on how you got there.

00:13:22 So if you came down the code one way,

00:13:25 you’re talking about Bob and Jane, right?

00:13:28 And then said, does Bob like Jane?

00:13:30 It went one way.

00:13:31 But if you’re talking about Bob and Jill,

00:13:32 does Bob like Jane?

00:13:33 You go a different way.

00:13:35 Right, so that’s called history.

00:13:36 So you take the history and a counter.

00:13:40 That’s cool, but that’s not how anything works today.

00:13:43 They use something that looks a little like a neural network.

00:13:48 So modern, you take all the execution flows.

00:13:52 And then you do basically deep pattern recognition

00:13:56 of how the program is executing.

00:13:59 And you do that multiple different ways.

00:14:03 And you have something that chooses what the best result is.

00:14:07 There’s a little supercomputer inside the computer.

00:14:10 That’s trying to predict branching.

00:14:11 That calculates which way branches go.

00:14:14 So the effective window that it’s worth finding grass

00:14:17 in gets bigger.

00:14:19 Why was that gonna make me sad?

00:14:21 Because that’s amazing.

00:14:22 It’s amazingly complicated.

00:14:24 Oh, well.

00:14:25 Well, here’s the funny thing.

00:14:27 So to get to 85% took 1,000 bits.

00:14:31 To get to 99% takes tens of megabits.

00:14:38 So this is one of those, to get the result,

00:14:42 to get from a window of say 50 instructions to 500,

00:14:47 it took three orders of magnitude

00:14:49 or four orders of magnitude more bits.

00:14:52 Now if you get the prediction of a branch wrong,

00:14:55 what happens then?

00:14:56 You flush the pipe.

00:14:57 You flush the pipe, so it’s just the performance cost.

00:14:59 But it gets even better.

00:15:00 Yeah.

00:15:01 So we’re starting to look at stuff that says,

00:15:03 so they executed down this path,

00:15:06 and then you had two ways to go.

00:15:09 But far away, there’s something that doesn’t matter

00:15:12 which path you went.

00:15:14 So you took the wrong path.

00:15:17 You executed a bunch of stuff.

00:15:20 Then you had the mispredicting.

00:15:21 You backed it up.

00:15:22 You remembered all the results you already calculated.

00:15:25 Some of those are just fine.

00:15:27 Like if you read a book and you misunderstand a paragraph,

00:15:30 your understanding of the next paragraph

00:15:32 sometimes is invariant to that understanding.

00:15:35 Sometimes it depends on it.

00:15:38 And you can kind of anticipate that invariance.

00:15:43 Yeah, well, you can keep track of whether the data changed.

00:15:47 And so when you come back through a piece of code,

00:15:49 should you calculate it again or do the same thing?

00:15:51 Okay, how much of this is art and how much of it is science?

00:15:55 Because it sounds pretty complicated.

00:15:59 Well, how do you describe a situation?

00:16:00 So imagine you come to a point in the road

00:16:02 where you have to make a decision, right?

00:16:05 And you have a bunch of knowledge about which way to go.

00:16:07 Maybe you have a map.

00:16:08 So you wanna go the shortest way,

00:16:11 or do you wanna go the fastest way,

00:16:13 or do you wanna take the nicest road?

00:16:14 So there’s some set of data.

00:16:17 So imagine you’re doing something complicated

00:16:19 like building a computer.

00:16:21 And there’s hundreds of decision points,

00:16:24 all with hundreds of possible ways to go.

00:16:27 And the ways you pick interact in a complicated way.

00:16:32 Right.

00:16:33 And then you have to pick the right spot.

00:16:35 Right, so that’s.

00:16:36 So that’s art or science, I don’t know.

00:16:37 You avoided the question.

00:16:38 You just described the Robert Frost problem

00:16:41 of road less taken.

00:16:43 I described the Robert Frost problem?

00:16:45 That’s what we do as computer designers.

00:16:49 It’s all poetry.

00:16:50 Okay.

00:16:51 Great.

00:16:52 Yeah, I don’t know how to describe that

00:16:54 because some people are very good

00:16:56 at making those intuitive leaps.

00:16:57 It seems like just combinations of things.

00:17:00 Some people are less good at it,

00:17:02 but they’re really good at evaluating the alternatives.

00:17:05 Right, and everybody has a different way to do it.

00:17:09 And some people can’t make those leaps,

00:17:11 but they’re really good at analyzing it.

00:17:14 So when you see computers are designed

00:17:16 by teams of people who have very different skill sets.

00:17:19 And a good team has lots of different kinds of people.

00:17:24 I suspect you would describe some of them

00:17:26 as artistic, but not very many.

00:17:30 Unfortunately, or fortunately.

00:17:32 Fortunately.

00:17:33 Well, you know, computer design’s hard.

00:17:36 It’s 99% perspiration.

00:17:40 And the 1% inspiration is really important.

00:17:44 But you still need the 99.

00:17:45 Yeah, you gotta do a lot of work.

00:17:47 And then there are interesting things to do

00:17:50 at every level of that stack.

00:17:52 So at the end of the day,

00:17:55 if you run the same program multiple times,

00:17:58 does it always produce the same result?

00:18:01 Is there some room for fuzziness there?

00:18:04 That’s a math problem.

00:18:06 So if you run a correct C program,

00:18:08 the definition is every time you run it,

00:18:11 you get the same answer.

00:18:12 Yeah, well that’s a math statement.

00:18:14 But that’s a language definitional statement.

00:18:17 So for years when people did,

00:18:19 when we first did 3D acceleration of graphics,

00:18:24 you could run the same scene multiple times

00:18:27 and get different answers.

00:18:28 Right.

00:18:29 Right, and then some people thought that was okay

00:18:32 and some people thought it was a bad idea.

00:18:34 And then when the HPC world used GPUs for calculations,

00:18:39 they thought it was a really bad idea.

00:18:41 Okay, now in modern AI stuff,

00:18:44 people are looking at networks

00:18:48 where the precision of the data is low enough

00:18:51 that the data is somewhat noisy.

00:18:53 And the observation is the input data is unbelievably noisy.

00:18:57 So why should the calculation be not noisy?

00:19:00 And people have experimented with algorithms

00:19:02 that say can get faster answers by being noisy.

00:19:05 Like as a network starts to converge,

00:19:08 if you look at the computation graph,

00:19:09 it starts out really wide and then it gets narrower.

00:19:12 And you can say is that last little bit that important

00:19:14 or should I start the graph on the next rev

00:19:17 before we whittle it all the way down to the answer, right?

00:19:21 So you can create algorithms that are noisy.

00:19:24 Now if you’re developing something

00:19:25 and every time you run it, you get a different answer,

00:19:27 it’s really annoying.

00:19:29 And so most people think even today,

00:19:33 every time you run the program, you get the same answer.

00:19:36 No, I know, but the question is

00:19:38 that’s the formal definition of a programming language.

00:19:42 There is a definition of languages

00:19:44 that don’t get the same answer,

00:19:45 but people who use those, you always want something

00:19:49 because you get a bad answer and then you’re wondering

00:19:51 is it because of something in the algorithm

00:19:54 or because of this?

00:19:55 And so everybody wants a little switch that says

00:19:57 no matter what, do it deterministically.

00:20:00 And it’s really weird because almost everything

00:20:02 going into modern calculations is noisy.

00:20:05 So why do the answers have to be so clear?

00:20:08 Right, so where do you stand?

00:20:09 I design computers for people who run programs.

00:20:12 So if somebody says I want a deterministic answer,

00:20:16 like most people want that.

00:20:18 Can you deliver a deterministic answer,

00:20:20 I guess is the question.

00:20:21 Like when you.

00:20:22 Yeah, hopefully, sure.

00:20:24 What people don’t realize is you get a deterministic answer

00:20:27 even though the execution flow is very undeterministic.

00:20:31 So you run this program 100 times,

00:20:33 it never runs the same way twice, ever.

00:20:36 And the answer, it arrives at the same answer.

00:20:37 But it gets the same answer every time.

00:20:39 It’s just amazing.

00:20:42 Okay, you’ve achieved, in the eyes of many people,

00:20:49 legend status as a chip art architect.

00:20:53 What design creation are you most proud of?

00:20:56 Perhaps because it was challenging,

00:20:59 because of its impact, or because of the set

00:21:01 of brilliant ideas that were involved in bringing it to life?

00:21:06 I find that description odd.

00:21:10 And I have two small children, and I promise you,

00:21:14 they think it’s hilarious.

00:21:15 This question.

00:21:16 Yeah.

00:21:17 I do it for them.

00:21:18 So I’m really interested in building computers.

00:21:23 And I’ve worked with really, really smart people.

00:21:27 I’m not unbelievably smart.

00:21:30 I’m fascinated by how they go together,

00:21:32 both as a thing to do and as an endeavor that people do.

00:21:38 How people and computers go together?

00:21:40 Yeah.

00:21:40 Like how people think and build a computer.

00:21:44 And I find sometimes that the best computer architects

00:21:47 aren’t that interested in people,

00:21:49 or the best people managers aren’t that good

00:21:51 at designing computers.

00:21:54 So the whole stack of human beings is fascinating.

00:21:56 So the managers, the individual engineers.

00:21:58 Yeah, yeah.

00:21:59 Yeah, I said I realized after a lot of years

00:22:02 of building computers, where you sort of build them

00:22:04 out of transistors, logic gates, functional units,

00:22:06 computational elements, that you could think of people

00:22:09 the same way, so people are functional units.

00:22:12 And then you could think of organizational design

00:22:14 as a computer architecture problem.

00:22:16 And then it was like, oh, that’s super cool,

00:22:19 because the people are all different,

00:22:20 just like the computational elements are all different.

00:22:23 And they like to do different things.

00:22:25 And so I had a lot of fun reframing

00:22:29 how I think about organizations.

00:22:31 Just like with computers, we were saying execution paths,

00:22:35 you can have a lot of different paths that end up

00:22:37 at the same good destination.

00:22:41 So what have you learned about the human abstractions

00:22:45 from individual functional human units

00:22:48 to the broader organization?

00:22:51 What does it take to create something special?

00:22:55 Well, most people don’t think simple enough.

00:23:00 All right, so the difference between a recipe

00:23:02 and the understanding.

00:23:04 There’s probably a philosophical description of this.

00:23:09 So imagine you’re gonna make a loaf of bread.

00:23:11 The recipe says get some flour, add some water,

00:23:14 add some yeast, mix it up, let it rise,

00:23:16 put it in a pan, put it in the oven.

00:23:19 It’s a recipe.

00:23:21 Understanding bread, you can understand biology,

00:23:24 supply chains, grain grinders, yeast, physics,

00:23:29 thermodynamics, there’s so many levels of understanding.

00:23:37 And then when people build and design things,

00:23:40 they frequently are executing some stack of recipes.

00:23:45 And the problem with that is the recipes

00:23:46 all have limited scope.

00:23:48 Like if you have a really good recipe book

00:23:50 for making bread, it won’t tell you anything

00:23:52 about how to make an omelet.

00:23:54 But if you have a deep understanding of cooking,

00:23:57 right, than bread, omelets, you know, sandwich,

00:24:03 you know, there’s a different way of viewing everything.

00:24:07 And most people, when you get to be an expert at something,

00:24:13 you know, you’re hoping to achieve deeper understanding,

00:24:16 not just a large set of recipes to go execute.

00:24:20 And it’s interesting to walk groups of people

00:24:22 because executing recipes is unbelievably efficient

00:24:27 if it’s what you want to do.

00:24:30 If it’s not what you want to do, you’re really stuck.

00:24:34 And that difference is crucial.

00:24:36 And everybody has a balance of, let’s say,

00:24:39 deeper understanding of recipes.

00:24:40 And some people are really good at recognizing

00:24:43 when the problem is to understand something deeply.

00:24:47 Does that make sense?

00:24:49 It totally makes sense, does every stage of development,

00:24:52 deep understanding on the team needed?

00:24:55 Oh, this goes back to the art versus science question.

00:24:58 Sure.

00:24:59 If you constantly unpack everything

00:25:01 for deeper understanding, you never get anything done.

00:25:04 And if you don’t unpack understanding when you need to,

00:25:06 you’ll do the wrong thing.

00:25:09 And then at every juncture, like human beings

00:25:12 are these really weird things because everything you tell them

00:25:15 has a million possible outputs, right?

00:25:18 And then they all interact in a hilarious way.

00:25:21 Yeah, it’s very interesting.

00:25:21 And then having some intuition about what you tell them,

00:25:24 what you do, when do you intervene, when do you not,

00:25:26 it’s complicated.

00:25:28 Right, so.

00:25:29 It’s essentially computationally unsolvable.

00:25:33 Yeah, it’s an intractable problem, sure.

00:25:36 Humans are a mess.

00:25:37 But with deep understanding,

00:25:41 do you mean also sort of fundamental questions

00:25:44 of things like what is a computer?

00:25:51 Or why, like the why questions,

00:25:55 why are we even building this, like of purpose?

00:25:58 Or do you mean more like going towards

00:26:02 the fundamental limits of physics,

00:26:04 sort of really getting into the core of the science?

00:26:07 In terms of building a computer, think a little simpler.

00:26:11 So common practice is you build a computer,

00:26:14 and then when somebody says, I wanna make it 10% faster,

00:26:17 you’ll go in and say, all right,

00:26:19 I need to make this buffer bigger,

00:26:20 and maybe I’ll add an add unit.

00:26:23 Or I have this thing that’s three instructions wide,

00:26:25 I’m gonna make it four instructions wide.

00:26:27 And what you see is each piece

00:26:30 gets incrementally more complicated, right?

00:26:34 And then at some point you hit this limit,

00:26:37 like adding another feature or buffer

00:26:39 doesn’t seem to make it any faster.

00:26:41 And then people will say,

00:26:42 well, that’s because it’s a fundamental limit.

00:26:45 And then somebody else will look at it and say,

00:26:46 well, actually the way you divided the problem up

00:26:49 and the way the different features are interacting

00:26:52 is limiting you, and it has to be rethought, rewritten.

00:26:56 So then you refactor it and rewrite it,

00:26:58 and what people commonly find is the rewrite

00:27:00 is not only faster, but half as complicated.

00:27:03 From scratch? Yes.

00:27:05 So how often in your career, but just have you seen

00:27:08 is needed, maybe more generally,

00:27:11 to just throw the whole thing out and start over?

00:27:14 This is where I’m on one end of it,

00:27:17 every three to five years.

00:27:19 Which end are you on?

00:27:21 Rewrite more often.

00:27:22 Rewrite, and three to five years is?

00:27:25 If you wanna really make a lot of progress

00:27:27 on computer architecture, every five years

00:27:28 you should do one from scratch.

00:27:31 So where does the x86.64 standard come in?

00:27:36 How often do you?

00:27:38 I was the coauthor of that spec in 98.

00:27:42 That’s 20 years ago.

00:27:43 Yeah, so that’s still around.

00:27:45 The instruction set itself has been extended

00:27:48 quite a few times.

00:27:50 And instruction sets are less interesting

00:27:52 than the implementation underneath.

00:27:54 There’s been, on x86 architecture, Intel’s designed a few,

00:27:58 AIM designed a few very different architectures.

00:28:02 And I don’t wanna go into too much of the detail

00:28:06 about how often, but there’s a tendency

00:28:10 to rewrite it every 10 years,

00:28:12 and it really should be every five.

00:28:15 So you’re saying you’re an outlier in that sense.

00:28:17 Rewrite more often.

00:28:19 Rewrite more often.

00:28:20 Well, and here’s the problem.

00:28:20 Isn’t that scary?

00:28:22 Yeah, of course.

00:28:23 Well, scary to who?

00:28:25 To everybody involved, because like you said,

00:28:28 repeating the recipe is efficient.

00:28:30 Companies wanna make money.

00:28:34 No, individual engineers wanna succeed,

00:28:36 so you wanna incrementally improve,

00:28:39 increase the buffer from three to four.

00:28:41 Well, this is where you get

00:28:42 into the diminishing return curves.

00:28:45 I think Steve Jobs said this, right?

00:28:46 So every, you have a project, and you start here,

00:28:49 and it goes up, and you have diminishing return.

00:28:52 And to get to the next level, you have to do a new one,

00:28:54 and the initial starting point will be lower

00:28:57 than the old optimization point, but it’ll get higher.

00:29:01 So now you have two kinds of fear,

00:29:03 short term disaster and long term disaster.

00:29:07 And you’re, you’re haunted.

00:29:08 So grown ups, right, like, you know,

00:29:12 people with a quarter by quarter business objective

00:29:15 are terrified about changing everything.

00:29:17 And people who are trying to run a business

00:29:21 or build a computer for a long term objective

00:29:23 know that the short term limitations block them

00:29:27 from the long term success.

00:29:29 So if you look at leaders of companies

00:29:32 that had really good long term success,

00:29:35 every time they saw that they had to redo something, they did.

00:29:39 And so somebody has to speak up.

00:29:41 Or you do multiple projects in parallel,

00:29:43 like you optimize the old one while you build a new one.

00:29:46 But the marketing guys are always like,

00:29:48 make promise me that the new computer

00:29:49 is faster on every single thing.

00:29:52 And the computer architect says,

00:29:53 well, the new computer will be faster on the average,

00:29:56 but there’s a distribution of results and performance,

00:29:59 and you’ll have some outliers that are slower.

00:30:01 And that’s very hard,

00:30:02 because they have one customer who cares about that one.

00:30:05 So speaking of the long term, for over 50 years now,

00:30:08 Moore’s Law has served, for me and millions of others,

00:30:12 as an inspiring beacon of what kind of amazing future

00:30:16 brilliant engineers can build.

00:30:18 Yep.

00:30:19 I’m just making your kids laugh all of today.

00:30:21 That was great.

00:30:23 So first, in your eyes, what is Moore’s Law,

00:30:27 if you could define for people who don’t know?

00:30:29 Well, the simple statement was, from Gordon Moore,

00:30:34 was double the number of transistors every two years.

00:30:37 Something like that.

00:30:39 And then my operational model is,

00:30:43 we increase the performance of computers

00:30:45 by two X every two or three years.

00:30:48 And it’s wiggled around substantially over time.

00:30:51 And also, in how we deliver, performance has changed.

00:30:55 But the foundational idea was

00:31:00 two X to transistors every two years.

00:31:02 The current cadence is something like,

00:31:05 they call it a shrink factor, like 0.6 every two years,

00:31:10 which is not 0.5.

00:31:11 But that’s referring strictly, again,

00:31:13 to the original definition of just.

00:31:15 A transistor count.

00:31:16 A shrink factor’s just getting them

00:31:18 smaller and smaller and smaller.

00:31:19 Well, it’s for a constant chip area.

00:31:21 If you make the transistors smaller by 0.6,

00:31:24 then you get one over 0.6 more transistors.

00:31:27 So can you linger on it a little longer?

00:31:29 What’s a broader, what do you think should be

00:31:31 the broader definition of Moore’s Law?

00:31:33 When you mentioned how you think of performance,

00:31:37 just broadly, what’s a good way to think about Moore’s Law?

00:31:42 Well, first of all, I’ve been aware

00:31:45 of Moore’s Law for 30 years.

00:31:48 In which sense?

00:31:49 Well, I’ve been designing computers for 40.

00:31:52 You’re just watching it before your eyes kind of thing.

00:31:56 And somewhere where I became aware of it,

00:31:58 I was also informed that Moore’s Law

00:31:59 was gonna die in 10 to 15 years.

00:32:02 And then I thought that was true at first.

00:32:03 But then after 10 years, it was gonna die in 10 to 15 years.

00:32:07 And then at one point, it was gonna die in five years.

00:32:09 And then it went back up to 10 years.

00:32:11 And at some point, I decided not to worry

00:32:13 about that particular prognostication

00:32:16 for the rest of my life, which is fun.

00:32:19 And then I joined Intel and everybody said

00:32:21 Moore’s Law is dead.

00:32:22 And I thought that’s sad,

00:32:23 because it’s the Moore’s Law company.

00:32:25 And it’s not dead.

00:32:26 And it’s always been gonna die.

00:32:29 And humans like these apocryphal kind of statements,

00:32:33 like we’ll run out of food, or we’ll run out of air,

00:32:36 or we’ll run out of room, or we’ll run out of something.

00:32:39 Right, but it’s still incredible

00:32:41 that it’s lived for as long as it has.

00:32:44 And yes, there’s many people who believe now

00:32:47 that Moore’s Law is dead.

00:32:50 You know, they can join the last 50 years

00:32:52 of people who had the same idea.

00:32:53 Yeah, there’s a long tradition.

00:32:55 But why do you think, if you can try to understand it,

00:33:00 why do you think it’s not dead?

00:33:03 Well, let’s just think, people think Moore’s Law

00:33:06 is one thing, transistors get smaller.

00:33:09 But actually, under the sheet,

00:33:10 there’s literally thousands of innovations.

00:33:12 And almost all those innovations

00:33:14 have their own diminishing return curves.

00:33:17 So if you graph it, it looks like a cascade

00:33:19 of diminishing return curves.

00:33:21 I don’t know what to call that.

00:33:22 But the result is an exponential curve.

00:33:26 Well, at least it has been.

00:33:27 So, and we keep inventing new things.

00:33:30 So if you’re an expert in one of the things

00:33:32 on a diminishing return curve, right,

00:33:35 and you can see it’s plateau,

00:33:38 you will probably tell people, well, this is done.

00:33:42 Meanwhile, some other pile of people

00:33:43 are doing something different.

00:33:46 So that’s just normal.

00:33:48 So then there’s the observation of

00:33:50 how small could a switching device be?

00:33:54 So a modern transistor is something like

00:33:55 a thousand by a thousand by a thousand atoms, right?

00:33:59 And you get quantum effects down around two to 10 atoms.

00:34:04 So you can imagine the transistor

00:34:06 as small as 10 by 10 by 10.

00:34:08 So that’s a million times smaller.

00:34:12 And then the quantum computational people

00:34:14 are working away at how to use quantum effects.

00:34:17 So.

00:34:20 A thousand by a thousand by a thousand.

00:34:21 Atoms.

00:34:23 That’s a really clean way of putting it.

00:34:26 Well, a fan, like a modern transistor,

00:34:28 if you look at the fan, it’s like 120 atoms wide,

00:34:32 but we can make that thinner.

00:34:33 And then there’s a gate wrapped around it,

00:34:35 and then there’s spacing.

00:34:36 There’s a whole bunch of geometry.

00:34:38 And a competent transistor designer

00:34:42 could count both atoms in every single direction.

00:34:48 Like there’s techniques now to already put down atoms

00:34:50 in a single atomic layer.

00:34:53 And you can place atoms if you want to.

00:34:55 It’s just from a manufacturing process,

00:34:59 if placing an atom takes 10 minutes

00:35:01 and you need to put 10 to the 23rd atoms together

00:35:05 to make a computer, it would take a long time.

00:35:08 So the methods are both shrinking things

00:35:13 and then coming up with effective ways

00:35:15 to control what’s happening.

00:35:17 Manufacture stably and cheaply.

00:35:20 Yeah.

00:35:21 So the innovation stock’s pretty broad.

00:35:23 There’s equipment, there’s optics, there’s chemistry,

00:35:26 there’s physics, there’s material science,

00:35:29 there’s metallurgy, there’s lots of ideas

00:35:31 about when you put different materials together,

00:35:33 how do they interact, are they stable,

00:35:35 is it stable over temperature, like are they repeatable?

00:35:40 There’s like literally thousands of technologies involved.

00:35:45 But just for the shrinking, you don’t think

00:35:46 we’re quite yet close to the fundamental limits of physics?

00:35:50 I did a talk on Moore’s Law and I asked for a roadmap

00:35:53 to a path of 100 and after two weeks,

00:35:56 they said we only got to 50.

00:35:58 100 what, sorry?

00:35:59 100 X shrink.

00:36:00 100 X shrink?

00:36:01 We only got to 50.

00:36:02 And I said, why don’t you give it another two weeks?

00:36:05 Well, here’s the thing about Moore’s Law, right?

00:36:09 So I believe that the next 10 or 20 years

00:36:14 of shrinking is gonna happen, right?

00:36:16 Now, as a computer designer, you have two stances.

00:36:20 You think it’s going to shrink, in which case

00:36:23 you’re designing and thinking about architecture

00:36:26 in a way that you’ll use more transistors.

00:36:29 Or conversely, not be swamped by the complexity

00:36:32 of all the transistors you get, right?

00:36:36 You have to have a strategy, you know?

00:36:39 So you’re open to the possibility and waiting

00:36:42 for the possibility of a whole new army

00:36:44 of transistors ready to work.

00:36:45 I’m expecting more transistors every two or three years

00:36:50 by a number large enough that how you think about design,

00:36:54 how you think about architecture has to change.

00:36:57 Like, imagine you build buildings out of bricks,

00:37:01 and every year the bricks are half the size,

00:37:04 or every two years.

00:37:05 Well, if you kept building bricks the same way,

00:37:08 so many bricks per person per day,

00:37:11 the amount of time to build a building

00:37:13 would go up exponentially, right?

00:37:16 But if you said, I know that’s coming,

00:37:19 so now I’m gonna design equipment that moves bricks faster,

00:37:22 uses them better, because maybe you’re getting something

00:37:24 out of the smaller bricks, more strength, thinner walls,

00:37:27 you know, less material, efficiency out of that.

00:37:30 So once you have a roadmap with what’s gonna happen,

00:37:33 transistors, we’re gonna get more of them,

00:37:36 then you design all this collateral around it

00:37:38 to take advantage of it, and also to cope with it.

00:37:42 Like, that’s the thing people don’t understand.

00:37:43 It’s like, if I didn’t believe in Moore’s Law,

00:37:46 and then Moore’s Law transistors showed up,

00:37:48 my design teams would all drown.

00:37:50 So what’s the hardest part of this inflow

00:37:56 of new transistors?

00:37:57 I mean, even if you just look historically,

00:37:59 throughout your career, what’s the thing,

00:38:03 what fundamentally changes when you add more transistors

00:38:06 in the task of designing an architecture?

00:38:10 Well, there’s two constants, right?

00:38:12 One is people don’t get smarter.

00:38:16 By the way, there’s some science showing

00:38:17 that we do get smarter because of nutrition or whatever.

00:38:21 Sorry to bring that up.

00:38:22 Blend effect.

00:38:22 Yes.

00:38:23 Yeah, I’m familiar with it.

00:38:24 Nobody understands it, nobody knows if it’s still going on.

00:38:26 So that’s a…

00:38:27 Or whether it’s real or not.

00:38:28 But yeah, it’s a…

00:38:30 I sort of…

00:38:31 Anyway, but not exponentially.

00:38:32 I would believe for the most part,

00:38:33 people aren’t getting much smarter.

00:38:35 The evidence doesn’t support it, that’s right.

00:38:37 And then teams can’t grow that much.

00:38:40 Right.

00:38:40 Right, so human beings, you know,

00:38:43 we’re really good in teams of 10,

00:38:45 you know, up to teams of 100, they can know each other.

00:38:48 Beyond that, you have to have organizational boundaries.

00:38:50 So you’re kind of, you have,

00:38:51 those are pretty hard constraints, right?

00:38:54 So then you have to divide and conquer,

00:38:56 like as the designs get bigger,

00:38:57 you have to divide it into pieces.

00:39:00 You know, the power of abstraction layers is really high.

00:39:03 We used to build computers out of transistors.

00:39:06 Now we have a team that turns transistors into logic cells

00:39:08 and another team that turns them into functional units,

00:39:10 another one that turns them into computers, right?

00:39:13 So we have abstraction layers in there

00:39:16 and you have to think about when do you shift gears on that.

00:39:21 We also use faster computers to build faster computers.

00:39:24 So some algorithms run twice as fast on new computers,

00:39:27 but a lot of algorithms are N squared.

00:39:30 So, you know, a computer with twice as many transistors

00:39:33 and it might take four times as long to run.

00:39:36 So you have to refactor the software.

00:39:39 Like simply using faster computers

00:39:41 to build bigger computers doesn’t work.

00:39:44 So you have to think about all these things.

00:39:46 So in terms of computing performance

00:39:47 and the exciting possibility

00:39:49 that more powerful computers bring,

00:39:51 is shrinking the thing which you’ve been talking about,

00:39:57 for you, one of the biggest exciting possibilities

00:39:59 of advancement in performance?

00:40:01 Or is there other directions that you’re interested in,

00:40:03 like in the direction of sort of enforcing given parallelism

00:40:08 or like doing massive parallelism

00:40:12 in terms of many, many CPUs,

00:40:15 you know, stacking CPUs on top of each other,

00:40:17 that kind of parallelism or any kind of parallelism?

00:40:20 Well, think about it a different way.

00:40:22 So old computers, you know, slow computers,

00:40:25 you said A equal B plus C times D, pretty simple, right?

00:40:30 And then we made faster computers with vector units

00:40:33 and you can do proper equations and matrices, right?

00:40:38 And then modern like AI computations

00:40:41 or like convolutional neural networks,

00:40:43 where you convolve one large data set against another.

00:40:47 And so there’s sort of this hierarchy of mathematics,

00:40:51 you know, from simple equation to linear equations,

00:40:54 to matrix equations, to deeper kind of computation.

00:40:58 And the data sets are getting so big

00:41:00 that people are thinking of data as a topology problem.

00:41:04 You know, data is organized in some immense shape.

00:41:07 And then the computation, which sort of wants to be,

00:41:11 get data from immense shape and do some computation on it.

00:41:15 So what computers have allowed people to do

00:41:18 is have algorithms go much, much further.

00:41:22 So that paper you reference, the Sutton paper,

00:41:26 they talked about, you know, like when AI started,

00:41:29 it was apply rule sets to something.

00:41:31 That’s a very simple computational situation.

00:41:35 And then when they did first chess thing,

00:41:37 they solved deep searches.

00:41:39 So have a huge database of moves and results, deep search,

00:41:44 but it’s still just a search, right?

00:41:48 Now we take large numbers of images

00:41:51 and we use it to train these weight sets

00:41:54 that we convolve across.

00:41:56 It’s a completely different kind of phenomena.

00:41:58 We call that AI.

00:41:59 Now they’re doing the next generation.

00:42:02 And if you look at it,

00:42:03 they’re going up this mathematical graph, right?

00:42:07 And then computations, both computation and data sets

00:42:11 support going up that graph.

00:42:13 Yeah, the kind of computation that might,

00:42:15 I mean, I would argue that all of it is still a search,

00:42:18 right?

00:42:20 Just like you said, a topology problem with data sets,

00:42:22 you’re searching the data sets for valuable data

00:42:27 and also the actual optimization of neural networks

00:42:30 is a kind of search for the…

00:42:33 I don’t know, if you had looked at the interlayers

00:42:34 of finding a cat, it’s not a search.

00:42:39 It’s a set of endless projections.

00:42:41 So, you know, a projection,

00:42:42 here’s a shadow of this phone, right?

00:42:45 And then you can have a shadow of that on the something

00:42:47 and a shadow on that of something.

00:42:49 And if you look in the layers, you’ll see

00:42:51 this layer actually describes pointy ears

00:42:53 and round eyeness and fuzziness.

00:42:56 But the computation to tease out the attributes

00:43:02 is not search.

00:43:03 Like the inference part might be search,

00:43:05 but the training’s not search.

00:43:07 And then in deep networks, they look at layers

00:43:10 and they don’t even know it’s represented.

00:43:14 And yet, if you take the layers out, it doesn’t work.

00:43:16 So I don’t think it’s search.

00:43:18 But you’d have to talk to a mathematician

00:43:21 about what that actually is.

00:43:22 Well, we could disagree, but it’s just semantics,

00:43:27 I think, it’s not, but it’s certainly not…

00:43:29 I would say it’s absolutely not semantics, but…

00:43:31 Okay, all right, well, if you want to go there.

00:43:37 So optimization to me is search,

00:43:39 and we’re trying to optimize the ability

00:43:42 of a neural network to detect cat ears.

00:43:45 And the difference between chess and the space,

00:43:51 the incredibly multidimensional,

00:43:54 100,000 dimensional space that neural networks

00:43:57 are trying to optimize over is nothing like

00:44:00 the chessboard database.

00:44:02 So it’s a totally different kind of thing.

00:44:04 And okay, in that sense, you can say it loses the meaning.

00:44:07 I can see how you might say, if you…

00:44:11 The funny thing is, it’s the difference

00:44:12 between given search space and found search space.

00:44:16 Right, exactly.

00:44:17 Yeah, maybe that’s a different way to describe it.

00:44:18 That’s a beautiful way to put it, okay.

00:44:19 But you’re saying, what’s your sense

00:44:21 in terms of the basic mathematical operations

00:44:24 and the architectures, computer hardware

00:44:27 that enables those operations?

00:44:29 Do you see the CPUs of today still being

00:44:33 a really core part of executing

00:44:36 those mathematical operations?

00:44:37 Yes.

00:44:38 Well, the operations continue to be add, subtract,

00:44:42 load, store, compare, and branch.

00:44:44 It’s remarkable.

00:44:46 So it’s interesting, the building blocks

00:44:48 of computers or transistors under that atoms.

00:44:52 So you got atoms, transistors, logic gates, computers,

00:44:56 functional units of computers.

00:44:58 The building blocks of mathematics at some level

00:45:01 are things like adds and subtracts and multiplies,

00:45:04 but the space mathematics can describe

00:45:08 is, I think, essentially infinite.

00:45:11 But the computers that run the algorithms

00:45:14 are still doing the same things.

00:45:16 Now, a given algorithm might say, I need sparse data,

00:45:20 or I need 32 bit data, or I need, you know,

00:45:24 like a convolution operation that naturally takes

00:45:27 eight bit data, multiplies it, and sums it up a certain way.

00:45:31 So like the data types in TensorFlow

00:45:35 imply an optimization set.

00:45:38 But when you go right down and look at the computers,

00:45:40 it’s and and or gates doing adds and multiplies.

00:45:42 Like that hasn’t changed much.

00:45:46 Now, the quantum researchers think

00:45:48 they’re going to change that radically,

00:45:50 and then there’s people who think about analog computing

00:45:52 because you look in the brain, and it

00:45:53 seems to be more analogish.

00:45:55 You know, that maybe there’s a way to do that more

00:45:58 efficiently.

00:45:59 But we have a million X on computation,

00:46:03 and I don’t know the relationship

00:46:07 between computational, let’s say,

00:46:09 intensity and ability to hit mathematical abstractions.

00:46:15 I don’t know any way to describe that, but just like you saw

00:46:19 in AI, you went from rule sets to simple search

00:46:23 to complex search to, say, found search.

00:46:26 Like those are orders of magnitude more computation

00:46:30 to do.

00:46:31 And as we get the next two orders of magnitude,

00:46:34 like a friend, Roger Gaduri, said,

00:46:36 like every order of magnitude changes the computation.

00:46:40 Fundamentally changes what the computation is doing.

00:46:42 Yeah.

00:46:44 Oh, you know the expression the difference in quantity

00:46:46 is the difference in kind.

00:46:49 You know, the difference between ant and anthill, right?

00:46:53 Or neuron and brain.

00:46:56 You know, there’s this indefinable place

00:46:58 where the quantity changed the quality, right?

00:47:02 And we’ve seen that happen in mathematics multiple times,

00:47:05 and you know, my guess is it’s going to keep happening.

00:47:08 So your sense is, yeah, if you focus head down

00:47:12 and shrinking the transistor.

00:47:14 Well, it’s not just head down, we’re aware of the software

00:47:18 stacks that are running in the computational loads,

00:47:20 and we’re kind of pondering what do you

00:47:22 do with a petabyte of memory that wants

00:47:24 to be accessed in a sparse way and have, you know,

00:47:28 the kind of calculations AI programmers want.

00:47:32 So there’s a dialogue interaction,

00:47:34 but when you go in the computer chip,

00:47:38 you know, you find adders and subtractors and multipliers.

00:47:43 So if you zoom out then with, as you mentioned very sudden,

00:47:46 the idea that most of the development in the last many

00:47:50 decades in AI research came from just leveraging computation

00:47:54 and just simple algorithms waiting for the computation

00:47:59 to improve.

00:48:00 Well, software guys have a thing that they call it

00:48:03 the problem of early optimization.

00:48:07 So you write a big software stack,

00:48:09 and if you start optimizing like the first thing you write,

00:48:12 the odds of that being the performance limiter is low.

00:48:15 But when you get the whole thing working,

00:48:17 can you make it 2x faster by optimizing the right things?

00:48:19 Sure.

00:48:21 While you’re optimizing that, could you

00:48:22 have written a new software stack, which

00:48:24 would have been a better choice?

00:48:26 Maybe.

00:48:27 Now you have creative tension.

00:48:29 So.

00:48:30 But the whole time as you’re doing the writing,

00:48:33 that’s the software we’re talking about.

00:48:34 The hardware underneath gets faster and faster.

00:48:36 Well, this goes back to the Moore’s law.

00:48:38 If Moore’s law is going to continue, then your AI research

00:48:43 should expect that to show up, and then you

00:48:46 make a slightly different set of choices then.

00:48:48 We’ve hit the wall.

00:48:49 Nothing’s going to happen.

00:48:51 And from here, it’s just us rewriting algorithms.

00:48:55 That seems like a failed strategy for the last 30

00:48:57 years of Moore’s law’s death.

00:49:00 So can you just linger on it?

00:49:03 I think you’ve answered it, but I’ll just

00:49:05 ask the same dumb question over and over.

00:49:06 So why do you think Moore’s law is not going to die?

00:49:12 Which is the most promising, exciting possibility

00:49:15 of why it won’t die in the next 5, 10 years?

00:49:17 So is it the continued shrinking of the transistor,

00:49:20 or is it another S curve that steps in and it totally sort

00:49:25 of matches up?

00:49:26 Shrinking the transistor is literally

00:49:28 thousands of innovations.

00:49:30 Right, so there’s stacks of S curves in there.

00:49:33 There’s a whole bunch of S curves just kind

00:49:35 of running their course and being reinvented

00:49:38 and new things.

00:49:41 The semiconductor fabricators and technologists have all

00:49:45 announced what’s called nanowires.

00:49:47 So they took a fan, which had a gate around it,

00:49:51 and turned that into little wires

00:49:52 so you have better control of that, and they’re smaller.

00:49:55 And then from there, there are some obvious steps

00:49:57 about how to shrink that.

00:49:59 The metallurgy around wire stacks and stuff

00:50:03 has very obvious abilities to shrink.

00:50:07 And there’s a whole combination of things there to do.

00:50:11 Your sense is that we’re going to get a lot

00:50:13 if this innovation performed just that, shrinking.

00:50:16 Yeah, like a factor of 100 is a lot.

00:50:19 Yeah, I would say that’s incredible.

00:50:22 And it’s totally unknown.

00:50:23 It’s only 10 or 15 years.

00:50:25 Now, you’re smarter, you might know,

00:50:26 but to me it’s totally unpredictable

00:50:28 of what that 100x would bring in terms

00:50:30 of the nature of the computation that people would be.

00:50:34 Yeah, are you familiar with Bell’s law?

00:50:37 So for a long time, it was mainframes, minis, workstation,

00:50:40 PC, mobile.

00:50:42 Moore’s law drove faster, smaller computers.

00:50:46 And then when we were thinking about Moore’s law,

00:50:49 Rajagaduri said, every 10x generates a new computation.

00:50:53 So scalar, vector, matrix, topological computation.

00:51:01 And if you go look at the industry trends,

00:51:03 there was mainframes, and then minicomputers, and then PCs,

00:51:07 and then the internet took off.

00:51:08 And then we got mobile devices.

00:51:10 And now we’re building 5G wireless

00:51:12 with one millisecond latency.

00:51:14 And people are starting to think about the smart world

00:51:17 where everything knows you, recognizes you.

00:51:23 The transformations are going to be unpredictable.

00:51:27 How does it make you feel that you’re

00:51:29 one of the key architects of this kind of future?

00:51:35 So we’re not talking about the architects

00:51:37 of the high level people who build the Angry Bird apps,

00:51:42 and Snapchat.

00:51:43 Angry Bird apps.

00:51:44 Who knows?

00:51:45 Maybe that’s the whole point of the universe.

00:51:47 I’m going to take a stand at that,

00:51:48 and the attention distracting nature of mobile phones.

00:51:52 I’ll take a stand.

00:51:53 But anyway, in terms of the side effects of smartphones,

00:52:01 or the attention distraction, which part?

00:52:03 Well, who knows where this is all leading?

00:52:06 It’s changing so fast.

00:52:08 My parents used to yell at my sisters

00:52:09 for hiding in the closet with a wired phone with a dial on it.

00:52:13 Stop talking to your friends all day.

00:52:15 Now my wife yells at my kids for talking to their friends

00:52:18 all day on text.

00:52:20 It looks the same to me.

00:52:21 It’s always echoes of the same thing.

00:52:23 But you are one of the key people

00:52:26 architecting the hardware of this future.

00:52:29 How does that make you feel?

00:52:30 Do you feel responsible?

00:52:33 Do you feel excited?

00:52:36 So we’re in a social context.

00:52:38 So there’s billions of people on this planet.

00:52:40 There are literally millions of people working on technology.

00:52:45 I feel lucky to be doing what I do and getting paid for it,

00:52:50 and there’s an interest in it.

00:52:52 But there’s so many things going on in parallel.

00:52:56 The actions are so unpredictable.

00:52:58 If I wasn’t here, somebody else would do it.

00:53:01 The vectors of all these different things

00:53:03 are happening all the time.

00:53:06 You know, there’s a, I’m sure, some philosopher

00:53:10 or metaphilosopher is wondering about how

00:53:12 we transform our world.

00:53:16 So you can’t deny the fact that these tools are

00:53:22 changing our world.

00:53:24 That’s right.

00:53:25 Do you think it’s changing for the better?

00:53:29 I read this thing recently.

00:53:31 It said the two disciplines with the highest GRE scores in college

00:53:36 are physics and philosophy.

00:53:39 And they’re both sort of trying to answer the question,

00:53:41 why is there anything?

00:53:43 And the philosophers are on the kind of theological side,

00:53:47 and the physicists are obviously on the material side.

00:53:52 And there’s 100 billion galaxies with 100 billion stars.

00:53:56 It seems, well, repetitive at best.

00:54:01 So you know, there’s on our way to 10 billion people.

00:54:06 I mean, it’s hard to say what it’s all for,

00:54:08 if that’s what you’re asking.

00:54:09 Yeah, I guess I am.

00:54:11 Things do tend to significantly increase in complexity.

00:54:16 And I’m curious about how computation,

00:54:21 like our physical world inherently

00:54:24 generates mathematics.

00:54:25 It’s kind of obvious, right?

00:54:26 So we have x, y, z coordinates.

00:54:28 You take a sphere, you make it bigger.

00:54:30 You get a surface that grows by r squared.

00:54:34 Like, it generally generates mathematics.

00:54:36 And the mathematicians and the physicists

00:54:38 have been having a lot of fun talking to each other for years.

00:54:41 And computation has been, let’s say, relatively pedestrian.

00:54:46 Like, computation in terms of mathematics

00:54:48 has been doing binary algebra, while those guys have

00:54:52 been gallivanting through the other realms of possibility.

00:54:58 Now recently, the computation lets

00:55:01 you do mathematical computations that

00:55:04 are sophisticated enough that nobody understands

00:55:07 how the answers came out.

00:55:10 Machine learning.

00:55:12 It used to be you get data set, you guess at a function.

00:55:16 The function is considered physics

00:55:18 if it’s predictive of new functions, new data sets.

00:55:23 Modern, you can take a large data set

00:55:28 with no intuition about what it is

00:55:29 and use machine learning to find a pattern that

00:55:31 has no function, right?

00:55:34 And it can arrive at results that I

00:55:37 don’t know if they’re completely mathematically describable.

00:55:39 So computation has kind of done something interesting compared

00:55:44 to a equal b plus c.

00:55:47 There’s something reminiscent of that step

00:55:49 from the basic operations of addition

00:55:54 to taking a step towards neural networks that’s

00:55:56 reminiscent of what life on Earth at its origins was doing.

00:56:01 Do you think we’re creating sort of the next step

00:56:03 in our evolution in creating artificial intelligence

00:56:06 systems that will?

00:56:07 I don’t know.

00:56:08 I mean, there’s so much in the universe already,

00:56:11 it’s hard to say.

00:56:12 Where we stand in this whole thing.

00:56:14 Are human beings working on additional abstraction

00:56:17 layers and possibilities?

00:56:18 Yeah, it appears so.

00:56:20 Does that mean that human beings don’t need dogs?

00:56:22 You know, no.

00:56:24 Like, there’s so many things that

00:56:26 are all simultaneously interesting and useful.

00:56:30 Well, you’ve seen, throughout your career,

00:56:32 you’ve seen greater and greater level abstractions built

00:56:35 in artificial machines, right?

00:56:39 Do you think, when you look at humans,

00:56:41 do you think that the look of all life on Earth

00:56:44 is a single organism building this thing,

00:56:46 this machine with greater and greater levels of abstraction?

00:56:49 Do you think humans are the peak,

00:56:52 the top of the food chain in this long arc of history

00:56:57 on Earth?

00:56:58 Or do you think we’re just somewhere in the middle?

00:57:00 Are we the basic functional operations of a CPU?

00:57:05 Are we the C++ program, the Python program,

00:57:09 or the neural network?

00:57:10 Like, somebody’s, you know, people

00:57:12 have calculated, like, how many operations does the brain do?

00:57:14 Something, you know, I’ve seen the number 10 to the 18th

00:57:17 a bunch of times, arrive different ways.

00:57:20 So could you make a computer that

00:57:22 did 10 to the 20th operations?

00:57:23 Yes.

00:57:24 Sure.

00:57:24 Do you think?

00:57:25 We’re going to do that.

00:57:27 Now, is there something magical about how brains compute things?

00:57:31 I don’t know.

00:57:32 You know, my personal experience is interesting,

00:57:35 because, you know, you think you know how you think,

00:57:37 and then you have all these ideas,

00:57:39 and you can’t figure out how they happened.

00:57:41 And if you meditate, you know, what you can be aware of

00:57:47 is interesting.

00:57:48 So I don’t know if brains are magical or not.

00:57:51 You know, the physical evidence says no.

00:57:54 Lots of people’s personal experience says yes.

00:57:57 So what would be funny is if brains are magical,

00:58:01 and yet we can make brains with more computation.

00:58:04 You know, I don’t know what to say about that.

00:58:07 But do you think magic is an emergent phenomena?

00:58:11 Could be.

00:58:12 I have no explanation for it.

00:58:13 Let me ask Jim Keller of what in your view is consciousness?

00:58:19 With consciousness?

00:58:20 Yeah, like what, you know, consciousness, love,

00:58:25 things that are these deeply human things that

00:58:27 seems to emerge from our brain, is that something

00:58:30 that we’ll be able to make encode in chips that get

00:58:36 faster and faster and faster and faster?

00:58:38 That’s like a 10 hour conversation.

00:58:40 Nobody really knows.

00:58:41 Can you summarize it in a couple of sentences?

00:58:45 Many people have observed that organisms run

00:58:48 at lots of different levels, right?

00:58:51 If you had two neurons, somebody said

00:58:52 you’d have one sensory neuron and one motor neuron, right?

00:58:56 So we move towards things and away from things.

00:58:58 And we have physical integrity and safety or not, right?

00:59:03 And then if you look at the animal kingdom,

00:59:05 you can see brains that are a little more complicated.

00:59:08 And at some point, there’s a planning system.

00:59:10 And then there’s an emotional system

00:59:11 that’s happy about being safe or unhappy about being threatened.

00:59:17 And then our brains have massive numbers of structures,

00:59:21 like planning and movement and thinking and feeling

00:59:25 and drives and emotions.

00:59:27 And we seem to have multiple layers of thinking systems.

00:59:31 And we have a dream system that nobody understands whatsoever,

00:59:35 which I find completely hilarious.

00:59:37 And you can think in a way that those systems are

00:59:44 more independent.

00:59:45 And you can observe the different parts of yourself

00:59:47 can observe them.

00:59:49 I don’t know which one’s magical.

00:59:51 I don’t know which one’s not computational.

00:59:55 So.

00:59:56 Is it possible that it’s all computation?

00:59:58 Probably.

01:00:00 Is there a limit to computation?

01:00:01 I don’t think so.

01:00:03 Do you think the universe is a computer?

01:00:06 It seems to be.

01:00:07 It’s a weird kind of computer.

01:00:09 Because if it was a computer, like when

01:00:13 they do calculations on how much calculation

01:00:16 it takes to describe quantum effects, it’s unbelievably high.

01:00:20 So if it was a computer, wouldn’t you

01:00:22 have built it out of something that was easier to compute?

01:00:26 That’s a funny system.

01:00:29 But then the simulation guys pointed out

01:00:31 that the rules are kind of interesting.

01:00:32 When you look really close, it’s uncertain.

01:00:35 And the speed of light says you can only look so far.

01:00:37 And things can’t be simultaneous,

01:00:39 except for the odd entanglement problem where they seem to be.

01:00:42 The rules are all kind of weird.

01:00:45 And somebody said physics is like having

01:00:47 50 equations with 50 variables to define 50 variables.

01:00:55 Physics itself has been a shit show for thousands of years.

01:00:59 It seems odd when you get to the corners of everything.

01:01:02 It’s either uncomputable or undefinable or uncertain.

01:01:07 It’s almost like the designers of the simulation

01:01:09 are trying to prevent us from understanding it perfectly.

01:01:12 But also, the things that require calculations

01:01:16 require so much calculation that our idea

01:01:18 of the universe of a computer is absurd,

01:01:20 because every single little bit of it

01:01:23 takes all the computation in the universe to figure out.

01:01:26 So that’s a weird kind of computer.

01:01:28 You say the simulation is running

01:01:29 in a computer, which has, by definition, infinite computation.

01:01:34 Not infinite.

01:01:35 Oh, you mean if the universe is infinite?

01:01:37 Yeah.

01:01:38 Well, every little piece of our universe

01:01:40 seems to take infinite computation to figure out.

01:01:43 Not infinite, just a lot.

01:01:44 Well, a lot.

01:01:44 Some pretty big number.

01:01:46 Compute this little teeny spot takes all the mass

01:01:50 in the local one light year by one light year space.

01:01:53 It’s close enough to infinite.

01:01:54 Well, it’s a heck of a computer if it is one.

01:01:56 I know.

01:01:57 It’s a weird description, because the simulation

01:02:01 description seems to break when you look closely at it.

01:02:04 But the rules of the universe seem to imply something’s up.

01:02:08 That seems a little arbitrary.

01:02:10 The universe, the whole thing, the laws of physics,

01:02:14 it just seems like, how did it come out to be the way it is?

01:02:20 Well, lots of people talk about that.

01:02:22 Like I said, the two smartest groups of humans

01:02:24 are working on the same problem.

01:02:26 From different aspects.

01:02:27 And they’re both complete failures.

01:02:29 So that’s kind of cool.

01:02:32 They might succeed eventually.

01:02:34 Well, after 2,000 years, the trend isn’t good.

01:02:37 Oh, 2,000 years is nothing in the span

01:02:39 of the history of the universe.

01:02:40 That’s for sure.

01:02:41 We have some time.

01:02:42 But the next 1,000 years doesn’t look good either.

01:02:46 That’s what everybody says at every stage.

01:02:48 But with Moore’s law, as you’ve just described,

01:02:50 not being dead, the exponential growth of technology,

01:02:54 the future seems pretty incredible.

01:02:57 Well, it’ll be interesting, that’s for sure.

01:02:59 That’s right.

01:03:00 So what are your thoughts on Ray Kurzweil’s sense

01:03:03 that exponential improvement in technology

01:03:05 will continue indefinitely?

01:03:07 Is that how you see Moore’s law?

01:03:09 Do you see Moore’s law more broadly,

01:03:12 in the sense that technology of all kinds

01:03:15 has a way of stacking S curves on top of each other,

01:03:20 where it’ll be exponential, and then we’ll see all kinds of…

01:03:24 What does an exponential of a million mean?

01:03:27 That’s a pretty amazing number.

01:03:29 And that’s just for a local little piece of silicon.

01:03:32 Now let’s imagine you, say, decided

01:03:35 to get 1,000 tons of silicon to collaborate in one computer

01:03:41 at a million times the density.

01:03:44 Now you’re talking, I don’t know, 10 to the 20th more

01:03:47 computation power than our current, already unbelievably

01:03:51 fast computers.

01:03:54 Nobody knows what that’s going to mean.

01:03:55 The sci fi guys call it computronium,

01:03:58 like when a local civilization turns the nearby star

01:04:02 into a computer.

01:04:05 I don’t know if that’s true, but…

01:04:06 So just even when you shrink a transistor, the…

01:04:11 That’s only one dimension.

01:04:12 The ripple effects of that.

01:04:14 People tend to think about computers as a cost problem.

01:04:17 So computers are made out of silicon and minor amounts

01:04:20 of metals and this and that.

01:04:24 None of those things cost any money.

01:04:27 There’s plenty of sand.

01:04:30 You could just turn the beach and a little bit of ocean water

01:04:32 into computers.

01:04:33 So all the cost is in the equipment to do it.

01:04:36 And the trend on equipment is once you

01:04:39 figure out how to build the equipment,

01:04:40 the trend of cost is zero.

01:04:41 Elon said, first you figure out what

01:04:44 configuration you want the atoms in,

01:04:47 and then how to put them there.

01:04:50 His great insight is people are how constrained.

01:04:56 I have this thing, I know how it works,

01:04:58 and then little tweaks to that will generate something,

01:05:02 as opposed to what do I actually want,

01:05:05 and then figure out how to build it.

01:05:07 It’s a very different mindset.

01:05:09 And almost nobody has it, obviously.

01:05:12 Well, let me ask on that topic,

01:05:15 you were one of the key early people

01:05:18 in the development of autopilot, at least in the hardware

01:05:21 side, Elon Musk believes that autopilot

01:05:24 and vehicle autonomy, if you just look at that problem,

01:05:26 can follow this kind of exponential improvement.

01:05:29 In terms of the how question that we’re talking about,

01:05:32 there’s no reason why you can’t.

01:05:34 What are your thoughts on this particular space

01:05:37 of vehicle autonomy, and your part of it

01:05:42 and Elon Musk’s and Tesla’s vision for vehicle autonomy?

01:05:45 Well, the computer you need to build is straightforward.

01:05:48 And you could argue, well, does it need to be

01:05:51 two times faster or five times or 10 times?

01:05:54 But that’s just a matter of time or price in the short run.

01:05:58 So that’s not a big deal.

01:06:00 You don’t have to be especially smart to drive a car.

01:06:03 So it’s not like a super hard problem.

01:06:05 I mean, the big problem with safety is attention,

01:06:07 which computers are really good at, not skills.

01:06:11 Well, let me push back on one.

01:06:15 You see, everything you said is correct,

01:06:17 but we as humans tend to take for granted

01:06:24 how incredible our vision system is.

01:06:26 So you can drive a car with 20, 50 vision,

01:06:30 and you can train a neural network to extract

01:06:33 the distance of any object in the shape of any surface

01:06:36 from a video and data.

01:06:38 Yeah, but that’s really simple.

01:06:40 No, it’s not simple.

01:06:42 That’s a simple data problem.

01:06:44 It’s not, it’s not simple.

01:06:46 It’s because it’s not just detecting objects,

01:06:50 it’s understanding the scene,

01:06:52 and it’s being able to do it in a way

01:06:54 that doesn’t make errors.

01:06:56 So the beautiful thing about the human vision system

01:07:00 and our entire brain around the whole thing

01:07:02 is we’re able to fill in the gaps.

01:07:05 It’s not just about perfectly detecting cars.

01:07:08 It’s inferring the occluded cars.

01:07:09 It’s trying to, it’s understanding the physics.

01:07:12 I think that’s mostly a data problem.

01:07:14 So you think what data would compute

01:07:17 with improvement of computation

01:07:19 with improvement in collection of data?

01:07:20 Well, there is a, you know, when you’re driving a car

01:07:22 and somebody cuts you off, your brain has theories

01:07:24 about why they did it.

01:07:26 You know, they’re a bad person, they’re distracted,

01:07:28 they’re dumb, you know, you can listen to yourself, right?

01:07:32 So, you know, if you think that narrative is important

01:07:37 to be able to successfully drive a car,

01:07:38 then current autopilot systems can’t do it.

01:07:41 But if cars are ballistic things with tracks

01:07:44 and probabilistic changes of speed and direction,

01:07:47 and roads are fixed and given, by the way,

01:07:50 they don’t change dynamically, right?

01:07:53 You can map the world really thoroughly.

01:07:56 You can place every object really thoroughly.

01:08:01 Right, you can calculate trajectories

01:08:03 of things really thoroughly, right?

01:08:06 But everything you said about really thoroughly

01:08:09 has a different degree of difficulty, so.

01:08:13 And you could say at some point,

01:08:15 computer autonomous systems will be way better

01:08:17 at things that humans are lousy at.

01:08:20 Like, they’ll be better at attention,

01:08:22 they’ll always remember there was a pothole in the road

01:08:25 that humans keep forgetting about,

01:08:27 they’ll remember that this set of roads

01:08:29 has these weirdo lines on it

01:08:31 that the computers figured out once,

01:08:32 and especially if they get updates,

01:08:35 so if somebody changes a given,

01:08:38 like, the key to robots and stuff somebody said

01:08:41 is to maximize the givens, right?

01:08:44 Right.

01:08:45 So having a robot pick up this bottle cap

01:08:47 is way easier if you put a red dot on the top,

01:08:51 because then you’ll have to figure out,

01:08:52 and if you wanna do a certain thing with it,

01:08:54 maximize the givens is the thing.

01:08:57 And autonomous systems are happily maximizing the givens.

01:09:01 Like, humans, when you drive someplace new,

01:09:04 you remember it, because you’re processing it

01:09:06 the whole time, and after the 50th time you drove to work,

01:09:08 you get to work, you don’t know how you got there, right?

01:09:11 You’re on autopilot, right?

01:09:14 Autonomous cars are always on autopilot.

01:09:17 But the cars have no theories about why they got cut off,

01:09:20 or why they’re in traffic.

01:09:22 So they also never stop paying attention.

01:09:24 Right, so I tend to believe you do have to have theories,

01:09:28 meta models of other people,

01:09:30 especially with pedestrian cyclists,

01:09:31 but also with other cars.

01:09:32 So everything you said is actually essential to driving.

01:09:38 Driving is a lot more complicated than people realize,

01:09:41 I think, so to push back slightly, but to…

01:09:44 So to cut into traffic, right?

01:09:46 Yep.

01:09:47 You can’t just wait for a gap,

01:09:48 you have to be somewhat aggressive.

01:09:50 You’ll be surprised how simple a calculation for that is.

01:09:53 I may be on that particular point,

01:09:55 but there’s, maybe I actually have to push back.

01:10:00 I would be surprised.

01:10:01 You know what, yeah, I’ll just say where I stand.

01:10:03 I would be very surprised,

01:10:04 but I think you might be surprised how complicated it is.

01:10:10 I tell people, progress disappoints in the short run,

01:10:12 and surprises in the long run.

01:10:13 It’s very possible, yeah.

01:10:15 I suspect in 10 years it’ll be just taken for granted.

01:10:19 Yeah, probably.

01:10:19 But you’re probably right, not look like…

01:10:22 It’s gonna be a $50 solution that nobody cares about.

01:10:25 It’s like GPSes, like, wow, GPSes.

01:10:27 We have satellites in space

01:10:29 that tell you where your location is.

01:10:31 It was a really big deal, now everything has a GPS in it.

01:10:33 Yeah, that’s true, but I do think that systems

01:10:36 that involve human behavior are more complicated

01:10:39 than we give them credit for.

01:10:40 So we can do incredible things with technology

01:10:43 that don’t involve humans, but when you…

01:10:45 I think humans are less complicated than people.

01:10:48 You know, frequently ascribed.

01:10:50 Maybe I feel…

01:10:51 We tend to operate out of large numbers of patterns

01:10:53 and just keep doing it over and over.

01:10:55 But I can’t trust you because you’re a human.

01:10:58 That’s something a human would say.

01:11:00 But my hope is on the point you’ve made is,

01:11:04 even if, no matter who’s right,

01:11:08 I’m hoping that there’s a lot of things

01:11:10 that humans aren’t good at

01:11:11 that machines are definitely good at,

01:11:13 like you said, attention and things like that.

01:11:15 Well, they’ll be so much better

01:11:17 that the overall picture of safety and autonomy

01:11:21 will be, obviously cars will be safer,

01:11:22 even if they’re not as good at understanding.

01:11:24 I’m a big believer in safety.

01:11:26 I mean, there are already the current safety systems,

01:11:29 like cruise control that doesn’t let you run into people

01:11:32 and lane keeping.

01:11:33 There are so many features

01:11:34 that you just look at the parade of accidents

01:11:37 and knocking off like 80% of them is super doable.

01:11:42 Just to linger on the autopilot team

01:11:44 and the efforts there,

01:11:48 it seems to be that there’s a very intense scrutiny

01:11:51 by the media and the public in terms of safety,

01:11:54 the pressure, the bar put before autonomous vehicles.

01:11:58 What are your, sort of as a person there

01:12:01 working on the hardware and trying to build a system

01:12:03 that builds a safe vehicle and so on,

01:12:07 what was your sense about that pressure?

01:12:08 Is it unfair?

01:12:09 Is it expected of new technology?

01:12:12 Yeah, it seems reasonable.

01:12:13 I was interested, I talked to both American

01:12:15 and European regulators,

01:12:17 and I was worried that the regulations

01:12:21 would write into the rules technology solutions,

01:12:25 like modern brake systems imply hydraulic brakes.

01:12:30 So if you read the regulations,

01:12:32 to meet the letter of the law for brakes,

01:12:35 it sort of has to be hydraulic, right?

01:12:37 And the regulator said they’re interested in the use cases,

01:12:42 like a head on crash, an offset crash,

01:12:44 don’t hit pedestrians, don’t run into people,

01:12:47 don’t leave the road, don’t run a red light or a stoplight.

01:12:50 They were very much into the scenarios.

01:12:53 And they had all the data about which scenarios

01:12:56 injured or killed the most people.

01:12:59 And for the most part, those conversations were like,

01:13:04 what’s the right thing to do to take the next step?

01:13:08 Now, Elon’s very interested also in the benefits

01:13:12 of autonomous driving or freeing people’s time

01:13:14 and attention, as well as safety.

01:13:18 And I think that’s also an interesting thing,

01:13:20 but building autonomous systems so they’re safe

01:13:25 and safer than people seemed,

01:13:27 since the goal is to be 10X safer than people,

01:13:30 having the bar to be safer than people

01:13:32 and scrutinizing accidents seems philosophically correct.

01:13:39 So I think that’s a good thing.

01:13:41 What are, is different than the things you worked at,

01:13:46 Intel, AMD, Apple, with autopilot chip design

01:13:51 and hardware design, what are interesting

01:13:54 or challenging aspects of building this specialized

01:13:56 kind of computing system in the automotive space?

01:14:00 I mean, there’s two tricks to building

01:14:01 like an automotive computer.

01:14:02 One is the software team, the machine learning team

01:14:07 is developing algorithms that are changing fast.

01:14:10 So as you’re building the accelerator,

01:14:14 you have this, you know, worry or intuition

01:14:16 that the algorithms will change enough

01:14:18 that the accelerator will be the wrong one, right?

01:14:22 And there’s the generic thing, which is,

01:14:25 if you build a really good general purpose computer,

01:14:27 say its performance is one, and then GPU guys

01:14:31 will deliver about 5X to performance

01:14:34 for the same amount of silicon,

01:14:35 because instead of discovering parallelism,

01:14:37 you’re given parallelism.

01:14:39 And then special accelerators get another two to 5X

01:14:43 on top of a GPU, because you say,

01:14:46 I know the math is always eight bit integers

01:14:49 into 32 bit accumulators, and the operations

01:14:52 are the subset of mathematical possibilities.

01:14:55 So AI accelerators have a claimed performance benefit

01:15:00 over GPUs because in the narrow math space,

01:15:05 you’re nailing the algorithm.

01:15:07 Now, you still try to make it programmable,

01:15:10 but the AI field is changing really fast.

01:15:13 So there’s a, you know, there’s a little

01:15:15 creative tension there of, I want the acceleration

01:15:18 afforded by specialization without being over specialized

01:15:22 so that the new algorithm is so much more effective

01:15:25 that you’d have been better off on a GPU.

01:15:27 So there’s a tension there.

01:15:30 To build a good computer for an application

01:15:33 like automotive, there’s all kinds of sensor inputs

01:15:36 and safety processors and a bunch of stuff.

01:15:39 So one of Elon’s goals is to make it super affordable.

01:15:42 So every car gets an autopilot computer.

01:15:44 So some of the recent startups you look at,

01:15:46 and they have a server in the trunk,

01:15:48 because they’re saying, I’m gonna build

01:15:49 this autopilot computer, replaces the driver.

01:15:52 So their cost budget’s 10 or $20,000.

01:15:55 And Elon’s constraint was, I’m gonna put one in every car,

01:15:58 whether people buy autonomous driving or not.

01:16:01 So the cost constraint he had in mind was great, right?

01:16:05 And to hit that, you had to think about the system design.

01:16:08 That’s complicated, and it’s fun.

01:16:09 You know, it’s like, it’s like, it’s craftsman’s work.

01:16:12 Like, you know, a violin maker, right?

01:16:14 You can say, Stradivarius is this incredible thing,

01:16:16 the musicians are incredible.

01:16:18 But the guy making the violin, you know,

01:16:20 picked wood and sanded it, and then he cut it,

01:16:24 you know, and he glued it, you know,

01:16:25 and he waited for the right day

01:16:27 so that when he put the finish on it,

01:16:29 it didn’t, you know, do something dumb.

01:16:31 That’s craftsman’s work, right?

01:16:33 You may be a genius craftsman

01:16:35 because you have the best techniques

01:16:36 and you discover a new one,

01:16:38 but most engineers, craftsman’s work.

01:16:41 And humans really like to do that.

01:16:44 You know the expression?

01:16:45 Smart humans.

01:16:45 No, everybody.

01:16:46 All humans.

01:16:47 I don’t know.

01:16:48 I used to, I dug ditches when I was in college.

01:16:50 I got really good at it.

01:16:51 Satisfying.

01:16:52 Yeah.

01:16:53 So.

01:16:54 Digging ditches is also craftsman’s work.

01:16:55 Yeah, of course.

01:16:56 So there’s an expression called complex mastery behavior.

01:17:00 So when you’re learning something,

01:17:02 that’s fine, because you’re learning something.

01:17:04 When you do something, it’s relatively simple.

01:17:05 It’s not that satisfying.

01:17:06 But if the steps that you have to do are complicated

01:17:10 and you’re good at them, it’s satisfying to do them.

01:17:14 And then if you’re intrigued by it all,

01:17:16 as you’re doing them, you sometimes learn new things

01:17:19 that you can raise your game.

01:17:21 But craftsman’s work is good.

01:17:23 And engineers, like engineering is complicated enough

01:17:27 that you have to learn a lot of skills.

01:17:28 And then a lot of what you do is then craftsman’s work,

01:17:32 which is fun.

01:17:33 Autonomous driving, building a very resource

01:17:37 constrained computer.

01:17:37 So a computer has to be cheap enough

01:17:39 to put in every single car.

01:17:41 That essentially boils down to craftsman’s work.

01:17:45 It’s engineering, it’s innovation.

01:17:45 Yeah, you know, there’s thoughtful decisions

01:17:47 and problems to solve and trade offs to make.

01:17:50 Do you need 10 camera and ports or eight?

01:17:52 You know, you’re building for the current car

01:17:54 or the next one.

01:17:56 You know, how do you do the safety stuff?

01:17:57 You know, there’s a whole bunch of details.

01:18:00 But it’s fun.

01:18:01 It’s not like I’m building a new type of neural network,

01:18:04 which has a new mathematics and a new computer to work.

01:18:08 You know, that’s like, there’s more invention than that.

01:18:12 But the rejection to practice,

01:18:14 once you pick the architecture, you look inside

01:18:16 and what do you see?

01:18:17 Adders and multipliers and memories and, you know,

01:18:20 the basics.

01:18:21 So computers is always this weird set of abstraction layers

01:18:25 of ideas and thinking that reduction to practice

01:18:29 is transistors and wires and, you know, pretty basic stuff.

01:18:33 And that’s an interesting phenomenon.

01:18:37 By the way, like factory work,

01:18:38 like lots of people think factory work

01:18:40 is road assembly stuff.

01:18:42 I’ve been on the assembly line.

01:18:44 Like the people who work there really like it.

01:18:46 It’s a really great job.

01:18:47 It’s really complicated.

01:18:48 Putting cars together is hard, right?

01:18:50 And the car is moving and the parts are moving

01:18:53 and sometimes the parts are damaged

01:18:55 and you have to coordinate putting all the stuff together

01:18:57 and people are good at it.

01:18:59 They’re good at it.

01:19:00 And I remember one day I went to work

01:19:01 and the line was shut down for some reason

01:19:03 and some of the guys sitting around were really bummed

01:19:06 because they had reorganized a bunch of stuff

01:19:09 and they were gonna hit a new record

01:19:10 for the number of cars built that day.

01:19:12 And they were all gung ho to do it.

01:19:14 And these were big, tough buggers.

01:19:15 And, you know, but what they did was complicated

01:19:19 and you couldn’t do it.

01:19:20 Yeah, and I mean.

01:19:21 Well, after a while you could,

01:19:22 but you’d have to work your way up

01:19:24 because, you know, like putting the bright,

01:19:27 what’s called the brights, the trim on a car

01:19:30 on a moving assembly line

01:19:32 where it has to be attached 25 places

01:19:34 in a minute and a half is unbelievably complicated.

01:19:39 And human beings can do it, it’s really good.

01:19:42 I think that’s harder than driving a car, by the way.

01:19:45 Putting together, working at a.

01:19:47 Working on a factory.

01:19:48 Two smart people can disagree.

01:19:51 Yay.

01:19:52 I think driving a car.

01:19:54 We’ll get you in the factory someday

01:19:56 and then we’ll see how you do.

01:19:57 No, not for us humans driving a car is easy.

01:19:59 I’m saying building a machine that drives a car

01:20:03 is not easy.

01:20:04 No, okay.

01:20:05 Okay.

01:20:05 Driving a car is easy for humans

01:20:07 because we’ve been evolving for billions of years.

01:20:10 Drive cars.

01:20:11 Yeah, I noticed that.

01:20:13 The pale of the cars are super cool.

01:20:16 No, now you join the rest of the internet

01:20:18 and mocking me.

01:20:19 Okay.

01:20:20 I wasn’t mocking, I was just.

01:20:22 Yeah, yeah.

01:20:23 Intrigued by your anthropology.

01:20:26 Yeah, it’s.

01:20:27 I’ll have to go dig into that.

01:20:28 There’s some inaccuracies there, yes.

01:20:31 Okay, but in general,

01:20:35 what have you learned in terms of

01:20:39 thinking about passion, craftsmanship,

01:20:44 tension, chaos.

01:20:47 Jesus.

01:20:48 The whole mess of it.

01:20:50 What have you learned, have taken away from your time

01:20:54 working with Elon Musk, working at Tesla,

01:20:57 which is known to be a place of chaos innovation,

01:21:02 craftsmanship, and all of those things.

01:21:03 I really like the way you thought.

01:21:06 You think you have an understanding

01:21:07 about what first principles of something is,

01:21:10 and then you talk to Elon about it,

01:21:11 and you didn’t scratch the surface.

01:21:15 He has a deep belief that no matter what you do,

01:21:18 it’s a local maximum, right?

01:21:21 And I had a friend, he invented a better electric motor,

01:21:24 and it was a lot better than what we were using.

01:21:26 And one day he came by, he said,

01:21:28 I’m a little disappointed, because this is really great,

01:21:31 and you didn’t seem that impressed.

01:21:33 And I said, when the super intelligent aliens come,

01:21:37 are they going to be looking for you?

01:21:38 Like, where is he?

01:21:39 The guy who built the motor.

01:21:41 Yeah.

01:21:42 Probably not.

01:21:43 You know, like, but doing interesting work

01:21:48 that’s both innovative and, let’s say,

01:21:49 craftsman’s work on the current thing

01:21:51 is really satisfying, and it’s good.

01:21:54 And that’s cool.

01:21:55 And then Elon was good at taking everything apart,

01:21:59 and like, what’s the deep first principle?

01:22:01 Oh, no, what’s really, no, what’s really?

01:22:03 You know, that ability to look at it without assumptions

01:22:08 and how constraints is super wild.

01:22:13 You know, he built a rocket ship, and an electric car,

01:22:17 and you know, everything.

01:22:19 And that’s super fun, and he’s into it, too.

01:22:21 Like, when they first landed two SpaceX rockets at Tesla,

01:22:25 we had a video projector in the big room,

01:22:27 and like, 500 people came down,

01:22:29 and when they landed, everybody cheered,

01:22:30 and some people cried.

01:22:32 It was so cool.

01:22:34 All right, but how did you do that?

01:22:35 Well, it was super hard, and then people say,

01:22:40 well, it’s chaotic, really?

01:22:42 To get out of all your assumptions,

01:22:44 you think that’s not gonna be unbelievably painful?

01:22:47 And is Elon tough?

01:22:49 Yeah, probably.

01:22:50 Do people look back on it and say,

01:22:52 boy, I’m really happy I had that experience

01:22:57 to go take apart that many layers of assumptions?

01:23:02 Sometimes super fun, sometimes painful.

01:23:04 So it could be emotionally and intellectually painful,

01:23:07 that whole process of just stripping away assumptions.

01:23:10 Yeah, imagine 99% of your thought process

01:23:13 is protecting your self conception,

01:23:16 and 98% of that’s wrong.

01:23:20 Now you got the math right.

01:23:22 How do you think you’re feeling

01:23:23 when you get back into that one bit that’s useful,

01:23:26 and now you’re open,

01:23:27 and you have the ability to do something different?

01:23:30 I don’t know if I got the math right.

01:23:33 It might be 99.9, but it ain’t 50.

01:23:38 Imagining it, the 50% is hard enough.

01:23:44 Now, for a long time, I’ve suspected you could get better.

01:23:48 Like you can think better, you can think more clearly,

01:23:50 you can take things apart.

01:23:52 And there’s lots of examples of that, people who do that.

01:23:56 And Nilan is an example of that, you are an example.

01:24:02 I don’t know if I am, I’m fun to talk to.

01:24:06 Certainly.

01:24:07 I’ve learned a lot of stuff.

01:24:09 Well, here’s the other thing, I joke, like I read books,

01:24:12 and people think, oh, you read books.

01:24:14 Well, no, I’ve read a couple of books a week for 55 years.

01:24:20 Well, maybe 50,

01:24:21 because I didn’t learn to read until I was age or something.

01:24:24 And it turns out when people write books,

01:24:28 they often take 20 years of their life

01:24:31 where they passionately did something,

01:24:33 reduce it to 200 pages.

01:24:36 That’s kind of fun.

01:24:37 And then you go online,

01:24:38 and you can find out who wrote the best books

01:24:41 and who liked, you know, that’s kind of wild.

01:24:43 So there’s this wild selection process,

01:24:45 and then you can read it,

01:24:46 and for the most part, understand it.

01:24:49 And then you can go apply it.

01:24:51 Like I went to one company,

01:24:53 I thought, I haven’t managed much before.

01:24:55 So I read 20 management books,

01:24:57 and I started talking to them,

01:24:58 and basically compared to all the VPs running around,

01:25:01 I’d read 19 more management books than anybody else.

01:25:05 It wasn’t even that hard.

01:25:08 And half the stuff worked, like first time.

01:25:11 It wasn’t even rocket science.

01:25:13 But at the core of that is questioning the assumptions,

01:25:16 or sort of entering the thinking,

01:25:20 first principles thinking,

01:25:21 sort of looking at the reality of the situation,

01:25:24 and using that knowledge, applying that knowledge.

01:25:28 So that’s.

01:25:29 So I would say my brain has this idea

01:25:31 that you can question first assumptions.

01:25:35 But I can go days at a time and forget that,

01:25:38 and you have to kind of like circle back that observation.

01:25:42 Because it is emotionally challenging.

01:25:45 Well, it’s hard to just keep it front and center,

01:25:47 because you operate on so many levels all the time,

01:25:50 and getting this done takes priority,

01:25:53 or being happy takes priority,

01:25:56 or screwing around takes priority.

01:25:59 Like how you go through life is complicated.

01:26:03 And then you remember, oh yeah,

01:26:04 I could really think first principles.

01:26:06 Oh shit, that’s tiring.

01:26:09 But you do for a while, and that’s kind of cool.

01:26:12 So just as a last question in your sense,

01:26:16 from the big picture, from the first principles,

01:26:19 do you think, you kind of answered it already,

01:26:21 but do you think autonomous driving is something

01:26:25 we can solve on a timeline of years?

01:26:28 So one, two, three, five, 10 years,

01:26:32 as opposed to a century?

01:26:33 Yeah, definitely.

01:26:35 Just to linger on it a little longer,

01:26:37 where’s the confidence coming from?

01:26:40 Is it the fundamentals of the problem,

01:26:42 the fundamentals of building the hardware and the software?

01:26:46 As a computational problem, understanding ballistics,

01:26:50 roles, topography, it seems pretty solvable.

01:26:56 And you can see this, like speech recognition,

01:26:59 for a long time people are doing frequency

01:27:01 and domain analysis, and all kinds of stuff,

01:27:04 and that didn’t work at all, right?

01:27:07 And then they did deep learning about it,

01:27:09 and it worked great.

01:27:11 And it took multiple iterations.

01:27:13 And autonomous driving is way past

01:27:18 the frequency analysis point.

01:27:21 Use radar, don’t run into things.

01:27:23 And the data gathering’s going up,

01:27:25 and the computation’s going up,

01:27:26 and the algorithm understanding’s going up,

01:27:28 and there’s a whole bunch of problems

01:27:30 getting solved like that.

01:27:32 The data side is really powerful,

01:27:33 but I disagree with both you and Elon.

01:27:35 I’ll tell Elon once again, as I did before,

01:27:38 that when you add human beings into the picture,

01:27:42 it’s no longer a ballistics problem.

01:27:45 It’s something more complicated,

01:27:47 but I could be very well proven wrong.

01:27:50 Cars are highly damped in terms of rate of change.

01:27:53 Like the steering system’s really slow

01:27:56 compared to a computer.

01:27:57 The acceleration of the acceleration’s really slow.

01:28:01 Yeah, on a certain timescale, on a ballistics timescale,

01:28:04 but human behavior, I don’t know.

01:28:07 I shouldn’t say.

01:28:08 Human beings are really slow too.

01:28:09 Weirdly, we operate half a second behind reality.

01:28:13 Nobody really understands that one either.

01:28:15 It’s pretty funny.

01:28:16 Yeah, yeah.

01:28:20 We very well could be surprised,

01:28:23 and I think with the rate of improvement

01:28:25 in all aspects on both the compute

01:28:26 and the software and the hardware,

01:28:29 there’s gonna be pleasant surprises all over the place.

01:28:34 Speaking of unpleasant surprises,

01:28:36 many people have worries about a singularity

01:28:39 in the development of AI.

01:28:41 Forgive me for such questions.

01:28:43 Yeah.

01:28:44 When AI improves the exponential

01:28:46 and reaches a point of superhuman level

01:28:48 general intelligence, beyond the point,

01:28:52 there’s no looking back.

01:28:53 Do you share this worry of existential threats

01:28:56 from artificial intelligence,

01:28:57 from computers becoming superhuman level intelligent?

01:29:01 No, not really.

01:29:04 We already have a very stratified society,

01:29:07 and then if you look at the whole animal kingdom

01:29:09 of capabilities and abilities and interests,

01:29:12 and smart people have their niche,

01:29:15 and normal people have their niche,

01:29:17 and craftsmen have their niche,

01:29:19 and animals have their niche.

01:29:22 I suspect that the domains of interest

01:29:26 for things that are astronomically different,

01:29:29 like the whole something got 10 times smarter than us

01:29:32 and wanted to track us all down because what?

01:29:34 We like to have coffee at Starbucks?

01:29:36 Like, it doesn’t seem plausible.

01:29:38 No, is there an existential problem

01:29:40 that how do you live in a world

01:29:42 where there’s something way smarter than you,

01:29:44 and you based your kind of self esteem

01:29:46 on being the smartest local person?

01:29:48 Well, there’s what, 0.1% of the population who thinks that?

01:29:52 Because the rest of the population’s been dealing with it

01:29:54 since they were born.

01:29:56 So the breadth of possible experience

01:30:00 that can be interesting is really big.

01:30:03 And, you know, superintelligence seems likely,

01:30:11 although we still don’t know if we’re magical,

01:30:14 but I suspect we’re not.

01:30:16 And it seems likely that it’ll create possibilities

01:30:18 that are interesting for us,

01:30:20 and its interests will be interesting for that,

01:30:24 for whatever it is.

01:30:26 It’s not obvious why its interests would somehow

01:30:30 want to fight over some square foot of dirt,

01:30:32 or, you know, whatever the usual fears are about.

01:30:37 So you don’t think it’ll inherit

01:30:38 some of the darker aspects of human nature?

01:30:42 Depends on how you think reality’s constructed.

01:30:45 So for whatever reason,

01:30:48 human beings are in, let’s say,

01:30:50 creative tension and opposition

01:30:52 with both our good and bad forces.

01:30:55 Like, there’s lots of philosophical understanding of that.

01:30:58 I don’t know why that would be different.

01:31:03 So you think the evil is necessary for the good?

01:31:06 I mean, the tension.

01:31:08 I don’t know about evil,

01:31:09 but like we live in a competitive world

01:31:11 where your good is somebody else’s evil.

01:31:16 You know, there’s the malignant part of it,

01:31:19 but that seems to be self limiting,

01:31:22 although occasionally it’s super horrible.

01:31:26 But yes, there’s a debate over ideas,

01:31:29 and some people have different beliefs,

01:31:32 and that debate itself is a process.

01:31:34 So the arriving at something.

01:31:37 Yeah, and why wouldn’t that continue?

01:31:39 Yeah.

01:31:41 But you don’t think that whole process

01:31:43 will leave humans behind in a way that’s painful?

01:31:47 Emotionally painful, yes.

01:31:48 For the 0.1%, they’ll be.

01:31:51 Why isn’t it already painful

01:31:52 for a large percentage of the population?

01:31:54 And it is.

01:31:54 I mean, society does have a lot of stress in it,

01:31:57 about the 1%, and about the this, and about the that,

01:32:00 but you know, everybody has a lot of stress in their life

01:32:03 about what they find satisfying,

01:32:05 and you know, know yourself seems to be the proper dictum,

01:32:10 and pursue something that makes your life meaningful

01:32:14 seems proper, and there’s so many avenues on that.

01:32:18 Like, there’s so much unexplored space

01:32:21 at every single level, you know.

01:32:25 I’m somewhat of, my nephew called me a jaded optimist.

01:32:29 And you know, so it’s.

01:32:33 There’s a beautiful tension in that label,

01:32:37 but if you were to look back at your life,

01:32:40 and could relive a moment, a set of moments,

01:32:45 because there were the happiest times of your life,

01:32:49 outside of family, what would that be?

01:32:54 I don’t want to relive any moments.

01:32:56 I like that.

01:32:58 I like that situation where you have some amount of optimism

01:33:01 and then the anxiety of the unknown.

01:33:06 So you love the unknown, the mystery of it.

01:33:10 I don’t know about the mystery.

01:33:11 It sure gets your blood pumping.

01:33:14 What do you think is the meaning of this whole thing?

01:33:17 Of life, on this pale blue dot?

01:33:21 It seems to be what it does.

01:33:25 Like, the universe, for whatever reason,

01:33:29 makes atoms, which makes us, which we do stuff.

01:33:34 And we figure out things, and we explore things, and.

01:33:38 That’s just what it is.

01:33:39 It’s not just.

01:33:41 Yeah, it is.

01:33:44 Jim, I don’t think there’s a better place to end it

01:33:46 is a huge honor, and.

01:33:50 Well, that was super fun.

01:33:51 Thank you so much for talking today.

01:33:52 All right, great.

01:33:54 Thanks for listening to this conversation,

01:33:56 and thank you to our presenting sponsor, Cash App.

01:33:59 Download it, use code LexPodcast.

01:34:02 You’ll get $10, and $10 will go to FIRST,

01:34:04 a STEM education nonprofit that inspires hundreds

01:34:07 of thousands of young minds to become future leaders

01:34:10 and innovators.

01:34:12 If you enjoy this podcast, subscribe on YouTube.

01:34:15 Give it five stars on Apple Podcast.

01:34:17 Follow on Spotify, support it on Patreon,

01:34:19 or simply connect with me on Twitter.

01:34:22 And now, let me leave you with some words of wisdom

01:34:24 from Gordon Moore.

01:34:26 If everything you try works,

01:34:28 you aren’t trying hard enough.

01:34:30 Thank you for listening, and hope to see you next time.