David Patterson: Computer Architecture and Data Storage #104

Transcript

00:00:00 The following is a conversation with David Patterson, touring award winner and professor

00:00:05 of computer science at Berkeley. He’s known for pioneering contributions to RISC processor

00:00:10 architecture used by 99% of new chips today and for co creating RAID storage. The impact that

00:00:18 these two lines of research and development have had in our world is immeasurable. He’s also one of

00:00:25 the great educators of computer science in the world. His book with John Hennessy is how I first

00:00:30 learned about and was humbled by the inner workings of machines at the lowest level.

00:01:21 This episode is supported by the Jordan Harbinger Show.

00:01:24 Go to Jordan Harbinger.com slash Lex. It’s how he knows I sent you on that page. There’s links

00:01:30 to subscribe to it on Apple podcast, Spotify, and everywhere else. I’ve been binging on this podcast.

00:01:36 It’s amazing. Jordan is a great human being. He gets the best out of his guests, dives deep,

00:01:41 calls them out when it’s needed, and makes the whole thing fun to listen to. He’s interviewed

00:01:46 Kobe Bryant, Mark Cuban, Neil deGrasse Tyson, Garry Kasparov, and many more. I recently listened

00:01:52 to his conversation with Frank Abagnale, author of Catch Me If You Can, and one of the world’s

00:01:58 most famous con men. Perfect podcast length and topic for a recent long distance run that I did.

00:02:05 Again, go to Jordan Harbinger.com slash Lex to give him my love and to support this podcast.

00:02:13 Subscribe also on Apple podcast, Spotify, and everywhere else.

00:02:17 This show is presented by Cash App, the greatest sponsor of this podcast ever, and the number one

00:02:23 finance app in the App Store. When you get it, use code LEX PODCAST. Cash App lets you send money

00:02:29 to friends, buy Bitcoin, and invest in the stock market with as little as $1. Since Cash App allows

00:02:35 you to buy Bitcoin, let me mention that cryptocurrency in the context of the history

00:02:39 of money is fascinating. I recommend Ascent of Money as a great book on this history.

00:02:44 Also, the audiobook is amazing. Debits and credits on Ledger started around 30,000 years ago.

00:02:50 The US dollar created over 200 years ago, and the first decentralized cryptocurrency released just

00:02:55 over 10 years ago. So given that history, cryptocurrency is still very much in its early

00:03:00 days of development, but it’s still aiming to and just might redefine the nature of money.

00:03:06 So again, if you get Cash App from the App Store or Google Play, and use the code LEX PODCAST,

00:03:12 you get $10, and Cash App will also donate $10 to FIRST, an organization that is helping to

00:03:19 advance robotics and STEM education for young people around the world. And now, here’s my

00:03:25 conversation with David Patterson. Let’s start with the big historical question. How have computers

00:03:32 changed in the past 50 years at both the fundamental architectural level and in general, in your eyes?

00:03:38 David Patterson Well, the biggest thing that happened was the invention of the microprocessor.

00:03:42 So computers that used to fill up several rooms could fit inside your cell phone. And not only

00:03:52 did they get smaller, they got a lot faster. So they’re a million times faster than they were

00:03:58 50 years ago, and they’re much cheaper, and they’re ubiquitous. There’s 7.8 billion people

00:04:06 on this planet. Probably half of them have cell phones right now, which is remarkable.

00:04:10 Soterios Johnson That’s probably more microprocessors than there are people.

00:04:14 David Patterson Sure. I don’t know what the ratio is,

00:04:16 but I’m sure it’s above one. Maybe it’s 10 to 1 or some number like that.

00:04:21 Soterios Johnson What is a microprocessor?

00:04:23 David Patterson So a way to say what a microprocessor is,

00:04:27 is to tell you what’s inside a computer. So a computer forever has classically had

00:04:32 five pieces. There’s input and output, which kind of naturally, as you’d expect, is input is like

00:04:38 speech or typing, and output is displays. There’s a memory, and like the name sounds, it remembers

00:04:48 things. So it’s integrated circuits whose job is you put information in, then when you ask for it,

00:04:54 it comes back out. That’s memory. And the third part is the processor, where the microprocessor

00:05:00 comes from. And that has two pieces as well. And that is the control, which is kind of the brain

00:05:07 of the processor. And what’s called the arithmetic unit, it’s kind of the brawn of the computer. So

00:05:15 if you think of the, as a human body, the arithmetic unit, the thing that does the

00:05:19 number crunching is the body and the control is the brain. So those five pieces, input, output,

00:05:25 memory, arithmetic unit, and control are, have been in computers since the very dawn. And the

00:05:34 last two are considered the processor. So a microprocessor simply means a processor that

00:05:39 fits on a microchip. And that was invented about, you know, 40 years ago, was the first microprocessor.

00:05:46 It’s interesting that you refer to the arithmetic unit as the, like you connected to the body and

00:05:52 the controllers of the brain. So I guess, I never thought of it that way. It’s a nice way to think

00:05:57 of it because most of the actions the microprocessor does in terms of literally sort of computation,

00:06:05 but the microprocessor does computation. It processes information. And most of the thing it

00:06:10 does is basic arithmetic operations. What are the operations, by the way?

00:06:16 It’s a lot like a calculator. So there are add instructions, subtract instructions,

00:06:22 multiply and divide. And kind of the brilliance of the invention of the computer or the processor

00:06:32 is that it performs very trivial operations, but it just performs billions of them per second.

00:06:39 And what we’re capable of doing is writing software that can take these very trivial instructions

00:06:44 and have them create tasks that can do things better than human beings can do today.

00:06:49 Just looking back through your career, did you anticipate the kind of how good we would be able

00:06:55 to get at doing these small, basic operations? How many surprises along the way where you just

00:07:03 kind of sat back and said, wow, I didn’t expect it to go this fast, this good?

00:07:09 MG Well, the fundamental driving force is what’s called Moore’s law, which was named after Gordon

00:07:17 Moore, who’s a Berkeley alumnus. And he made this observation very early in what are called

00:07:23 semiconductors. And semiconductors are these ideas, you can build these very simple switches,

00:07:29 and you can put them on these microchips. And he made this observation over 50 years ago.

00:07:34 He looked at a few years and said, I think what’s going to happen is the number of these little

00:07:38 switches called transistors is going to double every year for the next decade. And he said this

00:07:44 in 1965. And in 1975, he said, well, maybe it’s going to double every two years. And that what

00:07:51 other people since named that Moore’s law guided the industry. And when Gordon Moore made that

00:07:58 prediction, he wrote a paper back in, I think, in the 70s and said, not only did this going to happen,

00:08:08 he wrote, what would be the implications of that? And in this article from 1965,

00:08:13 he shows ideas like computers being in cars and computers being in something that you would buy

00:08:21 in the grocery store and stuff like that. So he kind of not only called his shot, he called the

00:08:26 implications of it. So if you were in the computing field, and if you believed Moore’s prediction,

00:08:33 he kind of said what would be happening in the future. So it’s not kind of, it’s at one sense,

00:08:41 this is what was predicted. And you could imagine it was easy to believe that Moore’s law was going

00:08:46 to continue. And so this would be the implications. On the other side, there are these kind of

00:08:51 shocking events in your life. Like I remember driving in Marin across the Bay in San Francisco

00:08:59 and seeing a bulletin board at a local civic center and it had a URL on it. And it was like,

00:09:07 for the people at the time, these first URLs and that’s the, you know, www select stuff with the

00:09:13 HTTP. People thought it looked like alien writing, right? You’d see these advertisements and

00:09:22 commercials or bulletin boards that had this alien writing on it. So for the lay people, it’s like,

00:09:26 what the hell is going on here? And for those people in the industry, it was, oh my God,

00:09:32 this stuff is getting so popular, it’s actually leaking out of our nerdy world into the real

00:09:37 world. So that, I mean, there was events like that. I think another one was, I remember in the

00:09:42 early days of the personal computer, when we started seeing advertisements in magazines

00:09:46 for personal computers, like it’s so popular that it’s made the newspapers. So at one hand,

00:09:52 you know, Gordon Moore predicted it and you kind of expected it to happen, but when it really hit

00:09:56 and you saw it affecting society, it was shocking. So maybe taking a step back and looking both

00:10:05 the engineering and philosophical perspective, what do you see as the layers of abstraction

00:10:11 in the computer? Do you see a computer as a set of layers of abstractions?

00:10:16 Dr. Justin Marchegiani Yeah, I think that’s one of the things that computer science

00:10:20 fundamentals is the, these things are really complicated in the way we cope with complicated

00:10:26 software and complicated hardware is these layers of abstraction. And that simply means that we,

00:10:33 you know, suspend disbelief and pretend that the only thing you know is that layer,

00:10:39 and you don’t know anything about the layer below it. And that’s the way we can make very complicated

00:10:44 things. And probably it started with hardware that that’s the way it was done, but it’s been

00:10:50 proven extremely useful. And, you know, I would say in a modern computer today, there might be

00:10:56 10 or 20 layers of abstraction, and they’re all trying to kind of enforce this contract is all

00:11:02 you know is this interface. There’s a set of commands that you can, are allowed to use,

00:11:09 and you stick to those commands, and we will faithfully execute that. And it’s like peeling

00:11:13 the air layers of a London, of an onion, you get down, there’s a new set of layers and so forth.

00:11:19 So for people who want to study computer science, the exciting part about it is you can

00:11:27 keep peeling those layers. You take your first course, and you might learn to program in Python,

00:11:32 and then you can take a follow on course, and you can get it down to a lower level language like C,

00:11:37 and you know, you can go and then you can, if you want to, you can start getting into the hardware

00:11:42 layers, and you keep getting down all the way to that transistor that I talked about that Gordon

00:11:47 Moore predicted. And you can understand all those layers all the way up to the highest level

00:11:53 application software. So it’s a very kind of magnetic field. If you’re interested, you can go

00:12:02 into any depth and keep going. In particular, what’s happening right now, or it’s happened

00:12:08 in software the last 20 years and recently in hardware, there’s getting to be open source

00:12:12 versions of all of these things. So what open source means is what the engineer, the programmer

00:12:18 designs, it’s not secret, the belonging to a company, it’s out there on the worldwide web,

00:12:26 so you can see it. So you can look at, for lots of pieces of software that you use, you can see

00:12:33 exactly what the programmer does if you want to get involved. That used to stop at the hardware.

00:12:39 Recently, there’s been an effort to make open source hardware and those interfaces open,

00:12:46 so you can see that. So instead of before you had to stop at the hardware, you can now start going

00:12:51 layer by layer below that and see what’s inside there. So it’s a remarkable time that for the

00:12:56 interested individual can really see in great depth what’s really going on in the computers

00:13:01 that power everything that we see around us. Are you thinking also when you say open source at

00:13:07 the hardware level, is this going to the design architecture instruction set level or is it going

00:13:14 to literally the manufacturer of the actual hardware, of the actual chips, whether that’s ASIC

00:13:24 specialized to a particular domain or the general? Yeah, so let’s talk about that a little bit.

00:13:30 So when you get down to the bottom layer of software, the way software talks to hardware

00:13:38 is in a vocabulary. And what we call that vocabulary, we call that, the words of that

00:13:45 vocabulary are called instructions. And the technical term for the vocabulary is instruction

00:13:50 set. So those instructions are like we talked about earlier, that can be instructions like

00:13:55 add, subtract and multiply, divide. There’s instructions to put data into memory, which

00:14:01 is called a store instruction and to get data back, which is called the load instructions.

00:14:05 And those simple instructions go back to the very dawn of computing in 1950, the commercial

00:14:12 computer had these instructions. So that’s the instruction set that we’re talking about.

00:14:17 So up until, I’d say 10 years ago, these instruction sets were all proprietary. So

00:14:23 a very popular one is owned by Intel, the one that’s in the cloud and in all the PCs in the

00:14:29 world. Intel owns that instruction set. It’s referred to as the x86. There’ve been a sequence

00:14:36 of ones that the first number was called 8086. And since then, there’s been a lot of numbers,

00:14:41 but they all end in 86. So there’s been that kind of family of instruction sets.

00:14:47 And that’s proprietary.

00:14:49 That’s proprietary. The other one that’s very popular is from ARM. That kind of powers all

00:14:55 the cell phones in the world, all the iPads in the world, and a lot of things that are so called

00:15:02 Internet of Things devices. ARM and that one is also proprietary. ARM will license it to people

00:15:09 for a fee, but they own that. So the new idea that got started at Berkeley kind of unintentionally

00:15:16 10 years ago is early in my career, we pioneered a way to do these vocabularies instruction sets

00:15:25 that was very controversial at the time. At the time in the 1980s, conventional wisdom was these

00:15:32 vocabularies instruction sets should have powerful instructions. So polysyllabic kind of words,

00:15:38 you can think of that. And so instead of just add, subtract, and multiply, they would have

00:15:44 polynomial, divide, or sort a list. And the hope was of those powerful vocabularies,

00:15:51 that’d make it easier for software. So we thought that didn’t make sense for microprocessors. There

00:15:57 was people at Berkeley and Stanford and IBM who argued the opposite. And what we called that was

00:16:03 a reduced instruction set computer. And the abbreviation was RISC. And typical for computer

00:16:10 people, we use the abbreviation start pronouncing it. So risk was the thing. So we said for

00:16:15 microprocessors, which with Gordon’s Moore is changing really fast, we think it’s better to have

00:16:21 a pretty simple set of instructions, reduced set of instructions. That that would be a better way

00:16:28 to build microprocessors since they’re going to be changing so fast due to Moore’s law. And then

00:16:32 we’ll just use standard software to cover the use, generate more of those simple instructions. And

00:16:41 one of the pieces of software that’s in that software stack going between these layers of

00:16:45 abstractions is called a compiler. And it’s basically translates, it’s a translator between

00:16:50 levels. We said the translator will handle that. So the technical question was, well, since there

00:16:57 are these reduced instructions, you have to execute more of them. Yeah, that’s right. But

00:17:01 maybe you could execute them faster. Yeah, that’s right. They’re simpler so they could go faster,

00:17:05 but you have to do more of them. So what’s that trade off look like? And it ended up that we ended

00:17:12 up executing maybe 50% more instructions, maybe a third more instructions, but they ran four times

00:17:19 faster. So this risk, controversial risk ideas proved to be maybe factors of three or four

00:17:26 better. I love that this idea was controversial and almost kind of like rebellious. So that’s

00:17:33 in the context of what was more conventional is the complex instructional set computing. So

00:17:40 how would you pronounce that? CISC. CISC versus risk. Risk versus CISC. And believe it or not,

00:17:46 this sounds very, who cares about this? It was violently debated at several conferences. It’s

00:17:54 like, what’s the right way to go? And people thought risk was a deevolution. We’re going to

00:18:01 make software worse by making those instructions simpler. And there are fierce debates at several

00:18:07 conferences in the 1980s. And then later in the 80s, it kind of settled to these benefits.

00:18:14 It’s not completely intuitive to me why risk has, for the most part, won.

00:18:19 Yeah. So why did that happen? Yeah. Yeah. And maybe I can sort of say a bunch of dumb things

00:18:24 that could lay the land for further commentary. So to me, this is kind of an interesting thing.

00:18:30 If you look at C++ versus C, with modern compilers, you really could write faster code

00:18:36 with C++. So relying on the compiler to reduce your complicated code into something simple and

00:18:44 fast. So to me, comparing risk, maybe this is a dumb question, but why is it that focusing the

00:18:53 definition of the design of the instruction set on very few simple instructions in the long run

00:19:00 provide faster execution versus coming up with, like you said, a ton of complicated instructions

00:19:09 that over time, you know, years, maybe decades, you come up with compilers that can reduce those

00:19:16 into simple instructions for you. Yeah. So let’s try and split that into two pieces.

00:19:22 So if the compiler can do that for you, if the compiler can take, you know, a complicated program

00:19:29 and produce simpler instructions, then the programmer doesn’t care, right? I don’t care just

00:19:37 how fast is the computer I’m using, how much does it cost? And so what happened kind of in the

00:19:43 software industry is right around before the 1980s, critical pieces of software were still written

00:19:50 not in languages like C or C++, they were written in what’s called assembly language, where there’s

00:19:57 this kind of humans writing exactly at the instructions at the level that a computer can

00:20:04 understand. So they were writing add, subtract, multiply, you know, instructions. It’s very tedious.

00:20:11 But the belief was to write this lowest level of software that people use, which are called operating

00:20:17 systems, they had to be written in assembly language because these high level languages were just too

00:20:22 inefficient. They were too slow, or the programs would be too big. So that changed with a famous

00:20:31 operating system called Unix, which is kind of the grandfather of all the operating systems today.

00:20:37 So Unix demonstrated that you could write something as complicated as an operating system in a

00:20:43 language like C. So once that was true, then that meant we could hide the instruction set from the

00:20:51 programmer. And so that meant then it didn’t really matter. The programmer didn’t have to write

00:20:58 lots of these simple instructions, that was up to the compiler. So that was part of our arguments

00:21:02 for risk is, if you were still writing assembly language, there’s maybe a better case for CISC

00:21:07 instructions. But if the compiler can do that, it’s going to be, you know, that’s done once the

00:21:13 computer translates at once. And then every time you run the program, it runs at this potentially

00:21:19 simpler instructions. And so that was the debate, right? And people would acknowledge that the

00:21:26 simpler instructions could lead to a faster computer. You can think of monosyllabic instructions,

00:21:33 you could say them, you know, if you think of reading, you can probably read them faster or say

00:21:36 them faster than long instructions. The same thing, that analogy works pretty well for hardware.

00:21:42 And as long as you didn’t have to read a lot more of those instructions, you could win. So that’s

00:21:46 kind of, that’s the basic idea for risk. But it’s interesting that in that discussion of Unix and C,

00:21:54 that there’s only one step of levels of abstraction from the code that’s really the closest to the

00:22:02 machine to the code that’s written by human. It’s, at least to me again, perhaps a dumb intuition,

00:22:09 but it feels like there might’ve been more layers, sort of different kinds of humans stacked on top

00:22:16 of each other. So what’s true and not true about what you said is several of the layers of software,

00:22:27 like, so the, if you, two layers would be, suppose we just talked about two layers,

00:22:32 that would be the operating system, like you get from Microsoft or from Apple, like iOS,

00:22:38 or the Windows operating system. And let’s say applications that run on top of it, like Word

00:22:44 or Excel. So both the operating system could be written in C and the application could be written

00:22:52 in C. But you could construct those two layers and the applications absolutely do call upon the

00:22:58 operating system. And the change was that both of them could be written in higher level languages.

00:23:04 So it’s one step of a translation, but you can still build many layers of abstraction

00:23:10 of software on top of that. And that’s how things are done today. So still today,

00:23:17 many of the layers that you’ll deal with, you may deal with debuggers, you may deal with linkers,

00:23:25 there’s libraries. Many of those today will be written in C++, say, even though that language is

00:23:34 pretty ancient. And even the Python interpreter is probably written in C or C++. So lots of

00:23:41 layers there are probably written in these, some old fashioned efficient languages that

00:23:48 still take one step to produce these instructions, produce RISC instructions, but they’re composed,

00:23:56 each layer of software invokes one another through these interfaces. And you can get 10 layers of

00:24:02 software that way. So in general, the RISC was developed here at Berkeley? It was kind of the

00:24:08 three places that were these radicals that advocated for this against the rest of community

00:24:14 were IBM, Berkeley, and Stanford. You’re one of these radicals. And how radical did you feel?

00:24:24 How confident did you feel? How doubtful were you that RISC might be the right approach? Because

00:24:31 it may, you can also intuit that is kind of taking a step back into simplicity, not forward into

00:24:37 simplicity. Yeah, no, it was easy to make, yeah, it was easy to make the argument against it. Well,

00:24:44 this was my colleague, John Hennessy at Stanford Nine. We were both assistant professors. And

00:24:50 for me, I just believed in the power of our ideas. I thought what we were saying made sense.

00:24:57 Moore’s law is going to move fast. The other thing that I didn’t mention is one of the surprises of

00:25:03 these complex instruction sets. You could certainly write these complex instructions

00:25:08 if the programmer is writing them themselves. It turned out to be kind of difficult for the

00:25:13 compiler to generate those complex instructions. Kind of ironically, you’d have to find the right

00:25:18 circumstances that just exactly fit this complex instruction. It was actually easier for the

00:25:22 compiler to generate these simple instructions. So not only did these complex instructions make

00:25:28 the hardware more difficult to build, often the compiler wouldn’t even use them. And so

00:25:35 it’s harder to build. The compiler doesn’t use them that much. The simple instructions go better

00:25:40 with Moore’s law. The number of transistors is doubling every two years. So we’re going to have,

00:25:46 you want to reduce the time to design the microprocessor, that may be more important

00:25:51 than these number of instructions. So I think we believed that we were right, that this was

00:25:58 the best idea. Then the question became in these debates, well, yeah, that’s a good technical idea,

00:26:03 but in the business world, this doesn’t matter. There’s other things that matter. It’s like

00:26:08 arguing that if there’s a standard with the railroad tracks and you’ve come up with a better

00:26:14 width, but the whole world is covered in railroad tracks, so your ideas have no chance of success.

00:26:20 Right. Commercial success. It was technically right, but commercially it’ll be insignificant.

00:26:25 Yeah, it’s kind of sad that this world, the history of human civilization is full of good ideas that

00:26:33 lost because somebody else came along first with a worse idea. And it’s good that in the

00:26:39 computing world, at least some of these have, well, you could, I mean, there’s probably still

00:26:43 CISC people that say, yeah, there still are. And what happened was, what was interesting, Intel,

00:26:50 a bunch of the CISC companies with CISC instruction sets of vocabulary, they gave up,

00:26:57 but not Intel. What Intel did to its credit, because Intel’s vocabulary was in the personal

00:27:06 computer. And so that was a very valuable vocabulary because the way we distribute software

00:27:11 is in those actual instructions. It’s in the instructions of that instruction set. So

00:27:17 you don’t get that source code, what the programmers wrote. You get, after it’s been translated into

00:27:22 the lowest level, that’s if you were to get a floppy disk or download software, it’s in the

00:27:27 instructions of that instruction set. So the x86 instruction set was very valuable. So what Intel

00:27:33 did cleverly and amazingly is they had their chips in hardware do a translation step.

00:27:40 They would take these complex instructions and translate them into essentially in RISC instructions

00:27:45 in hardware on the fly, at gigahertz clock speeds. And then any good idea that RISC people had,

00:27:52 they could use, and they could still be compatible with this really valuable PC software base,

00:28:01 which also had very high volumes, 100 million personal computers per year. So the CISC architecture

00:28:09 in the business world was actually won in this PC era. So just going back to the

00:28:20 time of designing RISC, when you design an instruction set architecture, do you think

00:28:27 like a programmer? Do you think like a microprocessor engineer? Do you think like a

00:28:33 artist, a philosopher? Do you think in software and hardware? I mean, is it art? Is it science?

00:28:40 Yeah, I’d say, I think designing a good instruction set is an art. And I think you’re trying to

00:28:47 balance the simplicity and speed of execution with how well easy it will be for compilers

00:28:57 to use it. You’re trying to create an instruction set that everything in there can be used by

00:29:03 compilers. There’s not things that are missing that’ll make it difficult for the program to run.

00:29:10 They run efficiently, but you want it to be easy to build as well. So I’d say you’re thinking

00:29:16 hardware, trying to find a hardware software compromise that’ll work well. And it’s a matter

00:29:24 of taste. It’s kind of fun to build instruction sets. It’s not that hard to build an instruction

00:29:30 set, but to build one that catches on and people use, you have to be fortunate to be

00:29:38 the right place in the right time or have a design that people really like. Are you using metrics?

00:29:43 So is it quantifiable? Because you kind of have to anticipate the kind of programs that people

00:29:49 write ahead of time. So can you use numbers? Can you use metrics? Can you quantify something ahead

00:29:56 of time? Or is this, again, that’s the art part where you’re kind of anticipating? No, it’s a big

00:30:00 change. Kind of what happened, I think from Hennessy’s and my perspective in the 1980s,

00:30:07 what happened was going from kind of really, you know, taste and hunches to quantifiable. And in

00:30:17 fact, he and I wrote a textbook at the end of the 1980s called Computer Architecture, A Quantitative

00:30:22 Approach. I heard of that. And it’s the thing, it had a pretty big impact in the field because we

00:30:30 went from textbooks that kind of listed, so here’s what this computer does, and here’s the pros and

00:30:36 cons, and here’s what this computer does and pros and cons to something where there were formulas

00:30:40 and equations where you could measure things. So specifically for instruction sets, what we do

00:30:47 and some other fields do is we agree upon a set of programs, which we call benchmarks,

00:30:53 and a suite of programs, and then you develop both the hardware and the compiler and you get

00:31:00 numbers on how well your computer does given its instruction set and how well you implemented it in

00:31:09 your microprocessor and how good your compilers are. In computer architecture, you know, using

00:31:14 professor’s terms, we grade on a curve rather than grade on an absolute scale. So when you say,

00:31:20 you know, these programs run this fast, well, that’s kind of interesting, but how do you know

00:31:24 it’s better? Well, you compare it to other computers at the same time. So the best way we

00:31:29 know how to turn it into a kind of more science and experimental and quantitative is to compare

00:31:37 yourself to other computers of the same era that have the same access to the same kind of technology

00:31:41 on commonly agreed benchmark programs.

00:31:44 So maybe to toss up two possible directions we can go. One is what are the different tradeoffs

00:31:51 in designing architectures? We’ve been already talking about SISC and RISC, but maybe a little

00:31:56 bit more detail in terms of specific features that you were thinking about. And the other side is

00:32:03 what are the metrics that you’re thinking about when looking at these tradeoffs?

00:32:08 Yeah, let’s talk about the metrics. So during these debates, we actually had kind of a hard

00:32:14 time explaining, convincing people the ideas, and partly we didn’t have a formula to explain it.

00:32:20 And a few years into it, we hit upon a formula that helped explain what was going on. And

00:32:27 I think if we can do this, see how it works orally to do this. So if I can do a formula

00:32:34 orally, let’s see. So fundamentally, the way you measure performance is how long does it take a

00:32:40 program to run? A program, if you have 10 programs, and typically these benchmarks were sweet because

00:32:47 you’d want to have 10 programs so they could represent lots of different applications. So for

00:32:51 these 10 programs, how long does it take to run? Well now, when you’re trying to explain why it

00:32:56 took so long, you could factor how long it takes a program to run into three factors.

00:33:01 One of the first one is how many instructions did it take to execute? So that’s the what we’ve been

00:33:06 talking about, you know, the instructions of Alchemy. How many did it take? All right. The

00:33:11 next question is how long did each instruction take to run on average? So you multiply the number

00:33:17 of instructions times how long it took to run, and that gets you time. Okay, so that’s, but now let’s

00:33:23 look at this metric of how long did it take the instruction to run. Well, it turns out,

00:33:28 the way we could build computers today is they all have a clock, and you’ve seen this when you,

00:33:33 if you buy a microprocessor, it’ll say 3.1 gigahertz or 2.5 gigahertz, and more gigahertz is

00:33:39 good. Well, what that is is the speed of the clock. So 2.5 gigahertz turns out to be 4 billionths of

00:33:47 instruction or 4 nanoseconds. So that’s the clock cycle time. But there’s another factor, which is

00:33:54 what’s the average number of clock cycles it takes per instruction? So it’s number of instructions,

00:33:59 average number of clock cycles, and the clock cycle time. So in these risk sis debates, they

00:34:05 would concentrate on, but risk needs to take more instructions, and we’d argue maybe the clock cycle

00:34:12 is faster, but what the real big difference was was the number of clock cycles per instruction.

00:34:17 Per instruction, that’s fascinating. What about the mess of, the beautiful mess of parallelism in the

00:34:25 whole picture? Parallelism, which has to do with, say, how many instructions could execute in parallel

00:34:31 and things like that, you could think of that as affecting the clock cycles per instruction,

00:34:35 because it’s the average clock cycles per instruction. So when you’re running a program,

00:34:39 if it took 100 billion instructions, and on average it took two clock cycles per instruction,

00:34:45 and they were four nanoseconds, you could multiply that out and see how long it took to run.

00:34:49 And there’s all kinds of tricks to try and reduce the number of clock cycles per instruction.

00:34:55 But it turned out that the way they would do these complex instructions is they would actually

00:35:00 build what we would call an interpreter in a simpler, a very simple hardware interpreter.

00:35:05 But it turned out that for the sis constructions, if you had to use one of those interpreters,

00:35:10 it would be like 10 clock cycles per instruction, where the risk constructions could be two. So

00:35:16 there’d be this factor of five advantage in clock cycles per instruction. We have to execute, say,

00:35:21 25 or 50 percent more instructions, so that’s where the win would come. And then you could

00:35:25 make an argument whether the clock cycle times are the same or not. But pointing out that we

00:35:30 could divide the benchmark results time per program into three factors, and the biggest

00:35:36 difference between RISC and SIS was the clock cycles per, you execute a few more instructions,

00:35:40 but the clock cycles per instruction is much less. And that was what this debate, once we

00:35:46 made that argument, then people said, oh, okay, I get it. And so we went from, it was outrageously

00:35:53 controversial in, you know, 1982 that maybe probably by 1984 or so, people said, oh, yeah,

00:35:59 technically, they’ve got a good argument. What are the instructions in the RISC instruction set,

00:36:05 just to get an intuition? Okay. 1995, I was asked to predict the future of what microprocessor

00:36:13 could future. So I, and I’d seen these predictions and usually people predict something outrageous

00:36:20 just to be entertaining, right? And so my prediction for 2020 was, you know, things are

00:36:25 going to be pretty much, they’re going to look very familiar to what they are. And they are,

00:36:30 and if you were to read the article, you know, the things I said are pretty much true. The

00:36:34 instructions that have been around forever are kind of the same. And that’s the outrageous

00:36:38 prediction, actually. Yeah. Given how fast computers have been going. Well, and you know,

00:36:42 Moore’s law was going to go on, we thought for 25 more years, you know, who knows, but kind of the

00:36:47 surprising thing, in fact, you know, Hennessy and I, you know, won the ACM, AM, Turing award for

00:36:55 both the RISC instruction set contributions and for that textbook I mentioned. But, you know,

00:37:00 we’re surprised that here we are 35, 40 years later after we did our work and the conventionalism

00:37:10 of the best way to do instruction sets is still those RISC instruction sets that looked very

00:37:15 similar to what we looked like, you know, we did in the 1980s. So those, surprisingly, there hasn’t

00:37:21 been some radical new idea, even though we have, you know, a million times as many transistors as

00:37:26 we had back then. But what are the basic constructions and how do they change over the

00:37:32 years? So we’re talking about addition, subtraction, these are the specific. So the things that are in

00:37:39 a calculator are in a computer. So any of the buttons that are in the calculator in the computer,

00:37:44 so the, so if there’s a memory function key, and like I said, those are, turns into putting

00:37:50 something in memory is called a store, bring something back to load. Just a quick tangent.

00:37:54 When you say memory, what does memory mean? Well, I told you there were five pieces of a computer.

00:38:00 And if you remember in a calculator, there’s a memory key. So you want to have intermediate

00:38:04 calculation and bring it back later. So you’d hit the memory plus key M plus maybe, and it would

00:38:09 put that into memory and then you’d hit an RM like recurrence section and then bring it back

00:38:14 on the display. So you don’t have to type it. You don’t have to write it down and bring it back

00:38:17 again. So that’s exactly what memory is. You can put things into it as temporary storage and bring

00:38:22 it back when you need it later. So that’s memory and loads and stores. But the big thing, the

00:38:28 difference between a computer and a calculator is that the computer can make decisions. And

00:38:35 amazingly, decisions are as simple as, is this value less than zero? Or is this value bigger

00:38:41 than that value? And those instructions, which are called conditional branch instructions,

00:38:47 is what give computers all its power. If you were in the early days of computing before

00:38:53 what’s called the general purpose microprocessor, people would write these instructions kind of in

00:38:58 hardware, but it couldn’t make decisions. It would do the same thing over and over again.

00:39:05 With the power of having branch instructions, it can look at things and make decisions

00:39:09 automatically. And it can make these decisions billions of times per second. And amazingly

00:39:15 enough, we can get, thanks to advanced machine learning, we can create programs that can do

00:39:20 something smarter than human beings can do. But if you go down that very basic level, it’s the

00:39:25 instructions are the keys on the calculator, plus the ability to make decisions, these conditional

00:39:30 branch instructions. And all decisions fundamentally can be reduced down to these

00:39:34 branch instructions. Yeah. So in fact, and so going way back in the stack back to,

00:39:42 we did four RISC projects at Berkeley in the 1980s. They did a couple at Stanford

00:39:47 in the 1980s. In 2010, we decided we wanted to do a new instruction set learning from the mistakes

00:39:54 of those RISC architectures in the 1980s. And that was done here at Berkeley almost exactly

00:40:00 10 years ago. And the people who did it, I participated, but Krzysztof Sanowicz and others

00:40:07 drove it. They called it RISC 5 to honor those RISC, the four RISC projects of the 1980s.

00:40:13 So what does RISC 5 involve? So RISC 5 is another instruction set of vocabulary. It’s learned from

00:40:21 the mistakes of the past, but it still has, if you look at the, there’s a core set of instructions

00:40:25 that’s very similar to the simplest architectures from the 1980s. And the big difference about RISC

00:40:31 5 is it’s open. So I talked early about proprietary versus open software. So this is an instruction

00:40:41 set. So it’s a vocabulary, it’s not hardware, but by having an open instruction set, we can have

00:40:47 open source implementations, open source processors that people can use. Where do you see that

00:40:54 going? It’s a really exciting possibility, but you’re just like in the scientific American,

00:41:00 if you were to predict 10, 20, 30 years from now, that kind of ability to utilize open source

00:41:07 instruction set architectures like RISC 5, what kind of possibilities might that unlock?

00:41:13 Yeah. And so just to make it clear, because this is confusing, the specification of RISC 5 is

00:41:20 something that’s like in a textbook, there’s books about it. So that’s defining an interface.

00:41:27 There’s also the way you build hardware is you write it in languages that are kind of like C,

00:41:33 but they’re specialized for hardware that gets translated into hardware. And so these

00:41:39 implementations of this specification are the open source. So they’re written in something

00:41:44 that’s called Verilog or VHDL, but it’s put up on the web, just like you can see the C++ code for

00:41:53 Linux on the web. So that’s the open instruction set enables open source implementations of RISC 5.

00:42:00 So you can literally build a processor using this instruction set.

00:42:04 People are, people are. So what happened to us that the story was this was developed here for

00:42:09 our use to do our research. And we made it, we licensed under the Berkeley Software Distribution

00:42:15 License, like a lot of things get licensed here. So other academics use it, they wouldn’t be afraid

00:42:19 to use it. And then about 2014, we started getting complaints that we were using it in our research

00:42:27 and in our courses. And we got complaints from people in industries, why did you change your

00:42:32 instruction set between the fall and the spring semester? And well, we get complaints from

00:42:37 industrial time. Why the hell do you care what we do with our instruction set? And then when we

00:42:42 talked to him, we found out there was this thirst for this idea of an open instruction set

00:42:46 architecture. And they had been looking for one. They stumbled upon ours at Berkeley, thought it

00:42:51 was, boy, this looks great. We should use this one. And so once we realized there is this need

00:42:58 for an open instruction set architecture, we thought that’s a great idea. And then we started

00:43:02 supporting it and tried to make it happen. So this was kind of, we accidentally stumbled into this

00:43:09 and to this need and our timing was good. And so it’s really taking off. There’s,

00:43:16 you know, universities are good at starting things, but they’re not good at sustaining things. So like

00:43:20 Linux has a Linux foundation, there’s a RISC 5 foundation that we started. There’s an annual

00:43:26 conferences. And the first one was done, I think, January of 2015. And the one that was just last

00:43:32 December in it, you know, it had 50 people at it. And this one last December had, I don’t know,

00:43:38 1700 people were at it and the companies excited all over the world. So if predicting into the

00:43:44 future, you know, if we were doing 25 years, I would predict that RISC 5 will be, you know,

00:43:51 possibly the most popular instruction set architecture out there, because it’s a pretty

00:43:57 good instruction set architecture and it’s open and free. And there’s no reason lots of people

00:44:03 shouldn’t use it. And there’s benefits just like Linux is so popular today compared to 20 years

00:44:10 ago. And, you know, the fact that you can get access to it for free, you can modify it, you can

00:44:17 improve it for all those same arguments. And so people collaborate to make it a better system

00:44:22 for everybody to use. And that works in software. And I expect the same thing will happen in

00:44:26 hardware. So if you look at ARM, Intel, MIPS, if you look at just the lay of the land,

00:44:34 and what do you think, just for me, because I’m not familiar how difficult this kind of transition

00:44:42 would, how much challenges this kind of transition would entail, do you see,

00:44:50 let me ask my dumb question in another way.

00:44:52 No, that’s, I know where you’re headed. Well, there’s a bunch, I think the thing you point out,

00:44:57 there’s these very popular proprietary instruction sets, the x86.

00:45:02 And so how do we move to RISC 5 potentially in sort of in the span of 5, 10, 20 years,

00:45:09 a kind of unification, given that the devices, the kind of way we use devices,

00:45:15 IoT, mobile devices, and the cloud keeps changing?

00:45:20 Well, part of it, a big piece of it is the software stack. And right now, looking forward,

00:45:27 there seem to be three important markets. There’s the cloud. And the cloud is simply

00:45:34 companies like Alibaba and Amazon and Google, Microsoft, having these giant data centers with

00:45:42 tens of thousands of servers in maybe a hundred of these data centers all over the world.

00:45:48 And that’s what the cloud is. So the computer that dominates the cloud is the x86 instruction set.

00:45:54 So the instruction sets used in the cloud are the x86, almost 100% of that today is x86.

00:46:03 The other big thing are cell phones and laptops. Those are the big things today.

00:46:08 I mean, the PC is also dominated by the x86 instruction set, but those sales are dwindling.

00:46:14 You know, there’s maybe 200 million PCs a year, and there’s one and a half billion phones a year.

00:46:21 There’s numbers like that. So for the phones, that’s dominated by ARM.

00:46:26 And now, and a reason that I talked about the software stacks, and the third category is

00:46:33 Internet of Things, which is basically embedded devices, things in your cars and your microwaves

00:46:38 everywhere. So what’s different about those three categories is for the cloud, the software that

00:46:45 runs in the cloud is determined by these companies, Alibaba, Amazon, Google, Microsoft. So they

00:46:51 control that software stack. For the cell phones, there’s both for Android and Apple, the software

00:46:58 they supply, but both of them have marketplaces where anybody in the world can build software.

00:47:03 And that software is translated or, you know, compiled down and shipped in the vocabulary of ARM.

00:47:11 So that’s what’s referred to as binary compatible because the actual, it’s the instructions are

00:47:18 turned into numbers, binary numbers, and shipped around the world.

00:47:21 And sorry, just a quick interruption. So ARM, what is ARM? ARM is an instruction set, like a risk based…

00:47:29 Yeah, it’s a risk based instruction set. It’s a proprietary one. ARM stands for Advanced Risk

00:47:36 Machine. ARM is the name where the company is. So it’s a proprietary risk architecture.

00:47:41 So, and it’s been around for a while and it’s, you know, the, surely the most popular instruction set

00:47:50 in the world right now. They, every year, billions of chips are using the ARM design in this post PC

00:47:56 era. Was it one of the early risk adopters of the risk idea? Yeah. The first ARM goes back,

00:48:03 I don’t know, 86 or so. So Berkeley instead did their work in the early 80s. The ARM guys needed

00:48:09 an instruction set and they read our papers and it heavily influenced them. So getting back to my

00:48:17 story, what about Internet of Things? Well, software is not shipped in Internet of Things. It’s the

00:48:22 embedded device people control that software stack. So the opportunities for risk five,

00:48:29 everybody thinks, is in the Internet of Things embedded things because there’s no dominant

00:48:34 player like there is in the cloud or the smartphones. And, you know, it’s, it’s,

00:48:41 doesn’t have a lot of licenses associated with, and you can enhance the instruction set if you want.

00:48:46 And it’s, and people have looked at instruction sets and think it’s a very good instruction set.

00:48:52 So it appears to be very popular there. It’s possible that in the cloud people,

00:48:59 those companies control their software stacks. So it’s possible that they would decide to use

00:49:05 risk five if we’re talking about 10 and 20 years in the future. The one that would be harder would

00:49:10 be the cell phones. Since people ship software in the ARM instruction set that you’d think be

00:49:16 the more difficult one. But if risk five really catches on and, you know, you could,

00:49:20 in a period of a decade, you can imagine that’s changing over too. Do you have a sense why risk

00:49:25 five or ARM has dominated? You mentioned these three categories. Why has, why did ARM dominate,

00:49:31 why does it dominate the mobile device space? And maybe my naive intuition is that there are some

00:49:38 aspects of power efficiency that are important that somehow come along with risk. Well, part of it is

00:49:44 for these old CIS construction sets, like in the x86, it was more expensive to these for, you know,

00:49:55 they’re older, so they have disadvantages in them because they were designed 40 years ago. But also

00:50:01 they have to translate in hardware from CIS constructions to risk constructions on the fly.

00:50:06 And that costs both silicon area that the chips are bigger to be able to do that.

00:50:12 And it uses more power. So ARM has, which has, you know, followed this risk philosophy is

00:50:18 seen to be much more energy efficient. And in today’s computer world, both in the cloud

00:50:23 and the cell phone and, you know, things, it isn’t, the limiting resource isn’t the number of

00:50:29 transistors you can fit in the chip. It’s what, how much power can you dissipate for your

00:50:34 application? So by having a reduced instruction set, that’s possible to have a simpler hardware,

00:50:42 which is more energy efficient. And energy efficiency is incredibly important in the cloud.

00:50:46 When you have tens of thousands of computers in a data center, you want to have the most energy

00:50:51 efficient ones there as well. And of course, for embedded things running off of batteries,

00:50:54 you want those to be energy efficient and the cell phones too. So I think it’s believed that

00:51:00 there’s a energy disadvantage of using these more complex instruction set architectures.

00:51:08 So the other aspect of this is if we look at Apple, Qualcomm, Samsung, Huawei, all use the

00:51:14 ARM architecture, and yet the performance of the systems varies. I mean, I don’t know

00:51:20 whose opinion you take on, but you know, Apple for some reason seems to perform better in terms of

00:51:26 these implementation, these architectures. So where’s the magic and show the picture.

00:51:30 How’s that happen? Yeah. So what ARM pioneered was a new business model. As they said, well,

00:51:35 here’s our proprietary instruction set, and we’ll give you two ways to do it.

00:51:41 We’ll give you one of these implementations written in things like C called Verilog,

00:51:46 and you can just use ours. Well, you have to pay money for that. Not only you pay,

00:51:51 we’ll give you their, you know, we’ll license you to do that, or you could design your own. And so

00:51:57 we’re talking about numbers like tens of millions of dollars to have the right to design your own,

00:52:02 since they, it’s the instruction set belongs to them. So Apple got one of those, the right to

00:52:08 build their own. Most of the other people who build like Android phones just get one of the designs

00:52:15 from ARM to do it themselves. So Apple developed a really good microprocessor design team. They,

00:52:24 you know, acquired a very good team that had, was building other microprocessors and brought them

00:52:30 into the company to build their designs. So the instruction sets are the same, the specifications

00:52:35 are the same, but their hardware design is much more efficient than I think everybody else’s.

00:52:40 And that’s given Apple an advantage in the marketplace in that the iPhones tend to be the

00:52:49 faster than most everybody else’s phones that are there. It’d be nice to be able to jump around and

00:52:55 kind of explore different little sides of this, but let me ask one sort of romanticized question.

00:53:01 What to you is the most beautiful aspect or idea of RISC instruction set?

00:53:07 Most beautiful aspect or idea of RISC instruction set or instruction sets or this work that you’ve

00:53:13 done? You know, I’m, you know, I was always attracted to the idea of, you know, small is

00:53:20 beautiful, right? Is that the temptation in engineering, it’s kind of easy to make things

00:53:26 more complicated. It’s harder to come up with a, it’s more difficult, surprisingly, to come up with

00:53:32 a simple, elegant solution. And I think that there’s a bunch of small features of RISC in general

00:53:39 that, you know, where you can see this examples of keeping it simpler makes it more elegant.

00:53:45 Specifically in RISC 5, which, you know, I was kind of the mentor in the program, but it was

00:53:50 really driven by Krzysztof Sanović and two grad students, Andrew Waterman and Yen Tsip Li, is they

00:53:56 hit upon this idea of having a subset of instructions, a nice, simple subset instructions,

00:54:05 like 40ish instructions that all software, the software staff RISC 5 can run just on those 40

00:54:12 instructions. And then they provide optional features that could accelerate the performance

00:54:20 instructions that if you needed them could be very helpful, but you don’t need to have them.

00:54:24 And that’s a new, really a new idea. So RISC 5 has right now maybe five optional subsets that

00:54:31 you can pull in, but the software runs without them. If you just want to build the, just the core

00:54:37 40 instructions, that’s fine. You can do that. So this is fantastic educationally is you can

00:54:43 explain computers. You only have to explain 40 instructions and not thousands of them. Also,

00:54:48 if you invent some wild and crazy new technology like, you know, biological computing, you’d like

00:54:56 a nice, simple instruction set and you can, RISC 5, if you implement those core instructions, you

00:55:02 can run, you know, really interesting programs on top of that. So this idea of a core set of

00:55:07 instructions that the software stack runs on and then optional features that if you turn them on,

00:55:13 the compilers were used, but you don’t have to, I think is a powerful idea. What’s happened in

00:55:18 the past for the proprietary instruction sets is when they add new instructions, it becomes

00:55:25 required piece. And so that all microprocessors in the future have to use those instructions. So

00:55:33 it’s kind of like, for a lot of people as they get older, they gain weight, right? That weight and

00:55:39 age are correlated. And so you can see these instruction sets getting bigger and bigger as

00:55:44 they get older. So RISC 5, you know, lets you be as slim as you as a teenager. And you only have to

00:55:50 add these extra features if you’re really going to use them rather than you have no choice. You have

00:55:55 to keep growing with the instruction set. I don’t know if the analogy holds up, but that’s a beautiful

00:56:00 notion that there’s, it’s almost like a nudge towards here’s the simple core. That’s the

00:56:06 essential. Yeah. And I think the surprising thing is still if we brought back, you know,

00:56:12 the pioneers from the 1950s and showed them the instruction set architectures, they’d understand

00:56:16 it. They’d say, wow, that doesn’t look that different. Well, you know, I’m surprised. And

00:56:21 it’s, there’s, it may be something, you know, to talk about philosophical things. I mean, there may

00:56:26 be something powerful about those, you know, 40 or 50 instructions that all you need is these

00:56:35 commands like these instructions that we talked about. And that is sufficient to build, to bring

00:56:42 up on, you know, artificial intelligence. And so it’s a remarkable, surprising to me that as

00:56:50 complicated as it is to build these things, you know, microprocessors where the line widths are

00:56:58 are narrower than the wavelength of light, you know, is this amazing technologies at some

00:57:05 fundamental level. The commands that software executes are really pretty straightforward and

00:57:10 haven’t changed that much in decades. What a surprising outcome. So underlying all computation,

00:57:18 all Turing machines, all artificial intelligence systems, perhaps might be a very simple instruction

00:57:23 set like a RISC5 or it’s. Yeah. I mean, that’s kind of what I said. I was interested to see,

00:57:30 I had another more senior faculty colleague and he had written something in Scientific American

00:57:36 and, you know, his 25 years in the future and his turned out about when I was a young professor and

00:57:43 he said, yep, I checked it. And so I was interested to see how that was going to turn out for me. And

00:57:48 it’s pretty held up pretty well, but yeah, so there’s, there’s probably, there’s some, you know,

00:57:54 there’s, there must be something fundamental about those instructions that we’re capable of

00:58:01 creating, you know, intelligence from pretty primitive operations and just doing them really

00:58:07 fast. You kind of mentioned a different, maybe radical computational medium like biological,

00:58:14 and there’s other ideas. So there’s a lot of spaces in ASIC, domain specific, and then there

00:58:20 could be quantum computers. And so we can think of all of those different mediums and types of

00:58:25 computation. What’s the connection between swapping out different hardware systems and the

00:58:33 instruction set? Do you see those as disjoint or are they fundamentally coupled? Yeah. So what’s,

00:58:37 so kind of, if we go back to the history, you know, when Moore’s Law is in full effect and

00:58:45 you’re getting twice as many transistors every couple of years, you know, kind of the challenge

00:58:51 for computer designers is how can we take advantage of that? How can we turn those transistors into

00:58:56 better computers faster typically? And so there was an era, I guess in the 80s and 90s where

00:59:04 computers were doubling performance every 18 months. And if you weren’t around then,

00:59:11 what would happen is you had your computer and your friend’s computer, which was like a year,

00:59:18 a year and a half newer, and it was much faster than your computer. And he or she could get their

00:59:23 work done much faster than your computer because it was newer. So people took their computers,

00:59:27 perfectly good computers, and threw them away to buy a newer computer because the computer

00:59:33 one or two years later was so much faster. So that’s what the world was like in the 80s and

00:59:39 90s. Well, with the slowing down of Moore’s Law, that’s no longer true, right? Now with, you know,

00:59:46 not desk side computers with the laptops, I only get a new desk laptop when it breaks,

00:59:51 right? Oh damn, the disk broke or this display broke, I gotta buy a new computer. But before

00:59:56 you would throw them away because it just, they were just so sluggish compared to the latest

01:00:01 computers. So that’s, you know, that’s a huge change of what’s gone on. So, but since this

01:00:11 lasted for decades, kind of programmers and maybe all of society is used to computers getting faster

01:00:18 regularly. We now believe, those of us who are in computer design, it’s called computer

01:00:24 architecture, that the path forward is instead is to add accelerators that only work well for

01:00:33 certain applications. So since Moore’s Law is slowing down, we don’t think general purpose

01:00:41 computers are going to get a lot faster. So the Intel processors of the world are not going to,

01:00:46 haven’t been getting a lot faster. They’ve been barely improving, like a few percent a year.

01:00:51 It used to be doubling every 18 months and now it’s doubling every 20 years. So it was just

01:00:56 shocking. So to be able to deliver on what Moore’s Law used to do, we think what’s going to happen,

01:01:02 what is happening right now is people adding accelerators to their microprocessors that only

01:01:09 work well for some domains. And by sheer coincidence, at the same time that this is happening,

01:01:17 has been this revolution in artificial intelligence called machine learning. So with,

01:01:23 as I’m sure your other guests have said, you know, AI had these two competing schools of thought is

01:01:31 that we could figure out artificial intelligence by just writing the rules top down, or that was

01:01:36 wrong. You had to look at data and infer what the rules are, the machine learning, and what’s

01:01:41 happened in the last decade or eight years as machine learning has won. And it turns out that

01:01:48 machine learning, the hardware you build for machine learning is pretty much multiply. The

01:01:55 matrix multiply is a key feature for the way machine learning is done. So that’s a godsend

01:02:03 for computer designers. We know how to make matrix multiply run really fast. So general purpose

01:02:08 microprocessors are slowing down. We’re adding accelerators for machine learning that fundamentally

01:02:13 are doing matrix multiplies much more efficiently than general purpose computers have done.

01:02:17 So we have to come up with a new way to accelerate things. The danger of only accelerating one

01:02:23 application is how important is that application. Turns out machine learning gets used for all

01:02:28 kinds of things. So serendipitously, we found something to accelerate that’s widely applicable.

01:02:36 And we don’t even, we’re in the middle of this revolution of machine learning. We’re not sure

01:02:40 what the limits of machine learning are. So this has been a kind of a godsend. If you’re going to

01:02:46 be able to excel, deliver on improved performance, as long as people are moving their programs to be

01:02:54 embracing more machine learning, we know how to give them more performance even as Moore’s law

01:02:58 is slowing down. And counterintuitively, the machine learning mechanism you can say is domain

01:03:06 specific, but because it’s leveraging data, it’s actually could be very broad in terms of

01:03:15 in terms of the domains it could be applied in. Yeah, that’s exactly right. Sort of, it’s almost

01:03:21 sort of people sometimes talk about the idea of software 2.0. We’re almost taking another step

01:03:27 up in the abstraction layer in designing machine learning systems, because now you’re programming

01:03:34 in the space of data, in the space of hyperparameters, it’s changing fundamentally

01:03:38 the nature of programming. And so the specialized devices that accelerate the performance, especially

01:03:45 neural network based machine learning systems might become the new general. Yeah. So the thing

01:03:52 that’s interesting point out these are not coral, these are not tied together. The enthusiasm about

01:03:59 machine learning about creating programs driven from data that we should figure out the answers

01:04:05 from data rather than kind of top down, which classically the way most programming is done

01:04:10 and the way artificial intelligence used to be done. That’s a movement that’s going on at the

01:04:14 same time. Coincidentally, and the first word machine learning is machines, right? So that’s

01:04:21 going to increase the demand for computing, because instead of programmers being smart, writing those

01:04:27 those things down, we’re going to instead use computers to examine a lot of data to kind of

01:04:31 create the programs. That’s the idea. And remarkably, this gets used for all kinds of

01:04:38 things very successfully. The image recognition, the language translation, the game playing,

01:04:43 and you know, it gets into pieces of the software stack like databases and stuff like that. We’re

01:04:50 not quite sure how general purpose is, but that’s going on independent of this hardware stuff.

01:04:54 What’s happening on the hardware side is Moore’s law is slowing down right when we need a lot more

01:04:59 cycles. It’s failing us, it’s failing us right when we need it because there’s going to be a

01:05:03 greater increase in computing. And then this idea that we’re going to do so called domain

01:05:09 specific. Here’s a domain that your greatest fear is you’ll make this one thing work and that’ll

01:05:16 help, you know, five percent of the people in the world. Well, this looks like it’s a very

01:05:22 general purpose thing. So the timing is fortuitous that if we can perhaps, if we can keep building

01:05:28 hardware that will accelerate machine learning, the neural networks, that’ll beat the timing will

01:05:36 be right. That neural network revolution will transform your software, the so called software

01:05:42 2.0. And the software of the future will be very different from the software of the past. And just

01:05:47 as our microprocessors, even though we’re still going to have that same basic RISC instructions

01:05:53 to run a big pieces of the software stack like user interfaces and stuff like that,

01:05:58 we can accelerate the kind of the small piece that’s computationally intensive. It’s not lots

01:06:02 of lines of code, but it takes a lot of cycles to run that code that that’s going to be the

01:06:08 accelerator piece. And so that’s what makes this from a computer designers perspective a really

01:06:14 interesting decade. What Hennessy and I talked about in the title of our Turing Warrant speech

01:06:20 is a new golden age. We see this as a very exciting decade, much like when we were assistant

01:06:28 professors and the RISC stuff was going on. That was a very exciting time was where we were changing

01:06:32 what was going on. We see this happening again. Tremendous opportunities of people because we’re

01:06:39 fundamentally changing how software is built and how we’re running it. So which layer of the

01:06:43 abstraction do you think most of the acceleration might be happening? If you look in the next 10

01:06:49 years, Google is working on a lot of exciting stuff with the TPU. Sort of there’s a closer to

01:06:54 the hardware that could be optimizations around the closer to the instruction set.

01:07:00 There could be optimization at the compiler level. It could be even at the higher level software

01:07:05 stack. Yeah, it’s got to be, I mean, if you think about the old RISC Sys debate, it was both,

01:07:11 it was software hardware. It was the compilers improving as well as the architecture improving.

01:07:18 And that’s likely to be the way things are now. With machine learning, they’re using

01:07:24 domain specific languages. The languages like TensorFlow and PyTorch are very popular with

01:07:30 the machine learning people. Those are the raising the level of abstraction. It’s easier

01:07:35 for people to write machine learning in these domain specific languages like PyTorch and

01:07:41 TensorFlow. So where the most optimization might be happening. Yeah. And so there’ll be both the

01:07:47 compiler piece and the hardware piece underneath it. So as you kind of the fatal flaw for hardware

01:07:53 people is to create really great hardware, but not have brought along the compilers. And what we’re

01:07:59 seeing right now in the marketplace because of this enthusiasm around hardware for machine

01:08:04 learning is getting, you know, probably billions of dollars invested in startup companies. We’re

01:08:10 seeing startup companies go belly up because they focus on the hardware, but didn’t bring the

01:08:15 software stack along. We talked about benchmarks earlier. So I participated in machine learning

01:08:23 didn’t really have a set of benchmarks. I think just two years ago, they didn’t have a set of

01:08:27 benchmarks. And we’ve created something called ML Perf, which is machine learning benchmark suite.

01:08:33 And pretty much the companies who didn’t invest in the software stack couldn’t run ML Perf very

01:08:39 well. And the ones who did invest in software stack did. And we’re seeing, you know, like kind

01:08:45 of in computer architecture, this is what happens. You have these arguments about risk versus this.

01:08:48 People spend billions of dollars in the marketplace to see who wins. It’s not a perfect comparison,

01:08:54 but it kind of sorts things out. And we’re seeing companies go out of business and then companies

01:08:59 like there’s a company in Israel called Habana. They came up with machine learning accelerators.

01:09:08 They had good ML Perf scores. Intel had acquired a company earlier called Nirvana a couple of years

01:09:14 ago. They didn’t reveal their ML Perf scores, which was suspicious. But a month ago, Intel

01:09:21 announced that they’re canceling the Nirvana product line and they’ve bought Habana for $2

01:09:25 billion. And Intel’s going to be shipping Habana chips, which have hardware and software and run

01:09:32 the ML Perf programs pretty well. And that’s going to be their product line in the future.

01:09:36 Brilliant. So maybe just to linger briefly on ML Perf. I love metrics. I love standards that

01:09:42 everyone can gather around. What are some interesting aspects of that portfolio of metrics?

01:09:48 Well, one of the interesting metrics is what we thought. I was involved in the start.

01:09:57 Peter Mattson is leading the effort from Google. Google got it off the ground,

01:10:00 but we had to reach out to competitors and say, there’s no benchmarks here. We think this is

01:10:07 bad for the field. It’ll be much better if we look at examples like in the risk days,

01:10:11 there was an effort to create a… For the people in the risk community got together,

01:10:16 competitors got together building risk microprocessors to agree on a set of

01:10:19 benchmarks that were called spec. And that was good for the industry. It’s rather before

01:10:24 the different risk architectures were arguing, well, you can believe my performance others,

01:10:28 but those other guys are liars. And that didn’t do any good. So we agreed on a set of benchmarks

01:10:34 and then we could figure out who was faster between the various risk architectures. But

01:10:37 it was a little bit faster, but that grew the market rather than people were afraid to buy

01:10:42 anything. So we argued the same thing would happen with MLPerf. Companies like Nvidia were maybe

01:10:48 worried that it was some kind of trap, but eventually we all got together to create a

01:10:53 set of benchmarks and do the right thing. And we agree on the results. And so we can see whether

01:11:00 TPUs or GPUs or CPUs are really faster and how much the faster. And I think from an engineer’s

01:11:06 perspective, as long as the results are fair, you can live with it. Okay, you kind of tip your hat

01:11:12 to your colleagues at another institution, boy, they did a better job than us. What you hate is

01:11:18 if it’s false, right? They’re making claims and it’s just marketing bullshit and that’s affecting

01:11:23 sales. So from an engineer’s perspective, as long as it’s a fair comparison and we don’t come in

01:11:28 first place, that’s too bad, but it’s fair. So we wanted to create that environment for MLPerf.

01:11:33 And so now there’s 10 companies, I mean, 10 universities and 50 companies involved. So pretty

01:11:40 much MLPerf is the way you measure machine learning performance. And it didn’t exist even

01:11:50 two years ago. One of the cool things that I enjoy about the internet has a few downsides, but one of

01:11:56 the nice things is people can see through BS a little better with the presence of these kinds

01:12:02 of metrics. So it’s really nice companies like Google and Facebook and Twitter. Now, it’s the

01:12:08 cool thing to do is to put your engineers forward and to actually show off how well you do on these

01:12:13 metrics. There’s less of a desire to do marketing, less so. In my sort of naive viewpoint.

01:12:22 I was trying to understand what’s changed from the 80s in this era. I think because of things

01:12:29 like social networking, Twitter and stuff like that, if you put up bullshit stuff that’s just

01:12:38 purposely misleading, you can get a violent reaction in social media pointing out the flaws

01:12:45 in your arguments. And so from a marketing perspective, you have to be careful today that

01:12:51 you didn’t have to be careful that there’ll be people who put out the flaw. You can get the

01:12:57 word out about the flaws in what you’re saying much more easily today than in the past. It used

01:13:02 to be easier to get away with it. And the other thing that’s been happening in terms of showing

01:13:08 off engineers is just in the software side, people have largely embraced open source software.

01:13:16 20 years ago, it was a dirty word at Microsoft. And today Microsoft is one of the big proponents

01:13:22 of open source software. That’s the standard way most software gets built, which really shows off

01:13:28 your engineers because you can see if you look at the source code, you can see who are making the

01:13:34 commits, who’s making the improvements, who are the engineers at all these companies who are

01:13:41 really great programmers and engineers and making really solid contributions,

01:13:47 which enhances their reputations and the reputation of the companies.

01:13:50 LR But that’s, of course, not everywhere. Like in the space that I work more in is autonomous

01:13:56 vehicles. And there’s still the machinery of hype and marketing is still very strong there. And

01:14:02 there’s less willingness to be open in this kind of open source way and sort of benchmark. So

01:14:07 MLPerf represents the machine learning world is much better being open source about holding

01:14:12 itself to standards of different, the amount of incredible benchmarks in terms of the different

01:14:18 computer vision, natural language processing tasks is incredible.

01:14:23 LR Historically, it wasn’t always that way.

01:14:26 I had a graduate student working with me, David Martin. So in computer, in some fields,

01:14:32 benchmarking has been around forever. So computer architecture, databases, maybe operating systems,

01:14:40 benchmarks are the way you measure progress. But he was working with me and then started working

01:14:47 with Jitendra Malik. And Jitendra Malik in computer vision space, I guess you’ve interviewed

01:14:53 Jitendra. And David Martin told me, they don’t have benchmarks. Everybody has their own vision

01:14:59 algorithm and the way, here’s my image, look at how well I do. And everybody had their own image.

01:15:04 So David Martin, back when he did his dissertation, figured out a way to do benchmarks. He had a bunch

01:15:10 of graduate students identify images and then ran benchmarks to see which algorithms run well. And

01:15:17 that was, as far as I know, kind of the first time people did benchmarks in computer vision, which

01:15:24 was predated all the things that eventually led to ImageNet and stuff like that. But then the vision

01:15:29 community got religion. And then once we got as far as ImageNet, then that let the guys in Toronto

01:15:38 be able to win the ImageNet competition. And then that changed the whole world.

01:15:42 It’s a scary step actually, because when you enter the world of benchmarks, you actually have to be

01:15:47 good to participate as opposed to… Yeah, you can just, you just believe you’re the best in the

01:15:54 world. I think the people, I think they weren’t purposely misleading. I think if you don’t have

01:16:01 benchmarks, I mean, how do you know? Your intuition is kind of like the way we did just

01:16:06 do computer architecture. Your intuition is that this is the right instruction set to do this job.

01:16:11 I believe in my experience, my hunch is that’s true. We had to get to make things more quantitative

01:16:18 to make progress. And so I just don’t know how, you know, in fields that don’t have benchmarks,

01:16:23 I don’t understand how they figure out how they’re making progress.

01:16:28 We’re kind of in the vacuum tube days of quantum computing. What are your thoughts in this wholly

01:16:34 different kind of space of architectures? You know, I actually, you know, quantum computing

01:16:41 is, idea has been around for a while and I actually thought, well, I sure hope I retire

01:16:46 before I have to start teaching this. I’d say because I talk about, give these talks about the

01:16:53 slowing of Moore’s law and, you know, when we need to change by doing domain specific accelerators,

01:17:01 common questions say, what about quantum computing? The reason that comes up,

01:17:04 it’s in the news all the time. So I think to keep in, the third thing to keep in mind is

01:17:08 quantum computing is not right around the corner. There’ve been two national reports,

01:17:14 one by the National Academy of Engineering and other by the Computing Consortium, where they

01:17:18 did a frank assessment of quantum computing. And both of those reports said, you know,

01:17:25 as far as we can tell, before you get error corrected quantum computing, it’s a decade away.

01:17:31 So I think of it like nuclear fusion, right? There’ve been people who’ve been excited about

01:17:35 nuclear fusion a long time. If we ever get nuclear fusion, it’s going to be fantastic

01:17:39 for the world. I’m glad people are working on it, but, you know, it’s not right around the corner.

01:17:45 Those two reports to me say probably it’ll be 2030 before quantum computing is something

01:17:52 that could happen. And when it does happen, you know, this is going to be big science stuff. This

01:17:58 is, you know, micro Kelvin, almost absolute zero things that if they vibrate, if truck goes by,

01:18:04 it won’t work, right? So this will be in data center stuff. We’re not going to have a quantum

01:18:09 cell phone. And it’s probably a 2030 kind of thing. So I’m happy that our people are working on it,

01:18:16 but just, you know, it’s hard with all the news about it, not to think that it’s right around the

01:18:21 corner. And that’s why we need to do something as Moore’s Law is slowing down to provide the

01:18:27 computing, keep computing getting better for this next decade. And, you know, we shouldn’t

01:18:32 be betting on quantum computing or expecting quantum computing to deliver in the next few

01:18:39 years. It’s probably further off. You know, I’d be happy to be wrong. It’d be great if quantum

01:18:44 computing is going to commercially viable, but it will be a set of applications. It’s not a general

01:18:49 purpose computation. So it’s going to do some amazing things, but there’ll be a lot of things

01:18:54 that probably, you know, the old fashioned computers are going to keep doing better for

01:18:59 quite a while. And there’ll be a teenager 50 years from now watching this video saying,

01:19:04 look how silly David Patterson was saying. No, I just said, I said 2030. I didn’t say,

01:19:09 I didn’t say never. We’re not going to have quantum cell phones. So he’s going to be watching it.

01:19:14 Well, I mean, I think this is such a, you know, given that we’ve had Moore’s Law, I just, I feel

01:19:21 comfortable trying to do projects that are thinking about the next decade. I admire people who are

01:19:27 trying to do things that are 30 years out, but it’s such a fast moving field. I just don’t know

01:19:32 how to, I’m not good enough to figure out what’s the problem is going to be in 30 years. You know,

01:19:38 10 years is hard enough for me. So maybe if it’s possible to untangle your intuition a little bit,

01:19:44 I spoke with Jim Keller. I don’t know if you’re familiar with Jim. And he is trying to sort of

01:19:50 be a little bit rebellious and to try to think that he quotes me as being wrong. Yeah. So this,

01:19:57 this is what you’re doing for the record. Jim talks about that. He has an intuition that Moore’s

01:20:04 Law is not in fact, in fact dead yet. And then it may continue for some time to come.

01:20:10 What are your thoughts about Jim’s ideas in this space? Yeah, this is just, this is just marketing.

01:20:16 So what Gordon Moore said is a quantitative prediction. We can check the facts, right? Which

01:20:22 is doubling the number of transistors every two years. So we can look back at Intel for the last

01:20:29 five years and ask him, let’s look at DRAM chips six years ago. So that would be three, two year

01:20:38 periods. So then our DRAM chips have eight times as many transistors as they did six years ago.

01:20:44 We can look up Intel microprocessors six years ago. If Moore’s Law is continuing, it should have

01:20:50 eight times as many transistors as six years ago. The answer in both those cases is no.

01:20:57 The problem has been because Moore’s Law was kind of genuinely embraced by the semiconductor

01:21:05 industry as they would make investments in similar equipment to make Moore’s Law come true.

01:21:10 Semiconductor improving and Moore’s Law in many people’s minds are the same thing. So when I say,

01:21:17 and I’m factually correct, that Moore’s Law is no longer holds, we are not doubling transistors

01:21:24 every year’s years. The downside for a company like Intel is people think that means it’s stopped,

01:21:31 that technology has no longer improved. And so Jim is trying to,

01:21:36 counteract the impression that semiconductors are frozen in 2019 are never going to get better.

01:21:46 So I never said that. All I said was Moore’s Law is no more. And I’m strictly looking at the number

01:21:53 of transistors. That’s what Moore’s Law is. There’s the, I don’t know, there’s been this aura

01:22:01 associated with Moore’s Law that they’ve enjoyed for 50 years about, look at the field we’re in,

01:22:07 we’re doubling transistors every two years. What an amazing field, which is an amazing thing that

01:22:12 they were able to pull off. But even as Gordon Moore said, you know, no exponential can last

01:22:16 forever. It lasted for 50 years, which is amazing. And this is a huge impact on the industry because

01:22:22 of these changes that we’ve been talking about. So he claims, and I’m not going to go into the

01:22:28 that we’ve been talking about. So he claims, because he’s trying to act on it, he claims,

01:22:33 you know, Patterson says Moore’s Law is no more and look at all, look at it, it’s still going.

01:22:38 And TSMC, they say it’s no longer, but there’s quantitative evidence that Moore’s Law is not

01:22:44 continuing. So what I say now to try and, okay, I understand the perception problem when I say

01:22:51 Moore’s Law has stopped. Okay. So now I say Moore’s Law is slowing down. And I think Jim, which is

01:22:58 another way of, if he’s, if it’s predicting every two years and I say it’s slowing down, then that’s

01:23:03 another way of saying it doesn’t hold anymore. And, and I think Jim wouldn’t disagree that it’s

01:23:09 slowing down because that sounds like it’s, things are still getting better and just not as fast,

01:23:14 which is another way of saying Moore’s Law isn’t working anymore.

01:23:18 TG. It’s still good for marketing. But what’s your, you’re not,

01:23:22 you don’t like expanding the definition of Moore’s Law, sort of naturally.

01:23:27 CM. Well, as an educator, you know, is this like modern politics? Does everybody get their own facts?

01:23:34 Or do we have, you know, Moore’s Law was a crisp, you know, it was Carver Mead looked at his

01:23:41 Moore’s Conversations drawing on a log log scale, a straight line. And that’s what the definition of

01:23:47 Moore’s Law is. There’s this other, what Intel did for a while, interestingly, before Jim joined

01:23:54 them, they said, oh, no, Moore’s Law isn’t the number of doubling, isn’t really doubling

01:23:58 transistors every two years. Moore’s Law is the cost of the individual transistor going down,

01:24:04 cutting in half every two years. Now, that’s not what he said, but they reinterpreted it

01:24:10 because they believed that the cost of transistors was continuing to drop,

01:24:15 even if they couldn’t get twice as many chips. Many people in industry have told me that’s not

01:24:20 true anymore, that basically in more recent technologies, they got more complicated,

01:24:26 the actual cost of transistor went up. So even the, a corollary might not be true,

01:24:32 but certainly, you know, Moore’s Law, that was the beauty of Moore’s Law. It was a very simple,

01:24:38 it’s like E equals MC squared, right? It was like, wow, what an amazing prediction. It’s so easy

01:24:44 to understand, the implications are amazing, and that’s why it was so famous as a prediction.

01:24:50 And this reinterpretation of what it meant and changing is, you know, is revisionist history.

01:24:56 And I’d be happy, and they’re not claiming there’s a new Moore’s Law. They’re not saying,

01:25:04 by the way, instead of every two years, it’s every three years. I don’t think they want to

01:25:10 say that. I think what’s going to happen is new technology generations, each one is going to get

01:25:14 a little bit slower. So it is slowing down, the improvements won’t be as great, and that’s why we

01:25:21 need to do new things. Yeah, I don’t like that the idea of Moore’s Law is tied up with marketing.

01:25:28 It would be nice if… Whether it’s marketing or it’s, well, it could be affecting business,

01:25:34 but it could also be affecting the imagination of engineers. If Intel employees actually believe

01:25:40 that we’re frozen in 2019, well, that would be bad for Intel. Not just Intel, but everybody.

01:25:49 Moore’s Law is inspiring to everybody. But what’s happening right now, talking to people

01:25:57 who have working in national offices and stuff like that, a lot of the computer science community

01:26:02 is unaware that this is going on, that we are in an era that’s going to need radical change at lower

01:26:09 levels that could affect the whole software stack. If you’re using cloud stuff and the

01:26:18 servers that you get next year are basically only a little bit faster than the servers you got this

01:26:23 year, you need to know that, and we need to start innovating to start delivering on it. If you’re

01:26:30 counting on your software going to have a lot more features, assuming the computers are going to get

01:26:34 faster, that’s not true. So are you going to have to start making your software stack more efficient?

01:26:38 Are you going to have to start learning about machine learning? So it’s a warning or call

01:26:45 for arms that the world is changing right now. And a lot of computer science PhDs are unaware

01:26:51 of that. So a way to try and get their attention is to say that Moore’s Law is slowing down and

01:26:56 that’s going to affect your assumptions. And we’re trying to get the word out. And when companies

01:27:02 like TSMC and Intel say, oh, no, no, no, Moore’s Law is fine, then people think, oh, hey, I don’t

01:27:08 have to change my behavior. I’ll just get the next servers. And if they start doing measurements,

01:27:13 they’ll realize what’s going on. It’d be nice to have some transparency on metrics for the lay

01:27:18 person to be able to know if computers are getting faster and not to forget Moore’s Law.

01:27:24 Yeah. There are a bunch of, most people kind of use clock rate as a measure of performance.

01:27:31 It’s not a perfect one, but if you’ve noticed clock rates are more or less the same as they were

01:27:37 five years ago, computers are a little better than they are. They haven’t made zero progress,

01:27:42 but they’ve made small progress. So there’s some indications out there. And then our behavior,

01:27:47 right? Nobody buys the next laptop because it’s so much faster than the laptop from the past.

01:27:52 For cell phones, I think, I don’t know why people buy new cell phones, you know, because

01:28:00 the new ones announced. The cameras are better, but that’s kind of domain specific, right? They’re

01:28:04 putting special purpose hardware to make the processing of images go much better. So that’s

01:28:10 the way they’re doing it. They’re not particularly, it’s not that the ARM processor in there is twice

01:28:15 as fast as much as they’ve added accelerators to help the experience of the phone. Can we talk a

01:28:22 little bit about one other exciting space, arguably the same level of impact as your work with RISC

01:28:30 is RAID. In 1988, you coauthored a paper, A Case for Redundant Arrays of Inexpensive Disks, hence

01:28:41 RAID RAID. So that’s where you introduced the idea of RAID. Incredible that that little,

01:28:49 I mean little, that paper kind of had this ripple effect and had a really a revolutionary effect.

01:28:55 So first, what is RAID? What is RAID? So this is work I did with my colleague Randy Katz and

01:29:01 a star graduate student, Garth Gibson. So we had just done the fourth generation RISC project

01:29:08 and Randy Katz, which had an early Apple Macintosh computer. At this time, everything was done with

01:29:17 floppy disks, which are old technologies that could store things that didn’t have much capacity

01:29:26 and you had to get any work done, you’re always sticking your little floppy disk in and out because

01:29:31 they didn’t have much capacity. But they started building what are called hard disk drives, which

01:29:36 is magnetic material that can remember information storage for the Mac. And Randy asked the question

01:29:44 when he saw this disk next to his Mac, gee, these are brand new small things. Before that,

01:29:51 for the big computers, the disk would be the size of washing machines. And here’s something

01:29:57 the size of a, kind of the size of a book or so. He says, I wonder what we could do with that? Well,

01:30:02 Randy was involved in the fourth generation RISC project here at Berkeley in the 80s. So we figured

01:30:11 out a way how to make the computation part, the processor part go a lot faster, but what about

01:30:15 the storage part? Can we do something to make it faster? So we hit upon the idea of taking a lot of

01:30:22 these disks developed for personal computers and Macintoshes and putting many of them together

01:30:27 instead of one of these washing machine size things. And so we wrote the first draft of the

01:30:32 paper and we’d have 40 of these little PC disks instead of one of these washing machine size

01:30:38 things. And they would be much cheaper because they’re made for PCs and they could actually kind

01:30:42 of be faster because there was 40 of them rather than one of them. And so we wrote a paper like

01:30:47 that and sent it to one of our former Berkeley students at IBM. And he said, well, this is all

01:30:51 great and good, but what about the reliability of these things? Now you have 40 of these things

01:30:56 and 40 of these devices, each of which are kind of PC quality. So they’re not as good as these

01:31:03 IBM washing machines. IBM dominated the storage businesses. So the reliability is going to be

01:31:10 awful. And so when we calculated it out, instead of it breaking on average once a year, it would

01:31:16 break every two weeks. So we thought about the idea and said, well, we got to address the

01:31:22 reliability. So we did it originally performance, but we had to do reliability. So the name

01:31:27 redundant array of inexpensive disks is array of these disks inexpensive like for PCs, but we have

01:31:33 extra copies. So if one breaks, we won’t lose all the information. We’ll have enough redundancy that

01:31:40 we could let some break and we can still preserve the information. So the name is an array of

01:31:44 inexpensive disks. This is a collection of these PCs and the R part of the name was the redundancy

01:31:51 so they’d be reliable. And it turns out if you put a modest number of extra disks in one of

01:31:55 these arrays, it could actually not only be as faster and cheaper than one of these washing

01:32:00 machine disks, it could be actually more reliable because you could have a couple of breaks even

01:32:05 with these cheap disks. Whereas one failure with the washing machine thing would knock it out.

01:32:10 Did you have a sense just like with risk that in the 30 years that followed,

01:32:17 RAID would take over as a mechanism for storage? I think I’m naturally an optimist,

01:32:27 but I thought our ideas were right. I thought kind of like Moore’s law, it seemed to me,

01:32:33 if you looked at the history of the disk drives, they went from washing machine size things and

01:32:38 they were getting smaller and smaller and the volumes were with the smaller disk drives because

01:32:43 that’s where the PCs were. So we thought that was a technological trend that the volume of disk

01:32:51 drives was going to be getting smaller and smaller devices, which were true. They were the size of,

01:32:56 I don’t know, eight inches diameter, then five inches, then three inches in diameters.

01:33:01 And so that it made sense to figure out how to deal things with an array of disks. So I think

01:33:06 it was one of those things where logically, we think the technological forces were on our side,

01:33:13 that it made sense. So we expected it to catch on, but there was that same kind of business question.

01:33:19 IBM was the big pusher of these disk drives in the real world where the technical advantage

01:33:25 get turned into a business advantage or not. It proved to be true. And so we thought we were

01:33:32 sound technically and it was unclear whether the business side, but we kind of, as academics,

01:33:38 we believe that technology should win and it did. And if you look at those 30 years,

01:33:44 just from your perspective, are there interesting developments in the space of storage

01:33:48 that have happened in that time? Yeah. The big thing that happened, well, a couple of things

01:33:53 that happened, what we did had a modest amount of storage. So as redundancy, as people built bigger

01:34:00 and bigger storage systems, they’ve added more redundancy so they could add more failures. And

01:34:05 the biggest thing that happened in storage is for decades, it was based on things physically spinning

01:34:14 called hard disk drives where you used to turn on your computer and it would make a noise.

01:34:18 What that noise was, was the disk drives spinning and they were rotating at like 60 revolutions per

01:34:25 second. And it’s like, if you remember the vinyl records, if you’ve ever seen those,

01:34:31 that’s what it looked like. And there was like a needle like on a vinyl record that was reading it.

01:34:36 So the big drive change is switching that over to a semiconductor technology called flash.

01:34:41 So within the last, I’d say about decade is increasing fraction of all the computers in the

01:34:47 world are using semiconductor for storage, the flash drive, instead of being magnetic,

01:34:54 they’re optical, well, they’re a semiconductor writing of information very densely.

01:35:04 And that’s been a huge difference. So all the cell phones in the world use flash.

01:35:08 Most of the laptops use flash. All the embedded devices use flash instead of storage. Still in

01:35:13 the cloud, magnetic disks are more economical than flash, but they use both in the cloud.

01:35:20 So it’s been a huge change in the storage industry, the switching from primarily disk

01:35:26 to being primarily semiconductor. For the individual disk, but still the RAID mechanism

01:35:31 applies to those different kinds of disks. Yes. The people will still use RAID ideas

01:35:35 because it’s kind of what’s different, kind of interesting kind of psychologically,

01:35:41 if you think about it. People have always worried about the reliability of computing since the

01:35:46 earliest days. So kind of, but if we’re talking about computation, if your computer makes a

01:35:52 mistake and the computer says, the computer has ways to check and say, Oh, we screwed up.

01:35:59 We made a mistake. What happens is that program that was running, you have to redo it,

01:36:04 which is a hassle for storage. If you’ve sent important information away and it loses that

01:36:12 information, you go nuts. This is the worst. Oh my God. So if you have a laptop and you’re not

01:36:18 backing it up on the cloud or something like this, and your disk drive breaks, which it can do,

01:36:24 you’ll lose all that information and you just go crazy. So the importance of reliability

01:36:29 for storage is tremendously higher than the importance of reliability for computation

01:36:34 because of the consequences of it. So yes, so RAID ideas are still very popular, even with

01:36:39 the switch of the technology. Although flash drives are more reliable, if you’re not doing

01:36:45 anything like backing it up to get some redundancy so they handle it, you’re taking great risks.

01:36:53 You said that for you and possibly for many others, teaching and research don’t

01:36:58 conflict with each other as one might suspect. And in fact, they kind of complement each other. So

01:37:03 maybe a question I have is how has teaching helped you in your research or just in your

01:37:10 entirety as a person who both teaches and does research and just thinks and creates new ideas

01:37:17 in this world? Yes, I think what happens is when you’re a college student, you know there’s this

01:37:22 kind of tenure system in doing research. So kind of this model that is popular in America, I think

01:37:30 America really made it happen, is we can attract these really great faculty to research universities

01:37:36 because they get to do research as well as teach. And that, especially in fast moving fields,

01:37:40 this means people are up to date and they’re teaching those kinds of things. But when you run

01:37:44 into a really bad professor, a really bad teacher, I think the students think, well, this guy must be

01:37:50 a great researcher because why else could he be here? So after 40 years at Berkeley, we had a

01:37:57 retirement party and I got a chance to reflect and I looked back at some things. That is not my

01:38:02 experience. I saw a photograph of five of us in the department who won the Distinguished Teaching

01:38:09 Award from campus, a very high honor. I’ve got one of those, one of the highest honors. So there are

01:38:14 five of us on that picture. There’s Manuel Blum, Richard Karp, me, Randy Kass, and John Osterhaupt,

01:38:23 contemporaries of mine. I mentioned Randy already. All of us are in the National Academy of

01:38:27 Engineering. We’ve all run the Distinguished Teaching Award. Blum, Karp, and I all have

01:38:34 Turing Awards. The highest award in computing. So that’s the opposite. What’s happened is they’re

01:38:45 highly correlated. So the other way to think of it, if you’re very successful people or maybe

01:38:51 successful at everything they do, it’s not an either or. But it’s an interesting question

01:38:56 whether specifically, that’s probably true, but specifically for teaching, if there’s something

01:39:00 in teaching that, it’s the Richard Feynman idea, is there something about teaching that actually

01:39:06 makes your research, makes you think deeper and more outside the box and more insightful?

01:39:12 Absolutely. I was going to bring up Feynman. I mean, he criticized the Institute of Advanced

01:39:16 Studies. So the Institute of Advanced Studies was this thing that was created near Princeton

01:39:21 where Einstein and all these smart people went. And when he was invited, he thought it was a

01:39:26 terrible idea. This is a university. It was supposed to be heaven, right? A university

01:39:31 without any teaching. But he thought it was a mistake. It’s getting up in the classroom and

01:39:35 having to explain things to students and having them ask questions like, well, why is that true,

01:39:40 makes you stop and think. So he thought, and I agree, I think that interaction between a great

01:39:47 research university and having students with bright young minds asking hard questions the

01:39:52 whole time is synergistic. And a university without teaching wouldn’t be as vital and

01:40:00 exciting a place. And I think it helps stimulate the research. Another romanticized question,

01:40:07 but what’s your favorite concept or idea to teach? What inspires you or you see inspire the students?

01:40:15 Is there something that pops to mind or puts the fear of God in them? I don’t know,

01:40:19 whichever is most effective. I mean, in general, I think people are surprised.

01:40:25 I’ve seen a lot of people who don’t think they like teaching come give guest lectures or teach

01:40:31 a course and get hooked on seeing the lights turn on, right? You can explain something to

01:40:37 people that they don’t understand. And suddenly they get something that’s important and difficult.

01:40:44 And just seeing the lights turn on is a real satisfaction there. I don’t think there’s any

01:40:51 specific example of that. It’s just the general joy of seeing them understand.

01:40:58 SL. I have to talk about this because I’ve wrestled. I do martial arts. Of course, I love wrestling. I’m a huge, I’m Russian. So I’ve talked to Dan Gable on the podcast.

01:41:11 So you wrestled at UCLA among many other things you’ve done in your life, competitively in sports

01:41:20 and science and so on. You’ve wrestled. Maybe, again, continue with the romanticized questions,

01:41:26 but what have you learned about life and maybe even science from wrestling or from?

01:41:32 CB. Yeah, in fact, I wrestled at UCLA, but also at El Camino Community College. And just right now,

01:41:39 we were in the state of California, we were state champions at El Camino. And in fact, I was talking

01:41:44 to my mom and I got into UCLA, but I decided to go to the community college, which is, it’s much

01:41:52 harder to go to UCLA than the community college. And I asked, why did I make that decision? Because I

01:41:56 thought it was because of my girlfriend. She said, well, it was the girlfriend and you thought the

01:41:59 wrestling team was really good. And we were right. We had a great wrestling team. We actually

01:42:06 wrestled against UCLA at a tournament and we beat UCLA as a community college, which just freshmen

01:42:12 and sophomores. And part of the reason I brought this up is I’m going to go, they’ve invited me back

01:42:17 at El Camino to give a lecture next month. And so, my friend who was on the wrestling team that

01:42:27 we’re still together, we’re right now reaching out to other members of the wrestling team if we can

01:42:31 get together for a reunion. But in terms of me, it was a huge difference. The age cut off, it was

01:42:40 December 1st. And so, I was almost always the youngest person in my class and I matured later

01:42:47 on, our family matured later. So, I was almost always the smallest guy. So, I took kind of

01:42:54 nerdy courses, but I was wrestling. So, wrestling was huge for my self confidence in high school.

01:43:02 And then, I kind of got bigger at El Camino and in college. And so, I had this kind of physical

01:43:08 self confidence and it’s translated into research self confidence. And also kind of, I’ve had this

01:43:18 feeling even today in my 70s, if something going on in the streets that is bad physically, I’m not

01:43:27 going to ignore it. I’m going to stand up and try and straighten that out.

01:43:31 And that kind of confidence just carries through the entirety of your life.

01:43:34 Yeah. And the same things happens intellectually. If there’s something going on where people are

01:43:39 saying something that’s not true, I feel it’s my job to stand up just like I would in the street.

01:43:44 If there’s something going on, somebody attacking some woman or something, I’m not standing by and

01:43:49 letting that get away. So, I feel it’s my job to stand up. So, it’s kind of ironically translates.

01:43:54 The other things that turned out for both, I had really great college and high school coaches and

01:44:00 they believed, even though wrestling is an individual sport, that we would be more successful

01:44:05 as a team if we bonded together, do things that we would support each other rather than everybody,

01:44:10 you know, in wrestling it’s a one on one and you could be everybody’s on their own, but he felt if

01:44:15 we bonded as a team, we’d succeed. So, I kind of picked up those skills of how to form successful

01:44:21 teams and how to, from wrestling. And so, I think one of, most people would say one of my strengths

01:44:27 is I can create teams of faculty, large teams of faculty grad students, pull all together for a

01:44:33 common goal and often be successful at it. But I got both of those things from wrestling. Also,

01:44:41 I think I heard this line about if people are in kind of collision, sports with physical contact

01:44:49 like wrestling or football and stuff like that, people are a little bit more assertive or something.

01:44:54 And so, I think that also comes through as, you know, and I didn’t shy away from the

01:45:02 racist debates, you know, I enjoyed taking on the arguments and stuff like that. So,

01:45:08 I’m really glad I did wrestling. I think it was really good for my self image and I learned a lot

01:45:13 from it. So, I think that’s, you know, sports done well, you know, there’s really lots of positives

01:45:19 you can take about it, of leadership, you know, how to form teams and how to be successful.

01:45:26 So, we’ve talked about metrics a lot. There’s a really cool, in terms of bench press and

01:45:30 weightlifting, pound years metric that you’ve developed that we don’t have time to talk about,

01:45:34 but it’s a really cool one that people should look into. It’s rethinking the way we think about

01:45:39 metrics and weightlifting. But let me talk about metrics more broadly, since that appeals to you

01:45:43 in all forms. Let’s look at the most ridiculous, the biggest question of the meaning of life.

01:45:50 If you were to try to put metrics on a life well lived, what would those metrics be?

01:45:56 Yeah, a friend of mine, Randy Katz, said this. He said, you know, when it’s time to sign off,

01:46:06 the measure isn’t the number of zeros in your bank account, it’s the number of inches

01:46:09 in the obituary in the New York Times, was he said it. I think, you know, having,

01:46:17 and you know, this is a cliche, is that people don’t die wishing they’d spent more time in the

01:46:21 office, right? As I reflect upon my career, there have been, you know, a half a dozen, a dozen things

01:46:29 say I’ve been proud of. A lot of them aren’t papers or scientific results. Certainly, my family,

01:46:35 my wife, we’ve been married more than 50 years, kids and grandkids, that’s really precious.

01:46:42 Education things I’ve done, I’m very proud of, you know, books and courses. I did some help

01:46:50 with underrepresented groups that was effective. So it was interesting to see what were the things

01:46:55 I reflected. You know, I had hundreds of papers, but some of them were the papers, like the risk

01:47:00 rate stuff that I’m proud of, but a lot of them were not those things. So people who are, just

01:47:06 spend their lives, you know, going after the dollars or going after all the papers in the

01:47:11 world, you know, that’s probably not the things that are afterwards you’re going to care about.

01:47:15 When I was, just when I got the offer from Berkeley before I showed up, I read a book where

01:47:22 they interviewed a lot of people in all works of life. And what I got out of that book was the

01:47:27 people who felt good about what they did was the people who affected people, as opposed to things

01:47:31 that were more transitory. So I came into this job assuming that it wasn’t going to be the papers,

01:47:36 it was going to be relationships with the people over time that I would value, and that was a

01:47:42 correct assessment, right? It’s the people you work with, the people you can influence, the people

01:47:47 you can help, it’s the things that you feel good about towards the end of your career. It’s not

01:47:51 the stuff that’s more transitory.

01:47:53 Trey Lockerbie I don’t think there’s a better way to end it than talking about your family,

01:47:58 the over 50 years of being married to your childhood sweetheart.

01:48:02 Richard Averbeck What I think I can add is,

01:48:05 when you tell people you’ve been married 50 years, they want to know why.

01:48:07 Trey Lockerbie How? Why?

01:48:08 Richard Averbeck Yeah, I can tell you the nine

01:48:10 magic words that you need to say to your partner to keep a good relationship. And the nine magic

01:48:16 words are, I was wrong. You were right. I love you. Okay. And you got to say all nine. You can’t

01:48:22 say, I was wrong. You were right. You’re a jerk. You know, you can’t say that. So yeah, freely

01:48:28 acknowledging that you made a mistake, the other person was right, and that you love them really

01:48:34 gets over a lot of bumps in the road. So that’s what I pass along.

01:48:37 Trey Lockerbie Beautifully put. David,

01:48:39 it’s a huge honor. Thank you so much for the book you’ve written, for the research you’ve done,

01:48:43 for changing the world. Thank you for talking today.

01:48:45 Richard Averbeck Thanks for the interview.

01:48:46 Trey Lockerbie Thanks for listening to this

01:48:48 conversation with David Patterson. And thank you to our sponsors, The Jordan Harbinger Show, and

01:48:55 Cash App. Please consider supporting this podcast by going to JordanHarbinger.com slash Lex and

01:49:02 downloading Cash App and using code LexPodcast. Click the links, buy the stuff. It’s the best way

01:49:08 to support this podcast and the journey I’m on. If you enjoy this thing, subscribe on YouTube,

01:49:14 review it with five stars in a podcast, support it on Patreon, or connect with me on Twitter at

01:49:19 Lex Friedman, spelled without the E, try to figure out how to do that. It’s just F R I D M A N.

01:49:27 And now let me leave you with some words from Henry David Thoreau.

01:49:32 Our life is faded away by detail. Simplify, simplify. Thank you for listening and hope to

01:49:40 see you next time.