Chris Lattner: Compilers, LLVM, Swift, TPU, and ML Accelerators #21

Transcript

00:00:00 The following is a conversation with Chris Latner.

00:00:02 Currently, he’s a senior director

00:00:04 at Google working on several projects, including CPU, GPU,

00:00:08 TPU accelerators for TensorFlow, Swift for TensorFlow,

00:00:12 and all kinds of machine learning compiler magic

00:00:14 going on behind the scenes.

00:00:16 He’s one of the top experts in the world

00:00:18 on compiler technologies, which means he deeply

00:00:21 understands the intricacies of how hardware and software come

00:00:25 together to create efficient code.

00:00:27 He created the LLVM compiler infrastructure project

00:00:31 and the Clang compiler.

00:00:33 He led major engineering efforts at Apple,

00:00:36 including the creation of the Swift programming language.

00:00:39 He also briefly spent time at Tesla

00:00:41 as vice president of Autopilot software

00:00:44 during the transition from Autopilot hardware 1

00:00:46 to hardware 2, when Tesla essentially

00:00:49 started from scratch to build an in house software

00:00:52 infrastructure for Autopilot.

00:00:54 I could have easily talked to Chris for many more hours.

00:00:58 Compiling code down across the levels of abstraction

00:01:01 is one of the most fundamental and fascinating aspects

00:01:04 of what computers do, and he is one of the world

00:01:06 experts in this process.

00:01:08 It’s rigorous science, and it’s messy, beautiful art.

00:01:12 This conversation is part of the Artificial Intelligence

00:01:15 podcast.

00:01:16 If you enjoy it, subscribe on YouTube, iTunes,

00:01:19 or simply connect with me on Twitter at Lex Friedman,

00:01:22 spelled F R I D.

00:01:24 And now, here’s my conversation with Chris Ladner.

00:01:29 What was the first program you’ve ever written?

00:01:33 My first program.

00:01:34 Back, and when was it?

00:01:35 I think I started as a kid, and my parents

00:01:39 got a basic programming book.

00:01:41 And so when I started, it was typing out programs

00:01:44 from a book, and seeing how they worked,

00:01:46 and then typing them in wrong, and trying

00:01:49 to figure out why they were not working right,

00:01:51 that kind of stuff.

00:01:52 So BASIC, what was the first language

00:01:54 that you remember yourself maybe falling in love with,

00:01:58 like really connecting with?

00:01:59 I don’t know.

00:02:00 I mean, I feel like I’ve learned a lot along the way,

00:02:02 and each of them have a different special thing

00:02:05 about them.

00:02:06 So I started in BASIC, and then went like GW BASIC,

00:02:09 which was the thing back in the DOS days,

00:02:11 and then upgraded to QBASIC, and eventually QuickBASIC,

00:02:15 which are all slightly more fancy versions of Microsoft

00:02:18 BASIC.

00:02:19 Made the jump to Pascal, and started

00:02:21 doing machine language programming and assembly

00:02:23 in Pascal, which was really cool.

00:02:25 Turbo Pascal was amazing for its day.

00:02:28 Eventually got into C, C++, and then kind of did

00:02:31 lots of other weird things.

00:02:33 I feel like you took the dark path, which is the,

00:02:37 you could have gone Lisp.

00:02:39 Yeah.

00:02:40 You could have gone higher level sort

00:02:41 of functional philosophical hippie route.

00:02:44 Instead, you went into like the dark arts of the C.

00:02:48 It was straight into the machine.

00:02:49 Straight to the machine.

00:02:50 So I started with BASIC, Pascal, and then Assembly,

00:02:53 and then wrote a lot of Assembly.

00:02:55 And I eventually did Smalltalk and other things like that.

00:03:00 But that was not the starting point.

00:03:01 But so what is this journey to C?

00:03:05 Is that in high school?

00:03:06 Is that in college?

00:03:07 That was in high school, yeah.

00:03:09 And then that was really about trying

00:03:13 to be able to do more powerful things than what Pascal could

00:03:16 do, and also to learn a different world.

00:03:18 So he was really confusing to me with pointers

00:03:20 and the syntax and everything, and it took a while.

00:03:23 But Pascal’s much more principled in various ways.

00:03:28 C is more, I mean, it has its historical roots,

00:03:33 but it’s not as easy to learn.

00:03:35 With pointers, there’s this memory management thing

00:03:39 that you have to become conscious of.

00:03:41 Is that the first time you start to understand

00:03:43 that there’s resources that you’re supposed to manage?

00:03:46 Well, so you have that in Pascal as well.

00:03:48 But in Pascal, like the caret instead of the star,

00:03:51 there’s some small differences like that.

00:03:53 But it’s not about pointer arithmetic.

00:03:55 And in C, you end up thinking about how things get

00:03:58 laid out in memory a lot more.

00:04:00 And so in Pascal, you have allocating and deallocating

00:04:04 and owning the memory, but just the programs are simpler,

00:04:07 and you don’t have to.

00:04:10 Well, for example, Pascal has a string type.

00:04:12 And so you can think about a string

00:04:14 instead of an array of characters

00:04:15 which are consecutive in memory.

00:04:17 So it’s a little bit of a higher level abstraction.

00:04:20 So let’s get into it.

00:04:22 Let’s talk about LLVM, C lang, and compilers.

00:04:25 Sure.

00:04:26 So can you tell me first what LLVM and C lang are?

00:04:32 And how is it that you find yourself

00:04:33 the creator and lead developer, one

00:04:35 of the most powerful compiler optimization systems

00:04:39 in use today?

00:04:40 Sure.

00:04:40 So I guess they’re different things.

00:04:43 So let’s start with what is a compiler?

00:04:47 Is that a good place to start?

00:04:48 What are the phases of a compiler?

00:04:50 Where are the parts?

00:04:50 Yeah, what is it?

00:04:51 So what is even a compiler used for?

00:04:53 So the way I look at this is you have a two sided problem of you

00:04:57 have humans that need to write code.

00:05:00 And then you have machines that need to run

00:05:01 the program that the human wrote.

00:05:03 And for lots of reasons, the humans

00:05:05 don’t want to be writing in binary

00:05:07 and want to think about every piece of hardware.

00:05:09 And so at the same time that you have lots of humans,

00:05:12 you also have lots of kinds of hardware.

00:05:14 And so compilers are the art of allowing

00:05:17 humans to think at a level of abstraction

00:05:19 that they want to think about.

00:05:20 And then get that program, get the thing that they wrote,

00:05:23 to run on a specific piece of hardware.

00:05:26 And the interesting and exciting part of all this

00:05:29 is that there’s now lots of different kinds of hardware,

00:05:32 chips like x86 and PowerPC and ARM and things like that.

00:05:35 But also high performance accelerators

00:05:37 for machine learning and other things like that

00:05:38 are also just different kinds of hardware, GPUs.

00:05:41 These are new kinds of hardware.

00:05:42 And at the same time, on the programming side of it,

00:05:45 you have basic, you have C, you have JavaScript,

00:05:48 you have Python, you have Swift.

00:05:50 You have lots of other languages

00:05:52 that are all trying to talk to the human in a different way

00:05:55 to make them more expressive and capable and powerful.

00:05:58 And so compilers are the thing

00:06:01 that goes from one to the other.

00:06:03 End to end, from the very beginning to the very end.

00:06:05 End to end.

00:06:06 And so you go from what the human wrote

00:06:08 and programming languages end up being about

00:06:11 expressing intent, not just for the compiler

00:06:14 and the hardware, but the programming language’s job

00:06:17 is really to capture an expression

00:06:20 of what the programmer wanted

00:06:22 that then can be maintained and adapted

00:06:25 and evolved by other humans,

00:06:27 as well as interpreted by the compiler.

00:06:29 So when you look at this problem,

00:06:31 you have, on the one hand, humans, which are complicated.

00:06:34 And you have hardware, which is complicated.

00:06:36 And so compilers typically work in multiple phases.

00:06:39 And so the software engineering challenge

00:06:42 that you have here is try to get maximum reuse

00:06:45 out of the amount of code that you write,

00:06:47 because these compilers are very complicated.

00:06:49 And so the way it typically works out

00:06:51 is that you have something called a front end or a parser

00:06:54 that is language specific.

00:06:56 And so you’ll have a C parser, and that’s what Clang is,

00:07:00 or C++ or JavaScript or Python or whatever.

00:07:03 That’s the front end.

00:07:05 Then you’ll have a middle part,

00:07:07 which is often the optimizer.

00:07:09 And then you’ll have a late part,

00:07:11 which is hardware specific.

00:07:13 And so compilers end up,

00:07:15 there’s many different layers often,

00:07:16 but these three big groups are very common in compilers.

00:07:20 And what LLVM is trying to do

00:07:22 is trying to standardize that middle and last part.

00:07:25 And so one of the cool things about LLVM

00:07:27 is that there are a lot of different languages

00:07:29 that compile through to it.

00:07:31 And so things like Swift, but also Julia, Rust,

00:07:35 Clang for C, C++, Subjective C,

00:07:39 like these are all very different languages

00:07:40 and they can all use the same optimization infrastructure,

00:07:43 which gets better performance,

00:07:45 and the same code generation infrastructure

00:07:47 for hardware support.

00:07:48 And so LLVM is really that layer that is common,

00:07:52 that all these different specific compilers can use.

00:07:55 And is it a standard, like a specification,

00:07:59 or is it literally an implementation?

00:08:01 It’s an implementation.

00:08:02 And so I think there’s a couple of different ways

00:08:05 of looking at it, right?

00:08:06 Because it depends on which angle you’re looking at it from.

00:08:09 LLVM ends up being a bunch of code, okay?

00:08:12 So it’s a bunch of code that people reuse

00:08:14 and they build compilers with.

00:08:16 We call it a compiler infrastructure

00:08:18 because it’s kind of the underlying platform

00:08:20 that you build a concrete compiler on top of.

00:08:22 But it’s also a community.

00:08:23 And the LLVM community is hundreds of people

00:08:26 that all collaborate.

00:08:27 And one of the most fascinating things about LLVM

00:08:30 over the course of time is that we’ve managed somehow

00:08:34 to successfully get harsh competitors

00:08:37 in the commercial space to collaborate

00:08:39 on shared infrastructure.

00:08:41 And so you have Google and Apple,

00:08:43 you have AMD and Intel,

00:08:45 you have Nvidia and AMD on the graphics side,

00:08:48 you have Cray and everybody else doing these things.

00:08:52 And all these companies are collaborating together

00:08:55 to make that shared infrastructure really, really great.

00:08:58 And they do this not out of the goodness of their heart,

00:09:01 but they do it because it’s in their commercial interest

00:09:03 of having really great infrastructure

00:09:05 that they can build on top of

00:09:06 and facing the reality that it’s so expensive

00:09:09 that no one company, even the big companies,

00:09:11 no one company really wants to implement it all themselves.

00:09:14 Expensive or difficult?

00:09:16 Both.

00:09:16 That’s a great point because it’s also about the skill sets.

00:09:20 And the skill sets are very hard to find.

00:09:26 How big is the LLVM?

00:09:27 It always seems like with open source projects,

00:09:30 the kind, an LLVM is open source?

00:09:33 Yes, it’s open source.

00:09:34 It’s about, it’s 19 years old now, so it’s fairly old.

00:09:38 It seems like the magic often happens

00:09:40 within a very small circle of people.

00:09:43 Yes.

00:09:43 At least their early birth and whatever.

00:09:46 Yes, so the LLVM came from a university project,

00:09:49 and so I was at the University of Illinois.

00:09:51 And there it was myself, my advisor,

00:09:53 and then a team of two or three research students

00:09:57 in the research group,

00:09:58 and we built many of the core pieces initially.

00:10:02 I then graduated and went to Apple,

00:10:03 and at Apple brought it to the products,

00:10:06 first in the OpenGL graphics stack,

00:10:09 but eventually to the C compiler realm,

00:10:11 and eventually built Clang,

00:10:12 and eventually built Swift and these things.

00:10:14 Along the way, building a team of people

00:10:16 that are really amazing compiler engineers

00:10:18 that helped build a lot of that.

00:10:20 And so as it was gaining momentum

00:10:21 and as Apple was using it, being open source and public

00:10:24 and encouraging contribution,

00:10:26 many others, for example, at Google,

00:10:28 came in and started contributing.

00:10:30 And in some cases, Google effectively owns Clang now

00:10:33 because it cares so much about C++

00:10:35 and the evolution of that ecosystem,

00:10:37 and so it’s investing a lot in the C++ world

00:10:41 and the tooling and things like that.

00:10:42 And so likewise, NVIDIA cares a lot about CUDA.

00:10:47 And so CUDA uses Clang and uses LLVM

00:10:50 for graphics and GPGPU.

00:10:54 And so when you first started as a master’s project,

00:10:58 I guess, did you think it was gonna go as far as it went?

00:11:02 Were you crazy ambitious about it?

00:11:06 No.

00:11:07 It seems like a really difficult undertaking, a brave one.

00:11:09 Yeah, no, no, no, it was nothing like that.

00:11:11 So my goal when I went to the University of Illinois

00:11:13 was to get in and out with a non thesis masters in a year

00:11:17 and get back to work.

00:11:18 So I was not planning to stay for five years

00:11:22 and build this massive infrastructure.

00:11:24 I got nerd sniped into staying.

00:11:27 And a lot of it was because LLVM was fun

00:11:29 and I was building cool stuff

00:11:30 and learning really interesting things

00:11:33 and facing both software engineering challenges,

00:11:36 but also learning how to work in a team

00:11:38 and things like that.

00:11:40 I had worked at many companies as interns before that,

00:11:43 but it was really a different thing

00:11:45 to have a team of people that are working together

00:11:48 and try and collaborate in version control.

00:11:50 And it was just a little bit different.

00:11:52 Like I said, I just talked to Don Knuth

00:11:54 and he believes that 2% of the world population

00:11:56 have something weird with their brain,

00:11:58 that they’re geeks, they understand computers,

00:12:01 they’re connected with computers.

00:12:02 He put it at exactly 2%.

00:12:04 Okay, so.

00:12:05 He’s a specific guy.

00:12:06 It’s very specific.

00:12:08 Well, he says, I can’t prove it,

00:12:10 but it’s very empirically there.

00:12:13 Is there something that attracts you

00:12:14 to the idea of optimizing code?

00:12:16 And he seems like that’s one of the biggest,

00:12:19 coolest things about LLVM.

00:12:20 Yeah, that’s one of the major things it does.

00:12:22 So I got into that because of a person, actually.

00:12:26 So when I was in my undergraduate,

00:12:28 I had an advisor, or a professor named Steve Vegdahl.

00:12:32 And he, I went to this little tiny private school.

00:12:35 There were like seven or nine people

00:12:38 in my computer science department,

00:12:40 students in my class.

00:12:43 So it was a very tiny, very small school.

00:12:47 It was kind of a wart on the side of the math department

00:12:49 kind of a thing at the time.

00:12:51 I think it’s evolved a lot in the many years since then.

00:12:53 But Steve Vegdahl was a compiler guy.

00:12:58 And he was super passionate.

00:12:59 And his passion rubbed off on me.

00:13:02 And one of the things I like about compilers

00:13:04 is that they’re large, complicated software pieces.

00:13:09 And so one of the culminating classes

00:13:12 that many computer science departments,

00:13:14 at least at the time, did was to say

00:13:16 that you would take algorithms and data structures

00:13:18 and all these core classes.

00:13:19 But then the compilers class was one of the last classes

00:13:21 you take because it pulls everything together.

00:13:24 And then you work on one piece of code

00:13:26 over the entire semester.

00:13:28 And so you keep building on your own work,

00:13:32 which is really interesting.

00:13:33 And it’s also very challenging because in many classes,

00:13:36 if you don’t get a project done, you just forget about it

00:13:38 and move on to the next one and get your B or whatever it is.

00:13:41 But here you have to live with the decisions you make

00:13:43 and continue to reinvest in it.

00:13:45 And I really like that.

00:13:48 And so I did an extra study project

00:13:50 with him the following semester.

00:13:52 And he was just really great.

00:13:53 And he was also a great mentor in a lot of ways.

00:13:56 And so from him and from his advice,

00:13:59 he encouraged me to go to graduate school.

00:14:01 I wasn’t super excited about going to grad school.

00:14:03 I wanted the master’s degree, but I

00:14:05 didn’t want to be an academic.

00:14:08 But like I said, I kind of got tricked into saying

00:14:11 and was having a lot of fun.

00:14:12 And I definitely do not regret it.

00:14:14 What aspects of compilers were the things you connected with?

00:14:17 So LLVM, there’s also the other part

00:14:22 that’s really interesting if you’re interested in languages

00:14:24 is parsing and just analyzing the language,

00:14:29 breaking it down, parsing, and so on.

00:14:31 Was that interesting to you, or were you

00:14:32 more interested in optimization?

00:14:34 For me, it was more so I’m not really a math person.

00:14:37 I could do math.

00:14:38 I understand some bits of it when I get into it.

00:14:41 But math is never the thing that attracted me.

00:14:43 And so a lot of the parser part of the compiler

00:14:46 has a lot of good formal theories

00:14:47 that Don, for example, knows quite well.

00:14:50 I’m still waiting for his book on that.

00:14:54 But I just like building a thing and seeing what it could do

00:14:57 and exploring and getting it to do more things

00:15:00 and then setting new goals and reaching for them.

00:15:04 And in the case of LLVM, when I started working on that,

00:15:09 my research advisor that I was working for was a compiler guy.

00:15:13 And so he and I specifically found each other

00:15:15 because we were both interested in compilers.

00:15:16 And so I started working with him and taking his class.

00:15:19 And a lot of LLVM initially was, it’s

00:15:21 fun implementing all the standard algorithms and all

00:15:24 the things that people had been talking about

00:15:26 and were well known.

00:15:27 And they were in the curricula for advanced studies

00:15:30 and compilers.

00:15:31 And so just being able to build that was really fun.

00:15:34 And I was learning a lot by, instead of reading about it,

00:15:37 just building.

00:15:38 And so I enjoyed that.

00:15:40 So you said compilers are these complicated systems.

00:15:42 Can you even just with language try

00:15:46 to describe how you turn a C++ program into code?

00:15:52 Like, what are the hard parts?

00:15:53 Why is it so hard?

00:15:54 So I’ll give you examples of the hard parts along the way.

00:15:57 So C++ is a very complicated programming language.

00:16:01 It’s something like 1,400 pages in the spec.

00:16:03 So C++ by itself is crazy complicated.

00:16:06 Can we just pause?

00:16:07 What makes the language complicated in terms

00:16:09 of what’s syntactically?

00:16:12 So it’s what they call syntax.

00:16:14 So the actual how the characters are arranged, yes.

00:16:16 It’s also semantics, how it behaves.

00:16:20 It’s also, in the case of C++, there’s

00:16:21 a huge amount of history.

00:16:23 C++ is built on top of C. You play that forward.

00:16:26 And then a bunch of suboptimal, in some cases, decisions

00:16:29 were made, and they compound.

00:16:31 And then more and more and more things

00:16:33 keep getting added to C++, and it will probably never stop.

00:16:36 But the language is very complicated

00:16:38 from that perspective.

00:16:39 And so the interactions between subsystems

00:16:41 is very complicated.

00:16:42 There’s just a lot there.

00:16:43 And when you talk about the front end,

00:16:45 one of the major challenges, which

00:16:47 clang as a project, the C, C++ compiler that I built,

00:16:51 I and many people built, one of the challenges we took on

00:16:54 was we looked at GCC.

00:16:57 GCC, at the time, was a really good industry standardized

00:17:02 compiler that had really consolidated

00:17:05 a lot of the other compilers in the world and was a standard.

00:17:08 But it wasn’t really great for research.

00:17:10 The design was very difficult to work with.

00:17:12 And it was full of global variables and other things

00:17:16 that made it very difficult to reuse in ways

00:17:18 that it wasn’t originally designed for.

00:17:20 And so with clang, one of the things that we wanted to do

00:17:22 is push forward on better user interface,

00:17:25 so make error messages that are just better than GCC’s.

00:17:28 And that’s actually hard, because you

00:17:29 have to do a lot of bookkeeping in an efficient way

00:17:32 to be able to do that.

00:17:33 We want to make compile time better.

00:17:35 And so compile time is about making it efficient,

00:17:37 which is also really hard when you’re keeping

00:17:38 track of extra information.

00:17:40 We wanted to make new tools available,

00:17:43 so refactoring tools and other analysis tools

00:17:46 that GCC never supported, also leveraging the extra information

00:17:50 we kept, but enabling those new classes of tools

00:17:54 that then get built into IDEs.

00:17:55 And so that’s been one of the areas that clang has really

00:17:59 helped push the world forward in,

00:18:01 is in the tooling for C and C++ and things like that.

00:18:05 But C++ and the front end piece is complicated.

00:18:07 And you have to build syntax trees.

00:18:09 And you have to check every rule in the spec.

00:18:11 And you have to turn that back into an error message

00:18:14 to the human that the human can understand

00:18:16 when they do something wrong.

00:18:17 But then you start doing what’s called lowering,

00:18:20 so going from C++ and the way that it represents

00:18:23 code down to the machine.

00:18:24 And when you do that, there’s many different phases

00:18:27 you go through.

00:18:29 Often, there are, I think LLVM has something like 150

00:18:33 different what are called passes in the compiler

00:18:36 that the code passes through.

00:18:38 And these get organized in very complicated ways,

00:18:41 which affect the generated code and the performance

00:18:44 and compile time and many other things.

00:18:45 What are they passing through?

00:18:47 So after you do the clang parsing, what’s the graph?

00:18:53 What does it look like?

00:18:54 What’s the data structure here?

00:18:56 Yeah, so in the parser, it’s usually a tree.

00:18:59 And it’s called an abstract syntax tree.

00:19:01 And so the idea is you have a node for the plus

00:19:04 that the human wrote in their code.

00:19:06 Or the function call, you’ll have a node for call

00:19:09 with the function that they call and the arguments they pass,

00:19:11 things like that.

00:19:14 This then gets lowered into what’s

00:19:16 called an intermediate representation.

00:19:18 And intermediate representations are like LLVM has one.

00:19:22 And there, it’s what’s called a control flow graph.

00:19:26 And so you represent each operation in the program

00:19:31 as a very simple, like this is going to add two numbers.

00:19:34 This is going to multiply two things.

00:19:35 Maybe we’ll do a call.

00:19:37 But then they get put in what are called blocks.

00:19:40 And so you get blocks of these straight line operations,

00:19:43 where instead of being nested like in a tree,

00:19:45 it’s straight line operations.

00:19:46 And so there’s a sequence and an ordering to these operations.

00:19:49 So within the block or outside the block?

00:19:51 That’s within the block.

00:19:52 And so it’s a straight line sequence of operations

00:19:54 within the block.

00:19:55 And then you have branches, like conditional branches,

00:19:58 between blocks.

00:20:00 And so when you write a loop, for example, in a syntax tree,

00:20:04 you would have a for node, like for a for statement

00:20:08 in a C like language, you’d have a for node.

00:20:10 And you have a pointer to the expression

00:20:12 for the initializer, a pointer to the expression

00:20:14 for the increment, a pointer to the expression

00:20:16 for the comparison, a pointer to the body.

00:20:18 And these are all nested underneath it.

00:20:21 In a control flow graph, you get a block

00:20:22 for the code that runs before the loop, so the initializer

00:20:26 code.

00:20:27 And you have a block for the body of the loop.

00:20:30 And so the body of the loop code goes in there,

00:20:33 but also the increment and other things like that.

00:20:35 And then you have a branch that goes back to the top

00:20:37 and a comparison and a branch that goes out.

00:20:39 And so it’s more of an assembly level kind of representation.

00:20:43 But the nice thing about this level of representation

00:20:46 is it’s much more language independent.

00:20:48 And so there’s lots of different kinds of languages

00:20:51 with different kinds of, you know,

00:20:54 JavaScript has a lot of different ideas of what

00:20:56 is false, for example.

00:20:58 And all that can stay in the front end.

00:21:00 But then that middle part can be shared across all those.

00:21:04 How close is that intermediate representation

00:21:07 to neural networks, for example?

00:21:10 Are they, because everything you describe

00:21:13 is a kind of echoes of a neural network graph.

00:21:16 Are they neighbors or what?

00:21:18 They’re quite different in details,

00:21:20 but they’re very similar in idea.

00:21:22 So one of the things that neural networks do

00:21:24 is they learn representations for data

00:21:26 at different levels of abstraction.

00:21:29 And then they transform those through layers, right?

00:21:33 So the compiler does very similar things.

00:21:35 But one of the things the compiler does

00:21:37 is it has relatively few different representations.

00:21:40 Where a neural network often, as you get deeper, for example,

00:21:43 you get many different representations

00:21:44 in each layer or set of ops.

00:21:47 It’s transforming between these different representations.

00:21:50 In a compiler, often you get one representation

00:21:53 and they do many transformations to it.

00:21:55 And these transformations are often applied iteratively.

00:21:59 And for programmers, there’s familiar types of things.

00:22:02 For example, trying to find expressions inside of a loop

00:22:06 and pulling them out of a loop so they execute for times.

00:22:08 Or find redundant computation.

00:22:10 Or find constant folding or other simplifications,

00:22:15 turning two times x into x shift left by one.

00:22:19 And things like this are all the examples

00:22:21 of the things that happen.

00:22:23 But compilers end up getting a lot of theorem proving

00:22:26 and other kinds of algorithms that

00:22:27 try to find higher level properties of the program that

00:22:30 then can be used by the optimizer.

00:22:32 Cool.

00:22:32 So what’s the biggest bang for the buck with optimization?

00:22:38 Today?

00:22:38 Yeah.

00:22:39 Well, no, not even today.

00:22:40 At the very beginning, the 80s, I don’t know.

00:22:42 Yeah, so for the 80s, a lot of it

00:22:44 was things like register allocation.

00:22:46 So the idea of in a modern microprocessor,

00:22:50 what you’ll end up having is you’ll

00:22:51 end up having memory, which is relatively slow.

00:22:54 And then you have registers that are relatively fast.

00:22:57 But registers, you don’t have very many of them.

00:23:00 And so when you’re writing a bunch of code,

00:23:02 you’re just saying, compute this,

00:23:04 put in a temporary variable, compute this, compute this,

00:23:05 compute this, put in a temporary variable.

00:23:07 I have a loop.

00:23:08 I have some other stuff going on.

00:23:09 Well, now you’re running on an x86,

00:23:11 like a desktop PC or something.

00:23:13 Well, it only has, in some cases, some modes,

00:23:16 eight registers.

00:23:18 And so now the compiler has to choose what values get

00:23:21 put in what registers at what points in the program.

00:23:24 And this is actually a really big deal.

00:23:26 So if you think about, you have a loop, an inner loop

00:23:29 that executes millions of times maybe.

00:23:31 If you’re doing loads and stores inside that loop,

00:23:33 then it’s going to be really slow.

00:23:35 But if you can somehow fit all the values inside that loop

00:23:37 in registers, now it’s really fast.

00:23:40 And so getting that right requires a lot of work,

00:23:43 because there’s many different ways to do that.

00:23:44 And often what the compiler ends up doing

00:23:46 is it ends up thinking about things

00:23:48 in a different representation than what the human wrote.

00:23:52 You wrote into x.

00:23:53 Well, the compiler thinks about that as four different values,

00:23:56 each which have different lifetimes across the function

00:23:59 that it’s in.

00:24:00 And each of those could be put in a register or memory

00:24:03 or different memory or maybe in some parts of the code

00:24:06 recomputed instead of stored and reloaded.

00:24:08 And there are many of these different kinds of techniques

00:24:10 that can be used.

00:24:11 So it’s adding almost like a time dimension to it’s

00:24:15 trying to optimize across time.

00:24:18 So it’s considering when you’re programming,

00:24:20 you’re not thinking in that way.

00:24:21 Yeah, absolutely.

00:24:23 And so the RISC era made things.

00:24:27 So RISC chips, R I S C. The RISC chips,

00:24:32 as opposed to CISC chips.

00:24:33 The RISC chips made things more complicated for the compiler,

00:24:36 because what they ended up doing is ending up

00:24:40 adding pipelines to the processor, where

00:24:42 the processor can do more than one thing at a time.

00:24:45 But this means that the order of operations matters a lot.

00:24:47 So one of the classical compiler techniques that you use

00:24:50 is called scheduling.

00:24:51 And so moving the instructions around

00:24:54 so that the processor can keep its pipelines full instead

00:24:57 of stalling and getting blocked.

00:24:59 And so there’s a lot of things like that that

00:25:01 are kind of bread and butter compiler techniques

00:25:03 that have been studied a lot over the course of decades now.

00:25:06 But the engineering side of making them real

00:25:08 is also still quite hard.

00:25:10 And you talk about machine learning.

00:25:12 This is a huge opportunity for machine learning,

00:25:14 because many of these algorithms are full of these

00:25:17 hokey, hand rolled heuristics, which

00:25:19 work well on specific benchmarks that don’t generalize,

00:25:21 and full of magic numbers.

00:25:23 And I hear there’s some techniques that

00:25:26 are good at handling that.

00:25:28 So what would be the, if you were to apply machine learning

00:25:32 to this, what’s the thing you’re trying to optimize?

00:25:34 Is it ultimately the running time?

00:25:39 You can pick your metric, and there’s running time,

00:25:41 there’s memory use, there’s lots of different things

00:25:43 that you can optimize for.

00:25:44 Code size is another one that some people care about

00:25:47 in the embedded space.

00:25:48 Is this like the thinking into the future,

00:25:51 or has somebody actually been crazy enough

00:25:54 to try to have machine learning based parameter

00:25:58 tuning for the optimization of compilers?

00:26:01 So this is something that is, I would say, research right now.

00:26:04 There are a lot of research systems

00:26:06 that have been applying search in various forms.

00:26:09 And using reinforcement learning is one form,

00:26:11 but also brute force search has been tried for quite a while.

00:26:14 And usually, these are in small problem spaces.

00:26:18 So find the optimal way to code generate a matrix

00:26:21 multiply for a GPU, something like that,

00:26:24 where you say, there, there’s a lot of design space of,

00:26:28 do you unroll loops a lot?

00:26:29 Do you execute multiple things in parallel?

00:26:32 And there’s many different confounding factors here

00:26:35 because graphics cards have different numbers of threads

00:26:38 and registers and execution ports and memory bandwidth

00:26:41 and many different constraints that interact

00:26:42 in nonlinear ways.

00:26:44 And so search is very powerful for that.

00:26:46 And it gets used in certain ways,

00:26:49 but it’s not very structured.

00:26:51 This is something that we need,

00:26:52 we as an industry need to fix.

00:26:54 So you said 80s, but like, so have there been like big jumps

00:26:59 in improvement and optimization?

00:27:01 Yeah.

00:27:02 Yeah, since then, what’s the coolest thing?

00:27:05 It’s largely been driven by hardware.

00:27:07 So, well, it’s hardware and software.

00:27:09 So in the mid nineties, Java totally changed the world,

00:27:13 right?

00:27:14 And I’m still amazed by how much change was introduced

00:27:17 by the way or in a good way.

00:27:19 So like reflecting back, Java introduced things like,

00:27:22 all at once introduced things like JIT compilation.

00:27:25 None of these were novel, but it pulled it together

00:27:27 and made it mainstream and made people invest in it.

00:27:30 JIT compilation, garbage collection, portable code,

00:27:33 safe code, like memory safe code,

00:27:36 like a very dynamic dispatch execution model.

00:27:41 Like many of these things,

00:27:42 which had been done in research systems

00:27:44 and had been done in small ways in various places,

00:27:46 really came to the forefront,

00:27:47 really changed how things worked

00:27:49 and therefore changed the way people thought

00:27:51 about the problem.

00:27:53 JavaScript was another major world change

00:27:56 based on the way it works.

00:27:59 But also on the hardware side of things,

00:28:01 multi core and vector instructions really change

00:28:06 the problem space and are very,

00:28:09 they don’t remove any of the problems

00:28:10 that compilers faced in the past,

00:28:12 but they add new kinds of problems

00:28:14 of how do you find enough work

00:28:16 to keep a four wide vector busy, right?

00:28:20 Or if you’re doing a matrix multiplication,

00:28:22 how do you do different columns out of that matrix

00:28:25 at the same time?

00:28:26 And how do you maximally utilize the arithmetic compute

00:28:30 that one core has?

00:28:31 And then how do you take it to multiple cores?

00:28:33 How did the whole virtual machine thing change

00:28:35 the compilation pipeline?

00:28:38 Yeah, so what the Java virtual machine does

00:28:40 is it splits, just like I was talking about before,

00:28:44 where you have a front end that parses the code,

00:28:46 and then you have an intermediate representation

00:28:48 that gets transformed.

00:28:49 What Java did was they said,

00:28:51 we will parse the code and then compile to

00:28:53 what’s known as Java byte code.

00:28:55 And that byte code is now a portable code representation

00:28:58 that is industry standard and locked down and can’t change.

00:29:02 And then the back part of the compiler

00:29:05 that does optimization and code generation

00:29:07 can now be built by different vendors.

00:29:09 Okay.

00:29:10 And Java byte code can be shipped around across the wire.

00:29:13 It’s memory safe and relatively trusted.

00:29:16 And because of that, it can run in the browser.

00:29:18 And that’s why it runs in the browser, right?

00:29:20 And so that way you can be in,

00:29:22 again, back in the day, you would write a Java applet

00:29:25 and as a web developer, you’d build this mini app

00:29:29 that would run on a webpage.

00:29:30 Well, a user of that is running a web browser

00:29:33 on their computer.

00:29:34 You download that Java byte code, which can be trusted,

00:29:37 and then you do all the compiler stuff on your machine

00:29:41 so that you know that you trust that.

00:29:42 Now, is that a good idea or a bad idea?

00:29:44 It’s a great idea.

00:29:44 I mean, it’s a great idea for certain problems.

00:29:46 And I’m very much a believer that technology is itself

00:29:49 neither good nor bad.

00:29:50 It’s how you apply it.

00:29:52 You know, this would be a very, very bad thing

00:29:54 for very low levels of the software stack.

00:29:56 But in terms of solving some of these software portability

00:30:00 and transparency, or portability problems,

00:30:02 I think it’s been really good.

00:30:04 Now, Java ultimately didn’t win out on the desktop.

00:30:06 And like, there are good reasons for that.

00:30:09 But it’s been very successful on servers and in many places,

00:30:13 it’s been a very successful thing over decades.

00:30:16 So what has been LLVMs and C langs improvements

00:30:21 and optimization that throughout its history,

00:30:28 what are some moments we had set back

00:30:31 and really proud of what’s been accomplished?

00:30:33 Yeah, I think that the interesting thing about LLVM

00:30:36 is not the innovations and compiler research.

00:30:40 It has very good implementations

00:30:41 of various important algorithms, no doubt.

00:30:44 And a lot of really smart people have worked on it.

00:30:48 But I think that the thing that’s most profound about LLVM

00:30:50 is that through standardization, it made things possible

00:30:53 that otherwise wouldn’t have happened, okay?

00:30:56 And so interesting things that have happened with LLVM,

00:30:59 for example, Sony has picked up LLVM

00:31:01 and used it to do all the graphics compilation

00:31:03 in their movie production pipeline.

00:31:06 And so now they’re able to have better special effects

00:31:07 because of LLVM.

00:31:09 That’s kind of cool.

00:31:11 That’s not what it was designed for, right?

00:31:13 But that’s the sign of good infrastructure

00:31:15 when it can be used in ways it was never designed for

00:31:18 because it has good layering and software engineering

00:31:20 and it’s composable and things like that.

00:31:23 Which is where, as you said, it differs from GCC.

00:31:26 Yes, GCC is also great in various ways,

00:31:28 but it’s not as good as infrastructure technology.

00:31:31 It’s really a C compiler, or it’s a Fortran compiler.

00:31:36 It’s not infrastructure in the same way.

00:31:38 Now you can tell I don’t know what I’m talking about

00:31:41 because I keep saying C lang.

00:31:44 You can always tell when a person has clues,

00:31:48 by the way, to pronounce something.

00:31:49 I don’t think, have I ever used C lang?

00:31:52 Entirely possible, have you?

00:31:54 Well, so you’ve used code, it’s generated probably.

00:31:58 So C lang and LLVM are used to compile

00:32:01 all the apps on the iPhone effectively and the OSs.

00:32:05 It compiles Google’s production server applications.

00:32:10 It’s used to build GameCube games and PlayStation 4

00:32:14 and things like that.

00:32:16 So as a user, I have, but just everything I’ve done

00:32:20 that I experienced with Linux has been,

00:32:22 I believe, always GCC.

00:32:23 Yeah, I think Linux still defaults to GCC.

00:32:26 And is there a reason for that?

00:32:27 Or is it because, I mean, is there a reason for that?

00:32:29 It’s a combination of technical and social reasons.

00:32:32 Many Linux developers do use C lang,

00:32:35 but the distributions, for lots of reasons,

00:32:40 use GCC historically, and they’ve not switched, yeah.

00:32:44 Because it’s just anecdotally online,

00:32:46 it seems that LLVM has either reached the level of GCC

00:32:50 or superseded on different features or whatever.

00:32:53 The way I would say it is that they’re so close,

00:32:55 it doesn’t matter.

00:32:56 Yeah, exactly.

00:32:56 Like, they’re slightly better in some ways,

00:32:58 slightly worse than otherwise,

00:32:59 but it doesn’t actually really matter anymore, that level.

00:33:03 So in terms of optimization breakthroughs,

00:33:06 it’s just been solid incremental work.

00:33:09 Yeah, yeah, which describes a lot of compilers.

00:33:12 The hard thing about compilers, in my experience,

00:33:15 is the engineering, the software engineering,

00:33:17 making it so that you can have hundreds of people

00:33:20 collaborating on really detailed, low level work

00:33:23 and scaling that.

00:33:25 And that’s really hard.

00:33:27 And that’s one of the things I think LLVM has done well.

00:33:32 And that kind of goes back to the original design goals

00:33:34 with it to be modular and things like that.

00:33:37 And incidentally, I don’t want to take all the credit

00:33:38 for this, right?

00:33:39 I mean, some of the best parts about LLVM

00:33:41 is that it was designed to be modular.

00:33:43 And when I started, I would write, for example,

00:33:45 a register allocator, and then somebody much smarter than me

00:33:48 would come in and pull it out and replace it

00:33:50 with something else that they would come up with.

00:33:52 And because it’s modular, they were able to do that.

00:33:55 And that’s one of the challenges with GCC, for example,

00:33:58 is replacing subsystems is incredibly difficult.

00:34:01 It can be done, but it wasn’t designed for that.

00:34:04 And that’s one of the reasons that LLVM’s been

00:34:06 very successful in the research world as well.

00:34:08 But in a community sense, Guido van Rossum, right,

00:34:12 from Python, just retired from, what is it?

00:34:18 Benevolent Dictator for Life, right?

00:34:20 So in managing this community of brilliant compiler folks,

00:34:24 is there, did it, for a time at least,

00:34:28 fall on you to approve things?

00:34:31 Oh yeah, so I mean, I still have something like

00:34:34 an order of magnitude more patches in LLVM

00:34:37 than anybody else, and many of those I wrote myself.

00:34:42 But you still write, I mean, you’re still close to the,

00:34:47 to the, I don’t know what the expression is,

00:34:49 to the metal, you still write code.

00:34:51 Yeah, I still write code.

00:34:52 Not as much as I was able to in grad school,

00:34:54 but that’s an important part of my identity.

00:34:56 But the way that LLVM has worked over time

00:34:58 is that when I was a grad student, I could do all the work

00:35:01 and steer everything and review every patch

00:35:04 and make sure everything was done

00:35:05 exactly the way my opinionated sense

00:35:09 felt like it should be done, and that was fine.

00:35:11 But as things scale, you can’t do that, right?

00:35:14 And so what ends up happening is LLVM

00:35:17 has a hierarchical system of what’s called code owners.

00:35:20 These code owners are given the responsibility

00:35:22 not to do all the work,

00:35:24 not necessarily to review all the patches,

00:35:26 but to make sure that the patches do get reviewed

00:35:28 and make sure that the right thing’s happening

00:35:30 architecturally in their area.

00:35:32 And so what you’ll see is you’ll see that, for example,

00:35:36 hardware manufacturers end up owning

00:35:38 the hardware specific parts of their hardware.

00:35:43 That’s very common.

00:35:45 Leaders in the community that have done really good work

00:35:47 naturally become the de facto owner of something.

00:35:50 And then usually somebody else is like,

00:35:53 how about we make them the official code owner?

00:35:55 And then we’ll have somebody to make sure

00:35:58 that all the patches get reviewed in a timely manner.

00:36:00 And then everybody’s like, yes, that’s obvious.

00:36:02 And then it happens, right?

00:36:03 And usually this is a very organic thing, which is great.

00:36:06 And so I’m nominally the top of that stack still,

00:36:08 but I don’t spend a lot of time reviewing patches.

00:36:11 What I do is I help negotiate a lot of the technical

00:36:16 disagreements that end up happening

00:36:18 and making sure that the community as a whole

00:36:19 makes progress and is moving in the right direction

00:36:22 and doing that.

00:36:23 So we also started a nonprofit six years ago,

00:36:28 seven years ago, time’s gone away.

00:36:30 And the LLVM Foundation nonprofit helps oversee

00:36:34 all the business sides of things and make sure

00:36:36 that the events that the LLVM community has

00:36:38 are funded and set up and run correctly

00:36:41 and stuff like that.

00:36:42 But the foundation is very much stays out

00:36:45 of the technical side of where the project is going.

00:36:49 Right, so it sounds like a lot of it is just organic.

00:36:53 Yeah, well, LLVM is almost 20 years old,

00:36:55 which is hard to believe.

00:36:56 Somebody pointed out to me recently that LLVM

00:36:59 is now older than GCC was when LLVM started, right?

00:37:04 So time has a way of getting away from you.

00:37:06 But the good thing about that is it has a really robust,

00:37:10 really amazing community of people that are

00:37:13 in their professional lives, spread across lots

00:37:15 of different companies, but it’s a community

00:37:17 of people that are interested in similar kinds of problems

00:37:21 and have been working together effectively for years

00:37:23 and have a lot of trust and respect for each other.

00:37:26 And even if they don’t always agree that we’re able

00:37:29 to find a path forward.

00:37:31 So then in a slightly different flavor of effort,

00:37:34 you started at Apple in 2005 with the task

00:37:38 of making, I guess, LLVM production ready.

00:37:41 And then eventually 2013 through 2017,

00:37:44 leading the entire developer tools department.

00:37:48 We’re talking about LLVM, Xcode, Objective C to Swift.

00:37:53 So in a quick overview of your time there,

00:37:58 what were the challenges?

00:37:59 First of all, leading such a huge group of developers,

00:38:03 what was the big motivator, dream, mission

00:38:06 behind creating Swift, the early birth of it

00:38:11 from Objective C and so on, and Xcode,

00:38:13 what are some challenges?

00:38:14 So these are different questions.

00:38:15 Yeah, I know, but I wanna talk about the other stuff too.

00:38:19 I’ll stay on the technical side,

00:38:21 then we can talk about the big team pieces, if that’s okay.

00:38:24 So it’s to really oversimplify many years of hard work.

00:38:29 LLVM started, joined Apple, became a thing,

00:38:32 became successful and became deployed.

00:38:34 But then there’s a question about

00:38:35 how do we actually parse the source code?

00:38:38 So LLVM is that back part,

00:38:40 the optimizer and the code generator.

00:38:42 And LLVM was really good for Apple

00:38:44 as it went through a couple of harder transitions.

00:38:46 I joined right at the time of the Intel transition,

00:38:47 for example, and 64 bit transitions,

00:38:51 and then the transition to ARM with the iPhone.

00:38:53 And so LLVM was very useful

00:38:54 for some of these kinds of things.

00:38:57 But at the same time, there’s a lot of questions

00:38:58 around developer experience.

00:39:00 And so if you’re a programmer pounding out

00:39:01 at the time Objective C code,

00:39:04 the error message you get, the compile time,

00:39:06 the turnaround cycle, the tooling and the IDE,

00:39:09 were not great, were not as good as they could be.

00:39:13 And so, as I occasionally do, I’m like,

00:39:18 well, okay, how hard is it to write a C compiler?

00:39:20 And so I’m not gonna commit to anybody,

00:39:22 I’m not gonna tell anybody, I’m just gonna just do it

00:39:25 nights and weekends and start working on it.

00:39:27 And then I built up in C,

00:39:29 there’s this thing called the preprocessor,

00:39:31 which people don’t like,

00:39:33 but it’s actually really hard and complicated

00:39:35 and includes a bunch of really weird things

00:39:37 like trigraphs and other stuff like that

00:39:39 that are really nasty,

00:39:40 and it’s the crux of a bunch of the performance issues

00:39:44 in the compiler.

00:39:45 Started working on the parser

00:39:46 and kind of got to the point where I’m like,

00:39:47 ah, you know what, we could actually do this.

00:39:49 Everybody’s saying that this is impossible to do,

00:39:51 but it’s actually just hard, it’s not impossible.

00:39:53 And eventually told my manager about it,

00:39:57 and he’s like, oh, wow, this is great,

00:39:59 we do need to solve this problem.

00:40:00 Oh, this is great, we can get you one other person

00:40:02 to work with you on this, you know?

00:40:04 And slowly a team is formed and it starts taking off.

00:40:08 And C++, for example, huge, complicated language.

00:40:12 People always assume that it’s impossible to implement

00:40:14 and it’s very nearly impossible,

00:40:16 but it’s just really, really hard.

00:40:18 And the way to get there is to build it

00:40:20 one piece at a time incrementally.

00:40:22 And that was only possible because we were lucky

00:40:26 to hire some really exceptional engineers

00:40:28 that knew various parts of it very well

00:40:30 and could do great things.

00:40:32 Swift was kind of a similar thing.

00:40:34 So Swift came from, we were just finishing off

00:40:39 the first version of C++ support in Clang.

00:40:42 And C++ is a very formidable and very important language,

00:40:47 but it’s also ugly in lots of ways.

00:40:49 And you can’t influence C++ without thinking

00:40:52 there has to be a better thing, right?

00:40:54 And so I started working on Swift, again,

00:40:56 with no hope or ambition that would go anywhere,

00:40:58 just let’s see what could be done,

00:41:00 let’s play around with this thing.

00:41:02 It was me in my spare time, not telling anybody about it,

00:41:06 kind of a thing, and it made some good progress.

00:41:09 I’m like, actually, it would make sense to do this.

00:41:11 At the same time, I started talking with the senior VP

00:41:14 of software at the time, a guy named Bertrand Serlet.

00:41:17 And Bertrand was very encouraging.

00:41:19 He was like, well, let’s have fun, let’s talk about this.

00:41:22 And he was a little bit of a language guy,

00:41:23 and so he helped guide some of the early work

00:41:26 and encouraged me and got things off the ground.

00:41:30 And eventually told my manager and told other people,

00:41:34 and it started making progress.

00:41:38 The complicating thing with Swift

00:41:40 was that the idea of doing a new language

00:41:43 was not obvious to anybody, including myself.

00:41:47 And the tone at the time was that the iPhone

00:41:50 was successful because of Objective C.

00:41:53 Oh, interesting.

00:41:54 Not despite of or just because of.

00:41:57 And you have to understand that at the time,

00:42:01 Apple was hiring software people that loved Objective C.

00:42:05 And it wasn’t that they came despite Objective C.

00:42:07 They loved Objective C, and that’s why they got hired.

00:42:10 And so you had a software team that the leadership,

00:42:13 in many cases, went all the way back to Next,

00:42:15 where Objective C really became real.

00:42:19 And so they, quote unquote, grew up writing Objective C.

00:42:23 And many of the individual engineers

00:42:25 all were hired because they loved Objective C.

00:42:28 And so this notion of, OK, let’s do new language

00:42:30 was kind of heretical in many ways.

00:42:34 Meanwhile, my sense was that the outside community wasn’t really

00:42:36 in love with Objective C. Some people were,

00:42:38 and some of the most outspoken people were.

00:42:40 But other people were hitting challenges

00:42:42 because it has very sharp corners

00:42:44 and it’s difficult to learn.

00:42:46 And so one of the challenges of making Swift happen that

00:42:50 was totally non technical is the social part of what do we do?

00:42:57 If we do a new language, which at Apple, many things

00:43:00 happen that don’t ship.

00:43:02 So if we ship it, what is the metrics of success?

00:43:05 Why would we do this?

00:43:06 Why wouldn’t we make Objective C better?

00:43:08 If Objective C has problems, let’s file off

00:43:10 those rough corners and edges.

00:43:12 And one of the major things that became the reason to do this

00:43:15 was this notion of safety, memory safety.

00:43:18 And the way Objective C works is that a lot of the object system

00:43:23 and everything else is built on top of pointers in C.

00:43:27 Objective C is an extension on top of C.

00:43:29 And so pointers are unsafe.

00:43:32 And if you get rid of the pointers,

00:43:34 it’s not Objective C anymore.

00:43:36 And so fundamentally, that was an issue

00:43:39 that you could not fix safety or memory safety

00:43:42 without fundamentally changing the language.

00:43:45 And so once we got through that part of the mental process

00:43:49 and the thought process, it became a design process

00:43:53 of saying, OK, well, if we’re going to do something new,

00:43:55 what is good?

00:43:56 How do we think about this?

00:43:57 And what do we like?

00:43:58 And what are we looking for?

00:44:00 And that was a very different phase of it.

00:44:02 So what are some design choices early on in Swift?

00:44:05 Like we’re talking about braces, are you

00:44:10 making a typed language or not, all those kinds of things.

00:44:13 Yeah, so some of those were obvious given the context.

00:44:16 So a typed language, for example,

00:44:17 Objective C is a typed language.

00:44:19 And going with an untyped language

00:44:22 wasn’t really seriously considered.

00:44:24 We wanted the performance, and we

00:44:26 wanted refactoring tools and other things

00:44:27 like that that go with typed languages.

00:44:29 Quick, dumb question.

00:44:31 Was it obvious, I think this would be a dumb question,

00:44:34 but was it obvious that the language

00:44:36 has to be a compiled language?

00:44:40 Yes, that’s not a dumb question.

00:44:42 Earlier, I think late 90s, Apple had seriously

00:44:44 considered moving its development experience to Java.

00:44:49 But Swift started in 2010, which was several years

00:44:53 after the iPhone.

00:44:53 It was when the iPhone was definitely

00:44:55 on an upward trajectory.

00:44:56 And the iPhone was still extremely,

00:44:58 and is still a bit memory constrained.

00:45:01 And so being able to compile the code

00:45:04 and then ship it and then having standalone code that

00:45:08 is not JIT compiled is a very big deal

00:45:11 and is very much part of the Apple value system.

00:45:15 Now, JavaScript’s also a thing.

00:45:17 I mean, it’s not that this is exclusive,

00:45:19 and technologies are good depending

00:45:21 on how they’re applied.

00:45:23 But in the design of Swift, saying,

00:45:26 how can we make Objective C better?

00:45:28 Objective C is statically compiled,

00:45:29 and that was the contiguous, natural thing to do.

00:45:32 Just skip ahead a little bit, and we’ll go right back.

00:45:35 Just as a question, as you think about today in 2019

00:45:40 in your work at Google, TensorFlow and so on,

00:45:42 is, again, compilations, static compilation still

00:45:48 the right thing?

00:45:49 Yeah, so the funny thing after working

00:45:52 on compilers for a really long time is that,

00:45:55 and this is one of the things that LLVM has helped with,

00:45:59 is that I don’t look at compilations

00:46:01 being static or dynamic or interpreted or not.

00:46:05 This is a spectrum.

00:46:07 And one of the cool things about Swift

00:46:09 is that Swift is not just statically compiled.

00:46:12 It’s actually dynamically compiled as well,

00:46:14 and it can also be interpreted.

00:46:15 Though, nobody’s actually done that.

00:46:17 And so what ends up happening when

00:46:20 you use Swift in a workbook, for example in Colab or in Jupyter,

00:46:24 is it’s actually dynamically compiling the statements

00:46:26 as you execute them.

00:46:28 And so this gets back to the software engineering problems,

00:46:32 where if you layer the stack properly,

00:46:34 you can actually completely change

00:46:37 how and when things get compiled because you

00:46:39 have the right abstractions there.

00:46:41 And so the way that a Colab workbook works with Swift

00:46:44 is that when you start typing into it,

00:46:47 it creates a process, a Unix process.

00:46:50 And then each line of code you type in,

00:46:52 it compiles it through the Swift compiler, the front end part,

00:46:56 and then sends it through the optimizer,

00:46:58 JIT compiles machine code, and then

00:47:01 injects it into that process.

00:47:03 And so as you’re typing new stuff,

00:47:05 it’s like squirting in new code and overwriting and replacing

00:47:09 and updating code in place.

00:47:11 And the fact that it can do this is not an accident.

00:47:13 Swift was designed for this.

00:47:15 But it’s an important part of how the language was set up

00:47:18 and how it’s layered, and this is a nonobvious piece.

00:47:21 And one of the things with Swift that

00:47:23 was, for me, a very strong design point

00:47:25 is to make it so that you can learn it very quickly.

00:47:29 And so from a language design perspective,

00:47:31 the thing that I always come back to

00:47:33 is this UI principle of progressive disclosure

00:47:36 of complexity.

00:47:37 And so in Swift, you can start by saying print, quote,

00:47:41 hello world, quote.

00:47:44 And there’s no slash n, just like Python, one line of code,

00:47:47 no main, no header files, no public static class void,

00:47:51 blah, blah, blah, string like Java has, one line of code.

00:47:55 And you can teach that, and it works great.

00:47:58 Then you can say, well, let’s introduce variables.

00:48:00 And so you can declare a variable with var.

00:48:02 So var x equals 4.

00:48:03 What is a variable?

00:48:04 You can use x, x plus 1.

00:48:06 This is what it means.

00:48:07 Then you can say, well, how about control flow?

00:48:09 Well, this is what an if statement is.

00:48:10 This is what a for statement is.

00:48:12 This is what a while statement is.

00:48:15 Then you can say, let’s introduce functions.

00:48:17 And many languages like Python have

00:48:20 had this kind of notion of let’s introduce small things,

00:48:22 and then you can add complexity.

00:48:24 Then you can introduce classes.

00:48:25 And then you can add generics, in the case of Swift.

00:48:28 And then you can build in modules

00:48:29 and build out in terms of the things that you’re expressing.

00:48:32 But this is not very typical for compiled languages.

00:48:35 And so this was a very strong design point,

00:48:38 and one of the reasons that Swift, in general,

00:48:40 is designed with this factoring of complexity in mind

00:48:43 so that the language can express powerful things.

00:48:46 You can write firmware in Swift if you want to.

00:48:49 But it has a very high level feel,

00:48:51 which is really this perfect blend, because often you

00:48:55 have very advanced library writers that

00:48:57 want to be able to use the nitty gritty details.

00:49:00 But then other people just want to use the libraries

00:49:02 and work at a higher abstraction level.

00:49:04 It’s kind of cool that I saw that you can just

00:49:07 interoperability.

00:49:09 I don’t think I pronounced that word enough.

00:49:11 But you can just drag in Python.

00:49:14 It’s just strange.

00:49:16 You can import, like I saw this in the demo.

00:49:19 How do you make that happen?

00:49:21 What’s up with that?

00:49:23 Is that as easy as it looks, or is it?

00:49:25 Yes, as easy as it looks.

00:49:27 That’s not a stage magic hack or anything like that.

00:49:29 I don’t mean from the user perspective.

00:49:31 I mean from the implementation perspective to make it happen.

00:49:34 So it’s easy once all the pieces are in place.

00:49:37 The way it works, so if you think about a dynamically typed

00:49:39 language like Python, you can think about it

00:49:41 in two different ways.

00:49:42 You can say it has no types, which

00:49:45 is what most people would say.

00:49:47 Or you can say it has one type.

00:49:50 And you can say it has one type, and it’s the Python object.

00:49:53 And the Python object gets passed around.

00:49:55 And because there’s only one type, it’s implicit.

00:49:58 And so what happens with Swift and Python talking

00:50:00 to each other, Swift has lots of types.

00:50:02 It has arrays, and it has strings, and all classes,

00:50:05 and that kind of stuff.

00:50:07 But it now has a Python object type.

00:50:11 So there is one Python object type.

00:50:12 And so when you say import NumPy, what you get

00:50:16 is a Python object, which is the NumPy module.

00:50:19 And then you say np.array.

00:50:21 It says, OK, hey, Python object, I have no idea what you are.

00:50:24 Give me your array member.

00:50:27 OK, cool.

00:50:27 And it just uses dynamic stuff, talks to the Python interpreter,

00:50:31 and says, hey, Python, what’s the.array member

00:50:33 in that Python object?

00:50:35 It gives you back another Python object.

00:50:37 And now you say parentheses for the call and the arguments

00:50:40 you’re going to pass.

00:50:40 And so then it says, hey, a Python object

00:50:43 that is the result of np.array, call with these arguments.

00:50:47 Again, calling into the Python interpreter to do that work.

00:50:50 And so right now, this is all really simple.

00:50:53 And if you dive into the code, what you’ll see

00:50:55 is that the Python module in Swift

00:50:58 is something like 1,200 lines of code or something.

00:51:01 It’s written in pure Swift.

00:51:02 It’s super simple.

00:51:03 And it’s built on top of the C interoperability

00:51:06 because it just talks to the Python interpreter.

00:51:09 But making that possible required

00:51:11 us to add two major language features to Swift

00:51:13 to be able to express these dynamic calls

00:51:15 and the dynamic member lookups.

00:51:17 And so what we’ve done over the last year

00:51:19 is we’ve proposed, implement, standardized, and contributed

00:51:23 new language features to the Swift language

00:51:26 in order to make it so it is really trivial.

00:51:29 And this is one of the things about Swift

00:51:31 that is critical to the Swift for TensorFlow work, which

00:51:35 is that we can actually add new language features.

00:51:37 And the bar for adding those is high,

00:51:39 but it’s what makes it possible.

00:51:42 So you’re now at Google doing incredible work

00:51:45 on several things, including TensorFlow.

00:51:47 So TensorFlow 2.0 or whatever leading up to 2.0 has,

00:51:53 by default, in 2.0, has eager execution.

00:51:56 And yet, in order to make code optimized for GPU or TPU

00:52:00 or some of these systems, computation

00:52:04 needs to be converted to a graph.

00:52:06 So what’s that process like?

00:52:07 What are the challenges there?

00:52:08 Yeah, so I am tangentially involved in this.

00:52:11 But the way that it works with Autograph

00:52:15 is that you mark your function with a decorator.

00:52:21 And when Python calls it, that decorator is invoked.

00:52:24 And then it says, before I call this function,

00:52:28 you can transform it.

00:52:29 And so the way Autograph works is, as far as I understand,

00:52:32 is it actually uses the Python parser

00:52:34 to go parse that, turn it into a syntax tree,

00:52:37 and now apply compiler techniques to, again,

00:52:39 transform this down into TensorFlow graphs.

00:52:42 And so you can think of it as saying, hey,

00:52:44 I have an if statement.

00:52:45 I’m going to create an if node in the graph,

00:52:48 like you say tf.cond.

00:52:51 You have a multiply.

00:52:53 Well, I’ll turn that into a multiply node in the graph.

00:52:55 And it becomes this tree transformation.

00:52:57 So where does the Swift for TensorFlow

00:53:00 come in, which is parallels?

00:53:04 For one, Swift is an interface.

00:53:06 Like, Python is an interface to TensorFlow.

00:53:09 But it seems like there’s a lot more going on in just

00:53:11 a different language interface.

00:53:13 There’s optimization methodology.

00:53:15 So the TensorFlow world has a couple

00:53:17 of different what I’d call front end technologies.

00:53:21 And so Swift and Python and Go and Rust and Julia

00:53:25 and all these things share the TensorFlow graphs

00:53:29 and all the runtime and everything that’s later.

00:53:32 And so Swift for TensorFlow is merely another front end

00:53:36 for TensorFlow, just like any of these other systems are.

00:53:40 There’s a major difference between, I would say,

00:53:43 three camps of technologies here.

00:53:44 There’s Python, which is a special case,

00:53:46 because the vast majority of the community effort

00:53:49 is going to the Python interface.

00:53:51 And Python has its own approaches

00:53:52 for automatic differentiation.

00:53:54 It has its own APIs and all this kind of stuff.

00:53:58 There’s Swift, which I’ll talk about in a second.

00:54:00 And then there’s kind of everything else.

00:54:02 And so the everything else are effectively language bindings.

00:54:05 So they call into the TensorFlow runtime,

00:54:07 but they usually don’t have automatic differentiation

00:54:10 or they usually don’t provide anything other than APIs

00:54:14 that call the C APIs in TensorFlow.

00:54:16 And so they’re kind of wrappers for that.

00:54:18 Swift is really kind of special.

00:54:19 And it’s a very different approach.

00:54:22 Swift for TensorFlow, that is, is a very different approach.

00:54:25 Because there we’re saying, let’s

00:54:26 look at all the problems that need

00:54:28 to be solved in the full stack of the TensorFlow compilation

00:54:34 process, if you think about it that way.

00:54:35 Because TensorFlow is fundamentally a compiler.

00:54:38 It takes models, and then it makes them go fast on hardware.

00:54:42 That’s what a compiler does.

00:54:43 And it has a front end, it has an optimizer,

00:54:47 and it has many back ends.

00:54:49 And so if you think about it the right way,

00:54:51 or if you look at it in a particular way,

00:54:54 it is a compiler.

00:54:59 And so Swift is merely another front end.

00:55:02 But it’s saying, and the design principle is saying,

00:55:05 let’s look at all the problems that we face as machine

00:55:08 learning practitioners and what is the best possible way we

00:55:11 can do that, given the fact that we can change literally

00:55:13 anything in this entire stack.

00:55:15 And Python, for example, where the vast majority

00:55:18 of the engineering and effort has gone into,

00:55:22 is constrained by being the best possible thing you

00:55:25 can do with a Python library.

00:55:27 There are no Python language features

00:55:29 that are added because of machine learning

00:55:31 that I’m aware of.

00:55:32 They added a matrix multiplication operator

00:55:34 with that, but that’s as close as you get.

00:55:38 And so with Swift, it’s hard, but you

00:55:41 can add language features to the language.

00:55:43 And there’s a community process for that.

00:55:46 And so we look at these things and say, well,

00:55:48 what is the right division of labor

00:55:49 between the human programmer and the compiler?

00:55:52 And Swift has a number of things that shift that balance.

00:55:55 So because it has a type system, for example,

00:56:00 that makes certain things possible for analysis

00:56:02 of the code, and the compiler can automatically

00:56:05 build graphs for you without you thinking about them.

00:56:08 That’s a big deal for a programmer.

00:56:10 You just get free performance.

00:56:11 You get clustering and fusion and optimization,

00:56:14 things like that, without you as a programmer

00:56:17 having to manually do it because the compiler can do it for you.

00:56:20 Automatic differentiation is another big deal.

00:56:22 And I think one of the key contributions of the Swift

00:56:25 TensorFlow project is that there’s

00:56:29 this entire body of work on automatic differentiation

00:56:32 that dates back to the Fortran days.

00:56:34 People doing a tremendous amount of numerical computing

00:56:36 in Fortran used to write these what they call source

00:56:39 to source translators, where you take a bunch of code,

00:56:43 shove it into a mini compiler, and it would push out

00:56:46 more Fortran code.

00:56:48 But it would generate the backwards passes

00:56:50 for your functions for you, the derivatives.

00:56:53 And so in that work in the 70s, a tremendous number

00:56:57 of optimizations, a tremendous number of techniques

00:57:01 for fixing numerical instability,

00:57:02 and other kinds of problems were developed.

00:57:05 But they’re very difficult to port into a world

00:57:07 where, in eager execution, you get an op by op at a time.

00:57:11 You need to be able to look at an entire function

00:57:13 and be able to reason about what’s going on.

00:57:15 And so when you have a language integrated automatic

00:57:18 differentiation, which is one of the things

00:57:20 that the Swift project is focusing on,

00:57:22 you can open all these techniques

00:57:24 and reuse them in familiar ways.

00:57:28 But the language integration piece

00:57:30 has a bunch of design room in it, and it’s also complicated.

00:57:33 The other piece of the puzzle here that’s kind of interesting

00:57:35 is TPUs at Google.

00:57:37 So we’re in a new world with deep learning.

00:57:40 It constantly is changing, and I imagine,

00:57:42 without disclosing anything, I imagine

00:57:46 you’re still innovating on the TPU front, too.

00:57:48 Indeed.

00:57:49 So how much interplay is there between software and hardware

00:57:53 in trying to figure out how to together move

00:57:55 towards an optimized solution?

00:57:56 There’s an incredible amount.

00:57:57 So we’re on our third generation of TPUs,

00:57:59 which are now 100 petaflops in a very large liquid cooled box,

00:58:04 virtual box with no cover.

00:58:07 And as you might imagine, we’re not out of ideas yet.

00:58:11 The great thing about TPUs is that they’re

00:58:14 a perfect example of hardware software co design.

00:58:17 And so it’s about saying, what hardware

00:58:19 do we build to solve certain classes of machine learning

00:58:23 problems?

00:58:23 Well, the algorithms are changing.

00:58:26 The hardware takes some cases years to produce.

00:58:30 And so you have to make bets and decide

00:58:32 what is going to happen and what is the best way to spend

00:58:36 the transistors to get the maximum performance per watt

00:58:39 or area per cost or whatever it is that you’re optimizing for.

00:58:44 And so one of the amazing things about TPUs

00:58:46 is this numeric format called bfloat16.

00:58:49 bfloat16 is a compressed 16 bit floating point format,

00:58:54 but it puts the bits in different places.

00:58:55 And in numeric terms, it has a smaller mantissa

00:58:58 and a larger exponent.

00:59:00 That means that it’s less precise,

00:59:02 but it can represent larger ranges of values,

00:59:05 which in the machine learning context

00:59:07 is really important and useful because sometimes you

00:59:09 have very small gradients you want to accumulate

00:59:13 and very, very small numbers that

00:59:17 are important to move things as you’re learning.

00:59:20 But sometimes you have very large magnitude numbers as well.

00:59:23 And bfloat16 is not as precise.

00:59:26 The mantissa is small.

00:59:28 But it turns out the machine learning algorithms actually

00:59:30 want to generalize.

00:59:31 And so there’s theories that this actually

00:59:34 increases the ability for the network

00:59:36 to generalize across data sets.

00:59:37 And regardless of whether it’s good or bad,

00:59:41 it’s much cheaper at the hardware level to implement

00:59:43 because the area and time of a multiplier

00:59:48 is n squared in the number of bits in the mantissa,

00:59:50 but it’s linear with size of the exponent.

00:59:53 And you’re connected to both efforts

00:59:55 here both on the hardware and the software side?

00:59:57 Yeah, and so that was a breakthrough

00:59:58 coming from the research side and people

01:00:01 working on optimizing network transport of weights

01:00:06 across the network originally and trying

01:00:08 to find ways to compress that.

01:00:10 But then it got burned into silicon.

01:00:12 And it’s a key part of what makes TPU performance

01:00:14 so amazing and great.

01:00:17 Now, TPUs have many different aspects that are important.

01:00:20 But the co design between the low level compiler bits

01:00:25 and the software bits and the algorithms

01:00:27 is all super important.

01:00:28 And it’s this amazing trifecta that only Google can do.

01:00:32 Yeah, that’s super exciting.

01:00:34 So can you tell me about MLIR project, previously

01:00:39 the secretive one?

01:00:41 Yeah, so MLIR is a project that we

01:00:43 announced at a compiler conference three weeks ago

01:00:47 or something at the Compilers for Machine Learning

01:00:49 conference.

01:00:50 Basically, again, if you look at TensorFlow as a compiler stack,

01:00:53 it has a number of compiler algorithms within it.

01:00:56 It also has a number of compilers

01:00:57 that get embedded into it.

01:00:59 And they’re made by different vendors.

01:01:00 For example, Google has XLA, which

01:01:02 is a great compiler system.

01:01:04 NVIDIA has TensorRT.

01:01:06 Intel has NGRAPH.

01:01:08 There’s a number of these different compiler systems.

01:01:10 And they’re very hardware specific.

01:01:13 And they’re trying to solve different parts of the problems.

01:01:16 But they’re all kind of similar in a sense of they

01:01:19 want to integrate with TensorFlow.

01:01:20 Now, TensorFlow has an optimizer.

01:01:22 And it has these different code generation technologies

01:01:25 built in.

01:01:26 The idea of MLIR is to build a common infrastructure

01:01:28 to support all these different subsystems.

01:01:31 And initially, it’s to be able to make it

01:01:33 so that they all plug in together

01:01:34 and they can share a lot more code and can be reusable.

01:01:37 But over time, we hope that the industry

01:01:39 will start collaborating and sharing code.

01:01:42 And instead of reinventing the same things over and over again,

01:01:45 that we can actually foster some of that working together

01:01:49 to solve common problem energy that

01:01:51 has been useful in the compiler field before.

01:01:54 Beyond that, MLIR is some people have joked

01:01:57 that it’s kind of LLVM too.

01:01:59 It learns a lot about what LLVM has been good

01:02:01 and what LLVM has done wrong.

01:02:04 And it’s a chance to fix that.

01:02:06 And also, there are challenges in the LLVM ecosystem as well,

01:02:09 where LLVM is very good at the thing it was designed to do.

01:02:12 But 20 years later, the world has changed.

01:02:15 And people are trying to solve higher level problems.

01:02:17 And we need some new technology.

01:02:20 And what’s the future of open source in this context?

01:02:24 Very soon.

01:02:25 So it is not yet open source.

01:02:27 But it will be hopefully in the next couple months.

01:02:29 So you still believe in the value of open source

01:02:31 in these kinds of contexts?

01:02:31 Oh, yeah.

01:02:31 Absolutely.

01:02:32 And I think that the TensorFlow community at large

01:02:36 fully believes in open source.

01:02:37 So I mean, there is a difference between Apple,

01:02:40 where you were previously, and Google now,

01:02:42 in spirit and culture.

01:02:43 And I would say the open source in TensorFlow

01:02:45 was a seminal moment in the history of software,

01:02:48 because here’s this large company releasing

01:02:51 a very large code base that’s open sourcing.

01:02:56 What are your thoughts on that?

01:02:58 Happy or not, were you to see that kind

01:03:00 of degree of open sourcing?

01:03:02 So between the two, I prefer the Google approach,

01:03:05 if that’s what you’re saying.

01:03:07 The Apple approach makes sense, given the historical context

01:03:12 that Apple came from.

01:03:13 But that’s been 35 years ago.

01:03:15 And I think that Apple is definitely adapting.

01:03:18 And the way I look at it is that there’s

01:03:20 different kinds of concerns in the space.

01:03:23 It is very rational for a business

01:03:24 to care about making money.

01:03:28 That fundamentally is what a business is about.

01:03:31 But I think it’s also incredibly realistic to say,

01:03:34 it’s not your string library that’s

01:03:36 the thing that’s going to make you money.

01:03:38 It’s going to be the amazing UI product differentiating

01:03:41 features and other things like that that you built on top

01:03:43 of your string library.

01:03:45 And so keeping your string library

01:03:48 proprietary and secret and things

01:03:50 like that is maybe not the important thing anymore.

01:03:54 Where before, platforms were different.

01:03:57 And even 15 years ago, things were a little bit different.

01:04:01 But the world is changing.

01:04:02 So Google strikes a very good balance,

01:04:04 I think.

01:04:05 And I think that TensorFlow being open source really

01:04:09 changed the entire machine learning field

01:04:12 and caused a revolution in its own right.

01:04:14 And so I think it’s amazingly forward looking

01:04:17 because I could have imagined, and I wasn’t at Google

01:04:20 at the time, but I could imagine a different context

01:04:23 and different world where a company says,

01:04:25 machine learning is critical to what we’re doing.

01:04:27 We’re not going to give it to other people.

01:04:29 And so that decision is a profoundly brilliant insight

01:04:35 that I think has really led to the world being

01:04:37 better and better for Google as well.

01:04:40 And has all kinds of ripple effects.

01:04:42 I think it is really, I mean, you

01:04:45 can’t understate Google deciding how profound that

01:04:48 is for software.

01:04:49 It’s awesome.

01:04:50 Well, and again, I can understand the concern

01:04:54 about if we release our machine learning software,

01:04:58 our competitors could go faster.

01:05:00 But on the other hand, I think that open sourcing TensorFlow

01:05:02 has been fantastic for Google.

01:05:03 And I’m sure that decision was very nonobvious at the time,

01:05:09 but I think it’s worked out very well.

01:05:11 So let’s try this real quick.

01:05:13 You were at Tesla for five months

01:05:15 as the VP of autopilot software.

01:05:17 You led the team during the transition from H hardware

01:05:20 one to hardware two.

01:05:22 I have a couple of questions.

01:05:23 So one, first of all, to me, that’s

01:05:26 one of the bravest engineering decisions undertaking really

01:05:33 ever in the automotive industry to me, software wise,

01:05:36 starting from scratch.

01:05:37 It’s a really brave engineering decision.

01:05:39 So my one question there is, what was that like?

01:05:42 What was the challenge of that?

01:05:43 Do you mean the career decision of jumping

01:05:45 from a comfortable good job into the unknown, or?

01:05:48 That combined, so at the individual level,

01:05:51 you making that decision.

01:05:54 And then when you show up, it’s a really hard engineering

01:05:57 problem.

01:05:58 So you could just stay, maybe slow down,

01:06:03 say hardware one, or those kinds of decisions.

01:06:06 Just taking it full on, let’s do this from scratch.

01:06:10 What was that like?

01:06:11 Well, so I mean, I don’t think Tesla

01:06:12 has a culture of taking things slow and seeing how it goes.

01:06:16 And one of the things that attracted me about Tesla

01:06:18 is it’s very much a gung ho, let’s change the world,

01:06:20 let’s figure it out kind of a place.

01:06:21 And so I have a huge amount of respect for that.

01:06:25 Tesla has done very smart things with hardware one

01:06:28 in particular.

01:06:29 And the hardware one design was originally

01:06:32 designed to be very simple automation features

01:06:36 in the car for like traffic aware cruise control and things

01:06:39 like that.

01:06:39 And the fact that they were able to effectively feature creep

01:06:42 it into lane holding and a very useful driver assistance

01:06:47 feature is pretty astounding, particularly given

01:06:50 the details of the hardware.

01:06:52 Hardware two built on that in a lot of ways.

01:06:54 And the challenge there was that they

01:06:56 were transitioning from a third party provided vision stack

01:07:00 to an in house built vision stack.

01:07:01 And so for the first step, which I mostly helped with,

01:07:05 was getting onto that new vision stack.

01:07:08 And that was very challenging.

01:07:10 And it was time critical for various reasons,

01:07:14 and it was a big leap.

01:07:14 But it was fortunate that it built

01:07:16 on a lot of the knowledge and expertise and the team

01:07:18 that had built hardware one’s driver assistance features.

01:07:22 So you spoke in a collected and kind way

01:07:25 about your time at Tesla, but it was ultimately not a good fit.

01:07:28 Elon Musk, we’ve talked on this podcast,

01:07:31 several guests to the course, Elon Musk

01:07:33 continues to do some of the most bold and innovative engineering

01:07:36 work in the world, at times at the cost

01:07:39 some of the members of the Tesla team.

01:07:41 What did you learn about working in this chaotic world

01:07:45 with Elon?

01:07:46 Yeah, so I guess I would say that when I was at Tesla,

01:07:50 I experienced and saw the highest degree of turnover

01:07:54 I’d ever seen in a company, which was a bit of a shock.

01:07:58 But one of the things I learned and I came to respect

01:08:00 is that Elon’s able to attract amazing talent because he

01:08:03 has a very clear vision of the future,

01:08:05 and he can get people to buy into it

01:08:07 because they want that future to happen.

01:08:09 And the power of vision is something

01:08:11 that I have a tremendous amount of respect for.

01:08:14 And I think that Elon is fairly singular

01:08:17 in the world in terms of the things

01:08:20 he’s able to get people to believe in.

01:08:22 And there are many people that stand in the street corner

01:08:27 and say, ah, we’re going to go to Mars, right?

01:08:30 But then there are a few people that

01:08:31 can get others to buy into it and believe and build the path

01:08:35 and make it happen.

01:08:36 And so I respect that.

01:08:39 I don’t respect all of his methods,

01:08:41 but I have a huge amount of respect for that.

01:08:45 You’ve mentioned in a few places,

01:08:46 including in this context, working hard.

01:08:50 What does it mean to work hard?

01:08:52 And when you look back at your life,

01:08:53 what were some of the most brutal periods

01:08:57 of having to really put everything

01:09:00 you have into something?

01:09:03 Yeah, good question.

01:09:05 So working hard can be defined a lot of different ways,

01:09:07 so a lot of hours, and so that is true.

01:09:12 The thing to me that’s the hardest

01:09:14 is both being short term focused on delivering and executing

01:09:18 and making a thing happen while also thinking

01:09:21 about the longer term and trying to balance that.

01:09:24 Because if you are myopically focused on solving a task

01:09:28 and getting that done and only think

01:09:31 about that incremental next step,

01:09:32 you will miss the next big hill you should jump over to.

01:09:36 And so I’ve been really fortunate that I’ve

01:09:39 been able to kind of oscillate between the two.

01:09:42 And historically at Apple, for example, that

01:09:45 was made possible because I was able to work with some really

01:09:47 amazing people and build up teams and leadership

01:09:50 structures and allow them to grow in their careers

01:09:55 and take on responsibility, thereby freeing up

01:09:58 me to be a little bit crazy and thinking about the next thing.

01:10:02 And so it’s a lot of that.

01:10:04 But it’s also about with experience,

01:10:06 you make connections that other people don’t necessarily make.

01:10:10 And so I think that’s a big part as well.

01:10:12 But the bedrock is just a lot of hours.

01:10:16 And that’s OK with me.

01:10:19 There’s different theories on work life balance.

01:10:21 And my theory for myself, which I do not project onto the team,

01:10:25 but my theory for myself is that I

01:10:28 want to love what I’m doing and work really hard.

01:10:30 And my purpose, I feel like, and my goal is to change the world

01:10:35 and make it a better place.

01:10:36 And that’s what I’m really motivated to do.

01:10:40 So last question, LLVM logo is a dragon.

01:10:44 You explain that this is because dragons have connotations

01:10:47 of power, speed, intelligence.

01:10:50 It can also be sleek, elegant, and modular,

01:10:53 though you remove the modular part.

01:10:56 What is your favorite dragon related character

01:10:58 from fiction, video, or movies?

01:11:01 So those are all very kind ways of explaining it.

01:11:03 Do you want to know the real reason it’s a dragon?

01:11:06 Yeah.

01:11:07 Is that better?

01:11:07 So there is a seminal book on compiler design

01:11:11 called The Dragon Book.

01:11:12 And so this is a really old now book on compilers.

01:11:16 And so the dragon logo for LLVM came about because at Apple,

01:11:22 we kept talking about LLVM related technologies

01:11:24 and there’s no logo to put on a slide.

01:11:26 And so we’re like, what do we do?

01:11:28 And somebody’s like, well, what kind of logo

01:11:30 should a compiler technology have?

01:11:32 And I’m like, I don’t know.

01:11:33 I mean, the dragon is the best thing that we’ve got.

01:11:37 And Apple somehow magically came up with the logo.

01:11:41 And it was a great thing.

01:11:42 And the whole community rallied around it.

01:11:44 And then it got better as other graphic designers

01:11:46 got involved.

01:11:47 But that’s originally where it came from.

01:11:49 The story.

01:11:50 Is there dragons from fiction that you

01:11:51 connect with, that Game of Thrones, Lord of the Rings,

01:11:57 that kind of thing?

01:11:58 Lord of the Rings is great.

01:11:59 I also like role playing games and things

01:12:00 like computer role playing games.

01:12:02 And so dragons often show up in there.

01:12:04 But really, it comes back to the book.

01:12:07 Oh, no, we need a thing.

01:12:09 And hilariously, one of the funny things about LLVM

01:12:13 is that my wife, who’s amazing, runs the LLVM Foundation.

01:12:19 And she goes to Grace Hopper and is

01:12:21 trying to get more women involved in the.

01:12:23 She’s also a compiler engineer.

01:12:24 So she’s trying to get other women

01:12:26 to get interested in compilers and things like this.

01:12:28 And so she hands out the stickers.

01:12:30 And people like the LLVM sticker because of Game of Thrones.

01:12:34 And so sometimes culture has this helpful effect

01:12:36 to get the next generation of compiler engineers

01:12:39 engaged with the cause.

01:12:42 OK, awesome.

01:12:43 Chris, thanks so much for talking with us.

01:12:44 It’s been great talking with you.