Transcript
00:00:00 The following is a conversation with Chris Latner.
00:00:02 Currently, he’s a senior director
00:00:04 at Google working on several projects, including CPU, GPU,
00:00:08 TPU accelerators for TensorFlow, Swift for TensorFlow,
00:00:12 and all kinds of machine learning compiler magic
00:00:14 going on behind the scenes.
00:00:16 He’s one of the top experts in the world
00:00:18 on compiler technologies, which means he deeply
00:00:21 understands the intricacies of how hardware and software come
00:00:25 together to create efficient code.
00:00:27 He created the LLVM compiler infrastructure project
00:00:31 and the Clang compiler.
00:00:33 He led major engineering efforts at Apple,
00:00:36 including the creation of the Swift programming language.
00:00:39 He also briefly spent time at Tesla
00:00:41 as vice president of Autopilot software
00:00:44 during the transition from Autopilot hardware 1
00:00:46 to hardware 2, when Tesla essentially
00:00:49 started from scratch to build an in house software
00:00:52 infrastructure for Autopilot.
00:00:54 I could have easily talked to Chris for many more hours.
00:00:58 Compiling code down across the levels of abstraction
00:01:01 is one of the most fundamental and fascinating aspects
00:01:04 of what computers do, and he is one of the world
00:01:06 experts in this process.
00:01:08 It’s rigorous science, and it’s messy, beautiful art.
00:01:12 This conversation is part of the Artificial Intelligence
00:01:15 podcast.
00:01:16 If you enjoy it, subscribe on YouTube, iTunes,
00:01:19 or simply connect with me on Twitter at Lex Friedman,
00:01:22 spelled F R I D.
00:01:24 And now, here’s my conversation with Chris Ladner.
00:01:29 What was the first program you’ve ever written?
00:01:33 My first program.
00:01:34 Back, and when was it?
00:01:35 I think I started as a kid, and my parents
00:01:39 got a basic programming book.
00:01:41 And so when I started, it was typing out programs
00:01:44 from a book, and seeing how they worked,
00:01:46 and then typing them in wrong, and trying
00:01:49 to figure out why they were not working right,
00:01:51 that kind of stuff.
00:01:52 So BASIC, what was the first language
00:01:54 that you remember yourself maybe falling in love with,
00:01:58 like really connecting with?
00:01:59 I don’t know.
00:02:00 I mean, I feel like I’ve learned a lot along the way,
00:02:02 and each of them have a different special thing
00:02:05 about them.
00:02:06 So I started in BASIC, and then went like GW BASIC,
00:02:09 which was the thing back in the DOS days,
00:02:11 and then upgraded to QBASIC, and eventually QuickBASIC,
00:02:15 which are all slightly more fancy versions of Microsoft
00:02:18 BASIC.
00:02:19 Made the jump to Pascal, and started
00:02:21 doing machine language programming and assembly
00:02:23 in Pascal, which was really cool.
00:02:25 Turbo Pascal was amazing for its day.
00:02:28 Eventually got into C, C++, and then kind of did
00:02:31 lots of other weird things.
00:02:33 I feel like you took the dark path, which is the,
00:02:37 you could have gone Lisp.
00:02:39 Yeah.
00:02:40 You could have gone higher level sort
00:02:41 of functional philosophical hippie route.
00:02:44 Instead, you went into like the dark arts of the C.
00:02:48 It was straight into the machine.
00:02:49 Straight to the machine.
00:02:50 So I started with BASIC, Pascal, and then Assembly,
00:02:53 and then wrote a lot of Assembly.
00:02:55 And I eventually did Smalltalk and other things like that.
00:03:00 But that was not the starting point.
00:03:01 But so what is this journey to C?
00:03:05 Is that in high school?
00:03:06 Is that in college?
00:03:07 That was in high school, yeah.
00:03:09 And then that was really about trying
00:03:13 to be able to do more powerful things than what Pascal could
00:03:16 do, and also to learn a different world.
00:03:18 So he was really confusing to me with pointers
00:03:20 and the syntax and everything, and it took a while.
00:03:23 But Pascal’s much more principled in various ways.
00:03:28 C is more, I mean, it has its historical roots,
00:03:33 but it’s not as easy to learn.
00:03:35 With pointers, there’s this memory management thing
00:03:39 that you have to become conscious of.
00:03:41 Is that the first time you start to understand
00:03:43 that there’s resources that you’re supposed to manage?
00:03:46 Well, so you have that in Pascal as well.
00:03:48 But in Pascal, like the caret instead of the star,
00:03:51 there’s some small differences like that.
00:03:53 But it’s not about pointer arithmetic.
00:03:55 And in C, you end up thinking about how things get
00:03:58 laid out in memory a lot more.
00:04:00 And so in Pascal, you have allocating and deallocating
00:04:04 and owning the memory, but just the programs are simpler,
00:04:07 and you don’t have to.
00:04:10 Well, for example, Pascal has a string type.
00:04:12 And so you can think about a string
00:04:14 instead of an array of characters
00:04:15 which are consecutive in memory.
00:04:17 So it’s a little bit of a higher level abstraction.
00:04:20 So let’s get into it.
00:04:22 Let’s talk about LLVM, C lang, and compilers.
00:04:25 Sure.
00:04:26 So can you tell me first what LLVM and C lang are?
00:04:32 And how is it that you find yourself
00:04:33 the creator and lead developer, one
00:04:35 of the most powerful compiler optimization systems
00:04:39 in use today?
00:04:40 Sure.
00:04:40 So I guess they’re different things.
00:04:43 So let’s start with what is a compiler?
00:04:47 Is that a good place to start?
00:04:48 What are the phases of a compiler?
00:04:50 Where are the parts?
00:04:50 Yeah, what is it?
00:04:51 So what is even a compiler used for?
00:04:53 So the way I look at this is you have a two sided problem of you
00:04:57 have humans that need to write code.
00:05:00 And then you have machines that need to run
00:05:01 the program that the human wrote.
00:05:03 And for lots of reasons, the humans
00:05:05 don’t want to be writing in binary
00:05:07 and want to think about every piece of hardware.
00:05:09 And so at the same time that you have lots of humans,
00:05:12 you also have lots of kinds of hardware.
00:05:14 And so compilers are the art of allowing
00:05:17 humans to think at a level of abstraction
00:05:19 that they want to think about.
00:05:20 And then get that program, get the thing that they wrote,
00:05:23 to run on a specific piece of hardware.
00:05:26 And the interesting and exciting part of all this
00:05:29 is that there’s now lots of different kinds of hardware,
00:05:32 chips like x86 and PowerPC and ARM and things like that.
00:05:35 But also high performance accelerators
00:05:37 for machine learning and other things like that
00:05:38 are also just different kinds of hardware, GPUs.
00:05:41 These are new kinds of hardware.
00:05:42 And at the same time, on the programming side of it,
00:05:45 you have basic, you have C, you have JavaScript,
00:05:48 you have Python, you have Swift.
00:05:50 You have lots of other languages
00:05:52 that are all trying to talk to the human in a different way
00:05:55 to make them more expressive and capable and powerful.
00:05:58 And so compilers are the thing
00:06:01 that goes from one to the other.
00:06:03 End to end, from the very beginning to the very end.
00:06:05 End to end.
00:06:06 And so you go from what the human wrote
00:06:08 and programming languages end up being about
00:06:11 expressing intent, not just for the compiler
00:06:14 and the hardware, but the programming language’s job
00:06:17 is really to capture an expression
00:06:20 of what the programmer wanted
00:06:22 that then can be maintained and adapted
00:06:25 and evolved by other humans,
00:06:27 as well as interpreted by the compiler.
00:06:29 So when you look at this problem,
00:06:31 you have, on the one hand, humans, which are complicated.
00:06:34 And you have hardware, which is complicated.
00:06:36 And so compilers typically work in multiple phases.
00:06:39 And so the software engineering challenge
00:06:42 that you have here is try to get maximum reuse
00:06:45 out of the amount of code that you write,
00:06:47 because these compilers are very complicated.
00:06:49 And so the way it typically works out
00:06:51 is that you have something called a front end or a parser
00:06:54 that is language specific.
00:06:56 And so you’ll have a C parser, and that’s what Clang is,
00:07:00 or C++ or JavaScript or Python or whatever.
00:07:03 That’s the front end.
00:07:05 Then you’ll have a middle part,
00:07:07 which is often the optimizer.
00:07:09 And then you’ll have a late part,
00:07:11 which is hardware specific.
00:07:13 And so compilers end up,
00:07:15 there’s many different layers often,
00:07:16 but these three big groups are very common in compilers.
00:07:20 And what LLVM is trying to do
00:07:22 is trying to standardize that middle and last part.
00:07:25 And so one of the cool things about LLVM
00:07:27 is that there are a lot of different languages
00:07:29 that compile through to it.
00:07:31 And so things like Swift, but also Julia, Rust,
00:07:35 Clang for C, C++, Subjective C,
00:07:39 like these are all very different languages
00:07:40 and they can all use the same optimization infrastructure,
00:07:43 which gets better performance,
00:07:45 and the same code generation infrastructure
00:07:47 for hardware support.
00:07:48 And so LLVM is really that layer that is common,
00:07:52 that all these different specific compilers can use.
00:07:55 And is it a standard, like a specification,
00:07:59 or is it literally an implementation?
00:08:01 It’s an implementation.
00:08:02 And so I think there’s a couple of different ways
00:08:05 of looking at it, right?
00:08:06 Because it depends on which angle you’re looking at it from.
00:08:09 LLVM ends up being a bunch of code, okay?
00:08:12 So it’s a bunch of code that people reuse
00:08:14 and they build compilers with.
00:08:16 We call it a compiler infrastructure
00:08:18 because it’s kind of the underlying platform
00:08:20 that you build a concrete compiler on top of.
00:08:22 But it’s also a community.
00:08:23 And the LLVM community is hundreds of people
00:08:26 that all collaborate.
00:08:27 And one of the most fascinating things about LLVM
00:08:30 over the course of time is that we’ve managed somehow
00:08:34 to successfully get harsh competitors
00:08:37 in the commercial space to collaborate
00:08:39 on shared infrastructure.
00:08:41 And so you have Google and Apple,
00:08:43 you have AMD and Intel,
00:08:45 you have Nvidia and AMD on the graphics side,
00:08:48 you have Cray and everybody else doing these things.
00:08:52 And all these companies are collaborating together
00:08:55 to make that shared infrastructure really, really great.
00:08:58 And they do this not out of the goodness of their heart,
00:09:01 but they do it because it’s in their commercial interest
00:09:03 of having really great infrastructure
00:09:05 that they can build on top of
00:09:06 and facing the reality that it’s so expensive
00:09:09 that no one company, even the big companies,
00:09:11 no one company really wants to implement it all themselves.
00:09:14 Expensive or difficult?
00:09:16 Both.
00:09:16 That’s a great point because it’s also about the skill sets.
00:09:20 And the skill sets are very hard to find.
00:09:26 How big is the LLVM?
00:09:27 It always seems like with open source projects,
00:09:30 the kind, an LLVM is open source?
00:09:33 Yes, it’s open source.
00:09:34 It’s about, it’s 19 years old now, so it’s fairly old.
00:09:38 It seems like the magic often happens
00:09:40 within a very small circle of people.
00:09:43 Yes.
00:09:43 At least their early birth and whatever.
00:09:46 Yes, so the LLVM came from a university project,
00:09:49 and so I was at the University of Illinois.
00:09:51 And there it was myself, my advisor,
00:09:53 and then a team of two or three research students
00:09:57 in the research group,
00:09:58 and we built many of the core pieces initially.
00:10:02 I then graduated and went to Apple,
00:10:03 and at Apple brought it to the products,
00:10:06 first in the OpenGL graphics stack,
00:10:09 but eventually to the C compiler realm,
00:10:11 and eventually built Clang,
00:10:12 and eventually built Swift and these things.
00:10:14 Along the way, building a team of people
00:10:16 that are really amazing compiler engineers
00:10:18 that helped build a lot of that.
00:10:20 And so as it was gaining momentum
00:10:21 and as Apple was using it, being open source and public
00:10:24 and encouraging contribution,
00:10:26 many others, for example, at Google,
00:10:28 came in and started contributing.
00:10:30 And in some cases, Google effectively owns Clang now
00:10:33 because it cares so much about C++
00:10:35 and the evolution of that ecosystem,
00:10:37 and so it’s investing a lot in the C++ world
00:10:41 and the tooling and things like that.
00:10:42 And so likewise, NVIDIA cares a lot about CUDA.
00:10:47 And so CUDA uses Clang and uses LLVM
00:10:50 for graphics and GPGPU.
00:10:54 And so when you first started as a master’s project,
00:10:58 I guess, did you think it was gonna go as far as it went?
00:11:02 Were you crazy ambitious about it?
00:11:06 No.
00:11:07 It seems like a really difficult undertaking, a brave one.
00:11:09 Yeah, no, no, no, it was nothing like that.
00:11:11 So my goal when I went to the University of Illinois
00:11:13 was to get in and out with a non thesis masters in a year
00:11:17 and get back to work.
00:11:18 So I was not planning to stay for five years
00:11:22 and build this massive infrastructure.
00:11:24 I got nerd sniped into staying.
00:11:27 And a lot of it was because LLVM was fun
00:11:29 and I was building cool stuff
00:11:30 and learning really interesting things
00:11:33 and facing both software engineering challenges,
00:11:36 but also learning how to work in a team
00:11:38 and things like that.
00:11:40 I had worked at many companies as interns before that,
00:11:43 but it was really a different thing
00:11:45 to have a team of people that are working together
00:11:48 and try and collaborate in version control.
00:11:50 And it was just a little bit different.
00:11:52 Like I said, I just talked to Don Knuth
00:11:54 and he believes that 2% of the world population
00:11:56 have something weird with their brain,
00:11:58 that they’re geeks, they understand computers,
00:12:01 they’re connected with computers.
00:12:02 He put it at exactly 2%.
00:12:04 Okay, so.
00:12:05 He’s a specific guy.
00:12:06 It’s very specific.
00:12:08 Well, he says, I can’t prove it,
00:12:10 but it’s very empirically there.
00:12:13 Is there something that attracts you
00:12:14 to the idea of optimizing code?
00:12:16 And he seems like that’s one of the biggest,
00:12:19 coolest things about LLVM.
00:12:20 Yeah, that’s one of the major things it does.
00:12:22 So I got into that because of a person, actually.
00:12:26 So when I was in my undergraduate,
00:12:28 I had an advisor, or a professor named Steve Vegdahl.
00:12:32 And he, I went to this little tiny private school.
00:12:35 There were like seven or nine people
00:12:38 in my computer science department,
00:12:40 students in my class.
00:12:43 So it was a very tiny, very small school.
00:12:47 It was kind of a wart on the side of the math department
00:12:49 kind of a thing at the time.
00:12:51 I think it’s evolved a lot in the many years since then.
00:12:53 But Steve Vegdahl was a compiler guy.
00:12:58 And he was super passionate.
00:12:59 And his passion rubbed off on me.
00:13:02 And one of the things I like about compilers
00:13:04 is that they’re large, complicated software pieces.
00:13:09 And so one of the culminating classes
00:13:12 that many computer science departments,
00:13:14 at least at the time, did was to say
00:13:16 that you would take algorithms and data structures
00:13:18 and all these core classes.
00:13:19 But then the compilers class was one of the last classes
00:13:21 you take because it pulls everything together.
00:13:24 And then you work on one piece of code
00:13:26 over the entire semester.
00:13:28 And so you keep building on your own work,
00:13:32 which is really interesting.
00:13:33 And it’s also very challenging because in many classes,
00:13:36 if you don’t get a project done, you just forget about it
00:13:38 and move on to the next one and get your B or whatever it is.
00:13:41 But here you have to live with the decisions you make
00:13:43 and continue to reinvest in it.
00:13:45 And I really like that.
00:13:48 And so I did an extra study project
00:13:50 with him the following semester.
00:13:52 And he was just really great.
00:13:53 And he was also a great mentor in a lot of ways.
00:13:56 And so from him and from his advice,
00:13:59 he encouraged me to go to graduate school.
00:14:01 I wasn’t super excited about going to grad school.
00:14:03 I wanted the master’s degree, but I
00:14:05 didn’t want to be an academic.
00:14:08 But like I said, I kind of got tricked into saying
00:14:11 and was having a lot of fun.
00:14:12 And I definitely do not regret it.
00:14:14 What aspects of compilers were the things you connected with?
00:14:17 So LLVM, there’s also the other part
00:14:22 that’s really interesting if you’re interested in languages
00:14:24 is parsing and just analyzing the language,
00:14:29 breaking it down, parsing, and so on.
00:14:31 Was that interesting to you, or were you
00:14:32 more interested in optimization?
00:14:34 For me, it was more so I’m not really a math person.
00:14:37 I could do math.
00:14:38 I understand some bits of it when I get into it.
00:14:41 But math is never the thing that attracted me.
00:14:43 And so a lot of the parser part of the compiler
00:14:46 has a lot of good formal theories
00:14:47 that Don, for example, knows quite well.
00:14:50 I’m still waiting for his book on that.
00:14:54 But I just like building a thing and seeing what it could do
00:14:57 and exploring and getting it to do more things
00:15:00 and then setting new goals and reaching for them.
00:15:04 And in the case of LLVM, when I started working on that,
00:15:09 my research advisor that I was working for was a compiler guy.
00:15:13 And so he and I specifically found each other
00:15:15 because we were both interested in compilers.
00:15:16 And so I started working with him and taking his class.
00:15:19 And a lot of LLVM initially was, it’s
00:15:21 fun implementing all the standard algorithms and all
00:15:24 the things that people had been talking about
00:15:26 and were well known.
00:15:27 And they were in the curricula for advanced studies
00:15:30 and compilers.
00:15:31 And so just being able to build that was really fun.
00:15:34 And I was learning a lot by, instead of reading about it,
00:15:37 just building.
00:15:38 And so I enjoyed that.
00:15:40 So you said compilers are these complicated systems.
00:15:42 Can you even just with language try
00:15:46 to describe how you turn a C++ program into code?
00:15:52 Like, what are the hard parts?
00:15:53 Why is it so hard?
00:15:54 So I’ll give you examples of the hard parts along the way.
00:15:57 So C++ is a very complicated programming language.
00:16:01 It’s something like 1,400 pages in the spec.
00:16:03 So C++ by itself is crazy complicated.
00:16:06 Can we just pause?
00:16:07 What makes the language complicated in terms
00:16:09 of what’s syntactically?
00:16:12 So it’s what they call syntax.
00:16:14 So the actual how the characters are arranged, yes.
00:16:16 It’s also semantics, how it behaves.
00:16:20 It’s also, in the case of C++, there’s
00:16:21 a huge amount of history.
00:16:23 C++ is built on top of C. You play that forward.
00:16:26 And then a bunch of suboptimal, in some cases, decisions
00:16:29 were made, and they compound.
00:16:31 And then more and more and more things
00:16:33 keep getting added to C++, and it will probably never stop.
00:16:36 But the language is very complicated
00:16:38 from that perspective.
00:16:39 And so the interactions between subsystems
00:16:41 is very complicated.
00:16:42 There’s just a lot there.
00:16:43 And when you talk about the front end,
00:16:45 one of the major challenges, which
00:16:47 clang as a project, the C, C++ compiler that I built,
00:16:51 I and many people built, one of the challenges we took on
00:16:54 was we looked at GCC.
00:16:57 GCC, at the time, was a really good industry standardized
00:17:02 compiler that had really consolidated
00:17:05 a lot of the other compilers in the world and was a standard.
00:17:08 But it wasn’t really great for research.
00:17:10 The design was very difficult to work with.
00:17:12 And it was full of global variables and other things
00:17:16 that made it very difficult to reuse in ways
00:17:18 that it wasn’t originally designed for.
00:17:20 And so with clang, one of the things that we wanted to do
00:17:22 is push forward on better user interface,
00:17:25 so make error messages that are just better than GCC’s.
00:17:28 And that’s actually hard, because you
00:17:29 have to do a lot of bookkeeping in an efficient way
00:17:32 to be able to do that.
00:17:33 We want to make compile time better.
00:17:35 And so compile time is about making it efficient,
00:17:37 which is also really hard when you’re keeping
00:17:38 track of extra information.
00:17:40 We wanted to make new tools available,
00:17:43 so refactoring tools and other analysis tools
00:17:46 that GCC never supported, also leveraging the extra information
00:17:50 we kept, but enabling those new classes of tools
00:17:54 that then get built into IDEs.
00:17:55 And so that’s been one of the areas that clang has really
00:17:59 helped push the world forward in,
00:18:01 is in the tooling for C and C++ and things like that.
00:18:05 But C++ and the front end piece is complicated.
00:18:07 And you have to build syntax trees.
00:18:09 And you have to check every rule in the spec.
00:18:11 And you have to turn that back into an error message
00:18:14 to the human that the human can understand
00:18:16 when they do something wrong.
00:18:17 But then you start doing what’s called lowering,
00:18:20 so going from C++ and the way that it represents
00:18:23 code down to the machine.
00:18:24 And when you do that, there’s many different phases
00:18:27 you go through.
00:18:29 Often, there are, I think LLVM has something like 150
00:18:33 different what are called passes in the compiler
00:18:36 that the code passes through.
00:18:38 And these get organized in very complicated ways,
00:18:41 which affect the generated code and the performance
00:18:44 and compile time and many other things.
00:18:45 What are they passing through?
00:18:47 So after you do the clang parsing, what’s the graph?
00:18:53 What does it look like?
00:18:54 What’s the data structure here?
00:18:56 Yeah, so in the parser, it’s usually a tree.
00:18:59 And it’s called an abstract syntax tree.
00:19:01 And so the idea is you have a node for the plus
00:19:04 that the human wrote in their code.
00:19:06 Or the function call, you’ll have a node for call
00:19:09 with the function that they call and the arguments they pass,
00:19:11 things like that.
00:19:14 This then gets lowered into what’s
00:19:16 called an intermediate representation.
00:19:18 And intermediate representations are like LLVM has one.
00:19:22 And there, it’s what’s called a control flow graph.
00:19:26 And so you represent each operation in the program
00:19:31 as a very simple, like this is going to add two numbers.
00:19:34 This is going to multiply two things.
00:19:35 Maybe we’ll do a call.
00:19:37 But then they get put in what are called blocks.
00:19:40 And so you get blocks of these straight line operations,
00:19:43 where instead of being nested like in a tree,
00:19:45 it’s straight line operations.
00:19:46 And so there’s a sequence and an ordering to these operations.
00:19:49 So within the block or outside the block?
00:19:51 That’s within the block.
00:19:52 And so it’s a straight line sequence of operations
00:19:54 within the block.
00:19:55 And then you have branches, like conditional branches,
00:19:58 between blocks.
00:20:00 And so when you write a loop, for example, in a syntax tree,
00:20:04 you would have a for node, like for a for statement
00:20:08 in a C like language, you’d have a for node.
00:20:10 And you have a pointer to the expression
00:20:12 for the initializer, a pointer to the expression
00:20:14 for the increment, a pointer to the expression
00:20:16 for the comparison, a pointer to the body.
00:20:18 And these are all nested underneath it.
00:20:21 In a control flow graph, you get a block
00:20:22 for the code that runs before the loop, so the initializer
00:20:26 code.
00:20:27 And you have a block for the body of the loop.
00:20:30 And so the body of the loop code goes in there,
00:20:33 but also the increment and other things like that.
00:20:35 And then you have a branch that goes back to the top
00:20:37 and a comparison and a branch that goes out.
00:20:39 And so it’s more of an assembly level kind of representation.
00:20:43 But the nice thing about this level of representation
00:20:46 is it’s much more language independent.
00:20:48 And so there’s lots of different kinds of languages
00:20:51 with different kinds of, you know,
00:20:54 JavaScript has a lot of different ideas of what
00:20:56 is false, for example.
00:20:58 And all that can stay in the front end.
00:21:00 But then that middle part can be shared across all those.
00:21:04 How close is that intermediate representation
00:21:07 to neural networks, for example?
00:21:10 Are they, because everything you describe
00:21:13 is a kind of echoes of a neural network graph.
00:21:16 Are they neighbors or what?
00:21:18 They’re quite different in details,
00:21:20 but they’re very similar in idea.
00:21:22 So one of the things that neural networks do
00:21:24 is they learn representations for data
00:21:26 at different levels of abstraction.
00:21:29 And then they transform those through layers, right?
00:21:33 So the compiler does very similar things.
00:21:35 But one of the things the compiler does
00:21:37 is it has relatively few different representations.
00:21:40 Where a neural network often, as you get deeper, for example,
00:21:43 you get many different representations
00:21:44 in each layer or set of ops.
00:21:47 It’s transforming between these different representations.
00:21:50 In a compiler, often you get one representation
00:21:53 and they do many transformations to it.
00:21:55 And these transformations are often applied iteratively.
00:21:59 And for programmers, there’s familiar types of things.
00:22:02 For example, trying to find expressions inside of a loop
00:22:06 and pulling them out of a loop so they execute for times.
00:22:08 Or find redundant computation.
00:22:10 Or find constant folding or other simplifications,
00:22:15 turning two times x into x shift left by one.
00:22:19 And things like this are all the examples
00:22:21 of the things that happen.
00:22:23 But compilers end up getting a lot of theorem proving
00:22:26 and other kinds of algorithms that
00:22:27 try to find higher level properties of the program that
00:22:30 then can be used by the optimizer.
00:22:32 Cool.
00:22:32 So what’s the biggest bang for the buck with optimization?
00:22:38 Today?
00:22:38 Yeah.
00:22:39 Well, no, not even today.
00:22:40 At the very beginning, the 80s, I don’t know.
00:22:42 Yeah, so for the 80s, a lot of it
00:22:44 was things like register allocation.
00:22:46 So the idea of in a modern microprocessor,
00:22:50 what you’ll end up having is you’ll
00:22:51 end up having memory, which is relatively slow.
00:22:54 And then you have registers that are relatively fast.
00:22:57 But registers, you don’t have very many of them.
00:23:00 And so when you’re writing a bunch of code,
00:23:02 you’re just saying, compute this,
00:23:04 put in a temporary variable, compute this, compute this,
00:23:05 compute this, put in a temporary variable.
00:23:07 I have a loop.
00:23:08 I have some other stuff going on.
00:23:09 Well, now you’re running on an x86,
00:23:11 like a desktop PC or something.
00:23:13 Well, it only has, in some cases, some modes,
00:23:16 eight registers.
00:23:18 And so now the compiler has to choose what values get
00:23:21 put in what registers at what points in the program.
00:23:24 And this is actually a really big deal.
00:23:26 So if you think about, you have a loop, an inner loop
00:23:29 that executes millions of times maybe.
00:23:31 If you’re doing loads and stores inside that loop,
00:23:33 then it’s going to be really slow.
00:23:35 But if you can somehow fit all the values inside that loop
00:23:37 in registers, now it’s really fast.
00:23:40 And so getting that right requires a lot of work,
00:23:43 because there’s many different ways to do that.
00:23:44 And often what the compiler ends up doing
00:23:46 is it ends up thinking about things
00:23:48 in a different representation than what the human wrote.
00:23:52 You wrote into x.
00:23:53 Well, the compiler thinks about that as four different values,
00:23:56 each which have different lifetimes across the function
00:23:59 that it’s in.
00:24:00 And each of those could be put in a register or memory
00:24:03 or different memory or maybe in some parts of the code
00:24:06 recomputed instead of stored and reloaded.
00:24:08 And there are many of these different kinds of techniques
00:24:10 that can be used.
00:24:11 So it’s adding almost like a time dimension to it’s
00:24:15 trying to optimize across time.
00:24:18 So it’s considering when you’re programming,
00:24:20 you’re not thinking in that way.
00:24:21 Yeah, absolutely.
00:24:23 And so the RISC era made things.
00:24:27 So RISC chips, R I S C. The RISC chips,
00:24:32 as opposed to CISC chips.
00:24:33 The RISC chips made things more complicated for the compiler,
00:24:36 because what they ended up doing is ending up
00:24:40 adding pipelines to the processor, where
00:24:42 the processor can do more than one thing at a time.
00:24:45 But this means that the order of operations matters a lot.
00:24:47 So one of the classical compiler techniques that you use
00:24:50 is called scheduling.
00:24:51 And so moving the instructions around
00:24:54 so that the processor can keep its pipelines full instead
00:24:57 of stalling and getting blocked.
00:24:59 And so there’s a lot of things like that that
00:25:01 are kind of bread and butter compiler techniques
00:25:03 that have been studied a lot over the course of decades now.
00:25:06 But the engineering side of making them real
00:25:08 is also still quite hard.
00:25:10 And you talk about machine learning.
00:25:12 This is a huge opportunity for machine learning,
00:25:14 because many of these algorithms are full of these
00:25:17 hokey, hand rolled heuristics, which
00:25:19 work well on specific benchmarks that don’t generalize,
00:25:21 and full of magic numbers.
00:25:23 And I hear there’s some techniques that
00:25:26 are good at handling that.
00:25:28 So what would be the, if you were to apply machine learning
00:25:32 to this, what’s the thing you’re trying to optimize?
00:25:34 Is it ultimately the running time?
00:25:39 You can pick your metric, and there’s running time,
00:25:41 there’s memory use, there’s lots of different things
00:25:43 that you can optimize for.
00:25:44 Code size is another one that some people care about
00:25:47 in the embedded space.
00:25:48 Is this like the thinking into the future,
00:25:51 or has somebody actually been crazy enough
00:25:54 to try to have machine learning based parameter
00:25:58 tuning for the optimization of compilers?
00:26:01 So this is something that is, I would say, research right now.
00:26:04 There are a lot of research systems
00:26:06 that have been applying search in various forms.
00:26:09 And using reinforcement learning is one form,
00:26:11 but also brute force search has been tried for quite a while.
00:26:14 And usually, these are in small problem spaces.
00:26:18 So find the optimal way to code generate a matrix
00:26:21 multiply for a GPU, something like that,
00:26:24 where you say, there, there’s a lot of design space of,
00:26:28 do you unroll loops a lot?
00:26:29 Do you execute multiple things in parallel?
00:26:32 And there’s many different confounding factors here
00:26:35 because graphics cards have different numbers of threads
00:26:38 and registers and execution ports and memory bandwidth
00:26:41 and many different constraints that interact
00:26:42 in nonlinear ways.
00:26:44 And so search is very powerful for that.
00:26:46 And it gets used in certain ways,
00:26:49 but it’s not very structured.
00:26:51 This is something that we need,
00:26:52 we as an industry need to fix.
00:26:54 So you said 80s, but like, so have there been like big jumps
00:26:59 in improvement and optimization?
00:27:01 Yeah.
00:27:02 Yeah, since then, what’s the coolest thing?
00:27:05 It’s largely been driven by hardware.
00:27:07 So, well, it’s hardware and software.
00:27:09 So in the mid nineties, Java totally changed the world,
00:27:13 right?
00:27:14 And I’m still amazed by how much change was introduced
00:27:17 by the way or in a good way.
00:27:19 So like reflecting back, Java introduced things like,
00:27:22 all at once introduced things like JIT compilation.
00:27:25 None of these were novel, but it pulled it together
00:27:27 and made it mainstream and made people invest in it.
00:27:30 JIT compilation, garbage collection, portable code,
00:27:33 safe code, like memory safe code,
00:27:36 like a very dynamic dispatch execution model.
00:27:41 Like many of these things,
00:27:42 which had been done in research systems
00:27:44 and had been done in small ways in various places,
00:27:46 really came to the forefront,
00:27:47 really changed how things worked
00:27:49 and therefore changed the way people thought
00:27:51 about the problem.
00:27:53 JavaScript was another major world change
00:27:56 based on the way it works.
00:27:59 But also on the hardware side of things,
00:28:01 multi core and vector instructions really change
00:28:06 the problem space and are very,
00:28:09 they don’t remove any of the problems
00:28:10 that compilers faced in the past,
00:28:12 but they add new kinds of problems
00:28:14 of how do you find enough work
00:28:16 to keep a four wide vector busy, right?
00:28:20 Or if you’re doing a matrix multiplication,
00:28:22 how do you do different columns out of that matrix
00:28:25 at the same time?
00:28:26 And how do you maximally utilize the arithmetic compute
00:28:30 that one core has?
00:28:31 And then how do you take it to multiple cores?
00:28:33 How did the whole virtual machine thing change
00:28:35 the compilation pipeline?
00:28:38 Yeah, so what the Java virtual machine does
00:28:40 is it splits, just like I was talking about before,
00:28:44 where you have a front end that parses the code,
00:28:46 and then you have an intermediate representation
00:28:48 that gets transformed.
00:28:49 What Java did was they said,
00:28:51 we will parse the code and then compile to
00:28:53 what’s known as Java byte code.
00:28:55 And that byte code is now a portable code representation
00:28:58 that is industry standard and locked down and can’t change.
00:29:02 And then the back part of the compiler
00:29:05 that does optimization and code generation
00:29:07 can now be built by different vendors.
00:29:09 Okay.
00:29:10 And Java byte code can be shipped around across the wire.
00:29:13 It’s memory safe and relatively trusted.
00:29:16 And because of that, it can run in the browser.
00:29:18 And that’s why it runs in the browser, right?
00:29:20 And so that way you can be in,
00:29:22 again, back in the day, you would write a Java applet
00:29:25 and as a web developer, you’d build this mini app
00:29:29 that would run on a webpage.
00:29:30 Well, a user of that is running a web browser
00:29:33 on their computer.
00:29:34 You download that Java byte code, which can be trusted,
00:29:37 and then you do all the compiler stuff on your machine
00:29:41 so that you know that you trust that.
00:29:42 Now, is that a good idea or a bad idea?
00:29:44 It’s a great idea.
00:29:44 I mean, it’s a great idea for certain problems.
00:29:46 And I’m very much a believer that technology is itself
00:29:49 neither good nor bad.
00:29:50 It’s how you apply it.
00:29:52 You know, this would be a very, very bad thing
00:29:54 for very low levels of the software stack.
00:29:56 But in terms of solving some of these software portability
00:30:00 and transparency, or portability problems,
00:30:02 I think it’s been really good.
00:30:04 Now, Java ultimately didn’t win out on the desktop.
00:30:06 And like, there are good reasons for that.
00:30:09 But it’s been very successful on servers and in many places,
00:30:13 it’s been a very successful thing over decades.
00:30:16 So what has been LLVMs and C langs improvements
00:30:21 and optimization that throughout its history,
00:30:28 what are some moments we had set back
00:30:31 and really proud of what’s been accomplished?
00:30:33 Yeah, I think that the interesting thing about LLVM
00:30:36 is not the innovations and compiler research.
00:30:40 It has very good implementations
00:30:41 of various important algorithms, no doubt.
00:30:44 And a lot of really smart people have worked on it.
00:30:48 But I think that the thing that’s most profound about LLVM
00:30:50 is that through standardization, it made things possible
00:30:53 that otherwise wouldn’t have happened, okay?
00:30:56 And so interesting things that have happened with LLVM,
00:30:59 for example, Sony has picked up LLVM
00:31:01 and used it to do all the graphics compilation
00:31:03 in their movie production pipeline.
00:31:06 And so now they’re able to have better special effects
00:31:07 because of LLVM.
00:31:09 That’s kind of cool.
00:31:11 That’s not what it was designed for, right?
00:31:13 But that’s the sign of good infrastructure
00:31:15 when it can be used in ways it was never designed for
00:31:18 because it has good layering and software engineering
00:31:20 and it’s composable and things like that.
00:31:23 Which is where, as you said, it differs from GCC.
00:31:26 Yes, GCC is also great in various ways,
00:31:28 but it’s not as good as infrastructure technology.
00:31:31 It’s really a C compiler, or it’s a Fortran compiler.
00:31:36 It’s not infrastructure in the same way.
00:31:38 Now you can tell I don’t know what I’m talking about
00:31:41 because I keep saying C lang.
00:31:44 You can always tell when a person has clues,
00:31:48 by the way, to pronounce something.
00:31:49 I don’t think, have I ever used C lang?
00:31:52 Entirely possible, have you?
00:31:54 Well, so you’ve used code, it’s generated probably.
00:31:58 So C lang and LLVM are used to compile
00:32:01 all the apps on the iPhone effectively and the OSs.
00:32:05 It compiles Google’s production server applications.
00:32:10 It’s used to build GameCube games and PlayStation 4
00:32:14 and things like that.
00:32:16 So as a user, I have, but just everything I’ve done
00:32:20 that I experienced with Linux has been,
00:32:22 I believe, always GCC.
00:32:23 Yeah, I think Linux still defaults to GCC.
00:32:26 And is there a reason for that?
00:32:27 Or is it because, I mean, is there a reason for that?
00:32:29 It’s a combination of technical and social reasons.
00:32:32 Many Linux developers do use C lang,
00:32:35 but the distributions, for lots of reasons,
00:32:40 use GCC historically, and they’ve not switched, yeah.
00:32:44 Because it’s just anecdotally online,
00:32:46 it seems that LLVM has either reached the level of GCC
00:32:50 or superseded on different features or whatever.
00:32:53 The way I would say it is that they’re so close,
00:32:55 it doesn’t matter.
00:32:56 Yeah, exactly.
00:32:56 Like, they’re slightly better in some ways,
00:32:58 slightly worse than otherwise,
00:32:59 but it doesn’t actually really matter anymore, that level.
00:33:03 So in terms of optimization breakthroughs,
00:33:06 it’s just been solid incremental work.
00:33:09 Yeah, yeah, which describes a lot of compilers.
00:33:12 The hard thing about compilers, in my experience,
00:33:15 is the engineering, the software engineering,
00:33:17 making it so that you can have hundreds of people
00:33:20 collaborating on really detailed, low level work
00:33:23 and scaling that.
00:33:25 And that’s really hard.
00:33:27 And that’s one of the things I think LLVM has done well.
00:33:32 And that kind of goes back to the original design goals
00:33:34 with it to be modular and things like that.
00:33:37 And incidentally, I don’t want to take all the credit
00:33:38 for this, right?
00:33:39 I mean, some of the best parts about LLVM
00:33:41 is that it was designed to be modular.
00:33:43 And when I started, I would write, for example,
00:33:45 a register allocator, and then somebody much smarter than me
00:33:48 would come in and pull it out and replace it
00:33:50 with something else that they would come up with.
00:33:52 And because it’s modular, they were able to do that.
00:33:55 And that’s one of the challenges with GCC, for example,
00:33:58 is replacing subsystems is incredibly difficult.
00:34:01 It can be done, but it wasn’t designed for that.
00:34:04 And that’s one of the reasons that LLVM’s been
00:34:06 very successful in the research world as well.
00:34:08 But in a community sense, Guido van Rossum, right,
00:34:12 from Python, just retired from, what is it?
00:34:18 Benevolent Dictator for Life, right?
00:34:20 So in managing this community of brilliant compiler folks,
00:34:24 is there, did it, for a time at least,
00:34:28 fall on you to approve things?
00:34:31 Oh yeah, so I mean, I still have something like
00:34:34 an order of magnitude more patches in LLVM
00:34:37 than anybody else, and many of those I wrote myself.
00:34:42 But you still write, I mean, you’re still close to the,
00:34:47 to the, I don’t know what the expression is,
00:34:49 to the metal, you still write code.
00:34:51 Yeah, I still write code.
00:34:52 Not as much as I was able to in grad school,
00:34:54 but that’s an important part of my identity.
00:34:56 But the way that LLVM has worked over time
00:34:58 is that when I was a grad student, I could do all the work
00:35:01 and steer everything and review every patch
00:35:04 and make sure everything was done
00:35:05 exactly the way my opinionated sense
00:35:09 felt like it should be done, and that was fine.
00:35:11 But as things scale, you can’t do that, right?
00:35:14 And so what ends up happening is LLVM
00:35:17 has a hierarchical system of what’s called code owners.
00:35:20 These code owners are given the responsibility
00:35:22 not to do all the work,
00:35:24 not necessarily to review all the patches,
00:35:26 but to make sure that the patches do get reviewed
00:35:28 and make sure that the right thing’s happening
00:35:30 architecturally in their area.
00:35:32 And so what you’ll see is you’ll see that, for example,
00:35:36 hardware manufacturers end up owning
00:35:38 the hardware specific parts of their hardware.
00:35:43 That’s very common.
00:35:45 Leaders in the community that have done really good work
00:35:47 naturally become the de facto owner of something.
00:35:50 And then usually somebody else is like,
00:35:53 how about we make them the official code owner?
00:35:55 And then we’ll have somebody to make sure
00:35:58 that all the patches get reviewed in a timely manner.
00:36:00 And then everybody’s like, yes, that’s obvious.
00:36:02 And then it happens, right?
00:36:03 And usually this is a very organic thing, which is great.
00:36:06 And so I’m nominally the top of that stack still,
00:36:08 but I don’t spend a lot of time reviewing patches.
00:36:11 What I do is I help negotiate a lot of the technical
00:36:16 disagreements that end up happening
00:36:18 and making sure that the community as a whole
00:36:19 makes progress and is moving in the right direction
00:36:22 and doing that.
00:36:23 So we also started a nonprofit six years ago,
00:36:28 seven years ago, time’s gone away.
00:36:30 And the LLVM Foundation nonprofit helps oversee
00:36:34 all the business sides of things and make sure
00:36:36 that the events that the LLVM community has
00:36:38 are funded and set up and run correctly
00:36:41 and stuff like that.
00:36:42 But the foundation is very much stays out
00:36:45 of the technical side of where the project is going.
00:36:49 Right, so it sounds like a lot of it is just organic.
00:36:53 Yeah, well, LLVM is almost 20 years old,
00:36:55 which is hard to believe.
00:36:56 Somebody pointed out to me recently that LLVM
00:36:59 is now older than GCC was when LLVM started, right?
00:37:04 So time has a way of getting away from you.
00:37:06 But the good thing about that is it has a really robust,
00:37:10 really amazing community of people that are
00:37:13 in their professional lives, spread across lots
00:37:15 of different companies, but it’s a community
00:37:17 of people that are interested in similar kinds of problems
00:37:21 and have been working together effectively for years
00:37:23 and have a lot of trust and respect for each other.
00:37:26 And even if they don’t always agree that we’re able
00:37:29 to find a path forward.
00:37:31 So then in a slightly different flavor of effort,
00:37:34 you started at Apple in 2005 with the task
00:37:38 of making, I guess, LLVM production ready.
00:37:41 And then eventually 2013 through 2017,
00:37:44 leading the entire developer tools department.
00:37:48 We’re talking about LLVM, Xcode, Objective C to Swift.
00:37:53 So in a quick overview of your time there,
00:37:58 what were the challenges?
00:37:59 First of all, leading such a huge group of developers,
00:38:03 what was the big motivator, dream, mission
00:38:06 behind creating Swift, the early birth of it
00:38:11 from Objective C and so on, and Xcode,
00:38:13 what are some challenges?
00:38:14 So these are different questions.
00:38:15 Yeah, I know, but I wanna talk about the other stuff too.
00:38:19 I’ll stay on the technical side,
00:38:21 then we can talk about the big team pieces, if that’s okay.
00:38:24 So it’s to really oversimplify many years of hard work.
00:38:29 LLVM started, joined Apple, became a thing,
00:38:32 became successful and became deployed.
00:38:34 But then there’s a question about
00:38:35 how do we actually parse the source code?
00:38:38 So LLVM is that back part,
00:38:40 the optimizer and the code generator.
00:38:42 And LLVM was really good for Apple
00:38:44 as it went through a couple of harder transitions.
00:38:46 I joined right at the time of the Intel transition,
00:38:47 for example, and 64 bit transitions,
00:38:51 and then the transition to ARM with the iPhone.
00:38:53 And so LLVM was very useful
00:38:54 for some of these kinds of things.
00:38:57 But at the same time, there’s a lot of questions
00:38:58 around developer experience.
00:39:00 And so if you’re a programmer pounding out
00:39:01 at the time Objective C code,
00:39:04 the error message you get, the compile time,
00:39:06 the turnaround cycle, the tooling and the IDE,
00:39:09 were not great, were not as good as they could be.
00:39:13 And so, as I occasionally do, I’m like,
00:39:18 well, okay, how hard is it to write a C compiler?
00:39:20 And so I’m not gonna commit to anybody,
00:39:22 I’m not gonna tell anybody, I’m just gonna just do it
00:39:25 nights and weekends and start working on it.
00:39:27 And then I built up in C,
00:39:29 there’s this thing called the preprocessor,
00:39:31 which people don’t like,
00:39:33 but it’s actually really hard and complicated
00:39:35 and includes a bunch of really weird things
00:39:37 like trigraphs and other stuff like that
00:39:39 that are really nasty,
00:39:40 and it’s the crux of a bunch of the performance issues
00:39:44 in the compiler.
00:39:45 Started working on the parser
00:39:46 and kind of got to the point where I’m like,
00:39:47 ah, you know what, we could actually do this.
00:39:49 Everybody’s saying that this is impossible to do,
00:39:51 but it’s actually just hard, it’s not impossible.
00:39:53 And eventually told my manager about it,
00:39:57 and he’s like, oh, wow, this is great,
00:39:59 we do need to solve this problem.
00:40:00 Oh, this is great, we can get you one other person
00:40:02 to work with you on this, you know?
00:40:04 And slowly a team is formed and it starts taking off.
00:40:08 And C++, for example, huge, complicated language.
00:40:12 People always assume that it’s impossible to implement
00:40:14 and it’s very nearly impossible,
00:40:16 but it’s just really, really hard.
00:40:18 And the way to get there is to build it
00:40:20 one piece at a time incrementally.
00:40:22 And that was only possible because we were lucky
00:40:26 to hire some really exceptional engineers
00:40:28 that knew various parts of it very well
00:40:30 and could do great things.
00:40:32 Swift was kind of a similar thing.
00:40:34 So Swift came from, we were just finishing off
00:40:39 the first version of C++ support in Clang.
00:40:42 And C++ is a very formidable and very important language,
00:40:47 but it’s also ugly in lots of ways.
00:40:49 And you can’t influence C++ without thinking
00:40:52 there has to be a better thing, right?
00:40:54 And so I started working on Swift, again,
00:40:56 with no hope or ambition that would go anywhere,
00:40:58 just let’s see what could be done,
00:41:00 let’s play around with this thing.
00:41:02 It was me in my spare time, not telling anybody about it,
00:41:06 kind of a thing, and it made some good progress.
00:41:09 I’m like, actually, it would make sense to do this.
00:41:11 At the same time, I started talking with the senior VP
00:41:14 of software at the time, a guy named Bertrand Serlet.
00:41:17 And Bertrand was very encouraging.
00:41:19 He was like, well, let’s have fun, let’s talk about this.
00:41:22 And he was a little bit of a language guy,
00:41:23 and so he helped guide some of the early work
00:41:26 and encouraged me and got things off the ground.
00:41:30 And eventually told my manager and told other people,
00:41:34 and it started making progress.
00:41:38 The complicating thing with Swift
00:41:40 was that the idea of doing a new language
00:41:43 was not obvious to anybody, including myself.
00:41:47 And the tone at the time was that the iPhone
00:41:50 was successful because of Objective C.
00:41:53 Oh, interesting.
00:41:54 Not despite of or just because of.
00:41:57 And you have to understand that at the time,
00:42:01 Apple was hiring software people that loved Objective C.
00:42:05 And it wasn’t that they came despite Objective C.
00:42:07 They loved Objective C, and that’s why they got hired.
00:42:10 And so you had a software team that the leadership,
00:42:13 in many cases, went all the way back to Next,
00:42:15 where Objective C really became real.
00:42:19 And so they, quote unquote, grew up writing Objective C.
00:42:23 And many of the individual engineers
00:42:25 all were hired because they loved Objective C.
00:42:28 And so this notion of, OK, let’s do new language
00:42:30 was kind of heretical in many ways.
00:42:34 Meanwhile, my sense was that the outside community wasn’t really
00:42:36 in love with Objective C. Some people were,
00:42:38 and some of the most outspoken people were.
00:42:40 But other people were hitting challenges
00:42:42 because it has very sharp corners
00:42:44 and it’s difficult to learn.
00:42:46 And so one of the challenges of making Swift happen that
00:42:50 was totally non technical is the social part of what do we do?
00:42:57 If we do a new language, which at Apple, many things
00:43:00 happen that don’t ship.
00:43:02 So if we ship it, what is the metrics of success?
00:43:05 Why would we do this?
00:43:06 Why wouldn’t we make Objective C better?
00:43:08 If Objective C has problems, let’s file off
00:43:10 those rough corners and edges.
00:43:12 And one of the major things that became the reason to do this
00:43:15 was this notion of safety, memory safety.
00:43:18 And the way Objective C works is that a lot of the object system
00:43:23 and everything else is built on top of pointers in C.
00:43:27 Objective C is an extension on top of C.
00:43:29 And so pointers are unsafe.
00:43:32 And if you get rid of the pointers,
00:43:34 it’s not Objective C anymore.
00:43:36 And so fundamentally, that was an issue
00:43:39 that you could not fix safety or memory safety
00:43:42 without fundamentally changing the language.
00:43:45 And so once we got through that part of the mental process
00:43:49 and the thought process, it became a design process
00:43:53 of saying, OK, well, if we’re going to do something new,
00:43:55 what is good?
00:43:56 How do we think about this?
00:43:57 And what do we like?
00:43:58 And what are we looking for?
00:44:00 And that was a very different phase of it.
00:44:02 So what are some design choices early on in Swift?
00:44:05 Like we’re talking about braces, are you
00:44:10 making a typed language or not, all those kinds of things.
00:44:13 Yeah, so some of those were obvious given the context.
00:44:16 So a typed language, for example,
00:44:17 Objective C is a typed language.
00:44:19 And going with an untyped language
00:44:22 wasn’t really seriously considered.
00:44:24 We wanted the performance, and we
00:44:26 wanted refactoring tools and other things
00:44:27 like that that go with typed languages.
00:44:29 Quick, dumb question.
00:44:31 Was it obvious, I think this would be a dumb question,
00:44:34 but was it obvious that the language
00:44:36 has to be a compiled language?
00:44:40 Yes, that’s not a dumb question.
00:44:42 Earlier, I think late 90s, Apple had seriously
00:44:44 considered moving its development experience to Java.
00:44:49 But Swift started in 2010, which was several years
00:44:53 after the iPhone.
00:44:53 It was when the iPhone was definitely
00:44:55 on an upward trajectory.
00:44:56 And the iPhone was still extremely,
00:44:58 and is still a bit memory constrained.
00:45:01 And so being able to compile the code
00:45:04 and then ship it and then having standalone code that
00:45:08 is not JIT compiled is a very big deal
00:45:11 and is very much part of the Apple value system.
00:45:15 Now, JavaScript’s also a thing.
00:45:17 I mean, it’s not that this is exclusive,
00:45:19 and technologies are good depending
00:45:21 on how they’re applied.
00:45:23 But in the design of Swift, saying,
00:45:26 how can we make Objective C better?
00:45:28 Objective C is statically compiled,
00:45:29 and that was the contiguous, natural thing to do.
00:45:32 Just skip ahead a little bit, and we’ll go right back.
00:45:35 Just as a question, as you think about today in 2019
00:45:40 in your work at Google, TensorFlow and so on,
00:45:42 is, again, compilations, static compilation still
00:45:48 the right thing?
00:45:49 Yeah, so the funny thing after working
00:45:52 on compilers for a really long time is that,
00:45:55 and this is one of the things that LLVM has helped with,
00:45:59 is that I don’t look at compilations
00:46:01 being static or dynamic or interpreted or not.
00:46:05 This is a spectrum.
00:46:07 And one of the cool things about Swift
00:46:09 is that Swift is not just statically compiled.
00:46:12 It’s actually dynamically compiled as well,
00:46:14 and it can also be interpreted.
00:46:15 Though, nobody’s actually done that.
00:46:17 And so what ends up happening when
00:46:20 you use Swift in a workbook, for example in Colab or in Jupyter,
00:46:24 is it’s actually dynamically compiling the statements
00:46:26 as you execute them.
00:46:28 And so this gets back to the software engineering problems,
00:46:32 where if you layer the stack properly,
00:46:34 you can actually completely change
00:46:37 how and when things get compiled because you
00:46:39 have the right abstractions there.
00:46:41 And so the way that a Colab workbook works with Swift
00:46:44 is that when you start typing into it,
00:46:47 it creates a process, a Unix process.
00:46:50 And then each line of code you type in,
00:46:52 it compiles it through the Swift compiler, the front end part,
00:46:56 and then sends it through the optimizer,
00:46:58 JIT compiles machine code, and then
00:47:01 injects it into that process.
00:47:03 And so as you’re typing new stuff,
00:47:05 it’s like squirting in new code and overwriting and replacing
00:47:09 and updating code in place.
00:47:11 And the fact that it can do this is not an accident.
00:47:13 Swift was designed for this.
00:47:15 But it’s an important part of how the language was set up
00:47:18 and how it’s layered, and this is a nonobvious piece.
00:47:21 And one of the things with Swift that
00:47:23 was, for me, a very strong design point
00:47:25 is to make it so that you can learn it very quickly.
00:47:29 And so from a language design perspective,
00:47:31 the thing that I always come back to
00:47:33 is this UI principle of progressive disclosure
00:47:36 of complexity.
00:47:37 And so in Swift, you can start by saying print, quote,
00:47:41 hello world, quote.
00:47:44 And there’s no slash n, just like Python, one line of code,
00:47:47 no main, no header files, no public static class void,
00:47:51 blah, blah, blah, string like Java has, one line of code.
00:47:55 And you can teach that, and it works great.
00:47:58 Then you can say, well, let’s introduce variables.
00:48:00 And so you can declare a variable with var.
00:48:02 So var x equals 4.
00:48:03 What is a variable?
00:48:04 You can use x, x plus 1.
00:48:06 This is what it means.
00:48:07 Then you can say, well, how about control flow?
00:48:09 Well, this is what an if statement is.
00:48:10 This is what a for statement is.
00:48:12 This is what a while statement is.
00:48:15 Then you can say, let’s introduce functions.
00:48:17 And many languages like Python have
00:48:20 had this kind of notion of let’s introduce small things,
00:48:22 and then you can add complexity.
00:48:24 Then you can introduce classes.
00:48:25 And then you can add generics, in the case of Swift.
00:48:28 And then you can build in modules
00:48:29 and build out in terms of the things that you’re expressing.
00:48:32 But this is not very typical for compiled languages.
00:48:35 And so this was a very strong design point,
00:48:38 and one of the reasons that Swift, in general,
00:48:40 is designed with this factoring of complexity in mind
00:48:43 so that the language can express powerful things.
00:48:46 You can write firmware in Swift if you want to.
00:48:49 But it has a very high level feel,
00:48:51 which is really this perfect blend, because often you
00:48:55 have very advanced library writers that
00:48:57 want to be able to use the nitty gritty details.
00:49:00 But then other people just want to use the libraries
00:49:02 and work at a higher abstraction level.
00:49:04 It’s kind of cool that I saw that you can just
00:49:07 interoperability.
00:49:09 I don’t think I pronounced that word enough.
00:49:11 But you can just drag in Python.
00:49:14 It’s just strange.
00:49:16 You can import, like I saw this in the demo.
00:49:19 How do you make that happen?
00:49:21 What’s up with that?
00:49:23 Is that as easy as it looks, or is it?
00:49:25 Yes, as easy as it looks.
00:49:27 That’s not a stage magic hack or anything like that.
00:49:29 I don’t mean from the user perspective.
00:49:31 I mean from the implementation perspective to make it happen.
00:49:34 So it’s easy once all the pieces are in place.
00:49:37 The way it works, so if you think about a dynamically typed
00:49:39 language like Python, you can think about it
00:49:41 in two different ways.
00:49:42 You can say it has no types, which
00:49:45 is what most people would say.
00:49:47 Or you can say it has one type.
00:49:50 And you can say it has one type, and it’s the Python object.
00:49:53 And the Python object gets passed around.
00:49:55 And because there’s only one type, it’s implicit.
00:49:58 And so what happens with Swift and Python talking
00:50:00 to each other, Swift has lots of types.
00:50:02 It has arrays, and it has strings, and all classes,
00:50:05 and that kind of stuff.
00:50:07 But it now has a Python object type.
00:50:11 So there is one Python object type.
00:50:12 And so when you say import NumPy, what you get
00:50:16 is a Python object, which is the NumPy module.
00:50:19 And then you say np.array.
00:50:21 It says, OK, hey, Python object, I have no idea what you are.
00:50:24 Give me your array member.
00:50:27 OK, cool.
00:50:27 And it just uses dynamic stuff, talks to the Python interpreter,
00:50:31 and says, hey, Python, what’s the.array member
00:50:33 in that Python object?
00:50:35 It gives you back another Python object.
00:50:37 And now you say parentheses for the call and the arguments
00:50:40 you’re going to pass.
00:50:40 And so then it says, hey, a Python object
00:50:43 that is the result of np.array, call with these arguments.
00:50:47 Again, calling into the Python interpreter to do that work.
00:50:50 And so right now, this is all really simple.
00:50:53 And if you dive into the code, what you’ll see
00:50:55 is that the Python module in Swift
00:50:58 is something like 1,200 lines of code or something.
00:51:01 It’s written in pure Swift.
00:51:02 It’s super simple.
00:51:03 And it’s built on top of the C interoperability
00:51:06 because it just talks to the Python interpreter.
00:51:09 But making that possible required
00:51:11 us to add two major language features to Swift
00:51:13 to be able to express these dynamic calls
00:51:15 and the dynamic member lookups.
00:51:17 And so what we’ve done over the last year
00:51:19 is we’ve proposed, implement, standardized, and contributed
00:51:23 new language features to the Swift language
00:51:26 in order to make it so it is really trivial.
00:51:29 And this is one of the things about Swift
00:51:31 that is critical to the Swift for TensorFlow work, which
00:51:35 is that we can actually add new language features.
00:51:37 And the bar for adding those is high,
00:51:39 but it’s what makes it possible.
00:51:42 So you’re now at Google doing incredible work
00:51:45 on several things, including TensorFlow.
00:51:47 So TensorFlow 2.0 or whatever leading up to 2.0 has,
00:51:53 by default, in 2.0, has eager execution.
00:51:56 And yet, in order to make code optimized for GPU or TPU
00:52:00 or some of these systems, computation
00:52:04 needs to be converted to a graph.
00:52:06 So what’s that process like?
00:52:07 What are the challenges there?
00:52:08 Yeah, so I am tangentially involved in this.
00:52:11 But the way that it works with Autograph
00:52:15 is that you mark your function with a decorator.
00:52:21 And when Python calls it, that decorator is invoked.
00:52:24 And then it says, before I call this function,
00:52:28 you can transform it.
00:52:29 And so the way Autograph works is, as far as I understand,
00:52:32 is it actually uses the Python parser
00:52:34 to go parse that, turn it into a syntax tree,
00:52:37 and now apply compiler techniques to, again,
00:52:39 transform this down into TensorFlow graphs.
00:52:42 And so you can think of it as saying, hey,
00:52:44 I have an if statement.
00:52:45 I’m going to create an if node in the graph,
00:52:48 like you say tf.cond.
00:52:51 You have a multiply.
00:52:53 Well, I’ll turn that into a multiply node in the graph.
00:52:55 And it becomes this tree transformation.
00:52:57 So where does the Swift for TensorFlow
00:53:00 come in, which is parallels?
00:53:04 For one, Swift is an interface.
00:53:06 Like, Python is an interface to TensorFlow.
00:53:09 But it seems like there’s a lot more going on in just
00:53:11 a different language interface.
00:53:13 There’s optimization methodology.
00:53:15 So the TensorFlow world has a couple
00:53:17 of different what I’d call front end technologies.
00:53:21 And so Swift and Python and Go and Rust and Julia
00:53:25 and all these things share the TensorFlow graphs
00:53:29 and all the runtime and everything that’s later.
00:53:32 And so Swift for TensorFlow is merely another front end
00:53:36 for TensorFlow, just like any of these other systems are.
00:53:40 There’s a major difference between, I would say,
00:53:43 three camps of technologies here.
00:53:44 There’s Python, which is a special case,
00:53:46 because the vast majority of the community effort
00:53:49 is going to the Python interface.
00:53:51 And Python has its own approaches
00:53:52 for automatic differentiation.
00:53:54 It has its own APIs and all this kind of stuff.
00:53:58 There’s Swift, which I’ll talk about in a second.
00:54:00 And then there’s kind of everything else.
00:54:02 And so the everything else are effectively language bindings.
00:54:05 So they call into the TensorFlow runtime,
00:54:07 but they usually don’t have automatic differentiation
00:54:10 or they usually don’t provide anything other than APIs
00:54:14 that call the C APIs in TensorFlow.
00:54:16 And so they’re kind of wrappers for that.
00:54:18 Swift is really kind of special.
00:54:19 And it’s a very different approach.
00:54:22 Swift for TensorFlow, that is, is a very different approach.
00:54:25 Because there we’re saying, let’s
00:54:26 look at all the problems that need
00:54:28 to be solved in the full stack of the TensorFlow compilation
00:54:34 process, if you think about it that way.
00:54:35 Because TensorFlow is fundamentally a compiler.
00:54:38 It takes models, and then it makes them go fast on hardware.
00:54:42 That’s what a compiler does.
00:54:43 And it has a front end, it has an optimizer,
00:54:47 and it has many back ends.
00:54:49 And so if you think about it the right way,
00:54:51 or if you look at it in a particular way,
00:54:54 it is a compiler.
00:54:59 And so Swift is merely another front end.
00:55:02 But it’s saying, and the design principle is saying,
00:55:05 let’s look at all the problems that we face as machine
00:55:08 learning practitioners and what is the best possible way we
00:55:11 can do that, given the fact that we can change literally
00:55:13 anything in this entire stack.
00:55:15 And Python, for example, where the vast majority
00:55:18 of the engineering and effort has gone into,
00:55:22 is constrained by being the best possible thing you
00:55:25 can do with a Python library.
00:55:27 There are no Python language features
00:55:29 that are added because of machine learning
00:55:31 that I’m aware of.
00:55:32 They added a matrix multiplication operator
00:55:34 with that, but that’s as close as you get.
00:55:38 And so with Swift, it’s hard, but you
00:55:41 can add language features to the language.
00:55:43 And there’s a community process for that.
00:55:46 And so we look at these things and say, well,
00:55:48 what is the right division of labor
00:55:49 between the human programmer and the compiler?
00:55:52 And Swift has a number of things that shift that balance.
00:55:55 So because it has a type system, for example,
00:56:00 that makes certain things possible for analysis
00:56:02 of the code, and the compiler can automatically
00:56:05 build graphs for you without you thinking about them.
00:56:08 That’s a big deal for a programmer.
00:56:10 You just get free performance.
00:56:11 You get clustering and fusion and optimization,
00:56:14 things like that, without you as a programmer
00:56:17 having to manually do it because the compiler can do it for you.
00:56:20 Automatic differentiation is another big deal.
00:56:22 And I think one of the key contributions of the Swift
00:56:25 TensorFlow project is that there’s
00:56:29 this entire body of work on automatic differentiation
00:56:32 that dates back to the Fortran days.
00:56:34 People doing a tremendous amount of numerical computing
00:56:36 in Fortran used to write these what they call source
00:56:39 to source translators, where you take a bunch of code,
00:56:43 shove it into a mini compiler, and it would push out
00:56:46 more Fortran code.
00:56:48 But it would generate the backwards passes
00:56:50 for your functions for you, the derivatives.
00:56:53 And so in that work in the 70s, a tremendous number
00:56:57 of optimizations, a tremendous number of techniques
00:57:01 for fixing numerical instability,
00:57:02 and other kinds of problems were developed.
00:57:05 But they’re very difficult to port into a world
00:57:07 where, in eager execution, you get an op by op at a time.
00:57:11 You need to be able to look at an entire function
00:57:13 and be able to reason about what’s going on.
00:57:15 And so when you have a language integrated automatic
00:57:18 differentiation, which is one of the things
00:57:20 that the Swift project is focusing on,
00:57:22 you can open all these techniques
00:57:24 and reuse them in familiar ways.
00:57:28 But the language integration piece
00:57:30 has a bunch of design room in it, and it’s also complicated.
00:57:33 The other piece of the puzzle here that’s kind of interesting
00:57:35 is TPUs at Google.
00:57:37 So we’re in a new world with deep learning.
00:57:40 It constantly is changing, and I imagine,
00:57:42 without disclosing anything, I imagine
00:57:46 you’re still innovating on the TPU front, too.
00:57:48 Indeed.
00:57:49 So how much interplay is there between software and hardware
00:57:53 in trying to figure out how to together move
00:57:55 towards an optimized solution?
00:57:56 There’s an incredible amount.
00:57:57 So we’re on our third generation of TPUs,
00:57:59 which are now 100 petaflops in a very large liquid cooled box,
00:58:04 virtual box with no cover.
00:58:07 And as you might imagine, we’re not out of ideas yet.
00:58:11 The great thing about TPUs is that they’re
00:58:14 a perfect example of hardware software co design.
00:58:17 And so it’s about saying, what hardware
00:58:19 do we build to solve certain classes of machine learning
00:58:23 problems?
00:58:23 Well, the algorithms are changing.
00:58:26 The hardware takes some cases years to produce.
00:58:30 And so you have to make bets and decide
00:58:32 what is going to happen and what is the best way to spend
00:58:36 the transistors to get the maximum performance per watt
00:58:39 or area per cost or whatever it is that you’re optimizing for.
00:58:44 And so one of the amazing things about TPUs
00:58:46 is this numeric format called bfloat16.
00:58:49 bfloat16 is a compressed 16 bit floating point format,
00:58:54 but it puts the bits in different places.
00:58:55 And in numeric terms, it has a smaller mantissa
00:58:58 and a larger exponent.
00:59:00 That means that it’s less precise,
00:59:02 but it can represent larger ranges of values,
00:59:05 which in the machine learning context
00:59:07 is really important and useful because sometimes you
00:59:09 have very small gradients you want to accumulate
00:59:13 and very, very small numbers that
00:59:17 are important to move things as you’re learning.
00:59:20 But sometimes you have very large magnitude numbers as well.
00:59:23 And bfloat16 is not as precise.
00:59:26 The mantissa is small.
00:59:28 But it turns out the machine learning algorithms actually
00:59:30 want to generalize.
00:59:31 And so there’s theories that this actually
00:59:34 increases the ability for the network
00:59:36 to generalize across data sets.
00:59:37 And regardless of whether it’s good or bad,
00:59:41 it’s much cheaper at the hardware level to implement
00:59:43 because the area and time of a multiplier
00:59:48 is n squared in the number of bits in the mantissa,
00:59:50 but it’s linear with size of the exponent.
00:59:53 And you’re connected to both efforts
00:59:55 here both on the hardware and the software side?
00:59:57 Yeah, and so that was a breakthrough
00:59:58 coming from the research side and people
01:00:01 working on optimizing network transport of weights
01:00:06 across the network originally and trying
01:00:08 to find ways to compress that.
01:00:10 But then it got burned into silicon.
01:00:12 And it’s a key part of what makes TPU performance
01:00:14 so amazing and great.
01:00:17 Now, TPUs have many different aspects that are important.
01:00:20 But the co design between the low level compiler bits
01:00:25 and the software bits and the algorithms
01:00:27 is all super important.
01:00:28 And it’s this amazing trifecta that only Google can do.
01:00:32 Yeah, that’s super exciting.
01:00:34 So can you tell me about MLIR project, previously
01:00:39 the secretive one?
01:00:41 Yeah, so MLIR is a project that we
01:00:43 announced at a compiler conference three weeks ago
01:00:47 or something at the Compilers for Machine Learning
01:00:49 conference.
01:00:50 Basically, again, if you look at TensorFlow as a compiler stack,
01:00:53 it has a number of compiler algorithms within it.
01:00:56 It also has a number of compilers
01:00:57 that get embedded into it.
01:00:59 And they’re made by different vendors.
01:01:00 For example, Google has XLA, which
01:01:02 is a great compiler system.
01:01:04 NVIDIA has TensorRT.
01:01:06 Intel has NGRAPH.
01:01:08 There’s a number of these different compiler systems.
01:01:10 And they’re very hardware specific.
01:01:13 And they’re trying to solve different parts of the problems.
01:01:16 But they’re all kind of similar in a sense of they
01:01:19 want to integrate with TensorFlow.
01:01:20 Now, TensorFlow has an optimizer.
01:01:22 And it has these different code generation technologies
01:01:25 built in.
01:01:26 The idea of MLIR is to build a common infrastructure
01:01:28 to support all these different subsystems.
01:01:31 And initially, it’s to be able to make it
01:01:33 so that they all plug in together
01:01:34 and they can share a lot more code and can be reusable.
01:01:37 But over time, we hope that the industry
01:01:39 will start collaborating and sharing code.
01:01:42 And instead of reinventing the same things over and over again,
01:01:45 that we can actually foster some of that working together
01:01:49 to solve common problem energy that
01:01:51 has been useful in the compiler field before.
01:01:54 Beyond that, MLIR is some people have joked
01:01:57 that it’s kind of LLVM too.
01:01:59 It learns a lot about what LLVM has been good
01:02:01 and what LLVM has done wrong.
01:02:04 And it’s a chance to fix that.
01:02:06 And also, there are challenges in the LLVM ecosystem as well,
01:02:09 where LLVM is very good at the thing it was designed to do.
01:02:12 But 20 years later, the world has changed.
01:02:15 And people are trying to solve higher level problems.
01:02:17 And we need some new technology.
01:02:20 And what’s the future of open source in this context?
01:02:24 Very soon.
01:02:25 So it is not yet open source.
01:02:27 But it will be hopefully in the next couple months.
01:02:29 So you still believe in the value of open source
01:02:31 in these kinds of contexts?
01:02:31 Oh, yeah.
01:02:31 Absolutely.
01:02:32 And I think that the TensorFlow community at large
01:02:36 fully believes in open source.
01:02:37 So I mean, there is a difference between Apple,
01:02:40 where you were previously, and Google now,
01:02:42 in spirit and culture.
01:02:43 And I would say the open source in TensorFlow
01:02:45 was a seminal moment in the history of software,
01:02:48 because here’s this large company releasing
01:02:51 a very large code base that’s open sourcing.
01:02:56 What are your thoughts on that?
01:02:58 Happy or not, were you to see that kind
01:03:00 of degree of open sourcing?
01:03:02 So between the two, I prefer the Google approach,
01:03:05 if that’s what you’re saying.
01:03:07 The Apple approach makes sense, given the historical context
01:03:12 that Apple came from.
01:03:13 But that’s been 35 years ago.
01:03:15 And I think that Apple is definitely adapting.
01:03:18 And the way I look at it is that there’s
01:03:20 different kinds of concerns in the space.
01:03:23 It is very rational for a business
01:03:24 to care about making money.
01:03:28 That fundamentally is what a business is about.
01:03:31 But I think it’s also incredibly realistic to say,
01:03:34 it’s not your string library that’s
01:03:36 the thing that’s going to make you money.
01:03:38 It’s going to be the amazing UI product differentiating
01:03:41 features and other things like that that you built on top
01:03:43 of your string library.
01:03:45 And so keeping your string library
01:03:48 proprietary and secret and things
01:03:50 like that is maybe not the important thing anymore.
01:03:54 Where before, platforms were different.
01:03:57 And even 15 years ago, things were a little bit different.
01:04:01 But the world is changing.
01:04:02 So Google strikes a very good balance,
01:04:04 I think.
01:04:05 And I think that TensorFlow being open source really
01:04:09 changed the entire machine learning field
01:04:12 and caused a revolution in its own right.
01:04:14 And so I think it’s amazingly forward looking
01:04:17 because I could have imagined, and I wasn’t at Google
01:04:20 at the time, but I could imagine a different context
01:04:23 and different world where a company says,
01:04:25 machine learning is critical to what we’re doing.
01:04:27 We’re not going to give it to other people.
01:04:29 And so that decision is a profoundly brilliant insight
01:04:35 that I think has really led to the world being
01:04:37 better and better for Google as well.
01:04:40 And has all kinds of ripple effects.
01:04:42 I think it is really, I mean, you
01:04:45 can’t understate Google deciding how profound that
01:04:48 is for software.
01:04:49 It’s awesome.
01:04:50 Well, and again, I can understand the concern
01:04:54 about if we release our machine learning software,
01:04:58 our competitors could go faster.
01:05:00 But on the other hand, I think that open sourcing TensorFlow
01:05:02 has been fantastic for Google.
01:05:03 And I’m sure that decision was very nonobvious at the time,
01:05:09 but I think it’s worked out very well.
01:05:11 So let’s try this real quick.
01:05:13 You were at Tesla for five months
01:05:15 as the VP of autopilot software.
01:05:17 You led the team during the transition from H hardware
01:05:20 one to hardware two.
01:05:22 I have a couple of questions.
01:05:23 So one, first of all, to me, that’s
01:05:26 one of the bravest engineering decisions undertaking really
01:05:33 ever in the automotive industry to me, software wise,
01:05:36 starting from scratch.
01:05:37 It’s a really brave engineering decision.
01:05:39 So my one question there is, what was that like?
01:05:42 What was the challenge of that?
01:05:43 Do you mean the career decision of jumping
01:05:45 from a comfortable good job into the unknown, or?
01:05:48 That combined, so at the individual level,
01:05:51 you making that decision.
01:05:54 And then when you show up, it’s a really hard engineering
01:05:57 problem.
01:05:58 So you could just stay, maybe slow down,
01:06:03 say hardware one, or those kinds of decisions.
01:06:06 Just taking it full on, let’s do this from scratch.
01:06:10 What was that like?
01:06:11 Well, so I mean, I don’t think Tesla
01:06:12 has a culture of taking things slow and seeing how it goes.
01:06:16 And one of the things that attracted me about Tesla
01:06:18 is it’s very much a gung ho, let’s change the world,
01:06:20 let’s figure it out kind of a place.
01:06:21 And so I have a huge amount of respect for that.
01:06:25 Tesla has done very smart things with hardware one
01:06:28 in particular.
01:06:29 And the hardware one design was originally
01:06:32 designed to be very simple automation features
01:06:36 in the car for like traffic aware cruise control and things
01:06:39 like that.
01:06:39 And the fact that they were able to effectively feature creep
01:06:42 it into lane holding and a very useful driver assistance
01:06:47 feature is pretty astounding, particularly given
01:06:50 the details of the hardware.
01:06:52 Hardware two built on that in a lot of ways.
01:06:54 And the challenge there was that they
01:06:56 were transitioning from a third party provided vision stack
01:07:00 to an in house built vision stack.
01:07:01 And so for the first step, which I mostly helped with,
01:07:05 was getting onto that new vision stack.
01:07:08 And that was very challenging.
01:07:10 And it was time critical for various reasons,
01:07:14 and it was a big leap.
01:07:14 But it was fortunate that it built
01:07:16 on a lot of the knowledge and expertise and the team
01:07:18 that had built hardware one’s driver assistance features.
01:07:22 So you spoke in a collected and kind way
01:07:25 about your time at Tesla, but it was ultimately not a good fit.
01:07:28 Elon Musk, we’ve talked on this podcast,
01:07:31 several guests to the course, Elon Musk
01:07:33 continues to do some of the most bold and innovative engineering
01:07:36 work in the world, at times at the cost
01:07:39 some of the members of the Tesla team.
01:07:41 What did you learn about working in this chaotic world
01:07:45 with Elon?
01:07:46 Yeah, so I guess I would say that when I was at Tesla,
01:07:50 I experienced and saw the highest degree of turnover
01:07:54 I’d ever seen in a company, which was a bit of a shock.
01:07:58 But one of the things I learned and I came to respect
01:08:00 is that Elon’s able to attract amazing talent because he
01:08:03 has a very clear vision of the future,
01:08:05 and he can get people to buy into it
01:08:07 because they want that future to happen.
01:08:09 And the power of vision is something
01:08:11 that I have a tremendous amount of respect for.
01:08:14 And I think that Elon is fairly singular
01:08:17 in the world in terms of the things
01:08:20 he’s able to get people to believe in.
01:08:22 And there are many people that stand in the street corner
01:08:27 and say, ah, we’re going to go to Mars, right?
01:08:30 But then there are a few people that
01:08:31 can get others to buy into it and believe and build the path
01:08:35 and make it happen.
01:08:36 And so I respect that.
01:08:39 I don’t respect all of his methods,
01:08:41 but I have a huge amount of respect for that.
01:08:45 You’ve mentioned in a few places,
01:08:46 including in this context, working hard.
01:08:50 What does it mean to work hard?
01:08:52 And when you look back at your life,
01:08:53 what were some of the most brutal periods
01:08:57 of having to really put everything
01:09:00 you have into something?
01:09:03 Yeah, good question.
01:09:05 So working hard can be defined a lot of different ways,
01:09:07 so a lot of hours, and so that is true.
01:09:12 The thing to me that’s the hardest
01:09:14 is both being short term focused on delivering and executing
01:09:18 and making a thing happen while also thinking
01:09:21 about the longer term and trying to balance that.
01:09:24 Because if you are myopically focused on solving a task
01:09:28 and getting that done and only think
01:09:31 about that incremental next step,
01:09:32 you will miss the next big hill you should jump over to.
01:09:36 And so I’ve been really fortunate that I’ve
01:09:39 been able to kind of oscillate between the two.
01:09:42 And historically at Apple, for example, that
01:09:45 was made possible because I was able to work with some really
01:09:47 amazing people and build up teams and leadership
01:09:50 structures and allow them to grow in their careers
01:09:55 and take on responsibility, thereby freeing up
01:09:58 me to be a little bit crazy and thinking about the next thing.
01:10:02 And so it’s a lot of that.
01:10:04 But it’s also about with experience,
01:10:06 you make connections that other people don’t necessarily make.
01:10:10 And so I think that’s a big part as well.
01:10:12 But the bedrock is just a lot of hours.
01:10:16 And that’s OK with me.
01:10:19 There’s different theories on work life balance.
01:10:21 And my theory for myself, which I do not project onto the team,
01:10:25 but my theory for myself is that I
01:10:28 want to love what I’m doing and work really hard.
01:10:30 And my purpose, I feel like, and my goal is to change the world
01:10:35 and make it a better place.
01:10:36 And that’s what I’m really motivated to do.
01:10:40 So last question, LLVM logo is a dragon.
01:10:44 You explain that this is because dragons have connotations
01:10:47 of power, speed, intelligence.
01:10:50 It can also be sleek, elegant, and modular,
01:10:53 though you remove the modular part.
01:10:56 What is your favorite dragon related character
01:10:58 from fiction, video, or movies?
01:11:01 So those are all very kind ways of explaining it.
01:11:03 Do you want to know the real reason it’s a dragon?
01:11:06 Yeah.
01:11:07 Is that better?
01:11:07 So there is a seminal book on compiler design
01:11:11 called The Dragon Book.
01:11:12 And so this is a really old now book on compilers.
01:11:16 And so the dragon logo for LLVM came about because at Apple,
01:11:22 we kept talking about LLVM related technologies
01:11:24 and there’s no logo to put on a slide.
01:11:26 And so we’re like, what do we do?
01:11:28 And somebody’s like, well, what kind of logo
01:11:30 should a compiler technology have?
01:11:32 And I’m like, I don’t know.
01:11:33 I mean, the dragon is the best thing that we’ve got.
01:11:37 And Apple somehow magically came up with the logo.
01:11:41 And it was a great thing.
01:11:42 And the whole community rallied around it.
01:11:44 And then it got better as other graphic designers
01:11:46 got involved.
01:11:47 But that’s originally where it came from.
01:11:49 The story.
01:11:50 Is there dragons from fiction that you
01:11:51 connect with, that Game of Thrones, Lord of the Rings,
01:11:57 that kind of thing?
01:11:58 Lord of the Rings is great.
01:11:59 I also like role playing games and things
01:12:00 like computer role playing games.
01:12:02 And so dragons often show up in there.
01:12:04 But really, it comes back to the book.
01:12:07 Oh, no, we need a thing.
01:12:09 And hilariously, one of the funny things about LLVM
01:12:13 is that my wife, who’s amazing, runs the LLVM Foundation.
01:12:19 And she goes to Grace Hopper and is
01:12:21 trying to get more women involved in the.
01:12:23 She’s also a compiler engineer.
01:12:24 So she’s trying to get other women
01:12:26 to get interested in compilers and things like this.
01:12:28 And so she hands out the stickers.
01:12:30 And people like the LLVM sticker because of Game of Thrones.
01:12:34 And so sometimes culture has this helpful effect
01:12:36 to get the next generation of compiler engineers
01:12:39 engaged with the cause.
01:12:42 OK, awesome.
01:12:43 Chris, thanks so much for talking with us.
01:12:44 It’s been great talking with you.