Transcript
00:00:00 The following is a conversation with Travis Oliphant,
00:00:03 one of the most impactful programmers
00:00:05 and data scientists ever.
00:00:07 He created NumPy, SciPy, and Anaconda.
00:00:12 NumPy formed the foundation
00:00:14 of tensor based machine learning in Python,
00:00:17 SciPy formed the foundation
00:00:18 of scientific programming in Python,
00:00:20 and Anaconda, specifically with Conda,
00:00:23 made Python more accessible to a much larger audience.
00:00:27 Travis’s life work across a large number of programming
00:00:31 and entrepreneurial efforts has and will continue
00:00:34 to have immeasurable impact on millions of lives
00:00:38 by empowering scientists and engineers
00:00:41 in big companies, small companies,
00:00:43 and open source communities to take on difficult problems
00:00:47 and solve them with the power of programming.
00:00:50 Plus, he’s a truly kind human being,
00:00:53 which is something that when combined with vision
00:00:56 and ambition makes for a great leader
00:00:58 and a great person to chat with.
00:01:01 To support this podcast,
00:01:02 please check out our sponsors in the description.
00:01:04 This is the Lex Friedman Podcast,
00:01:06 and here is my conversation with Travis Oliphant.
00:01:11 What was the first computer program you’ve ever written?
00:01:14 Do you remember?
00:01:15 Whoa, that’s a good question.
00:01:16 I think it was in fourth grade.
00:01:18 Just a simple loop in BASIC.
00:01:20 BASIC. BASIC, yeah, on an Atari 800,
00:01:23 Atari 400, I think, or maybe it was an Atari 800.
00:01:26 It was a part of a class,
00:01:28 and we just were just BASIC loops to print things out.
00:01:32 Did you use go to statements?
00:01:34 Yes, yes, we used go to statements.
00:01:38 I remember in the early days,
00:01:39 that’s when I first realized
00:01:41 there’s like principles to programming,
00:01:43 when I was told that don’t use go to statements.
00:01:45 Those are bad software engineering principles,
00:01:48 like it goes against what great, beautiful code is.
00:01:52 I was like, oh, okay, there’s rules to this game.
00:01:54 I didn’t see that until high school
00:01:56 when I took an AP computer science course.
00:01:58 I did a lot of other kinds of just programming in TI,
00:02:02 but finally, when I took an AP computer science course
00:02:04 in Pascal.
00:02:05 Wow.
00:02:06 That’s, yeah, it was Pascal.
00:02:07 That’s when I, oh, there are these principles.
00:02:09 Not C or C++?
00:02:11 No, I didn’t take C until the next year in college.
00:02:14 I had a course in C, but I haven’t done much in Pascal,
00:02:18 just that AP computer science course.
00:02:20 Now, sorry for the romanticized question,
00:02:23 but when did you first fall in love with programming?
00:02:26 Oh, man, good question.
00:02:27 I think actually when I was 10,
00:02:30 my dad got us a TI Timex Sinclair,
00:02:33 and he was excited about the spreadsheet capability,
00:02:37 and then, but I made him get the basic,
00:02:39 the add ons we could actually program in basic,
00:02:41 and just being able to write instructions
00:02:44 and have the computer do something.
00:02:45 Then we got a TI 994A when I was about 12,
00:02:50 and I would just, it had sprites and graphics and music.
00:02:52 You could actually program it to do music.
00:02:55 That’s when I really sort of fell in love with programming.
00:02:58 So this is a full, like a real computer
00:03:01 with like, with memory and storage,
00:03:04 processors and whatnot,
00:03:05 because you say TI. Yeah, the Timex Sinclair
00:03:07 was one of the very first, it was a cheap, cheap,
00:03:09 like, I think it was, well, it was still expensive,
00:03:12 but it was 2K of memory.
00:03:14 We got the 16K add on pack,
00:03:16 but yeah, it had memory, and you could program it.
00:03:19 You had the, in order to store your programs,
00:03:20 you had to attach a tape drive.
00:03:22 Remember that old, the sound that would play
00:03:24 when you converted the modems would convert digital bits
00:03:29 to audio files set on a tape drive.
00:03:31 Still remember that sound, but that was the storage.
00:03:34 And what was the programming language, do you remember?
00:03:36 It was basic. It was basic.
00:03:37 And then they had a VisiCalc,
00:03:38 and so a little bit of spreadsheet programming
00:03:40 in VisiCalc, but mostly just some basic.
00:03:42 Do you remember what kind of things drew you to programming?
00:03:46 Was it working with data, was it video games?
00:03:50 Games, math, mathy stuff?
00:03:52 Yeah, I’ve always loved math,
00:03:54 and a lot of people think they don’t like math
00:03:58 because I think when they’re exposed to it early,
00:04:00 it’s about memory.
00:04:02 When you’re exposed to math early,
00:04:03 you have a good short term memory,
00:04:04 can remember his timetables.
00:04:05 And I do have a reasonably, I mean, not perfect,
00:04:08 but a reasonably long little short term memory buffer.
00:04:12 And so I did great at timetables.
00:04:14 I said, oh, I’m good at math.
00:04:15 But I started to really like math,
00:04:17 just the problem solving aspect.
00:04:20 And so computing was problem solving applied.
00:04:25 And so that’s always kind of been the draw,
00:04:28 kind of coupled with the mathematics.
00:04:30 Did you ever see the computer as like an extension
00:04:33 of your mind, like something able to achieve?
00:04:36 Not till later.
00:04:37 Okay.
00:04:38 Yeah, not then.
00:04:39 It’s just like a little set of puzzles
00:04:40 that you can play with and you can play with math puzzles.
00:04:43 Yeah, it was too rudimentary early on.
00:04:46 Like it was sort of, yeah, it was a lot of work
00:04:49 to actually take a thought you’d have
00:04:51 and actually get it implemented.
00:04:53 And that’s still work, but it’s getting easier.
00:04:56 And so yeah, I would say that’s definitely
00:04:58 what’s attracting me to Python
00:04:59 is that that was more real, right?
00:05:02 I could think in Python.
00:05:04 Speaking of foreign language,
00:05:05 I only speak another language fluently besides English,
00:05:08 which is Spanish.
00:05:09 And I remember the day when I would dream in Spanish
00:05:11 and you start to think in that language.
00:05:13 And then you actually, I do definitely believe
00:05:15 that language limits or expands your thinking.
00:05:19 There’s some languages that actually lead you
00:05:21 to certain thought processes.
00:05:23 Yeah, like, so I speak Russian fluently
00:05:27 and that’s certainly a language that leads you
00:05:30 down certain thought processes.
00:05:33 Well, yeah, I mean, there’s a history
00:05:36 of the two world wars of millions of people starving
00:05:41 to death or near to death throughout its history
00:05:44 of suffering, of injustice, like this promise sold
00:05:48 to the people and then the carpet
00:05:50 or whatever is swept from under them.
00:05:53 And it’s like broken promises.
00:05:54 And all of that pain and melancholy is in the language,
00:05:58 the sad songs, the sad hopeful songs,
00:06:01 the over romanticized, like, I love you, I hate you,
00:06:05 the sort of the swings between all the various spectrums
00:06:09 of emotion, so that’s all within the language.
00:06:13 The way it’s twisted, there’s a strong culture
00:06:18 of rhyming poetry, so like the bards,
00:06:20 like the sync, there’s a musicality to the language too.
00:06:24 Did Dostoevsky write in Russian?
00:06:27 Yeah, so like Dostoevsky, Tostoy, all the,
00:06:32 all the.
00:06:32 The ones that I know about, which are translated
00:06:34 and I’m curious how the translations.
00:06:36 So Dostoevsky did not use the musicality
00:06:40 of the language too much.
00:06:42 So it actually translates pretty well
00:06:44 because it’s so philosophically dense
00:06:46 that the story does a lot of the work,
00:06:48 but there’s a bunch of things that are untranslatable.
00:06:51 Certainly the poetry is not translatable.
00:06:53 I actually have a few conversations coming up offline
00:06:57 and also in this podcast with people
00:06:59 who’ve translated Dostoevsky.
00:07:01 And that’s for people who worked, who work in this field,
00:07:06 know how difficult that is.
00:07:07 Sometimes you can spend months thinking
00:07:10 about a single sentence, right?
00:07:12 In context, like, cause there’s just the magic
00:07:15 captured by that sentence and how do you translate
00:07:17 just in the right way?
00:07:18 Because those words can be really powerful.
00:07:22 There’s a famous line,
00:07:24 beauty will save the world from Dostoevsky.
00:07:27 You know, there’s so many ways to translate that.
00:07:29 And you’re right, the language gives you the tools
00:07:32 with which to tell the story,
00:07:34 but it also leads your mind down certain trajectories
00:07:37 and paths to where over time,
00:07:39 as you think in that language,
00:07:41 you become a different human being.
00:07:42 Yes. Yeah.
00:07:43 Yeah, that’s a fascinating reality, I think.
00:07:45 I know people have explored that,
00:07:47 but it’s just rediscovered.
00:07:49 Well, we don’t, we live in our own like little pockets.
00:07:52 Like this is the sad thing is I feel like unfortunately,
00:07:56 given time and given getting older,
00:07:59 I’ll never know China, the Chinese world,
00:08:03 because I don’t truly know the language.
00:08:05 Same with Japanese, I don’t truly know Japanese
00:08:08 and Portuguese and Brazil,
00:08:10 that whole South American continent.
00:08:12 Like, yeah, I’ll go to Brazil and Argentina,
00:08:14 but will I truly understand the people
00:08:17 if I don’t understand the language?
00:08:18 It’s sad because I wonder how much,
00:08:23 how many geniuses were missing
00:08:25 because so much of the scientific world,
00:08:28 so much of the technical world is in English,
00:08:31 and so much of it might be lost
00:08:33 because it’s just we don’t have the common language.
00:08:36 I completely agree.
00:08:36 I’m very much in that vein of there’s a lot of genius
00:08:40 out there that we miss,
00:08:41 and it’s sort of fortunate when it bubbles up
00:08:45 into something that we can understand or process,
00:08:48 there’s a lot we miss.
00:08:50 So I tend to lean towards really loving democratization
00:08:54 or things that empower people
00:08:55 or very resistant sort of authoritarian structures.
00:09:00 Fundamentally for that reason,
00:09:01 well, several reasons, but it just hurts us.
00:09:04 We’re soft.
00:09:06 So speaking of languages that empower you,
00:09:09 so Python was the first language for me
00:09:11 that I really enjoyed thinking in, as you said.
00:09:16 Sounds like you shared my experience too.
00:09:18 So when did you first,
00:09:19 do you remember when you first kind of connected with Python,
00:09:21 maybe even fell in love with Python?
00:09:23 It’s a good question.
00:09:24 It was a process.
00:09:25 It took about a year.
00:09:26 I first encountered Python in 1997.
00:09:29 I was a graduate student studying biomedical engineering
00:09:31 at the Mayo Clinic.
00:09:32 And I had previously,
00:09:34 I’d been involved in taking information from satellites.
00:09:39 I was an electrical engineering student
00:09:41 used to taking information
00:09:42 and trying to get something out of it,
00:09:44 doing some data processing, getting information out of it.
00:09:46 And I’d done that in MATLAB.
00:09:47 I’d done that in Perl.
00:09:49 I’d done that in scripting on a VMS.
00:09:52 There’s actually a VAX VMS system,
00:09:54 they had their own little scripting tools around Fortran.
00:09:57 Done a lot of that.
00:09:58 And then as a graduate student,
00:10:00 I was looking for something and encountered Python.
00:10:04 And because Python had an array,
00:10:06 had two things that made me not filter it away.
00:10:09 Because I was filtering a bunch of stuff,
00:10:10 as Yorick, I looked at Yorick,
00:10:11 I looked at a few other languages that are out there
00:10:14 at the time in 1997, but it had arrays.
00:10:17 There’s a library called Numeric
00:10:19 that had just been written in 95,
00:10:20 like not very, not too much earlier.
00:10:23 By an MIT alum, Jim Huganen.
00:10:26 You know, and I went back and read the mailing list
00:10:29 to see the history of how it grew.
00:10:30 And there was a very interesting,
00:10:31 it’s fascinating to do that actually,
00:10:32 to see how this emergent cooperation,
00:10:36 unstructured cooperation happens in the open source world
00:10:39 that led to a lot of this collective programming,
00:10:43 which is something maybe we might get into a little later,
00:10:45 but what that looks like.
00:10:46 What gap did Numeric fill?
00:10:48 Numeric filled the gap of having an array object.
00:10:50 There was no array object.
00:10:51 There was no array.
00:10:52 There was a one dimensional byte concept,
00:10:55 but there was no n dimensional,
00:10:57 two, three, four dimensional tensor they call it now.
00:11:00 I’m still in the category that a tensor is another thing
00:11:03 and it’s just an ndarray we should call it,
00:11:05 but kind of lost that battle.
00:11:08 There’s many battles in this world,
00:11:10 some of which we win, some we lose.
00:11:12 That’s exactly right.
00:11:13 So, but it had no math to it.
00:11:17 So Numeric had math and a basic way to think in arrays.
00:11:20 So I was looking for that,
00:11:21 and it had complex numbers,
00:11:24 a lot of programming languages.
00:11:26 And you can see it because,
00:11:28 if you’re just a computer scientist,
00:11:29 you think, ah, complex numbers are just two floats.
00:11:32 So you can, people can build that on.
00:11:34 But in practice, a complex number
00:11:36 as one of the significant algebras
00:11:38 that helps connect a lot of physical
00:11:40 and mathematical ideas,
00:11:42 particularly FFT for an electrical engineer.
00:11:45 And it’s a really important concept
00:11:48 and not having it means you have to develop it
00:11:50 several times and those times may not share an approach.
00:11:54 One of the common things in programming,
00:11:55 one of the things programming enables is abstractions.
00:11:59 But when you have shared abstractions, it’s even better.
00:12:01 It sort of gets to the level of language
00:12:02 of actually we all think of this the same way,
00:12:05 which is both powerful and dangerous, right?
00:12:07 Because powerful in that we now can quickly
00:12:11 make bigger and higher level things
00:12:13 on top of those abstractions dangerous
00:12:14 because it also limits us as to the things
00:12:17 we maybe left behind in producing that abstraction,
00:12:20 which is at the heart of programming today
00:12:21 and actually building around the programming world.
00:12:24 I think it’s a fascinating philosophical topic.
00:12:26 Yeah, they will continue for many years, I think.
00:12:28 They’ll continue for many years.
00:12:29 As we build more and more and more abstractions.
00:12:31 Yes, I often think about, you know,
00:12:32 we have a world that’s built on these abstractions
00:12:35 that were they the only ones possible?
00:12:37 Certainly not, but they led to,
00:12:39 you know, it’s very hard to do it differently.
00:12:42 Like there’s an inertia that’s very hard to,
00:12:44 you know, push out, push away from.
00:12:47 That has implications for things like,
00:12:49 you know, the Julia language,
00:12:50 which you have heard of, I’m sure.
00:12:52 And I’ve met the creators and I liked Julia.
00:12:55 It’s a really cool language,
00:12:56 but they struggled to kind of against the,
00:12:59 just the tide of like this inertia of people using Python.
00:13:03 And, you know, there’s strategies to approach that,
00:13:05 but nonetheless, it’s a phenomena.
00:13:07 And sometimes, so I love complex numbers
00:13:09 and I love to raise, so I looked at Python.
00:13:12 And then I had the experience, I did some stuff in Python
00:13:15 and I was just doing my PhD.
00:13:16 So I was out, my focus was on,
00:13:19 I was actually doing a combination of MRI and ultrasound
00:13:22 and looking at a phenomenon called elastography,
00:13:24 which is you push waves into the body
00:13:27 and observe those waves, like you can actually measure them.
00:13:30 And then you do mathematical inversion
00:13:32 to see what the elasticity is.
00:13:35 And so that’s the problem I was solving
00:13:36 is how to do that with both ultrasound and MRI.
00:13:39 I needed some tool to do that with.
00:13:41 So I was starting to use Python in 97.
00:13:44 In 98, I went back, looked at what I’d written
00:13:47 and realized I could still understand it,
00:13:49 which is not the experience I’d had
00:13:50 when doing Perl in 95, right?
00:13:53 I’d done the same thing and then I looked back
00:13:55 and I forgotten what I was even saying.
00:13:58 Now, you know, I’m not saying, so that may,
00:14:00 hey, this may work, I like this.
00:14:02 This is something I can retain
00:14:04 without becoming an expert per se.
00:14:07 And so that led me to go, I’m gonna push more into this.
00:14:10 And then that 98 was kind of when I started
00:14:14 to fall in love with Python, I would say.
00:14:18 A few peculiar things about Python.
00:14:20 So maybe compare it to Perl,
00:14:22 compare it to some of the other languages.
00:14:24 So there’s no braces.
00:14:26 Yeah.
00:14:27 So space is used, indentation, I should say,
00:14:31 is used as part of the language.
00:14:33 Yeah, right.
00:14:35 So did you, I mean, that’s quite a leap.
00:14:39 Were you comfortable with that leap
00:14:41 or were you just very open minded?
00:14:42 It’s a good question.
00:14:43 I was open minded, so I was cognizant of the concern.
00:14:48 And it definitely has, it has specific challenges.
00:14:52 You know, cut and pasting.
00:14:53 For example, when you’re cut and pasting code,
00:14:55 and if your editors aren’t supportive of that,
00:14:57 if you’re putting it into a terminal,
00:14:58 and particularly in the past when terminals
00:15:01 didn’t necessarily have the intelligence to manage it now.
00:15:03 Now, I, Python, and Jupyter Notebooks
00:15:05 handle that just fine, so there’s really no problem.
00:15:06 But in the past, it created some challenges,
00:15:08 formatting challenges, also mixed tabs and spaces.
00:15:12 If editors weren’t, you weren’t clear
00:15:14 on what was happening, you would have these issues.
00:15:16 So there were really concrete reasons about it
00:15:19 that I heard and understood.
00:15:20 I never really encountered a problem with it personally.
00:15:23 Like, it was occasional annoyances,
00:15:26 but I really liked the fact
00:15:28 that it didn’t have all this extra characters, right?
00:15:31 That these extra characters didn’t show up
00:15:33 in my visual field when I was just trying
00:15:35 to process understanding a snippet of code.
00:15:38 Yeah, there’s a cleanness to it.
00:15:39 But, I mean, the idea is supposed to be
00:15:41 that Perl also has a cleanness to it
00:15:43 because of the minimalism of how many characters
00:15:46 it takes to express a certain thing.
00:15:48 So it’s very compact.
00:15:49 But what you realize with that compactness comes,
00:15:53 there’s a culture that prizes compactness,
00:15:57 and so the code gets more and more compact
00:15:58 and less and less readable to a point where it’s like,
00:16:03 like, to be a good programmer in Perl,
00:16:05 you write code that’s basically unreadable.
00:16:07 There’s a culture, like.
00:16:09 Correct, and you’re proud of it.
00:16:10 Yeah, you’re proud of it.
00:16:12 Right, exactly, and it’s like, feels good.
00:16:14 And it’s really selective.
00:16:16 It means you have to be an expert in Perl to understand it.
00:16:20 Whereas Python allowed you not to have to be an expert.
00:16:22 You didn’t have to take all this brain energy.
00:16:24 You could leverage, what I say,
00:16:25 you could leverage your English language center,
00:16:28 which you’re using all the time.
00:16:29 I’ve wondered about other languages,
00:16:31 particularly non Latin based languages.
00:16:34 Latin based languages with the characters are at least similar.
00:16:37 I think people have an easier time,
00:16:38 but I don’t know what it’s like to be a Japanese
00:16:41 or a Chinese person trying to learn different syntax.
00:16:46 Like, what would computer programming look like in that?
00:16:49 I haven’t looked at that at all,
00:16:50 but it certainly doesn’t,
00:16:52 you know, leveraging your Chinese language center,
00:16:54 I’m not sure Python or any programming does that.
00:16:57 But that was a big deal.
00:16:58 The fact that it was accessible, I could be a scientist.
00:17:00 What I really liked is many programming languages
00:17:02 really demand a lot of you, and you can get a lot,
00:17:04 you know, you do a lot if you learn it.
00:17:07 But Python enables you to do a lot
00:17:08 without demanding a lot of you.
00:17:11 There’s nuance to that statement,
00:17:13 but it certainly was, it’s more accessible.
00:17:15 So more people could actually, as a scientist,
00:17:18 as somebody who, or an engineer,
00:17:19 who was trying to solve another problem
00:17:21 besides point programming,
00:17:23 I could still use this language and get things done
00:17:26 and be happy about it.
00:17:27 And I was also comfortable in C at that time.
00:17:30 And MATLAB, you did a little bit of that.
00:17:31 And MATLAB, I did a lot before that, exactly.
00:17:33 So I was comfortable in,
00:17:34 those three languages were really the tools I used
00:17:37 during my studies and schooling.
00:17:40 But to your point about language helping you think,
00:17:42 one of the big things about MATLAB was it was,
00:17:44 and APL before it, I don’t know if you remember APL.
00:17:48 APL is actually the predecessor of array based programming,
00:17:51 which I think is really an underappreciated,
00:17:54 if I talk to people who are just steeped
00:17:55 in computer programming, computer science,
00:17:57 like most of the people that Microsoft has hired
00:17:59 in the past, for example,
00:18:01 Microsoft as a company generally did not understand
00:18:03 array based programming.
00:18:05 Like culturally, they didn’t understand it.
00:18:06 So they kept missing the boat,
00:18:08 kept missing the understanding of what this was.
00:18:11 They’ve gotten better,
00:18:12 but there’s still a whole culture of folks
00:18:14 that doesn’t, programming, that’s systems programming
00:18:17 or web programming or lists and maps.
00:18:20 And what about an n dimensional array?
00:18:22 Oh yeah, that’s just an implementation detail.
00:18:24 Well, you can think that,
00:18:26 but then actually if you have that as a construct,
00:18:28 you actually think differently.
00:18:29 APL was the first language to understand that.
00:18:31 And it was in the sixties, right?
00:18:33 The challenge of APL is APL had very dense,
00:18:36 not only glyphs, like new characters, new glyphs,
00:18:39 but they even had a new keyboard
00:18:40 because to produce those glyphs,
00:18:42 this is back in the early days in computing
00:18:43 when the QWERTY keyboard maybe wasn’t as established,
00:18:47 like, well, we can have a new keyboard, no big deal.
00:18:50 But it was a big deal and it didn’t catch on.
00:18:52 And the language APL, very much like Perl,
00:18:56 as people would pride themselves on how much,
00:18:58 could they write the game of life
00:18:59 in 30 characters of APL.
00:19:03 APL has characters that mean summation
00:19:06 and they have adverbs,
00:19:08 they would have adjectives and these things called adverbs,
00:19:10 which are like methods, like reduction,
00:19:12 reduction would be an adverb on an ad operator, right?
00:19:15 So, but doing, using these tools you could construct
00:19:18 and then you start to think at that level,
00:19:20 you think in n dimensions is something I like to say,
00:19:22 and you start to think differently about data at that point.
00:19:25 Now you’re, it really helps.
00:19:27 Yeah, I mean, outside of programming,
00:19:30 if you really internalize linear algebra as a course,
00:19:33 I mean, it’s philosophically allows you
00:19:35 to think of the world differently.
00:19:37 It’s almost like liberating, you don’t have to,
00:19:39 you don’t have to think about the individual numbers
00:19:42 in the n dimensional array.
00:19:44 You could think of it as an object in itself
00:19:46 and all of a sudden this world can open up.
00:19:48 You’re saying MATLAB and APL were like the early C,
00:19:52 I don’t know if many languages got that right ever.
00:19:54 No, no, no they didn’t.
00:19:56 Even still.
00:19:57 Even still, I would say.
00:19:58 I mean, NumPy is an inheritor of the traditions
00:20:02 that I would say APLJ was another version that was,
00:20:06 what it did is not have the glyphs,
00:20:08 just have short characters,
00:20:09 but still a Latin keyboard could type them.
00:20:11 And then numeric inherited from that
00:20:14 in terms of let’s add arrays plus broadcasting
00:20:17 plus methods, reduction,
00:20:19 even some of the language like rank is a concept
00:20:21 that was in Python and is still in Python
00:20:24 for the number of dimensions, right?
00:20:27 That’s different than say the rank of a matrix
00:20:29 which people think of as well.
00:20:31 So it came from that tradition,
00:20:33 but NumPy is a very pragmatic, practical tool.
00:20:37 NumPy inherited from numeric
00:20:39 and we can get to where NumPy came from
00:20:40 which is the current array,
00:20:43 at least current as of 2015, 2017.
00:20:46 Now there’s a ton of them over the past two or three years.
00:20:49 We can get into that too.
00:20:50 So if we just linger on the early days
00:20:52 of what was your favorite feature of Python?
00:20:56 Do you remember like what?
00:20:58 So it’s so interesting to linger on like the,
00:21:02 what really makes you connect with a language?
00:21:06 I’m not sure it’s obvious to introspect that.
00:21:09 No, it isn’t.
00:21:10 And I’ve thought about that at some length.
00:21:12 I think definitely the fact that I could read it later,
00:21:16 that I could use it productively
00:21:18 without becoming an expert.
00:21:19 Other language I had to put more effort into.
00:21:22 That’s like an empirical observation.
00:21:23 Like you’re not analyzing any one aspect of the language.
00:21:26 It just seems time after time when you look back,
00:21:29 it’s somehow readable.
00:21:30 It’s somehow readable.
00:21:31 Then it was sort of, I could take executable English
00:21:35 and translate it to Python more easily.
00:21:36 Like I didn’t have to go, there was no translation layer.
00:21:39 As an engineer or as a scientist,
00:21:41 I could think about what I wanted to do.
00:21:43 And then the syntax wasn’t that far behind it, right?
00:21:46 Now there are some warts there still.
00:21:49 It wasn’t perfect.
00:21:50 Like there’s some areas where I’m like,
00:21:51 ah, it’d be better if this were different
00:21:52 or if this were different.
00:21:54 Some of those things got added to the language too.
00:21:56 I was really grateful for some of the early pioneers
00:21:58 in the Python ecosystem back,
00:22:00 because Python got written in 91.
00:22:01 That’s when the first version came out.
00:22:03 But Guido was very open to users.
00:22:06 And one of the sets of users were people like Jim Huganen
00:22:08 and David Asher and Paul Dubois and Conrad Hinson.
00:22:13 These were people that were on the main list.
00:22:15 And they were just asking for things like,
00:22:16 hey, we really should have complex numbers in this language.
00:22:19 So let’s, you know, there’s a J, there’s a one J, right?
00:22:22 And the fact that they went the engineering route of J
00:22:24 is interesting.
00:22:26 I don’t think that’s entirely favoring engineers.
00:22:28 I think it’s because I is so often used
00:22:30 as the index of a for loop.
00:22:32 So I think that’s actually why.
00:22:34 Probably, I mean, there’s a pragmatic aspect.
00:22:36 But the fact that complex numbers were there, I love that.
00:22:39 The fact that I could write in the array constructs
00:22:41 and that reduction was there,
00:22:42 very simple to write summations and broadcasting was there.
00:22:46 I could do addition of whole arrays.
00:22:49 So that was cool.
00:22:50 Those are some things I loved about it.
00:22:52 I don’t know what to start talking to you about
00:22:54 because you’ve created so many incredible projects
00:22:57 that basically changed the whole landscape of programming.
00:23:00 But okay, let’s start with,
00:23:02 let’s go chronologically with SciPy.
00:23:06 You created SciPy over two decades ago now?
00:23:09 Yes, yes, I love to talk about SciPy.
00:23:11 SciPy was really my baby.
00:23:12 What is it?
00:23:14 What was its goal?
00:23:15 What is its goal?
00:23:16 How does it work?
00:23:17 Yeah, fantastic.
00:23:18 So SciPy was effectively, here I am using Python
00:23:21 to do stuff that I previously used MATLAB to use.
00:23:24 And I was using numeric, which is an array library
00:23:26 that made a lot of it possible.
00:23:28 But there’s things that were missing.
00:23:29 Like I didn’t have an ordinary differential equation solver
00:23:32 I could just call, right?
00:23:33 I didn’t have integration.
00:23:35 Hey, I wanted to integrate this function.
00:23:37 Okay, well, I don’t have just a function
00:23:38 I can call to do that.
00:23:40 These are things I remember being critical things
00:23:42 that I was missing.
00:23:43 Optimization.
00:23:44 I just wanna pass a function to an optimizer
00:23:46 and have it tell me what the optimal value is.
00:23:50 Those are things I’m like, well,
00:23:51 why don’t we just write a library that adds these tools?
00:23:54 And I started to post on the mailing list
00:23:55 and there’d previously been, people have discussed,
00:23:58 I remember Conrad Henson saying,
00:23:59 wouldn’t it be great if we had this optimizer library
00:24:00 or David Ashwood say this stuff.
00:24:02 And I’m a ambitious, ambitious is the wrong word,
00:24:06 an eager and probably more time than sense.
00:24:11 I was a poor graduate student.
00:24:13 My wife thinks I’m working on my PhD and I am,
00:24:15 but part of the PhD that I loved
00:24:17 was the fact that it’s exploratory.
00:24:19 You’re not just taking orders,
00:24:21 fulfilling a list of things to do,
00:24:23 you’re trying to figure out what to do.
00:24:25 And so I thought, well, I’m running tools
00:24:27 for my own use and a PhD,
00:24:29 so I’ll just start this project.
00:24:32 And so in 99, 98 was when I first started
00:24:34 to write libraries for Python.
00:24:36 Definitely when I fell in love with Python 98,
00:24:38 I thought, oh, well, there’s just a few things missing.
00:24:39 Like, oh, I need a reader to read DICOM files.
00:24:42 I was in medical imaging and DICOM was a format
00:24:44 that I want to be able to load that into Python.
00:24:46 Okay, how do I write a reader for that?
00:24:48 So I wrote something called, it was an IO package, right?
00:24:51 And that was my very first extension module, which is C.
00:24:55 So I wrote C code to extend Python
00:24:57 so that in Python I could write things more easily.
00:24:59 That combination kind of hooked me.
00:25:02 It was the idea that I could,
00:25:03 here’s this powerful tool I can use as a scripting language
00:25:05 and a high level language to think about,
00:25:07 but that I can extend easily, easily in C,
00:25:11 easily for me because I knew enough C.
00:25:13 And then Guido had written a link.
00:25:15 I mean, the only, the hard part of extending Python
00:25:17 was something called the way memory management networks,
00:25:19 and you have to do reference counting.
00:25:21 And so there’s a tracking of reference counting
00:25:23 you have to do manually.
00:25:25 And if you don’t, you have memory leaks.
00:25:27 And so that’s hard.
00:25:29 Plus then C, you know, it’s just much more,
00:25:31 you have to put more effort into it.
00:25:32 It’s not just, I have to now think about pointers
00:25:34 and I have to think about stuff that is different.
00:25:37 I have to kind of,
00:25:38 you’re like putting a new cartridge in your brain.
00:25:40 Like, okay, I’m thinking about MRI.
00:25:42 Now I’m thinking about programming.
00:25:43 And there are distinct modules
00:25:45 you end up having to think about.
00:25:46 So it’s harder.
00:25:47 And when I was just in Python,
00:25:48 I could just think about MRI and high level writing,
00:25:51 but I could do that.
00:25:52 And that kind of, I liked it.
00:25:54 I found that to be enjoyable and fun.
00:25:55 And so I ended up, oh,
00:25:57 well, let me just add a bunch of stuff to Python
00:25:59 to do integration.
00:26:00 Well, and the cool thing is,
00:26:01 is that the power of the internet,
00:26:03 just looking around and I found,
00:26:04 oh, there’s this NetLive,
00:26:06 which has hundreds of 4chan routines
00:26:08 that people have written in the 60s and the 70s and the 80s
00:26:12 in 4chan 77, fortunately, it wasn’t 4chan 16.
00:26:14 So it had been ported to 4chan 77.
00:26:18 And 4chan 77 is actually a really great language.
00:26:21 4chan 90 probably is my favorite 4chan
00:26:24 because it’s also, it’s got complex numbers,
00:26:26 got arrays and it’s pretty high level.
00:26:27 Now, the problem with it
00:26:28 is you’d never want to write a program in 4chan 90
00:26:31 or 4chan 77,
00:26:32 but it’s totally fine to write a subroutine in, right?
00:26:34 And so, and then 4chan kind of got a little off course
00:26:37 when they tried to compete with C++.
00:26:39 But at the time,
00:26:40 I just want libraries to do something like,
00:26:42 oh, here’s an ordinary differential equation.
00:26:43 Here’s integration.
00:26:44 Here’s runge cut integration.
00:26:46 Already done.
00:26:47 I don’t have to think about that algorithm.
00:26:48 I mean, you could,
00:26:49 but it’s nice to have somebody who’s already done one
00:26:51 and tested it.
00:26:51 And so I sort of started this journey in 98, really.
00:26:55 If you look back at the mailing list,
00:26:55 there’s sort of this productive era of me
00:26:59 writing an extension module
00:27:01 to connect runge cut integration to Python
00:27:04 and making an ordinary differential equation solver.
00:27:06 And then releasing that as a package.
00:27:09 So we could call ODE pack, I think I called it then.
00:27:11 Quad pack.
00:27:12 And then I just made these packages.
00:27:14 Eventually that became multipack
00:27:16 because they’re originally modular.
00:27:17 You can install them separately.
00:27:19 But a massive problem in Python
00:27:20 was actually just getting your stuff installed.
00:27:23 At the time, releasing software for me,
00:27:25 like today it’s people think, what does that mean?
00:27:27 Well, then it meant some poorly written webpage.
00:27:30 I had some bad webpage up and I put a tarball,
00:27:33 just a GZIP tarball of source code.
00:27:35 That was the release.
00:27:37 But okay, can we just stand that?
00:27:39 Because the community aspect
00:27:43 of creating the package and sharing that, that’s rare.
00:27:47 That, to have, to both have the, at that time,
00:27:50 so like the raw.
00:27:51 Yeah, it was pretty early, yeah.
00:27:52 Oh, well, not rare.
00:27:54 Maybe you can correct me on this,
00:27:57 but it seems like in the scientific community,
00:27:59 so many people, you were basically solving the problems
00:28:02 you needed to solve to process the particular application,
00:28:07 the data that you need.
00:28:08 And to also have the mind
00:28:10 that I’m going to make this usable for others, that’s.
00:28:15 I would say I was inspired.
00:28:16 I’d been inspired by Linux,
00:28:18 been inspired by Linus and him making his code available.
00:28:21 And I was starting to use Linux at the time.
00:28:23 And I went, this is cool.
00:28:24 So I’d kind of been previously primed that way.
00:28:27 And generally I was into science
00:28:29 because I liked the sharing notion.
00:28:30 I liked the idea of, hey, let’s,
00:28:32 if collectively we build knowledge and share it,
00:28:34 we can all be better off.
00:28:35 Okay, so you want to energize by that idea.
00:28:37 So I was energized by that idea already, right?
00:28:39 And I can’t deny that I was.
00:28:40 I’m sort of had this very,
00:28:42 I liked that part of science, that part of sharing.
00:28:45 And then all of a sudden, oh, wait, here’s something.
00:28:47 And here’s something I could do.
00:28:49 And then I slowly over years learned how to share better
00:28:52 so that you could actually engage more people faster.
00:28:55 One of the key things was actually giving people a binary
00:28:57 they could install, right?
00:28:58 So that it wasn’t just your source code, good luck.
00:29:01 Compile this and then.
00:29:02 It’s compiled, ready to install, just, you know.
00:29:05 So in fact, a lot of the journey from 98,
00:29:07 even through 2012 when I started Anaconda was about that.
00:29:10 Like it’s why, you know, it’s really the key
00:29:13 as to why a scientist with dreams of doing MRI research
00:29:17 ended up starting a software company
00:29:19 that installs software.
00:29:22 I work with a few folks now that don’t program
00:29:26 like on the creative side and the video side,
00:29:28 the audio side.
00:29:29 And because my whole life is running on scripts,
00:29:32 I have to try to get them,
00:29:34 I’m having all the task of teaching them
00:29:35 how to do Python enough to run the scripts.
00:29:39 And so I’ve been actually facing this,
00:29:40 whether it’s Anaconda or some with the task of
00:29:44 how do I minimally explain basically to my mom
00:29:46 how to write a Python script.
00:29:48 And it’s an interesting challenge.
00:29:50 I have to, it’s a to do item for me to figure out like,
00:29:53 what is the minimal amount of information I have to teach?
00:29:56 What are the tools you use that one, you enjoy it,
00:29:59 two, you’re effective at it.
00:30:00 And they’re related, those are two related questions.
00:30:02 And then the debugging, like the iterative process
00:30:05 of running the script to figure out what the error is,
00:30:07 maybe even for some people to do the fix yourself.
00:30:11 So do you compile it?
00:30:12 Do you, like how do you distribute that code to them?
00:30:15 And it’s interesting because I think
00:30:18 it’s exactly what you’re talking about.
00:30:20 If you increase the circle of empathy,
00:30:24 the circle of people that are able to use your programs,
00:30:28 you increase it, it’s like effectiveness and it’s power.
00:30:32 And so you have to think, can I write scripts?
00:30:37 Can I write programs that can be used by medical engineers,
00:30:40 by all kinds of people that don’t know programming
00:30:43 and actually maybe plant a seed,
00:30:46 have them catch the bug of programming
00:30:48 so that they start on a journey.
00:30:50 That’s a huge responsibility.
00:30:51 And ultimately it has to do with the Amazon one click buy.
00:30:55 Like how frictionless can you make the early steps?
00:30:58 Frictionless is actually really key.
00:31:00 To go in any community is, any friction point,
00:31:03 you’re just gonna lose some people, right?
00:31:05 Now sometimes you may wanna intentionally do that.
00:31:09 If you’re early enough on, you need a lot of help.
00:31:11 You need people who have the skills.
00:31:13 You might actually, it’s helpful.
00:31:14 You don’t necessarily have too many users
00:31:16 as opposed to contributors if you’re early on.
00:31:20 Anyway, there’s, SciFi started in 98,
00:31:23 but it really emerged as this collection of modules
00:31:25 that I was just putting on the net.
00:31:27 People were downloading and I think I got 100 users, right?
00:31:31 By the end of that year.
00:31:32 But the fact that I got 100 users and more than that,
00:31:35 people started to email me with fixes.
00:31:39 And that was actually intoxicating, right?
00:31:41 That was the, here I’m writing papers
00:31:44 and I’m giving conferences and I get people to say hello,
00:31:46 but yeah, good job.
00:31:47 But mostly it was, you’re viewed with,
00:31:49 it’s competitive, right?
00:31:51 You publish a paper and people are like,
00:31:52 oh, it wasn’t my paper.
00:31:55 I was starting to see that sense of academic life
00:31:59 where it was so much,
00:32:00 I thought there was this cooperative effort,
00:32:01 but it sounds like we’re here just to one up each other.
00:32:04 And it’s not true across the board,
00:32:07 but a lot of that’s there.
00:32:08 But here in this world,
00:32:09 I was getting responses from people all over the world.
00:32:13 I remember Pjaro Peterson in Estonia, right?
00:32:16 Was one of the first people.
00:32:17 And he sent me back this make file,
00:32:18 cause the first thing it is, yeah, your build thing stinks
00:32:21 and here’s a better make file.
00:32:23 Now it was a complex make file.
00:32:24 I don’t think I never understood that make file actually,
00:32:26 but it worked and it did a lot more.
00:32:29 And so I said, thanks, this is cool.
00:32:30 And that was my first kind of engagement
00:32:32 with community development.
00:32:35 But the process was, he sent me a patch file.
00:32:37 I had to upload a new tar ball.
00:32:39 And I just found, I really love that.
00:32:41 And the style back then was here’s a mailing list.
00:32:43 It’s very, it wasn’t as,
00:32:45 it’s certainly weren’t the tools that are available today.
00:32:47 It was very early on, but I really started to,
00:32:49 that’s the whole year.
00:32:50 I think I did about seven packages that year, right?
00:32:54 And then by the end of the year,
00:32:55 I collected them into a thing called multipack.
00:32:57 So in 99, there was this thing called multipack.
00:32:59 And that’s when a high school student,
00:33:01 no, he was a high school student at the time,
00:33:03 guy named Robert Kern,
00:33:04 took that package and made a Windows installer, right?
00:33:09 And then of course, a massive increase of usage.
00:33:12 So by the way, most of this development was under Linux.
00:33:15 Yes, yes, it was on Linux.
00:33:17 I was a Linux developer doing it on a Unix box.
00:33:20 I mean, at the time I was actually getting into,
00:33:23 I had a new hard drive,
00:33:24 did some kernel programming to make the hard drive work.
00:33:26 I mean, not programming, but modification to the kernel
00:33:28 so I could actually get a hard drive working.
00:33:31 I love that aspect of it.
00:33:32 I was also in, at school, I was building a cluster.
00:33:36 I took Mac computers and you put yellow dog Linux on them.
00:33:40 At the Mayo Clinic, they were just,
00:33:42 they had all these Macs that were older,
00:33:43 they were just getting rid of.
00:33:44 And so I kind of got permission to go grab them together.
00:33:46 I put about 24 of them together in a cluster, in a cabinet,
00:33:50 and put yellow dog Linux on them all.
00:33:51 And I wrote a C++ program to do MRI simulation.
00:33:56 That was what I was doing at the same time
00:33:58 for my day job, so to speak.
00:34:01 So I was loving the whole process.
00:34:03 And the same time I was,
00:34:04 oh, I need a ordinary differential equation.
00:34:06 That’s why ordinary differential equations were key
00:34:08 was because that’s the heart of a block equation
00:34:09 for simulating MRI, is an ODE solver.
00:34:12 And so that’s, but I actually did that,
00:34:15 it just happened at the same time.
00:34:16 That’s why it was kind of what you’re working on
00:34:18 and what you’re interested in, they’re coinciding.
00:34:20 I was definitely scratching my own itch
00:34:22 in terms of building stuff.
00:34:24 And which helped in the sense that I was using it for me,
00:34:27 so at least I had one user.
00:34:28 I had one person who was like, well, no, this is better.
00:34:30 I like this interface better.
00:34:31 And I had the experience of MATLAB
00:34:33 to guide some of what those APIs might look like.
00:34:36 But you’re just doing yourself,
00:34:37 you’re building all this stuff.
00:34:39 But with the Windows installer,
00:34:40 it was the first time I realized, oh yeah,
00:34:41 the binary installer really helps people.
00:34:43 And so that led to spending more time
00:34:46 on that side of things.
00:34:49 So around 2000, so I graduated my PhD in 2000,
00:34:52 end of year, end of 2000.
00:34:53 So 99 doing a lot of work there,
00:34:56 98 doing a lot of work there,
00:34:57 99 kind of spending more time on my PhD,
00:35:00 helping people use the tools,
00:35:02 thinking about what do I want to go from here.
00:35:04 There was a company, there was a guy actually,
00:35:05 Eric Jones and Travis Vought.
00:35:07 They were two friends who founded a company called NTHOT.
00:35:11 It’s here in Austin, still here.
00:35:13 And they, Eric contacted me at the time
00:35:16 when I was a graduate student still.
00:35:19 And he said, hey, why don’t you come down?
00:35:20 We want to build a company.
00:35:22 We’re thinking of a scientific company
00:35:25 and we want to take what you’re doing
00:35:27 and kind of add it to some stuff that he’d done.
00:35:29 He’d written some tools.
00:35:31 And then Piero Peterson had done F2Py.
00:35:32 Let’s come together and build,
00:35:34 pull this all together and call it SciPy.
00:35:36 So that’s the origin of the SciPy brand.
00:35:39 It came from multi pack
00:35:41 and a whole bunch of modules I’d written,
00:35:42 plus a few things from some other folks
00:35:44 and then pulled together in a single installer.
00:35:47 SciPy was really a distribution of Python
00:35:49 masquerading as a library.
00:35:51 How did you think about SciPy in context of Python,
00:35:54 in context of Numeric, like what?
00:35:56 So we saw SciPy as a way to make an R&D environment
00:35:59 for Python, like use Python, depended on Numeric.
00:36:03 So Numeric was the array library we depended on.
00:36:05 And then from there, extend it with a bunch of modules
00:36:08 that allowed for, and at the time,
00:36:10 the original vision of SciPy was to have plotting,
00:36:13 was to have the REPL environment
00:36:16 and kind of really a whole data environment
00:36:19 that you could then install and get going with.
00:36:21 And that was kind of the thinking.
00:36:23 It didn’t really evolve that way, right?
00:36:25 It sort of had a, for one,
00:36:27 it’s really hard to do massive scale projects
00:36:31 with open source collectives.
00:36:34 Actually, there’s sort of an intrinsic cooperation limit
00:36:38 as to which, too many cooks in the kitchen,
00:36:40 you can do amazing infrastructure work.
00:36:42 When it comes down to bringing it all together
00:36:44 into a single deliverable,
00:36:45 that actually requires a little more product management
00:36:49 that is not, that doesn’t really emerge
00:36:52 from the same dynamic.
00:36:53 So it struggled, struggled to get almost too many voices.
00:36:57 It’s hard to have everybody agree.
00:36:59 Consensus doesn’t really work at that scale.
00:37:02 You end up with politics,
00:37:03 with the same kind of things that’s happened
00:37:05 in large organizations trying to decide
00:37:07 what to do together.
00:37:09 So consensus building was challenging at scale
00:37:12 as more people came in, right?
00:37:13 Early on, it’s fine, because there’s nobody there.
00:37:15 So it works, but then as you get more successful
00:37:17 and more people use it, all of a sudden,
00:37:18 oh, there’s this scale at which this doesn’t work anymore
00:37:22 and we have to come up with different approaches.
00:37:23 So Sidepy came out officially in 2001,
00:37:26 was the first release, most of the time.
00:37:28 I remember the days of getting that release ready.
00:37:31 It was a Windows installer and there were bugs
00:37:33 on how the Windows compiler handled complex numbers
00:37:36 and you were chasing segmentation faults.
00:37:38 And it was, it’s a lot of work.
00:37:40 There was a lot of effort had nothing to do
00:37:43 with my area of study.
00:37:45 And at the same time, I had just gotten an offer.
00:37:47 So he wondered if I wanted to come down
00:37:48 and help him start that company with his friend.
00:37:51 And at the time I was like, I was intrigued,
00:37:53 but I was squaring a path, an academic path.
00:37:56 And I had just got an offer to go and teach at my alma mater.
00:37:59 So I took that tenure track position.
00:38:02 And Sidepy, and kind of, then I started to work on Sidepy
00:38:05 as a professor too.
00:38:07 So that’s, I left, I’ve got the Mayo Clinic,
00:38:09 graduated, wrote my thesis using Sidepy,
00:38:11 wrote, you know, there’s images that were created.
00:38:15 Now the plotting tool I used was something
00:38:17 from Yorick actually.
00:38:18 It was a plotting, a PLT kind of a plotting language
00:38:21 that I used.
00:38:22 Yorick is a programming language?
00:38:23 It was a programming language, had a plotting tool,
00:38:26 Dyslin, it had integration to Dyslin.
00:38:28 I ended up using Dyslin plus some of the plotting
00:38:31 from Yorick linked to from Python.
00:38:33 Anyway, it was, people don’t plot that way now,
00:38:37 but this is before, and Sidepy was trying to add plotting.
00:38:40 Yeah. Right?
00:38:41 It didn’t have much success.
00:38:42 Really the success of plotting came from John Hunter,
00:38:45 who had a similar experience to my experience,
00:38:47 my kind of maverick experience as a person
00:38:49 just trying to get stuff done and kind of having more time
00:38:51 than money maybe, right?
00:38:53 And John Hunter created what?
00:38:55 MapPlotLib.
00:38:56 He’s the creator of MapPlotLib.
00:38:57 Yeah, so John Hunter was, you know,
00:38:59 he wasn’t a student at the time, but he was an,
00:39:00 he was working in Quant field and he said,
00:39:02 we need better plotting.
00:39:03 So he just went out and said, cool, I’ll make a new project
00:39:05 and we’ll call it MapPlotLib.
00:39:06 And he released in 2001,
00:39:08 about the same time that Sidepy came out
00:39:09 and it was separate library, separate install,
00:39:12 use numeric, Sidepy use numeric.
00:39:15 And so Sidepy, you know, in 2001, we released Sidepy
00:39:18 and then Endthought created a conference called Sidepy,
00:39:22 which was brought people together to talk about the space.
00:39:25 And that conference is still ongoing.
00:39:26 It’s one of the favorite conferences of a lot of people
00:39:28 because it’s, you know, it’s changed over the years,
00:39:30 but early on it was, you know, a collection of 50 people
00:39:33 who care about, scientists mostly, you know,
00:39:36 practicing scientists who want, who care about coding
00:39:39 and doing it well and not using MATLAB.
00:39:42 And I remember being driven by, you know, I liked MATLAB,
00:39:44 but I didn’t like the fact that,
00:39:46 so I’m not opposed to proprietary software.
00:39:48 I’m actually not an open source zealot.
00:39:50 I love open source for the, what it brings,
00:39:52 but I also see the role for proprietary software.
00:39:54 But what I didn’t like was the fact that I would develop
00:39:56 code and publish it and then effectively telling somebody
00:39:59 here to run my code, you have to have
00:40:01 this proprietary software.
00:40:02 Right, and there’s also culture around MATLAB as much,
00:40:05 because I’ve talked to a few folks in,
00:40:08 MathWorks creates MATLAB?
00:40:09 Yeah.
00:40:10 I mean, there’s just a culture, they try really hard,
00:40:13 but it just, there’s this corporate IBM style culture
00:40:16 that’s like, or whatever.
00:40:18 I don’t want to say negative things about IBM or whatever,
00:40:20 but there’s a…
00:40:22 No, it’s really that connection.
00:40:23 It’s something I’m in the middle of right now
00:40:24 is the business of open source.
00:40:27 And how do you connect the ethos of cooperative development
00:40:30 with the necessity of creating profits, right?
00:40:34 And like right now today, I’m still in the middle of that.
00:40:38 That’s actually the early days of me exploring this question.
00:40:42 Cause I was writing SciPy, I mean, as an aside,
00:40:44 I also had, so I had three kids at the time.
00:40:46 I have six kids now.
00:40:47 I got married early, wanted a family.
00:40:50 I had three kids and I remember reading,
00:40:52 I read Richard Stallman’s post and I was a fan of Stallman.
00:40:55 I would read his work, I liked this collective ideas
00:40:58 he would have.
00:40:58 Certainly the ideas on IP law, I read a lot of his stuff.
00:41:01 But then he said, okay, well,
00:41:04 how do I make money with this?
00:41:05 How do I make a living?
00:41:06 How do I pay for my kids?
00:41:07 All this stuff was in my mind,
00:41:09 young graduate student making no money,
00:41:10 thinking I got to get a job.
00:41:12 And he said, well, I think just be like me
00:41:14 and don’t have kids, right?
00:41:15 That’s just, don’t, don’t.
00:41:17 That’s his take on it.
00:41:18 That was what he said in that moment, right?
00:41:20 That’s the thing I read and I went,
00:41:22 okay, this is a train I can’t get on.
00:41:24 There has to be a way to preserve the culture
00:41:26 of open source and still be able to make sufficient money
00:41:29 to feed your kids.
00:41:30 Yes, exactly, there’s gotta be.
00:41:31 Well, so that actually led me to a study of economics.
00:41:34 Because at the time I was ignorant and I really was.
00:41:36 I’m actually, I’m embarrassed for educational system
00:41:39 that they could let me and I was valedictorian
00:41:41 in my high school class and I did super well in college.
00:41:43 And like academically I did great, right?
00:41:47 But the fact that I could do that and then be clueless
00:41:49 about this key part of life,
00:41:52 it led me to go, there’s a problem.
00:41:54 Like I should have learned this in fifth grade.
00:41:56 I should have learned this in eighth grade.
00:41:58 Like everybody should come out
00:41:59 with a basic knowledge of economics.
00:42:01 You’re an interesting example because you’ve created tools
00:42:04 that change the lives of probably millions of people
00:42:07 and the fact that you don’t understand at the time
00:42:10 of the creation of those tools, the basics economics
00:42:12 of how like to build up a giant system is the problem.
00:42:15 Yeah, it’s a problem.
00:42:16 And so during my PhD at the same time,
00:42:18 this is back in 98, 99 at the same time,
00:42:20 I was in a library, I was reading books on capitalism,
00:42:23 I was reading books on Marxism,
00:42:24 I was reading books on what is this thing?
00:42:27 What does it mean?
00:42:29 And I encountered, basically I encountered a set of writings
00:42:33 from people that said they were the inheritors of Adam Smith.
00:42:35 Read Adam Smith for the first time, right?
00:42:37 Which is the wealth of nations
00:42:38 and kind of this notion of emergent societies
00:42:42 and realized, oh, there’s this whole world out here
00:42:45 of people and the challenge of economics is also political.
00:42:49 Like, cause economics, people, different parties
00:42:53 running for office, they want their economic friends.
00:42:58 They want their economists to back them up, right?
00:43:00 Or to be their magicians, like the magicians
00:43:03 in Pharaoh’s court, right?
00:43:04 The people that are kind of say, hey, this is,
00:43:06 you should listen to me because I’ve got the expert
00:43:08 who says this.
00:43:09 And so it gets really muddled, right?
00:43:11 But I was looking at it from as a scientist going,
00:43:14 what is this space?
00:43:14 What does this mean?
00:43:15 How does Paris get fed?
00:43:16 How does, what is money?
00:43:18 How does it work?
00:43:19 And I found a lot of writings that I really loved.
00:43:21 I found some things that I really loved
00:43:22 and I learned from that.
00:43:23 It was writings from people like Von Missess.
00:43:26 He wrote a paper in 1920 that still should be read
00:43:29 more than it is.
00:43:29 It was the economic calculation problem
00:43:33 of the socialist commonwealth.
00:43:34 It was basically in response
00:43:35 to the Bolshevik revolution in 1917.
00:43:37 And his basic argument was it’s not gonna work
00:43:40 to not have private property.
00:43:41 You’re not gonna be able to come up with prices.
00:43:43 The bureaucrats aren’t gonna be able to determine
00:43:45 how to allocate resources without a price system.
00:43:47 And a price system emerges from people making trades.
00:43:51 And they can only make trades if they have authority
00:43:53 over the thing they’re trading.
00:43:55 And that creates information flow
00:43:58 that you just don’t have if you try to top down it.
00:44:01 Right.
00:44:02 And it’s like, huh, that’s a really good point.
00:44:04 Yeah, the prices have a signal that’s used.
00:44:06 And it’s important to have that signal
00:44:09 when you’re trying to build a community
00:44:11 of productive people like you would
00:44:12 in the software engineering space.
00:44:13 Yeah, the prices are actually
00:44:14 an important signaling mechanism.
00:44:17 Right, and that money is just a bartering tool.
00:44:20 Right, so this is the first time I’ve encountered
00:44:22 any of this concept, right, and the fact that,
00:44:24 oh, this is actually really critical.
00:44:26 Like it’s so critical to our prosperity
00:44:29 and that we’re dangerously not learning about this,
00:44:34 not teaching our children about this.
00:44:36 So you had the three kids,
00:44:37 you had to make some hard decisions.
00:44:38 I had to make some money, right, had to figure it out.
00:44:39 But I didn’t really care.
00:44:40 I mean, I’ve never been driven by money, just need it.
00:44:43 Yeah, right, need to eat.
00:44:45 So how did that resolve itself in terms of site buy?
00:44:49 So I would say it didn’t really resolve itself.
00:44:51 It sort of started a journey that I’m continuing on.
00:44:53 I’m still on, I would say.
00:44:54 I don’t think it resolved itself.
00:44:55 But I will say I went in eyes wide open.
00:44:59 Like I knew that there were problems
00:45:00 with giving stuff away and creating the market externalities
00:45:07 that the fact that, yeah, people might use it
00:45:09 and I might not get paid for it
00:45:10 and I’ll have to figure something else out to get paid.
00:45:13 Like at least I can say I’m not bitter
00:45:14 that a lot of people have used stuff that I’ve written
00:45:17 and I haven’t necessarily benefited economically from it.
00:45:20 I’ve heard other people be bitter about that
00:45:22 when they write or they talk.
00:45:23 Like, oh, I should’ve got more value out of this.
00:45:24 And I’m also, I want to create systems
00:45:27 that let people like me who might have these desires
00:45:31 to do things, let them benefit.
00:45:32 So it actually creates more of the same.
00:45:34 Not to turn on your bitterness module,
00:45:36 but there’s some aspect, I wish there was mechanisms for me
00:45:40 to reward whoever created side buy and non buy
00:45:43 because it brought so much joy to my life.
00:45:45 I appreciate that.
00:45:46 You know what I mean?
00:45:46 The tip dark notion was there.
00:45:48 I appreciate that.
00:45:49 But there should be a very frictionless mechanism.
00:45:51 There should be a frictionless mechanism.
00:45:52 I totally agree.
00:45:53 I would love to talk about some of the ideas I have
00:45:55 because I actually came across,
00:45:56 I think I’ve come up with some interesting notions
00:45:58 that could work, but they’ll require anything that will work
00:46:01 takes time to emerge, right?
00:46:03 Like things don’t just turn overnight.
00:46:04 That’s definitely one thing I’ve also understood
00:46:06 and learned is any fixes, that’s why it’s kind of funny.
00:46:10 We often give credit to, oh, this president gets elected
00:46:12 and oh, look how great things have done.
00:46:14 And I saw that when I had a transition in a condo
00:46:18 when a new CEO came in, right?
00:46:19 And it’s like the success that’s happening,
00:46:22 there’s an inertia there.
00:46:23 Yeah, and sometimes the decision you made
00:46:25 like 10 years before is the reason why the success is the.
00:46:28 Right, exactly.
00:46:29 So we’re sort of just running around taking credit
00:46:31 for stuff.
00:46:32 The credit assignment has like a delay to it
00:46:35 that makes the credit assignment basically wrong
00:46:38 more than right.
00:46:39 Wrong more than right, exactly.
00:46:40 And so I’m like, oh, this is, you know,
00:46:42 that’s the stuff I would read a ton about, you know,
00:46:44 early on.
00:46:45 So I don’t, I feel like I’m with you.
00:46:47 Like I want the same thing.
00:46:48 I want to be able to, and honestly, not for personally,
00:46:50 I’ve been happy.
00:46:51 I’ve been happy.
00:46:52 I feel like I don’t have any, I mean,
00:46:53 we’ve been done reasonably okay, but I’ve had to pursue it.
00:46:56 Like that’s really what started my trajectory from academia
00:47:01 is reading that stuff led me to say,
00:47:02 oh, entrepreneurship matters.
00:47:05 So I love software, but we need more entrepreneurs
00:47:09 and I wanna understand that better.
00:47:10 So once I kind of had that virus infect my brain,
00:47:16 even though I was on a trajectory
00:47:17 to go to a tenure track position at a university
00:47:20 and I was there for six years,
00:47:22 I was kind of already out the door when I started.
00:47:26 And we can get into that, but.
00:47:27 Well, can I just ask you a quick question on,
00:47:30 is there some design principles
00:47:32 that were in your mind around SciPy?
00:47:34 Like, is there some key ideas
00:47:36 that were just like sticking to you
00:47:38 that this is the fundamental ideas?
00:47:40 Yeah, I would say so.
00:47:41 I would think it’s basically accessibility to scientists,
00:47:43 like give them, give scientists and engineers tools
00:47:46 that they don’t have to think a lot about programming.
00:47:48 So give them really good building blocks,
00:47:50 give them functions that they wanna call
00:47:51 and sort of just the right length of spelling.
00:47:55 There’s one tradition in programming where it’s like,
00:47:59 make very, very long names, right?
00:48:01 And you can see it in some programming languages
00:48:03 where the names get, take half the screen.
00:48:06 And in the 4chan world, characters had to be six letters
00:48:11 early on, right?
00:48:12 And that’s way too much, too little.
00:48:14 But I was like, I liked to have names
00:48:16 that were informative but short.
00:48:18 So even though Python, well this is a different conversation,
00:48:22 but documentation is doing some work there.
00:48:25 So when you look at great scientific libraries
00:48:29 and functions, there’s a richness of documentation
00:48:32 that helps you get into the details.
00:48:34 The first glance at a function gives you the intuition
00:48:37 of all it needs to do by looking at the headers and so on.
00:48:40 But to get the depths of all the complexities involved,
00:48:43 all the options involved,
00:48:44 documentation does some of the work.
00:48:45 Documentation is essential, yeah.
00:48:47 So that was actually a, so we thought about several things.
00:48:50 One is we wanted plotting.
00:48:51 We wanted interactive environment.
00:48:53 We wanted good documentation.
00:48:54 These are things we knew, we wanted.
00:48:56 The reality is those took about 10 years to evolve, right?
00:49:00 Given the fact that we didn’t have a big budget,
00:49:02 it was all volunteer labor.
00:49:03 It was sort of, when nthought got created
00:49:06 and they started to try to find projects,
00:49:10 people would pay for pieces
00:49:11 and they were able to fund some of it.
00:49:13 Not nearly enough to keep up with what was necessary.
00:49:15 And no criticism, just simply the reality.
00:49:18 I mean, it’s hard to start a business
00:49:21 and then do consulting and then also
00:49:23 promote an open source project that’s still fairly new.
00:49:26 Cypo is fairly niche.
00:49:27 We stayed connected all while I was a student,
00:49:30 sorry, a professor.
00:49:30 I went to BYU and started to teach.
00:49:32 Electrical engineering, all the applied math courses.
00:49:35 I loved teaching single processing,
00:49:36 probability theory, electromagnetism.
00:49:39 I was, if you look at writing my professor,
00:49:40 which my kids loved to do,
00:49:42 I wasn’t, I got some bad reviews because people.
00:49:46 What was the criticism?
00:49:48 I would speak too high of a level.
00:49:50 Like I definitely had a calibration problem
00:49:52 coming out of graduate work
00:49:54 where I hate to be condescending to people.
00:49:56 Like I really have a ton of respect for people fundamentally.
00:49:59 Like my fundamental thing is I respect people.
00:50:02 Sometimes that can lead to a,
00:50:03 I was thinking they had more knowledge than they did.
00:50:07 And so I would just speak at a very high level,
00:50:10 assume they got it.
00:50:11 But they need to rise to the standard that you set.
00:50:14 I mean, that’s one of the,
00:50:15 some of the greatest teachers do that.
00:50:17 And I agree.
00:50:18 And that was kind of what was inspiring me.
00:50:19 But you also have to,
00:50:22 I cannot say I was articulate
00:50:24 with some of the greatest teachers, right?
00:50:26 I was, like one classic example,
00:50:28 when I first taught at BYU,
00:50:30 my very first class, it was overheads,
00:50:31 transparencies, overheads.
00:50:34 Before projectors were really that common,
00:50:35 I taught transparencies.
00:50:37 I’m writing my notes out.
00:50:38 I go in, room’s half dark.
00:50:40 I just blaring through these transparencies.
00:50:42 Here it is, here it is, here it is.
00:50:44 And I did give a quiz after two weeks.
00:50:47 No one knew anything.
00:50:48 Nothing I had taught had gotten anywhere.
00:50:50 And I realized, okay, I’m not, this is not working.
00:50:54 So I put away the transparencies
00:50:56 and I turned around and just started using the chalkboard.
00:50:58 And what it did is it slowed me down, right?
00:51:00 The chalkboard just slowed me down
00:51:02 and gave people time to process and to think.
00:51:04 And then that made me focus.
00:51:06 My writing wasn’t great on the chalkboard,
00:51:07 but I really love that part of like the teaching.
00:51:10 So that entered SciPy’s world in terms of,
00:51:12 we always understood that there’s a didactic aspect
00:51:14 of SciPy, kind of how do you take the knowledge
00:51:17 and then produce it?
00:51:18 The challenge we had was the scope.
00:51:21 Like ultimately SciPy was everything, right?
00:51:23 And so 2001, when it first came out,
00:51:25 people were starting to use it.
00:51:26 No, this is cool, this is a tool we actually use.
00:51:29 At the same time, 2001 timeframe,
00:51:31 there was a little bit of like the Hubble Space Telescope,
00:51:33 the folks at Hubble that started to say,
00:51:35 hey, Python, we’re gonna use Python
00:51:36 for processing images from Hubble.
00:51:38 And so Perry Greenfield was a good friend
00:51:40 in running that program.
00:51:42 And he had called me before I left WIU and said,
00:51:45 you know, we wanna do this,
00:51:47 but numeric actually has some challenges in terms of,
00:51:50 you know, it’s not, the array doesn’t have enough types.
00:51:52 We need more operations.
00:51:54 You know, broadcasting needs to be a little more settled.
00:51:56 They wanted record arrays.
00:51:57 They wanted, you know, record arrays are like a data frame,
00:52:00 but a little bit different,
00:52:02 but they wanted more structured data.
00:52:03 So he had called me even early on then,
00:52:06 and he said, you know, what,
00:52:06 would you wanna work on something to make this work?
00:52:08 And I said, yeah, I’m interested, but I’m going here,
00:52:10 and I, you know, we’ll see if I have time.
00:52:12 So in the meantime, while I was teaching
00:52:13 and SciPy was emerging, and I had a student,
00:52:15 I was constantly, while I was teaching,
00:52:16 trying to figure a way to fund this stuff.
00:52:18 So I had a graduate student, my only graduate student,
00:52:21 a Chinese fellow, Liu Hongze is his name, great guy.
00:52:26 He wrote a bunch of stuff for iterative linear algebra,
00:52:29 like got into writing some of the iterative
00:52:31 linear algebra tools that are currently there in SciPy,
00:52:34 and they’ve gotten better since,
00:52:36 but this is in 2005, kept working on SciPy,
00:52:39 but Perry has started working on a replacement
00:52:43 to numeric called NumArray.
00:52:45 And in 2004, a package called ND Image,
00:52:49 it was an image processing library
00:52:50 that was written for NumArray,
00:52:53 and it had in it a morphology tool.
00:52:55 I don’t know if you know what morphology is.
00:52:56 It’s open, dilations, closed, you know,
00:52:58 there was sort of this, as a medical imaging student,
00:53:01 I knew what it was,
00:53:02 because it was used in segmentation a lot.
00:53:04 And in fact, I’d wanted to do something like that
00:53:06 in Python, in SciPy, but just had never gotten around to it.
00:53:10 So when it came out, but it worked only on NumArray,
00:53:14 and SciPy needed numeric,
00:53:16 and so we effectively had the beginning of this split.
00:53:20 And numeric and NumArray didn’t share data,
00:53:22 they were just two, so you could have a gigabyte
00:53:24 of numeric, NumArray data, and gigabyte of numeric data,
00:53:26 and they wouldn’t share it.
00:53:27 And so you had these,
00:53:28 then you had these scientific libraries written on top.
00:53:31 I got really bugged by that.
00:53:32 I got really like, oh man, this is not good,
00:53:35 we’re not cooperating now,
00:53:36 we’re sort of redoing each other’s work,
00:53:37 and we’re just this young community.
00:53:40 So that’s what led me, even though I knew it was risky,
00:53:43 because my, you know, I was on a tenure track position,
00:53:47 2004 I got reviewed.
00:53:48 They said, hey, things are going okay,
00:53:49 you’re doing well, paper’s coming out,
00:53:51 but you’re kind of spending a lot of time
00:53:52 doing this open source stuff, maybe do a little less of that,
00:53:54 and a little more of the paper writing and grant writing,
00:53:57 which was naive, but it was definitely the thinking.
00:54:00 It still goes on.
00:54:01 Still goes on.
00:54:03 You’re basically creating a thing
00:54:05 which enables science in the 21st century.
00:54:08 Right.
00:54:09 Maybe don’t emphasize that so much in your free year tenure.
00:54:11 Right.
00:54:13 It illustrates some of the challenges.
00:54:14 Yes.
00:54:15 It does, and it’s, people mean well.
00:54:18 Yes.
00:54:19 Like, but we’ve gotten broken in a bunch of ways.
00:54:22 Certain things, programming,
00:54:23 understanding the role of software engineering,
00:54:25 programming in society is a little bit lacking.
00:54:27 Exactly.
00:54:28 Now, I was in electrical engineering position.
00:54:30 Right.
00:54:30 That’s even worse there.
00:54:33 Yeah, it was very, they were very focused,
00:54:34 and so, you know, good people, and I had a great time,
00:54:37 I loved my time, I loved my teaching,
00:54:38 I loved all the things I did there.
00:54:40 The problem was, the split was happening
00:54:42 in this community that I loved, right?
00:54:43 I saw people, and I went, oh my gosh,
00:54:45 this is gonna be, this is not great,
00:54:47 and so I happened, you know, fate,
00:54:50 I had a class I had signed up for,
00:54:52 it’s a, I was trying to build an MRI system,
00:54:54 so I had a kind of a radio, instead of a radio,
00:54:58 a digital radio class, it was a digital MRI class.
00:55:01 And I had people sign up, two people signed up,
00:55:04 then they dropped, and so I had nobody in this class.
00:55:06 So, and I didn’t have any other courses to teach,
00:55:08 and I thought, oh, I’ve got some time,
00:55:10 and I’ll just write, I’ll just write a replace,
00:55:13 a merger of Numerica Numeray.
00:55:14 Like, I’ll basically take the numeric code base
00:55:16 at the features Numeray was adding,
00:55:19 and then kind of come up with a single array library
00:55:21 that everybody can use.
00:55:22 So that’s where NumPy came from,
00:55:24 was my thinking, hey, I can do this,
00:55:26 and who else is going to?
00:55:27 Because at that point, I’d been around the community
00:55:29 long enough, and I’d written enough C code,
00:55:30 I knew, I knew the structures, and I,
00:55:33 in fact, my first contribution to numeric
00:55:35 had been writing the CAPI documentation
00:55:38 that went in the first documentation for NumPy,
00:55:41 for numeric, sorry, this is Paul DuBois,
00:55:43 David Asher, Conrad Hinson, and myself.
00:55:45 I got credit because I wrote this chapter,
00:55:47 which is all the CAPI of Numerica, all the C stuff.
00:55:51 So I said, I’m probably the one to do it,
00:55:53 and nobody else is gonna do this.
00:55:54 So it was sort of, out of a sense of duty and passion,
00:55:58 knowing that, eh, I don’t think my academic,
00:56:01 I don’t think the department here is gonna appreciate this,
00:56:03 but it’s the right thing to do.
00:56:06 It was like.
00:56:06 Can we just link on that moment?
00:56:08 Yeah, yeah.
00:56:09 Because the importance of the way you thought
00:56:11 and the action you took, I feel is understated
00:56:16 and is rare and I would love to see so much more of it
00:56:19 because what happens as the tools become more popular,
00:56:24 there’s a split that happens.
00:56:27 And it’s a truly heroic and impactful action
00:56:30 to in those early, in that early split,
00:56:33 to step up and it’s like great leaders throughout history,
00:56:37 like get, what is the brave heart,
00:56:39 like get on a horse and rile the troops
00:56:42 because I think that can have, make a big difference.
00:56:46 We have TensorFlow versus PyTorch
00:56:48 in the machine learning community.
00:56:49 We have the same problem today.
00:56:50 Yeah, I wonder.
00:56:51 It’s actually bigger.
00:56:52 I wonder if it’s possible in the early days
00:56:56 to rally the troops.
00:56:58 It is possible, especially in the early days.
00:57:00 The longer it goes, the harder, right?
00:57:01 The more energy in the factions, the harder.
00:57:03 But in the early days, it is possible
00:57:05 and it’s extremely helpful
00:57:07 and there’s a willingness there,
00:57:09 but the challenge is there’s just not a willingness
00:57:11 to fund it.
00:57:12 There’s not a willingness to, you know,
00:57:14 like I was literally walking into a field
00:57:17 saying I’m going to do this
00:57:18 and here I am, like, you know,
00:57:20 I have five kids at home now.
00:57:23 Pressure builds.
00:57:24 Sometimes my wife hears these stories
00:57:26 and she’s like, you did what?
00:57:29 I thought we were going to,
00:57:29 I thought you were actually on a path
00:57:31 to make sure we had resources and money, but,
00:57:34 but again, there’s a, there’s an aspect,
00:57:36 I’m a very hopeful person.
00:57:37 I’m an optimistic person by nature.
00:57:39 I love people.
00:57:41 I learned that about myself later on.
00:57:43 And part of my, my religious beliefs
00:57:47 actually lead to that.
00:57:48 And it’s why I hold them dear
00:57:49 because it’s actually how I feel about,
00:57:51 that’s what leads me to these attitudes,
00:57:53 sort of this hopefulness and this sense of,
00:57:55 yeah, it may not work out for me financially
00:57:58 or maybe, but that’s not the ultimate gain.
00:58:00 Like that’s a thing, but it’s not,
00:58:02 that’s not the scorecard for me.
00:58:05 And so I just wanted to be helpful
00:58:07 and I knew, and partly because these SciPy conferences,
00:58:09 because the maintenance conversations,
00:58:10 I knew there was a lot of need for this, right?
00:58:13 And so I had this, it wasn’t like I was alone
00:58:15 in terms of no feedback.
00:58:16 I had these people who knew, but it was crazy.
00:58:19 Like people who at the time said,
00:58:20 yeah, we didn’t think you’d be able to do it.
00:58:22 We thought it was crazy.
00:58:23 And also instructive, like practically speaking,
00:58:26 that you had a cool feature
00:58:28 that you were chasing the morphology, like the.
00:58:30 Yes.
00:58:31 Like it’s not just like.
00:58:32 There’s an end result.
00:58:33 It’s not some visionary thing.
00:58:35 I’m going to unite the community.
00:58:36 You were like. Correct.
00:58:38 You were actually practically,
00:58:39 this is what one person actually could do
00:58:42 and actually build.
00:58:43 Cause that is important.
00:58:44 Cause you can get over your skis.
00:58:47 You can definitely get over your skis.
00:58:49 And I had, in fact, this almost got me over my skis, right?
00:58:52 I would say, well, in retrospect, I hate looking back.
00:58:56 I can tell you all the flaws with NumPy, right?
00:58:58 When I go into it, there’s lots of stuff that I’m like,
00:59:00 oh man, that’s embarrassing.
00:59:01 That was wrong.
00:59:02 I wish I had somebody stop me with a wet fish there.
00:59:04 Like I needed, like what I’d wished I’d had
00:59:07 was somebody with more experience and certainly library
00:59:10 writing and array library.
00:59:11 There’s like, I wish I had me.
00:59:12 I could go back in time and go do this, do that.
00:59:14 There’s a more important thing.
00:59:15 Cause there’s things we did that are still there
00:59:18 that are problematic, that created challenges for later.
00:59:20 And I didn’t know it at the time.
00:59:22 Didn’t understand how important that was.
00:59:24 And in many cases, didn’t know what to do.
00:59:26 Like there was pieces of the design of NumPy.
00:59:29 I didn’t know what to do until five years ago.
00:59:31 Now I know what they should have been, Ben.
00:59:32 But I didn’t know at the time and nobody,
00:59:33 and I couldn’t get the help.
00:59:35 Anyway, so I wrote it.
00:59:36 It took about, it took four months to write
00:59:38 the first version, then about 14 months to make it usable.
00:59:43 But it was, it wasn’t, it was that first four months
00:59:45 of intense writing, coding, getting something out the door
00:59:49 that worked that was, it was, it was definitely challenging.
00:59:52 And then the big thing I did was create a new type object
00:59:54 called D type.
00:59:56 That was probably the contribution.
00:59:58 And then the fact that I added broad, not just broadcasting,
01:00:01 but advanced indexing so that you could do masked indexing
01:00:06 and indirect indexing instead of just slicing.
01:00:09 So for people who don’t know, and maybe you can elaborate,
01:00:13 NumPy, I guess the vision in the narrowest sense
01:00:17 is to have this object that represents
01:00:21 n dimensional arrays.
01:00:23 And like at any level of abstraction you want,
01:00:26 but basically it could be a black box
01:00:28 that you can investigate in ways that you would naturally
01:00:30 want to investigate such objects.
01:00:33 Yes, exactly.
01:00:34 So you could do math on it easily.
01:00:35 Math on it easily, yeah.
01:00:37 So it had an associated library of math operations
01:00:39 and effectively SciPy became an even larger operate set
01:00:43 of math operations.
01:00:44 So the key for me was I was going to write NumPy
01:00:48 and then move SciPy to depend on NumPy.
01:00:50 In fact, early on, one of the initial proposals
01:00:52 was that we would just write SciPy
01:00:54 and it would have the numeric object inside of it.
01:00:56 And it’d be SciPy.array or something.
01:00:59 That turned out to be problematic because numeric
01:01:02 already had a little mini library of linear algebra
01:01:04 and some functions, and it had enough momentum,
01:01:08 enough users that nobody wanted to,
01:01:10 they wanted backward compatibility.
01:01:12 One of the big challenges of NumPy
01:01:13 was I had to be backward compatible
01:01:14 with both numeric and NumArray
01:01:16 in order to allow both of those communities to come together.
01:01:19 There was a ton of work in creating
01:01:21 that backward compatibility
01:01:22 that also created echoes in today’s object.
01:01:25 Like some of the complexity in today’s object
01:01:27 is actually from that goal of backward compatibility
01:01:30 to these other communities,
01:01:31 which if you didn’t have that, you’d do something different,
01:01:34 which is instructive because a lot of things are there.
01:01:37 You think, what is that there for?
01:01:38 It’s like, well, it’s a remnant.
01:01:41 It’s an artifact of its historical existence.
01:01:45 By the way, I love the empathy
01:01:46 and the lack of ego behind that
01:01:48 because I feel, you see that in the split
01:01:51 in the JavaScript framework, for example,
01:01:53 the arbitrary branching.
01:01:54 Right.
01:01:56 I think in order to unite people,
01:01:59 you have to kind of put your ego aside
01:02:00 and truly listen to others.
01:02:02 You do.
01:02:03 What do you love about NumArray?
01:02:04 What do you love about Numeric?
01:02:06 Like actually get a sense,
01:02:07 we were talking about languages earlier,
01:02:08 sort of empathize to the culture,
01:02:11 the people that love something about this particular API,
01:02:14 some of the naming style
01:02:18 or the actual usage patterns
01:02:21 and truly understand them
01:02:22 and so that you can create that same draw
01:02:26 in the united thing. I completely agree.
01:02:28 I completely agree.
01:02:29 And you have to also have enough passion
01:02:31 that you’ll do it.
01:02:32 It can’t be just like a perfunctory,
01:02:34 oh yes, I’ll listen to you
01:02:36 and then I’m not really that excited about it.
01:02:38 So it really is an aspect,
01:02:39 it’s a philosophical, like there’s a philia,
01:02:42 there’s a love of esteeming of others.
01:02:44 It’s actually at the heart of what,
01:02:47 it’s sort of a life philosophy for me, right?
01:02:49 That I’m constantly pursuing and that helped,
01:02:51 absolutely helped.
01:02:52 Makes me wonder in a philosophical,
01:02:54 like looking at human civilization as one object,
01:02:57 it makes me wonder how we can copy and paste Travis’s
01:02:59 in this book.
01:03:00 Well, some aspects, maybe.
01:03:03 Some aspects, right, right, exactly.
01:03:05 Well, it’s a good question.
01:03:07 How do we teach this?
01:03:08 How do we encourage it?
01:03:09 How do we lift it?
01:03:10 Because so much of the software world,
01:03:12 it’s giant communities, right?
01:03:15 But it seems like so much is moved by,
01:03:16 like little individuals.
01:03:18 You talk about like Linus Torvalds.
01:03:21 It’s like, could you have not,
01:03:23 could you have had Linux without him?
01:03:25 Could you?
01:03:26 Yeah, Guido and Python.
01:03:28 Guido and Python.
01:03:28 Guido and Python.
01:03:29 Well, the iPy community particularly,
01:03:30 it’s like I said, we wanted to build this big thing,
01:03:32 but ultimately we didn’t.
01:03:33 What happened is we had Mavericks and champions
01:03:36 like John Hunter who created Matplotlib.
01:03:37 We had Fernando Perez who created iPython.
01:03:39 And so we sort of inspired each other,
01:03:42 but then it kind of, there’s sort of a culture
01:03:43 of this selfless giving, the stewardship mentality,
01:03:47 as opposed to ownership mentality,
01:03:49 but stewardship and community focused,
01:03:54 community focused, but intentional work.
01:03:56 Like not waiting for everybody else to do the work,
01:03:58 but you’re doing it for the benefit of others
01:04:00 and not worried about what you’re gonna get.
01:04:04 You’re not worried about the credit.
01:04:04 You’re not worried about what you’re gonna get.
01:04:05 You’re worried about, I later realized
01:04:07 that I have to worry a little about credit,
01:04:09 not because I want the credit,
01:04:10 because I want people to understand
01:04:11 what led to the results.
01:04:13 Like, I don’t, it’s not about me.
01:04:15 It’s I want to understand this is what led to the result.
01:04:17 So let’s like, I think doing,
01:04:18 and this is what had no impact on the result.
01:04:21 Like let’s promote, just like you said,
01:04:23 I want to promote the attributes
01:04:25 that help make us better off.
01:04:26 How do we make more of West McKinney?
01:04:28 Like West McKinney was critical to the success of Python
01:04:31 because of his creation of pandas,
01:04:33 which is the roots of that were all the way back
01:04:36 in numeric and num array and numpy,
01:04:40 where numpy created an array of records.
01:04:43 West started to use that almost like a data frame,
01:04:45 except it’s an array of records.
01:04:47 And data frame, the challenge is,
01:04:49 okay, if you want to augment it at another column,
01:04:52 you have to insert, you have to do all this memory movement
01:04:54 to insert a column.
01:04:55 Whereas data frames became,
01:04:57 oh, I’m going to have a loose collection of arrays.
01:05:00 So it’s a record of arrays that is a part of a data frame.
01:05:03 And we thought about that back in the memory days,
01:05:05 but West ended up doing the work to build it.
01:05:08 And then also the operations that were relevant
01:05:11 for data processing.
01:05:12 What I noticed is just that each of these little things
01:05:15 creates just another tick, another up.
01:05:17 So numpy ultimately took a little while,
01:05:19 about six months in, people started to join me,
01:05:22 Francesc Altad, Robert Kern, Charles Harris.
01:05:27 And these people are many of the unsung heroes, I would say.
01:05:30 People who are, you know,
01:05:31 they sometimes don’t get the credit they deserve
01:05:34 because they were critical both to support,
01:05:36 like, you know, it’s hard and you want,
01:05:38 you need some support, people need support.
01:05:40 And I needed just encouragement.
01:05:41 And they were helping and encouraged by contributing.
01:05:43 And once, the big thing for me was when John Hunter,
01:05:48 he had previously done kind of a simple thing
01:05:50 called numerics to kind of, you know, between numeric
01:05:52 and numerae, he had a little high level tool
01:05:55 that would just select each one for matplotlib.
01:05:57 In 2006, he finally said,
01:06:00 we’re gonna just make numpy the dependency of matplotlib.
01:06:03 As soon as he did that,
01:06:04 and I remember specifically when he did that,
01:06:06 I said, okay, we’ve done it.
01:06:07 Like, that was when I knew we had to see success.
01:06:11 Before then it was still unsure,
01:06:13 but that kind of started a roller coaster.
01:06:15 And then 2006 to 2009.
01:06:17 And then I’ve been floored by what it’s done.
01:06:20 Like, I knew it would help.
01:06:22 I had no idea how much it would help.
01:06:25 Right, so.
01:06:26 And it has to do with, again, the language thing.
01:06:28 It just, people started to think in terms of numpy.
01:06:31 Yes.
01:06:32 And that opened up a whole new way of thinking.
01:06:36 And part of the story that you kind of mentioned,
01:06:39 but maybe you can elaborate,
01:06:42 is it seems like at some point in the story,
01:06:46 Python took over science and data science.
01:06:50 Yes.
01:06:51 And bigger than that,
01:06:54 the scientific community started to think like programmers
01:07:00 or started to utilize the tools of computers to do,
01:07:04 like at a scale that wasn’t done with Fortran.
01:07:06 Like at this gigantic scale,
01:07:09 they started to open in their heart.
01:07:10 And then Python was the thing.
01:07:12 I mean, there’s a few other competitors, I guess,
01:07:14 but Python, I think, really, really took over.
01:07:16 I agree.
01:07:17 There’s a lot of stories here
01:07:18 that are kind of during this journey,
01:07:19 because this is sort of the start of this journey in 2005, 2006.
01:07:23 So my tenure committee, I applied for tenure in 2006, 2007.
01:07:28 It came back, I split the department.
01:07:29 I was very polarizing.
01:07:31 I had some huge fans
01:07:32 and then some people that said no way, right?
01:07:34 So it was very, I was a polarizing figure in the department.
01:07:36 It went all the way up to the university president.
01:07:39 Ultimately, my department chair had the sway
01:07:42 and they didn’t say no.
01:07:43 They said, come back in two years and do it again.
01:07:46 And I went, eh, at that point, I was like,
01:07:49 I mean, I had this interest in entrepreneurship,
01:07:52 this interest in not the academic circles,
01:07:56 not the, like, how do we make industry work?
01:07:59 So I do have to give credit to that exploration of economics
01:08:03 because that led me, oh, I had a lot of opinions.
01:08:06 I was actually very libertarian at the time.
01:08:09 And I still have some libertarian trends,
01:08:11 but I’m more of a, I’m more of a collectivist libertarian.
01:08:15 So you value broadly, philosophically freedom.
01:08:18 I value broadly, philosophically freedom,
01:08:20 but I also understand the power of communities,
01:08:23 like the power of collective behavior.
01:08:26 And so what’s that balance, right?
01:08:27 That makes sense.
01:08:29 So by the time I was just,
01:08:31 I gotta go out and explore this entrepreneur world.
01:08:33 So I left academia.
01:08:34 I said, no thanks, called my friend, Eric, here,
01:08:37 who had, his company was going.
01:08:39 I said, hey, could I join you and start this trend?
01:08:43 And he, at that time they were using SciFi a lot.
01:08:45 They were trying to get clients.
01:08:47 And so I came down to Texas.
01:08:48 And in Texas is where I sort of,
01:08:51 it’s my entrepreneur world, right?
01:08:53 I left academia and went to entrepreneur world in 2007.
01:08:57 So I moved here in 2007, kind of took a leap,
01:08:59 knew nothing really about business,
01:09:01 knew nothing about a lot of stuff there.
01:09:05 There’s, you know, for a long time,
01:09:06 I’ve kept some connections to a lot of academics
01:09:08 because I still value it.
01:09:10 I still love the scientific tradition.
01:09:12 I still value the essence and the soul and the heart
01:09:15 of what is possible.
01:09:17 Don’t like a lot of the administration
01:09:21 and the kind of, we can go into detail about why
01:09:24 and where and how this happens,
01:09:25 what are some of the challenges.
01:09:26 I don’t know, but I’m with you.
01:09:28 So I’m still affiliated with MIT.
01:09:31 I still love MIT because there’s magic there.
01:09:35 There’s people I talk to, like researchers, faculty,
01:09:40 in those conversations and the whiteboard
01:09:43 and just the conversation, that’s magic there.
01:09:46 All the other stuff, the administration,
01:09:48 all that kind of stuff seems to,
01:09:52 you don’t wanna say too harshly criticize
01:09:54 sort of bureaucracies, but there’s a lag
01:09:57 that seems to get in the way of the magic.
01:10:00 And I’m still have a lot of hope
01:10:03 that that can change because I don’t often see
01:10:08 that particular type of magic elsewhere in the industry.
01:10:12 So like we need that and we need that flame going.
01:10:15 And it’s the same thing as exactly as you said,
01:10:19 it has the same kind of elements
01:10:20 like the open source community does.
01:10:23 And, but then if you, like the reason I stepped away,
01:10:27 the reason I’m here, just like you did in Austin is like,
01:10:30 if I wanna build one robot, I’ll stay at MIT.
01:10:33 But if I wanna build millions and make money enough
01:10:37 to where I can explore the magic of that, then you can’t.
01:10:41 And I think that dance is…
01:10:44 That translational dance has been lost a bit, right?
01:10:47 And there’s a lot of reasons for that.
01:10:48 I’m not, I’m certainly not an expert on this stuff.
01:10:50 I can opine like anybody else,
01:10:51 but I realized that I wanted to explore entrepreneurship,
01:10:55 which I, and really figure out,
01:10:57 and it’s been a driving passion for 20 years, 25 years.
01:11:01 How do we connect capital markets and company?
01:11:06 Cause again, I fell in love with the notion of,
01:11:07 oh, profit seeking on its own is not a bad thing.
01:11:11 It’s actually a coordination mechanism
01:11:13 for allocating resources that, you know,
01:11:16 in an emergent way, right?
01:11:18 That respects everybody’s opinions, right?
01:11:20 So this is actually powerful.
01:11:21 So I say all the time, when I make a company
01:11:25 and we do something that makes profit,
01:11:27 what we’re saying is, hey,
01:11:28 we’re collecting of the world’s resources
01:11:29 and voluntarily people are asking us
01:11:31 to do something that they like.
01:11:33 And that’s a huge deal.
01:11:34 And so I really liked that energy.
01:11:36 So that’s what I came to do and to learn
01:11:37 and to try to figure out.
01:11:38 And that’s what I’ve been kind of stumbling through
01:11:40 since for the past 14 years.
01:11:40 And that’s 2007.
01:11:42 2007, yeah.
01:11:43 And so you were still working at NoPi.
01:11:44 So NoPi was just emerging.
01:11:46 Just emerging.
01:11:47 One of the things I’ve done,
01:11:49 it’s worth mentioning because it emphasizes
01:11:51 the exploratory nature of my thinking at the time.
01:11:53 I said, well, I don’t know how to fund this thing.
01:11:55 I’ve got a graduate student I’m paying for
01:11:56 and I’ve got no funding for him.
01:11:57 And I had done some fundraising from the public
01:12:00 to try to get public fundraisers in my lab.
01:12:02 I didn’t really wanna go out
01:12:03 and just do the fundraising circuit
01:12:05 the way it’s traditionally done.
01:12:06 So I wrote a book and I said, I’m gonna write a book
01:12:09 and I’m gonna charge for it.
01:12:11 It was called Guide to NoPi.
01:12:12 And so ultimately NoPi became
01:12:14 documentation driven development
01:12:15 because I basically wrote the book
01:12:17 and made sure the stuff worked or the book would work.
01:12:19 So it really helped actually make NoPi become a thing.
01:12:23 So writing that book,
01:12:25 and it’s not a page turner.
01:12:28 Guide to NoPi is not a book you pick up
01:12:29 and go, oh, this is great, over the fire.
01:12:31 But it’s where you could find the details,
01:12:33 like how’d all this work.
01:12:34 And a lot of people love that book.
01:12:36 And so a lot of people ended up,
01:12:38 so I said, look, I need to, so I’m gonna charge for it.
01:12:41 And I got some flack for that.
01:12:42 Not that much, just probably five angry messages,
01:12:45 people yelling at me saying I was a bad guy
01:12:49 for charging for this book.
01:12:51 Was one of them Richard Stallman?
01:12:53 No. Just kidding.
01:12:54 No, I haven’t really had any interaction with him personally,
01:12:56 like I said, but there were a few,
01:12:59 but actually surprisingly not.
01:13:01 There was actually a lot of people like,
01:13:02 no, it’s fine, you can charge for a book.
01:13:04 That’s no big deal.
01:13:05 We know that’s a way you can try to make money
01:13:07 around open source.
01:13:07 So what I did, I did it in an interesting way.
01:13:10 I said, well, kind of my ideas around IP law and stuff.
01:13:14 I love the idea you can share something, you can spread it.
01:13:16 Like once it’s, the fact that you have a thing
01:13:18 and copying is free, but the creation is not free.
01:13:21 So how do you fund the creation and allow the copying?
01:13:25 And in software, it’s a little more complicated than that
01:13:27 because creation is actually a continuous thing.
01:13:29 It’s not like you build a widget and it’s done.
01:13:31 It’s sort of a process of emerging
01:13:32 and continuing to create.
01:13:34 But I wrote the book
01:13:35 and had this market determined price thing.
01:13:37 I said, look, I need, I think I said 250,000.
01:13:40 If I make 250,000 from this book, I’ll make it free.
01:13:44 So as soon as I get that much money,
01:13:45 or I said five years, so there’s a time limit.
01:13:48 Like it’s not forever.
01:13:49 That’s really cool.
01:13:50 It’s amazing.
01:13:51 I released it on this.
01:13:53 And it’s actually interesting
01:13:54 because one of the people
01:13:55 who also thought that was interesting
01:13:57 ended up being Chris White,
01:13:58 who was the director of DARPA project
01:14:01 that we got funding through at Anaconda.
01:14:02 And the reason he even called us back
01:14:04 is because he remembered my name from this book
01:14:06 and he thought that was interesting.
01:14:08 And so even though we hadn’t gone to the demo days,
01:14:10 we applied and the people said, yeah,
01:14:12 nobody ever gets this without coming to the demo day first.
01:14:15 This is the first time I’ve seen it.
01:14:16 But it’s because I knew, you know,
01:14:18 Chris had done this and had this interaction.
01:14:19 So it did have impact.
01:14:21 I was actually really, really pleased by the result.
01:14:23 I mean, I ended up in three years, I made 90,000.
01:14:27 So sold 30,000 copies by myself.
01:14:29 I just put it up on, you know, use PayPal and sold it.
01:14:33 And that was my first taste of kind of, okay,
01:14:36 this can work to some degree.
01:14:37 And I, you know, all over the world, right?
01:14:40 From Germany to Japan to, it was actually, it did work.
01:14:44 And so I appreciated the fact that PayPal existed
01:14:47 and I had a way to get the money, the distribution was simple.
01:14:51 This is pre Amazon book stuff.
01:14:53 So it was just publishing a website.
01:14:55 It was the popularity of SciPy emerging
01:14:57 and getting company usage.
01:14:58 I ended up not letting it go the five years
01:15:00 and not trying to make the full amount
01:15:01 because, you know, a year and a half later,
01:15:04 I was at Enthought.
01:15:05 I had left academia as an Enthought
01:15:06 and I kind of had a full time job.
01:15:07 And then actually what happened is the documentation people,
01:15:10 there’s a group that said, hey,
01:15:10 we want to do documentation for SciPy as a collective.
01:15:14 And they’re essentially needing the stuff in the book, right?
01:15:18 And so they kind of ask,
01:15:20 hey, could we just use the stuff in your book?
01:15:21 And at that point I said, yeah, I’ll just open it up.
01:15:24 So that’s, but it has served its purpose.
01:15:27 And the money that I made actually funded my grad student.
01:15:31 Like it was actually, you know,
01:15:32 I paid him 25,000 a year out of that money.
01:15:35 So the funny thing is if you do a very similar
01:15:37 kind of experiment now with NumPy or something like it,
01:15:40 you could probably make a lot more.
01:15:42 It’s probably true.
01:15:43 Because of the tooling and the community building.
01:15:46 Yeah, I agree.
01:15:47 Like the, and social media,
01:15:48 that there’s just a virality to that kind of idea.
01:15:51 I agree.
01:15:52 There’d be things to do.
01:15:53 I’ve thought about that.
01:15:54 And really I thought about a couple of books
01:15:56 or a couple of things that could be done there.
01:15:57 And I just haven’t, right?
01:15:58 Even, I tried to hire a ghostwriter this year too
01:16:01 to see if that could help, but it didn’t.
01:16:04 But part of my problem is this,
01:16:06 I’ve been so excited by a number of things
01:16:08 that have stemmed from that.
01:16:09 Like, so I came here, worked at Enthought for four years,
01:16:13 graciously, Eric made me president.
01:16:14 Then we started to work closely together.
01:16:16 We actually helped him buy out his partner.
01:16:19 It didn’t end great.
01:16:20 Like unfortunately Eric and I aren’t real,
01:16:22 aren’t friends now.
01:16:24 I still respect him.
01:16:25 I have a lot, I wish we were,
01:16:26 but he didn’t like the fact that Peter and I
01:16:30 started Anaconda, right?
01:16:31 That was not, I mean, so there’s two sides to that story.
01:16:36 So I’m not gonna go into it, right?
01:16:37 Sure.
01:16:38 But you, as human beings
01:16:40 and you wish you still could be friends.
01:16:42 I do, I do.
01:16:43 It saddens me.
01:16:45 I mean, that’s a story of great minds
01:16:49 building great companies.
01:16:51 Somehow it’s sad that when there’s that kind of.
01:16:55 And I hold him in esteem.
01:16:57 I’m grateful for him.
01:16:58 I think Enthought still exists.
01:17:00 They’re doing great work helping scientists.
01:17:02 They still run the SciPy conference.
01:17:05 They have an R&D platform they’re selling now
01:17:07 that’s a tool that you can go get today, right?
01:17:10 So Enthought has played a role in the SciPy
01:17:14 in supporting the community around SciPy, I would say.
01:17:18 They ended up not being able to,
01:17:20 they ended up building a tool suite
01:17:22 to write GUI applications.
01:17:24 Like that’s where they could actually make
01:17:25 that the business could work.
01:17:26 And so supporting SciPy and NumPy itself
01:17:29 wasn’t as possible.
01:17:30 Like they didn’t, they tried.
01:17:31 I mean, it was not just because,
01:17:33 it was just because of the business aspect.
01:17:34 So, and I wanted to build a company that could do,
01:17:36 that could get venture funding, right?
01:17:39 Better for worse.
01:17:39 I mean, that’s a longer story.
01:17:41 We could talk a lot about that, but.
01:17:42 And that’s where Anaconda came to be.
01:17:44 That’s where Anaconda came to be.
01:17:45 So let me ask you, it’s a little bit for fun
01:17:48 because you built this amazing thing.
01:17:50 And so let’s talk about like an old warrior
01:17:54 looking over old battles.
01:17:57 You’ve, you know, there’s a sad letter in 2012
01:18:01 that you wrote to the NumPy mailing list
01:18:04 announcing that you’re leaving NumPy.
01:18:06 And some of the things you’ve listed
01:18:08 as some of the things you regret
01:18:10 or not regret necessarily, but some things to think about.
01:18:14 If you could go back and you could fix stuff about NumPy
01:18:17 or both sort of in a personal level,
01:18:20 but also like looking forward,
01:18:21 what kind of things would you like to see changed?
01:18:24 Good question.
01:18:25 So I think there’s technical questions
01:18:26 and social questions right there.
01:18:29 First of all, you know, I wrote NumPy as a service
01:18:33 and I spent a lot of time doing it.
01:18:35 And then other people came help make it happen.
01:18:36 NumPy succeeded because the work of a lot of people, right?
01:18:39 So it’s important to understand that.
01:18:42 I’m grateful for the opportunity,
01:18:43 the role I had, I could play
01:18:45 and grateful that things I did had an impact,
01:18:47 but they only had the impact they had
01:18:49 because the other people that came to the story.
01:18:52 And so they were essential,
01:18:53 but the way data types were handled,
01:18:55 the way data types, we had array scalers, for example,
01:18:59 that are really just a substitute for a type concept, right?
01:19:04 So we had array scalers or actual Python objects
01:19:06 so that there’s for every, for a 32 bit float
01:19:09 or a 16 bit float or a 16 bit integer,
01:19:13 Python doesn’t have a natural,
01:19:14 it’s just one integer, there’s one float.
01:19:17 Well, what about these lower precision types,
01:19:19 these larger precision types?
01:19:21 So we had them in NumPy
01:19:23 so that you could have a collection of them,
01:19:25 but then have an object in Python that was one of them.
01:19:28 And there’s questions about like in retrospect,
01:19:31 I wouldn’t have created those
01:19:32 if it improved the type system.
01:19:34 And like made the type system actually a Python type system
01:19:38 as opposed to currently,
01:19:39 it’s a Python one level type system.
01:19:41 I don’t know if you know the difference
01:19:42 between Python one, Python two,
01:19:43 it’s kind of technical, kind of depth,
01:19:44 but Python two, one of its big things that Guido did,
01:19:47 it was really brilliant.
01:19:48 It was the actually Python one,
01:19:51 all classes, new objects were one.
01:19:55 If you as a user wrote a class,
01:19:56 it was an instance of a single Python type
01:19:59 called the class type, right?
01:20:02 In Python two, he used a meta typing hook
01:20:06 to actually go, oh, we can extend this
01:20:07 and have users write classes that are new types.
01:20:10 So he was able to have your user classes be actual types
01:20:13 and the Python type system got a lot more rich.
01:20:16 I barely understood that at the time that NumPy was written.
01:20:19 And so I essentially in NumPy created a type system
01:20:22 that was Python one era.
01:20:24 It was every D type is an instance of the same type
01:20:29 as opposed to having new D types be really just Python types
01:20:33 with additional metadata.
01:20:34 What’s the cost of that?
01:20:35 Is it efficiency, is it usability?
01:20:37 It’s usability primarily.
01:20:38 The cost isn’t really efficiency.
01:20:40 It’s the fact that it’s clumsy to create new types.
01:20:45 It’s hard.
01:20:45 And then one of the challenges,
01:20:47 you wanna create new types.
01:20:48 You wanna quaternion type or you wanna add a new posit type
01:20:52 or you wanna, so it’s hard.
01:20:55 And now, if we had done that well,
01:20:59 when Numba came on the scene
01:21:00 where we could actually compile Python code,
01:21:02 it would integrate with that type system much cleaner.
01:21:05 And now all of a sudden you could do gradual typing
01:21:08 more easily.
01:21:08 You could actually have Python when you add Numba
01:21:10 plus better typing, could actually be a,
01:21:14 you’d smooth out a lot of rough edges.
01:21:16 But there’s already, there’s like,
01:21:18 but are you talking about from the perspective
01:21:20 of developers within NumPy or users of NumPy?
01:21:23 Developers of new, not really users of NumPy so much.
01:21:27 It’s the development of NumPy.
01:21:28 So you’re thinking about like how to design NumPy
01:21:32 so that it’s contributors.
01:21:33 Yeah, the contributors, it’s easier.
01:21:35 It’s easier.
01:21:36 It’s less work to make it better and to keep it maintained.
01:21:39 And where that’s impacted things, for example,
01:21:41 is the GPU.
01:21:43 Like all of a sudden GPUs start getting added
01:21:45 and we don’t have them in NumPy.
01:21:48 Like NumPy should just work on GPUs.
01:21:50 The fact that we’d have to download a whole other object
01:21:52 called Kupy to have arrays on GPUs
01:21:54 is just an artifact of history.
01:21:57 Like there’s no fundamental reason for it.
01:21:59 Well, that’s really interesting.
01:22:00 If we could sort of go on that tangent briefly
01:22:02 is you have PyTorch and other libraries like TensorFlow
01:22:07 that basically tried to mimic NumPy.
01:22:11 Like you’ve created a sort of platonic form
01:22:15 of multi dimension. Basically, yeah.
01:22:16 Yeah, exactly.
01:22:17 Well, and the problem was I didn’t realize that.
01:22:19 Platonic form has a lot of edges.
01:22:21 They’re like, well, we should cut those out
01:22:23 before we present it.
01:22:24 So I wonder if you can comment,
01:22:26 is there like a difference between their implementations?
01:22:29 Do you wish that they were all using NumPy
01:22:31 or like in this abstraction of GPU?
01:22:34 And sorry to interrupt that there’s GPUs, ASICs.
01:22:38 There might be other neuromorphic computing.
01:22:40 There might be other kind of,
01:22:41 or the aliens will come with a new kind of computer.
01:22:43 Like an abstraction that NumPy should just operate nicely
01:22:47 over the things that are more and more
01:22:50 and smarter and smarter with this multi dimensional arrays.
01:22:54 Yeah, yeah.
01:22:55 There’s several comments there.
01:22:56 We are working on something now called data dash APIs.org.
01:23:00 Data dash API.org, you can go there today.
01:23:02 And it’s our answer.
01:23:04 It’s my answer.
01:23:05 It’s not just me.
01:23:06 It’s me and Rolf and Athen and Aaron
01:23:09 and a lot of companies are helping us at Quansight Labs.
01:23:13 It’s not unifying all the arrays.
01:23:14 It’s creating an API that is unified.
01:23:17 So we do care about this
01:23:19 and we’re trying to work through it.
01:23:21 I actually had the chance to go and meet
01:23:22 with the TensorFlow team and the PyTorch team
01:23:25 and talk to them after exiting Anaconda.
01:23:29 Just talking about,
01:23:29 because the first year after leaving Anaconda in 2018,
01:23:33 I became deeply aware of this and realized that,
01:23:36 oh, this split in the array community that exists today
01:23:38 makes what I was concerned about in 2005 pretty parochial.
01:23:44 It’s a lot worse, right?
01:23:45 Now there’s a lot more people.
01:23:47 So perhaps the industry can sustain more stacks, right?
01:23:51 There’s a lot of money,
01:23:52 but it makes it a lot less efficient.
01:23:54 I mean, but I’ve also learned to appreciate,
01:23:56 it’s okay to have some competition.
01:23:58 It’s okay to have different implementations,
01:24:00 but it’s better if you can at least refactor some parts.
01:24:03 I mean, you’re gonna be more efficient
01:24:04 if you can refactor parts.
01:24:07 It’s nice to have competition over things,
01:24:09 over what is nice to have competition.
01:24:11 They’re innovative.
01:24:12 Yeah, innovative.
01:24:13 And then maybe on the infrastructure,
01:24:15 whatever, however you define infrastructure,
01:24:18 that maybe it’s nice to have come together.
01:24:21 Exactly, I agree.
01:24:22 And I think, but it was interesting to hear the stories.
01:24:24 I mean, TensorFlow came out of a C++ library,
01:24:29 Jeff Dean wrote, I think,
01:24:30 that was basically how they were doing inference, right?
01:24:33 And then they realized, oh,
01:24:34 we could do this TensorFlow thing.
01:24:36 That C++ library, then what was interesting to me
01:24:38 was the fact that both Google and Facebook did not,
01:24:42 it’s not like they supported Python or NumPy initially.
01:24:44 They just realized they had to.
01:24:47 They came to this world and then all the users were like,
01:24:48 hey, where’s the NumPy interface?
01:24:50 Oh, and then they kind of came late to it
01:24:52 and then they had these bolt ons.
01:24:54 TensorFlow’s bolt on, I don’t mean to offend,
01:24:57 but it was so bad.
01:24:58 Yeah, it was bad.
01:24:59 It’s the first time that I’m usually,
01:25:01 I mean, one of the challenges I have
01:25:04 is I don’t criticize enough in the sense
01:25:07 that I don’t give people input enough, you know, if.
01:25:09 I think it’s universally agreed upon
01:25:11 that the bolt ons on TensorFlow were.
01:25:13 But I went to, it was a talk given at Mallorca in Spain
01:25:17 and a great guy came and gave a talk and I said,
01:25:19 you should never show that API again
01:25:21 at a PyData conference.
01:25:23 Like that was, that’s terrible.
01:25:24 Like you’re taking this beautiful system we’ve created
01:25:27 and like you’re corrupting all these poor Python people,
01:25:29 forcing them to write code like that
01:25:30 or thinking they should.
01:25:32 Fortunately, you know, they adopted Keras as their,
01:25:35 and Keras is better.
01:25:36 And so Keras, TensorFlow is fine, is reasonable,
01:25:40 but they bolted it on.
01:25:42 Facebook did too.
01:25:43 Like Facebook had their own C++ library for doing inference
01:25:48 and they also had the same reaction, they had to do this.
01:25:51 One big difference is Facebook,
01:25:52 maybe because of the way it’s situated in part of fair,
01:25:55 part of the research library,
01:25:56 TensorFlow is definitely used and, you know,
01:25:58 they have to make, they couldn’t just open it up
01:26:00 and let the community, you know, change what that is.
01:26:03 Cause I guess they were worried
01:26:04 about disrupting their operations.
01:26:06 Facebook’s been much more open to having community input
01:26:10 on the structure itself.
01:26:12 Whereas Google and TensorFlow,
01:26:14 they’re really eager to have community users,
01:26:16 people use it and build the infrastructure,
01:26:17 but it’s much more walled.
01:26:18 Like it’s harder to become a contributor to TensorFlow.
01:26:21 And it’s also, this is very difficult question to answer
01:26:24 and don’t mean to be throwing shade at anybody,
01:26:27 but you have to wonder, it’s the Microsoft question
01:26:30 of when you have a tool like PyTorch or TensorFlow,
01:26:33 how much are you tending to the hackers
01:26:36 and how much are you tending to the big corporate clients?
01:26:39 Correct.
01:26:40 So like the ones that,
01:26:42 do you tend to the millions of people
01:26:44 that are giving you almost no money,
01:26:46 or do you tend to the few
01:26:48 that are giving you a ton of money?
01:26:50 I tend to stand with the people.
01:26:54 Right.
01:26:54 Cause I feel like if you nurture the hackers,
01:26:57 you will make the right decisions in the longterm
01:27:00 that will make the companies happy.
01:27:02 I lean that way too.
01:27:03 I totally agree.
01:27:04 But then you have to find the right dance.
01:27:05 But it’s a balance.
01:27:07 Cause you can lean to the hackers and run out of money.
01:27:08 Yeah, exactly.
01:27:10 Exactly.
01:27:11 Which has been some of the challenge I’ve faced
01:27:13 in the sense that,
01:27:14 like I would look at some of the experiments,
01:27:17 like NumPy, the fact that we have this split
01:27:19 is a factor of I wasn’t able to collect more money
01:27:21 towards NumPy development.
01:27:22 Yeah.
01:27:23 Right?
01:27:24 I mean, I didn’t succeed in the early days
01:27:26 of getting enough financial contribution to NumPy
01:27:29 so that they could work on it.
01:27:31 Right?
01:27:31 I couldn’t work on it full time.
01:27:32 I had to just catch an hour here, an hour there.
01:27:35 And I basically not liked that.
01:27:37 Like I’ve wanted to be able to do something about that
01:27:39 for a long time and try to figure out how,
01:27:41 well, there’s lots of ways.
01:27:42 I mean, possibly one could say,
01:27:44 we had an offer from Microsoft
01:27:46 at early days of Anaconda.
01:27:48 2014, they offered to come buy us, right?
01:27:51 The problem was the right people at Microsoft
01:27:52 didn’t offer to buy us.
01:27:53 And they were still,
01:27:54 they were, it was really a,
01:27:56 we were like a second,
01:27:58 they had really bought, they just bought R,
01:27:59 the R company called,
01:28:01 it was not R studio,
01:28:02 but it was another R company that was emergent.
01:28:05 And it was kind of a,
01:28:07 well, we should also get a Python play,
01:28:09 but they were really doubling down on R.
01:28:11 Right?
01:28:12 And so it was like,
01:28:13 it was where you would go to die.
01:28:14 So it’s not, it wasn’t,
01:28:15 it was before Satya was there.
01:28:17 Satya had just started.
01:28:18 Just started.
01:28:19 Right?
01:28:20 And the offer was coming from someone
01:28:21 two levels down from him.
01:28:23 Got you.
01:28:23 Right?
01:28:24 And if it had come from Scott Guthrie,
01:28:26 so I got a chance to meet Scott Guthrie,
01:28:28 great guy, I like him.
01:28:29 If an offer had come from him,
01:28:31 probably would be at Microsoft right now.
01:28:33 That’d be fascinating.
01:28:34 That would be really nice actually,
01:28:36 especially given what Microsoft has since done
01:28:38 for the open source community and all those things.
01:28:40 Yes, I think they’re doing well.
01:28:41 I really like some of the stuff they’ve been doing.
01:28:43 They’re still working,
01:28:45 and they’ve, you know,
01:28:46 they’ve hired Guido now,
01:28:46 and they’ve hired a lot of Python developers.
01:28:47 Wait, Guido’s not at Microsoft?
01:28:49 Yeah, he works at Microsoft.
01:28:50 I need to.
01:28:52 Which, he retired,
01:28:53 then he came out of retirement,
01:28:54 and he’s working now.
01:28:55 I was just talking to him,
01:28:56 and he didn’t mention this person.
01:28:57 Well.
01:28:58 I should investigate this further.
01:29:01 Well.
01:29:02 Because I know he loved Dropbox,
01:29:02 but I wasn’t sure what he was doing,
01:29:04 who he was up to.
01:29:05 Well, he was kind of saying he’d retire,
01:29:06 but, and it’s literally been five years
01:29:09 since I last sat down and really talked to Guido.
01:29:12 Right?
01:29:13 Guido’s a technology expert, right?
01:29:16 He’s a, so I came,
01:29:17 I was excited because I’d finally figured out
01:29:18 the type system for NumPy.
01:29:20 I wanted to kind of talk about that with him,
01:29:22 and I kind of overwhelmed him.
01:29:23 Could you stay in that,
01:29:25 just for a brief moment,
01:29:26 because you’re a fascinating person
01:29:28 in the history of programming.
01:29:29 He is a fascinating person.
01:29:31 What have you learned from Guido
01:29:34 about programming, about life?
01:29:37 Yeah, yeah.
01:29:38 A lot, actually.
01:29:39 I’ve been a fan of Guido’s.
01:29:40 You know, we have a chance to talk.
01:29:42 Some, I wouldn’t say, you know,
01:29:43 we talk all the time.
01:29:44 Not at all.
01:29:45 He may, but we talk enough to,
01:29:47 I respect his,
01:29:48 in fact, when I first started NumPy,
01:29:49 one of the first things I did was I had a,
01:29:51 I asked Guido for a meeting
01:29:53 with him and Paul Dubois in San Mateo.
01:29:55 And I went and met him for lunch.
01:29:56 And basically, to say,
01:29:58 maybe we can actually,
01:29:59 part of the strategy for NumPy
01:30:00 was to get it into Python 3,
01:30:02 and maybe be part of Python.
01:30:04 And so we talked about that.
01:30:05 That’s a cool conversation.
01:30:06 And about that approach, right?
01:30:06 I would have loved to be a flyer in the water.
01:30:09 That was good.
01:30:10 And over the years for Guido,
01:30:12 I learned,
01:30:13 so he was open.
01:30:14 Like, he was willing to listen to people’s ideas.
01:30:18 Right?
01:30:19 And over the years,
01:30:19 now generally, you know,
01:30:20 I’m not saying universally that’s been true,
01:30:22 but generally that’s been true.
01:30:24 So he’s willing to listen.
01:30:25 He’s willing to defer.
01:30:27 Like on the scientific side,
01:30:28 he would just kind of defer.
01:30:29 He didn’t really always understand
01:30:30 what we were doing.
01:30:31 Yeah.
01:30:31 And he’d defer.
01:30:32 One place where he didn’t enough
01:30:35 was we missed a matrix multiply operator.
01:30:37 Like that finally got added to Python,
01:30:39 but about 10 years later than it should have.
01:30:42 But the reason was because nobody,
01:30:44 it takes a lot of effort.
01:30:46 And I learned this while I was writing NumPy.
01:30:48 I also wrote tools to Python.
01:30:49 I began with Python Dev,
01:30:50 and I added some pieces to Python.
01:30:52 Like the memory view object.
01:30:53 I wanted the structure of NumPy into Python.
01:30:55 So we didn’t get NumPy into Python,
01:30:56 but we got the basic structure of it into Python.
01:30:59 Like, so you could build on it.
01:31:01 Nobody did for a while,
01:31:01 but eventually database authors started to.
01:31:04 And it’s a lot better.
01:31:05 They did.
01:31:06 And also Antoine Petrou and Stefan Krah
01:31:08 actually fixed the memory view object.
01:31:10 Cause I wrote the underlying infrastructure in C,
01:31:13 but the Python exposure was terrible
01:31:15 until they came in and fixed it.
01:31:16 Partly because I was writing NumPy,
01:31:18 and NumPy was the Python exposure.
01:31:19 I didn’t really care about
01:31:21 if you didn’t have NumPy installed.
01:31:22 Anyway, Guido opened up ideas,
01:31:25 technologically brilliant.
01:31:27 Like really, I really got a lot of respect for him
01:31:29 when I saw what he did
01:31:30 with this type class merger thing.
01:31:33 It was actually tricky, right?
01:31:35 And then willing to share, willing to share his ideas.
01:31:38 So the other thing early on in 1998,
01:31:40 I said, I wrote my first extension module.
01:31:42 The reason I could is because he’d written this blog post
01:31:44 on how to do reference counting, right?
01:31:47 And without it, I would have been lost, right?
01:31:50 But he was willing to at least try to write this post.
01:31:53 And so he’s been motivated early on with Python.
01:31:56 There’s a computer science for everybody.
01:31:58 You kind of have this early on desire to,
01:31:59 oh, maybe we should be pushing programming to more people.
01:32:02 So he had this populist notion, I guess,
01:32:04 or populist sense to learn that there’s a certain skill,
01:32:08 and I’ve seen it in other people too,
01:32:10 of engaging with contributors sufficiently to,
01:32:13 because when somebody engaged with you
01:32:15 and wants to contribute to you,
01:32:16 if you ignore them, they go away.
01:32:18 So building that early contributor base
01:32:19 requires real engagement with other people.
01:32:23 And he would do that.
01:32:24 Can you also comment on this tragic stepping down
01:32:29 from his position as the benevolent dictator for life
01:32:32 over the wars, you know?
01:32:35 The Walrus operator?
01:32:36 The Walrus operator was the last battle.
01:32:39 I don’t know if that’s the cause of it,
01:32:40 but there’s this, for people who don’t know,
01:32:43 you can look up, there’s the Walrus operator,
01:32:45 which looks like a colon and equal sign.
01:32:49 Yeah, colon, equal sign.
01:32:50 And it actually does maybe the thing
01:32:54 that an equal sign should be doing.
01:32:57 Yeah, maybe, right, exactly.
01:33:00 But it’s just historically,
01:33:02 equal sign means something else.
01:33:03 It just means assignment.
01:33:05 So he stepped down over this.
01:33:07 What do you think about the pressure of leadership?
01:33:10 It’s something that, you mentioned the letter I wrote
01:33:12 in NumPy at the time.
01:33:13 That was a hard time, actually.
01:33:15 I mean, there’s been really hard times.
01:33:17 It was hard.
01:33:19 You get criticized, right?
01:33:20 And you get pushed, and you get,
01:33:22 not everybody loves what you do.
01:33:23 Like anytime you do anything that has impact at all,
01:33:26 you’re not universally loved, right?
01:33:28 You get some real critics.
01:33:29 And that’s an important energy,
01:33:31 because it’s impossible for you to do everything right.
01:33:35 You need people to be pushing.
01:33:37 But sometimes people can get mean, right?
01:33:39 People can, I prefer to give people the benefit of the doubt.
01:33:43 I don’t immediately assume they have bad intentions.
01:33:45 And maybe for other, maybe that doesn’t happen for everybody.
01:33:49 For whatever reason, their past,
01:33:50 their experiences with people, they sometimes have bad,
01:33:53 so they immediately attribute to you bad intentions.
01:33:54 So you’re like, where did this come from?
01:33:56 I mean, I’m definitely open to criticism,
01:33:57 but I think you’re misinterpreting the whole point.
01:34:00 Because I would get that, certainly when I started Anaconda.
01:34:05 Sometimes I say to people,
01:34:08 I care enough about entrepreneurship
01:34:09 to make some open source people uncomfortable.
01:34:12 And I care enough about open source
01:34:13 to make investors uncomfortable.
01:34:15 So I sort of, you create kind of doubters on both sides.
01:34:19 So when you have, and this is just a plea
01:34:23 to the listener and the public, I’ve noticed this too,
01:34:27 that there’s a tendency, and social media makes this worse,
01:34:32 when you don’t have perfect information about the situation,
01:34:35 you tend to fill the gaps with the worst possible,
01:34:39 or at least a bad story that fills those gaps.
01:34:43 And I think it’s good to live life,
01:34:46 maybe not fully naively, but filling in the gaps
01:34:49 with the good, with the best, with the positive,
01:34:54 with the hopeful explanation of why you see this.
01:34:57 So if you see somebody like you trying to make money
01:35:00 on a book about an umpire,
01:35:01 there’s a million stories around that that are positive.
01:35:04 And those are good to think about,
01:35:07 to project positive intent on the people.
01:35:10 Because for many reasons, usually because people are good
01:35:13 and they do have good intent.
01:35:15 And also when you project that positive intent,
01:35:17 people will step up to that too.
01:35:19 Yes.
01:35:20 It’s a great point.
01:35:21 It has this kind of viral nature to it.
01:35:24 And of course with Twitter, early on figured out,
01:35:27 and Facebook is that they can make a lot of money
01:35:30 and engagement from the negative.
01:35:32 Yes.
01:35:33 So there’s this, we’re fighting this mechanism.
01:35:35 I agree.
01:35:36 Which is challenging.
01:35:37 It’s easier.
01:35:37 It’s just easier to be.
01:35:38 To be negative.
01:35:39 And then for some reason, something in our minds
01:35:41 really enjoys sharing that and getting all excited
01:35:45 about the negativity.
01:35:46 We do, yeah.
01:35:47 Some protective mechanism perhaps that we’re gonna get eaten
01:35:50 if we don’t, yeah.
01:35:51 Exactly.
01:35:52 For us to be effective as a group of people
01:35:53 in a software engineering project,
01:35:54 you have to project positive intent, I think.
01:35:56 I totally agree.
01:35:57 Totally agree.
01:35:58 And I think that’s very,
01:35:59 and so that happens in this space.
01:36:01 But Python has done a reasonable job in the past,
01:36:03 but here is a situation where I think it started
01:36:05 to get this pressure where it didn’t.
01:36:07 I really didn’t, I didn’t know enough about what happened.
01:36:10 I’ve talked to several people about it.
01:36:12 And I know most of the steering committee members today,
01:36:15 one person nominated me for that role,
01:36:17 but it’s the wrong role for me right now, right?
01:36:20 I have a lot of respect for the Python developer space
01:36:24 and the Python developers.
01:36:25 I also understand the gap between computer science
01:36:27 Python developers and array programming developers
01:36:30 or science developers.
01:36:31 And in fact, Python succeeds in the array space
01:36:34 the more it has people in that boundary.
01:36:36 And there’s often very few.
01:36:37 Like I was playing a role in that boundary
01:36:39 and working like everything to try to keep up
01:36:42 with even what Guido was saying, like I’m a C programmer,
01:36:47 but not a computer scientist.
01:36:49 Like I was an engineer and physicist and mathematician,
01:36:52 and I didn’t always understand
01:36:54 what they were talking about
01:36:56 and why they would have opinions the way they did.
01:36:58 So, you know, you have to listen and try to understand.
01:37:00 Then you also have to explain your point of view
01:37:02 in a way they can understand.
01:37:03 And that takes a lot of work.
01:37:04 And that communication is always the challenge.
01:37:07 And it’s just what we’re describing here
01:37:09 about the negativity is just another form of that.
01:37:11 Like how do we come together?
01:37:12 And it does appear we’re wired anyway
01:37:14 to at least have a, there’s a part of us
01:37:16 that will enemy, you know, friend, enemy.
01:37:18 And we see, yeah, it’s like,
01:37:21 why are we wiring on the enemy front?
01:37:23 So why are we pushing that?
01:37:24 Why are we promoting that so deeply?
01:37:26 Assume friend until proven otherwise.
01:37:28 Yes, yes.
01:37:30 So, cause you have such a fascinating mind in all of this.
01:37:32 Let me just ask you these questions.
01:37:34 So one interesting side on the Python history
01:37:38 is the move from Python two to Python three.
01:37:41 You mentioned move from Python one to Python two,
01:37:43 but the move from Python two to Python three
01:37:46 is a little bit interesting
01:37:47 because it took a very long time.
01:37:50 It broke, you know, quite a small way
01:37:53 backward compatibility, but even that small way
01:37:56 seemed to have been very painful for people.
01:37:58 Is there lessons you draw?
01:38:00 Oh man, tons of lessons.
01:38:01 From how long it took and how painful it seemed to be?
01:38:05 Yeah, tons of lessons.
01:38:07 Well, I mentioned here earlier
01:38:08 that NumPy was written in 2005.
01:38:11 It was in 2005 that I actually went to Guido
01:38:15 to talk about getting NumPy into Python three.
01:38:17 Like my strategy was to,
01:38:18 oh, we were moving to Python three.
01:38:19 Let’s have that be, and it seems funny in retrospect
01:38:22 because like, wait, Python three,
01:38:23 that was in 2020, right?
01:38:25 When we finally ended the support for Python two
01:38:27 or at least 2017.
01:38:29 The reason it took a long time,
01:38:30 a lot of time, I think it was because one of the things is
01:38:33 there wasn’t much to like about Python three.
01:38:36 3.0, 3.1, it really wasn’t until 3.3.
01:38:40 Like I consider Python 3.3 to be Python 3.0.
01:38:43 But it wasn’t until Python 3.3
01:38:44 that I felt there’s enough stuff in it
01:38:47 to make it worth anybody using it, right?
01:38:49 And then 3.4 started to be, oh yeah, I want that.
01:38:52 And then 3.5 as the matrix multiply operator,
01:38:54 and now it’s like, okay, we gotta use that.
01:38:56 Plus the libraries that started leveraging
01:38:58 some of the features of Python three.
01:38:59 Exactly.
01:39:00 So it really, the challenge was it was,
01:39:03 but it also illustrated a truism that, you know,
01:39:07 when you have inertia,
01:39:08 when you have a group of people using something,
01:39:10 it’s really hard to move them away from it.
01:39:11 You can’t just change the world on them.
01:39:13 And Python three, you know, made some,
01:39:15 I think it fixed some things Guido had always hated.
01:39:17 I don’t think he didn’t like the fact
01:39:18 that print was a statement.
01:39:19 He wanted to make it a function.
01:39:20 But in some sense, that’s a bit of gratuitous change
01:39:23 to the language.
01:39:24 And you could argue, and people have,
01:39:27 but one of the challenges was there wasn’t enough features
01:39:31 and too many just changes without features.
01:39:34 And so the empathy for the end user
01:39:37 as to why they would switch wasn’t there.
01:39:40 I think also it illustrated just the funding realities.
01:39:42 Like Python wasn’t funded.
01:39:45 Like it was also a project
01:39:46 with a bunch of volunteer labor, right?
01:39:48 It had more people, so more volunteer labor,
01:39:50 but it was still, it was fun in the sense
01:39:52 that at least Guido had a job.
01:39:53 And I’ve learned some of the behind the scenes on that now
01:39:55 since talking to people who have lived through it
01:39:57 and maybe not on air, we can talk about some of that.
01:40:00 But it’s interesting to see, but Guido had a job,
01:40:03 but his full time job wasn’t just work on Python.
01:40:07 Like he had other things to do.
01:40:08 Just wild.
01:40:09 It is wild, isn’t it?
01:40:10 It’s wild how few people are funded.
01:40:13 Yes.
01:40:14 And how much impact they have.
01:40:15 Yes.
01:40:16 Maybe that’s a feature not a bug, I don’t know.
01:40:17 Maybe, yes, exactly.
01:40:19 At least early on, like it’s sort of, I know, yeah.
01:40:21 It’s like Olympic athletes are often severely underfunded,
01:40:25 but maybe that’s what brings out the greatness.
01:40:27 Perhaps, yes, correct.
01:40:28 No, exactly.
01:40:29 Maybe this is the essential part of it.
01:40:31 Because I do think about that in terms of,
01:40:33 I currently have an incubator for open source startups.
01:40:36 Like what I’m trying to do right now
01:40:37 is create the environment I wished had existed
01:40:40 when I was leaving academia with NumPy
01:40:42 and trying to figure out what to do.
01:40:44 I’m trying to create those opportunities and environments.
01:40:46 So, and that’s what drives me still,
01:40:49 is how do I make the world easier
01:40:50 for the open source entrepreneur?
01:40:52 So let me stay, I mean, I could probably stay on NumPy
01:40:55 for a long time, but this is fun question.
01:41:00 So Andre Kapathy leads the Tesla Autopilot team,
01:41:04 and he’s also one of the most like legit programmers I know.
01:41:10 It’s like he builds stuff from scratch a lot,
01:41:13 and that’s how he builds intuition about how a problem works.
01:41:16 He just builds it from scratch, and I always love that.
01:41:18 And the primary language he uses is Python
01:41:21 for the intuition building.
01:41:23 But he posted something on Twitter saying
01:41:27 that they got a significant improvement
01:41:31 on some aspect of their like data loading, I think,
01:41:35 by switching away from np.square root,
01:41:39 so the NumPy’s implementation of square root,
01:41:42 to math.square root, and then somebody else commented
01:41:44 that you can get even a much greater improvement
01:41:48 by using the vanilla Python square root, which is like.
01:41:52 Power 0.5.
01:41:53 Power 0.5.
01:41:55 And it’s fascinating to me, I just wanted to.
01:41:58 So that was some shade throwing at some.
01:42:02 No, no, and yes, we’re talking about.
01:42:04 It’s a good way to ask the trade off
01:42:08 between usability and efficiency broadly in NumPy,
01:42:12 but also on these specific weird quirks
01:42:14 of like a single function.
01:42:16 Yep, so on that point, if you use a NumPy math function
01:42:21 on a scaler, it’s gonna be slower
01:42:25 than using a Python function on that scaler.
01:42:27 But because the math object in NumPy is more complicated,
01:42:33 because you can also call that math object on an array.
01:42:36 And so effectively, it goes through a similar machine.
01:42:39 There aren’t enough of the, which you would do
01:42:41 and you could do like checks and fast paths.
01:42:45 So yeah, if you’re basically doing a list,
01:42:48 if you run over a list, in fact,
01:42:50 for problems that are less than 1,000,
01:42:53 even maybe 10,000 is probably the,
01:42:55 if you’re going more than 10,000,
01:42:56 that’s where you definitely need to be using arrays.
01:42:59 But if you’re less than that, and for reading,
01:43:01 if you’re doing a reading process
01:43:02 and essentially it’s not compute bound, it’s IO bound.
01:43:05 And so you’re really taking lists of 1,000 at a time
01:43:08 and doing work on it.
01:43:09 Yeah, you could be faster just using Python,
01:43:11 straight up Python.
01:43:12 See, but also, and this is the side to the top,
01:43:16 there’s the fundamental questions
01:43:18 when you look at the long arc of history,
01:43:21 it’s very possible that np.square root is much faster.
01:43:25 It could be.
01:43:26 So like in terms of like, don’t worry about it,
01:43:29 it’s the evils of over optimization or whatever,
01:43:32 all the different quotes around that,
01:43:34 is sometimes obsessing about this particular little quark
01:43:39 is not sufficient.
01:43:41 For somebody like, if you’re trying to optimize your path,
01:43:45 I mean, I agree, premature optimization
01:43:47 creates all kinds of challenges, right?
01:43:49 Because now, but you may have to do it.
01:43:51 I believe the quote is, it’s the root of all evil.
01:43:53 It’s the root of all evil, right?
01:43:55 Let’s give Donald Knuth, I think,
01:43:57 or is he more than somebody else?
01:43:59 Well, Doc Knuth is kind of like Mark Twain,
01:44:00 people just attribute stuff to him, I don’t know.
01:44:02 And it’s fine because he’s brilliant.
01:44:04 So, no, I was a LaTeX user myself,
01:44:07 and so I have a lot of respect,
01:44:09 and he did more than that, of course,
01:44:10 but yeah, someone I really appreciate
01:44:14 in the computer science space.
01:44:15 Yeah, I don’t, I think that’s appropriate.
01:44:17 There’s a lot of little things like that,
01:44:18 where people actually, if you understood it,
01:44:20 you go, yeah, of course, that’s the case.
01:44:22 And the other part, the other part I didn’t mention,
01:44:25 and Numba was a thing we wrote early on,
01:44:27 and I was really excited by Numba
01:44:29 because it’s something we wanted,
01:44:30 it was a compiler for Python syntax,
01:44:32 and I wanted it from the beginning of writing NumPy
01:44:35 because of this function question,
01:44:38 like taking, the power of arrays
01:44:41 is really that you can write functions using all of it.
01:44:45 It has implicit looping, right?
01:44:47 So you don’t worry about,
01:44:47 I write this n dimensional for loop
01:44:49 with four loops, four, four statements.
01:44:51 You just say, oh, big four dimensional array,
01:44:53 I’m gonna do this operation, this plus, this minus,
01:44:55 this reduction, and you get this,
01:44:57 it’s called vectorization in other areas,
01:44:59 but you can basically think at a high level
01:45:01 and get massive amounts of computation done
01:45:03 with the added benefit of,
01:45:06 oh, it can be paralyzed easily.
01:45:08 It can be put in parallel.
01:45:09 You don’t have to think about that.
01:45:10 In fact, it’s worse to go decompose your,
01:45:12 you write the for loops
01:45:14 and then try to infer parallelism from for loops.
01:45:16 That’s actually a harder problem
01:45:17 than to take the array problem
01:45:19 and just automatically parallelize that problem.
01:45:22 That’s what, and so functions in NumPy
01:45:25 are called universal functions, ufuncs.
01:45:27 So square root is an example of a ufunk.
01:45:29 There are others, sine, cosine, add, subtract.
01:45:32 In fact, one of the first libraries to SciPy
01:45:34 was something called Special
01:45:35 where I added Bessel functions
01:45:36 and all these special functions that come up in physics
01:45:40 and I added them as ufuncs so they could work on arrays.
01:45:43 So I understood ufuncs very, very well
01:45:44 from day one inside of numeric.
01:45:45 That was one of the things we tried to make better
01:45:47 in NumPy was how do they work?
01:45:49 Can they do broadcasting?
01:45:50 What does broadcasting mean?
01:45:51 But one of the problems is, okay,
01:45:54 what do I do with a Python scaler?
01:45:57 So what happens, the Python scaler gets broadcast
01:45:59 to a zero dimensional array
01:46:01 and then it goes through the whole same machinery
01:46:02 as if it were a 10,000 dimensional array.
01:46:05 And then it kind of unpacks the element
01:46:07 and then does the addition.
01:46:09 That’s not to mention the function it calls
01:46:12 in the case of square root
01:46:13 is just the clib square root, right?
01:46:15 In some cases, like Python’s power,
01:46:18 there’s some optimizations they’re doing
01:46:20 that could be faster
01:46:21 than just calling this the clib square root.
01:46:23 In the interpreter or in the?
01:46:25 No, in the C code, in the Python runtime.
01:46:27 In the Python runtime, so they really optimize it
01:46:30 and they have the freedom to do that
01:46:32 because they don’t have to worry about.
01:46:32 It’s just a scaler.
01:46:34 It’s just a scaler.
01:46:34 Right, they don’t have to worry about the fact
01:46:36 that, oh, this could be an object with many pieces.
01:46:39 The ufunc machine is also generic
01:46:41 in sense that typecasting and broadcasting,
01:46:44 broadcasting’s idea of I’m gonna go,
01:46:46 I have a zero dimensional array,
01:46:47 I have a scaler with a four dimensional array
01:46:49 and I add them.
01:46:50 Oh, I have to kind of coerce the shape of this guy
01:46:54 to make it work against the whole four dimensional array.
01:46:56 So it’s the idea of I can do a one dimensional array
01:46:59 against a two dimensional array and have it make sense.
01:47:02 Well, that’s what NumPy does is it challenges you
01:47:04 to reformulate, rethink your problem
01:47:07 as a multi dimensional array problem
01:47:09 versus move away from scalers completely.
01:47:12 Right, exactly, exactly.
01:47:14 In fact, that’s where some of the edge cases boundaries are
01:47:16 is that, well, they’re still there
01:47:18 and this is where array scalers are particular.
01:47:21 So array scalers are particularly bad
01:47:23 in the sense that they were written
01:47:24 so that you could optimize the math on them,
01:47:26 but that hasn’t happened.
01:47:29 And so their default is to coerce the array scaler
01:47:32 to a zero dimensional array
01:47:33 and then use the NumPy machinery.
01:47:36 That’s what, and you could specialize,
01:47:38 but it doesn’t happen all the time.
01:47:39 So in fact, when we first wrote Numba,
01:47:41 we do comparisons and say, look, it’s 1000X speed up.
01:47:45 We were lying a little bit in the sense that,
01:47:47 well, first do the 40X slowdown
01:47:50 of using the array scalers inside of a loop.
01:47:52 Cause if you used to use Python scalers,
01:47:53 you’d already be 10 times faster.
01:47:56 But then we would get a hundred times faster
01:47:58 over that using just compilation.
01:48:00 But what we do is compile the loop
01:48:01 from out of the interpreter to machine code.
01:48:04 And then that’s always been the power of Python
01:48:06 is this extensibility so that you can,
01:48:08 cause people say, oh, Python’s so slow.
01:48:09 Well, sure, if you do all your logic
01:48:11 in the runtime of the Python interpreter, yeah.
01:48:13 But the power is that you don’t have to.
01:48:15 You write all the logic,
01:48:17 what you do in the high level is just high level logic.
01:48:19 And the actual calls you’re making
01:48:21 could be on gigabyte arrays of data.
01:48:24 And that’s all done at compiled speeds.
01:48:26 And the fact that integration is one can happen,
01:48:30 but two is separable.
01:48:32 That’s one of the, the language like Julia says,
01:48:35 we’re going to be all in one.
01:48:36 You can do all of it together.
01:48:37 And then there’s, the jury’s out, is that possible?
01:48:39 I tend to think that you’re going to,
01:48:41 there’s separate concerns there.
01:48:43 You want to precompile.
01:48:44 In fact, generally you will want to precompile your,
01:48:47 some of your loops.
01:48:48 Like SciPy is a compilation step.
01:48:50 To install SciPy, it takes about two hours.
01:48:53 If you have many machines,
01:48:54 maybe you can get it down to one hour.
01:48:55 But to compile those libraries takes about, takes a while.
01:48:57 You don’t want to do that at runtime.
01:48:59 You don’t want to do that all the time.
01:49:00 You want to have this precompiled binary available
01:49:02 that you’re then just linking into.
01:49:04 So there’s real questions about the whole source code.
01:49:09 Code is, running binary code is more than source code.
01:49:11 It’s creating object code, it’s the linker, it’s the loader,
01:49:14 it’s the how does that interpret it
01:49:15 inside of virtual memory space.
01:49:17 There’s a lot of details there that actually
01:49:19 I didn’t understand for a long time
01:49:20 until I read books on the topic.
01:49:23 And it led to, the more you know, the better off you are
01:49:27 and you can do more details,
01:49:28 but sometimes it helps with abstractions too.
01:49:31 Well, the problem, as we mentioned earlier
01:49:33 with abstractions is you kind of sometimes assume
01:49:37 that whoever implemented this thing
01:49:41 had your case in mind and found the optimal solution.
01:49:45 Yes.
01:49:45 Or like you assume certain things.
01:49:47 I mean, there’s a lot of,
01:49:48 Correct.
01:49:49 One of the really powerful things to me early on,
01:49:52 I mean, it sounds silly to say, but with Python,
01:49:55 probably one of the reasons I fell in love with it
01:49:58 is dictionaries.
01:49:59 Yes.
01:50:00 So obviously probably most languages
01:50:03 have some mapping concept,
01:50:06 but it felt like it was a first class citizen
01:50:09 and it was just my brain was able to think in dictionaries.
01:50:12 But then there’s the thing that I guess I still use
01:50:14 to this day is order dictionaries
01:50:16 because that seems like a more natural way
01:50:20 to construct dictionaries.
01:50:21 Yeah.
01:50:22 And from a computer science perspective,
01:50:23 the running time cost is not that significant,
01:50:26 but there’s a lot of things to understand about dictionaries
01:50:30 that the abstraction kind of
01:50:33 doesn’t necessarily incentivize you to understand.
01:50:37 Right, do you really understand the notion of a hash map
01:50:39 and how the dictionary is implemented?
01:50:41 But you’re right.
01:50:42 Dictionaries are a good example
01:50:43 of an abstraction that’s powerful.
01:50:44 And I agree with you.
01:50:46 I agree, I love dictionaries too.
01:50:47 Took me a while to understand that once you do,
01:50:49 you realize, oh, they’re everywhere.
01:50:50 And Python uses them everywhere too.
01:50:52 Like it’s actually constructed,
01:50:54 one of the foundational things is dictionaries
01:50:55 and it does everything with dictionaries.
01:50:57 So it is, it’s powerful.
01:50:58 Order dictionaries came later,
01:51:00 but it is very, very powerful.
01:51:02 It took me a little while coming
01:51:03 from just the array programming entirely
01:51:05 to understand these other objects,
01:51:07 like dictionaries and lists and tuples and binary trees.
01:51:11 Like I said, I wasn’t a computer scientist,
01:51:13 I studied arrays first.
01:51:15 And so I was very array centric.
01:51:16 And you realize, oh, these others
01:51:17 don’t have purposes and value actually.
01:51:21 I agree.
01:51:22 There’s a friendliness about,
01:51:24 like one way to think about arrays
01:51:26 is arrays are just like full of numbers,
01:51:31 but to make them accessible to humans
01:51:35 and make them less error prone to human users,
01:51:38 sometimes you want to attach names,
01:51:41 human interpretable names
01:51:43 that are sticky to those arrays.
01:51:44 So that’s how you start to think about dictionaries
01:51:47 is you start to convert numbers
01:51:50 into something that’s human interpretable.
01:51:52 And that’s actually the tension I’ve had with NumPy
01:51:55 because I’ve built so much tooling
01:51:58 around human interpretability
01:52:02 and also protecting me from a year later
01:52:05 not making the mistakes by being,
01:52:07 I wanted to force myself to use English versus numbers.
01:52:12 Yes, so there’s a project called Labeled Arrays.
01:52:15 Like very early it was recognized that,
01:52:18 oh, we’re indexing NumPy with just numbers,
01:52:21 all the columns and particularly the dimensions.
01:52:23 I mean, if you have an image,
01:52:25 you don’t necessarily need to label each column or row,
01:52:27 but if you have a lot of images
01:52:29 or you have another dimension,
01:52:30 you’d at least like to label the dimension
01:52:31 as this is X, this is Y, this is Z,
01:52:33 or this is give us some human meaning
01:52:34 or some domain specific meaning.
01:52:36 That was one of the impetuses for Pandas actually
01:52:39 was just, oh, we do need to label these things.
01:52:43 And Label Array was an attempt to add
01:52:45 that like a lighter weight version of that.
01:52:47 And there’s been, like, that’s an example of something
01:52:49 I think NumPy could add, could be added to NumPy,
01:52:53 but one of the challenges again, how do you fund this?
01:52:55 Like I said, one of the tragedies I think is that,
01:52:58 so I never had the chance to,
01:53:00 I was never paid to work on NumPy, right?
01:53:02 So I’ve always just done it in my spare time,
01:53:04 always taken from one thing,
01:53:05 taken from another thing to do it.
01:53:07 And at the time, I mean, today,
01:53:09 it would be the wrong day and today,
01:53:11 like paying me to work on NumPy now
01:53:12 would not be a good use of effort,
01:53:13 but we are finally at Quansight Labs,
01:53:16 I’m actually paying people to work on NumPy and SciPy,
01:53:19 which is I’m thrilled with, I’m excited by.
01:53:22 I’ve wanted to do that.
01:53:22 That’s what I always wanted to do from day one.
01:53:24 It just took me a while to figure out a mechanism to do that.
01:53:27 Even like in the university setting,
01:53:29 respecting that, like pushing students,
01:53:33 young minds and young graduate students to contribute
01:53:38 and then figuring out financial mechanisms
01:53:41 that enable them to contribute
01:53:43 and then sort of reward them
01:53:45 for their innovative scientific journey,
01:53:48 that would be nice.
01:53:49 But then also just a better allocation of resources.
01:53:53 It’s 20 year anniversary since 9.11
01:53:55 and I was just looking, we spent over $6 trillion
01:53:59 in the Middle East after 9.11 in the various efforts there.
01:54:04 And sort of to put politics and all that aside,
01:54:08 it’s just, you think about the education system,
01:54:10 all the other ways we could have
01:54:11 possibly allocated that money.
01:54:14 To me, to take it back,
01:54:16 the amount of impact you would have
01:54:21 by allocating a little bit of money to the programmers
01:54:26 that build the tools that run the world is fascinating.
01:54:30 It is.
01:54:32 I don’t know, I think, again,
01:54:34 there is some aspect to being broke
01:54:38 as somewhat of a feature, not a bug,
01:54:40 that you make sure that you’re valued.
01:54:42 But you can still manage that.
01:54:43 Right, no, I know.
01:54:45 But I don’t think that’s a big part.
01:54:47 So it’s like, I think you can have enough money
01:54:50 and actually be wealthy while maintaining your values.
01:54:53 Agreed, agreed.
01:54:55 There’s an old adage that nations that trade together
01:54:57 don’t go to war together.
01:54:59 I’ve often thought about nations that code together.
01:55:01 Yeah, code together.
01:55:02 Right?
01:55:03 I love that.
01:55:04 Because one of the things I love about open source
01:55:05 is it’s global, it’s multinational.
01:55:07 Like there aren’t national boundaries.
01:55:09 One of the challenges with business and open source
01:55:10 is the fact that, well, business is national.
01:55:12 Like businesses are entities
01:55:13 that are recognized in legal jurisdictions, right?
01:55:16 And have laws that are respected in those jurisdictions
01:55:18 and hiring, and yet the open source ecosystem
01:55:21 is not, it’s not there.
01:55:23 Like currently, one of the problems we’re solving
01:55:25 is hiring people all over the world, right?
01:55:27 Because we, it’s a global effort.
01:55:29 And I’ve had the chance to work, and I’ve loved the chance.
01:55:31 I’ve never been to like Iran,
01:55:35 but I once had a conference
01:55:36 where I was able to talk to people there, right?
01:55:38 And talk to folks in Pakistan.
01:55:40 I’ve never been there, but we had a call
01:55:44 where there were people there,
01:55:45 like just scientists and normal people.
01:55:47 And there’s a certain amount of humanizing, right?
01:55:52 That gets away from the,
01:55:54 like we often get the memes of society
01:55:56 that bubble up and get discussed,
01:55:58 but the memes are not even an accurate reflection
01:56:00 of the reality of what people are.
01:56:02 Well, if you look at the major power centers
01:56:05 that are leading to something like cyber war
01:56:08 in the next few decades,
01:56:10 it’s the United States, it’s Russia, and China.
01:56:13 And those three countries in particular
01:56:16 have incredible developers.
01:56:18 So if they work together, I think that’s one way,
01:56:21 the politicians can do their stupid bickering,
01:56:23 but like there’s a layer of infrastructure, of humanity.
01:56:27 If they collaborate together,
01:56:29 that I think can prevent major military conflict,
01:56:34 which would, I think most likely happen at the cyber level
01:56:37 versus the actual hot war level.
01:56:39 You’re right.
01:56:40 You know, I think that’s a good prediction.
01:56:43 Nations that code together don’t go to war together.
01:56:46 Don’t go to war together.
01:56:47 That’s a hope, right?
01:56:48 That’s one of the philosophical hopes, but yeah.
01:56:52 So you mentioned the project of Numba,
01:56:55 which is fascinating.
01:56:58 So from the early days,
01:56:59 there was kind of a pushback on Python that it’s not fast.
01:57:04 You know, you see C plus,
01:57:05 if you wanna write something that’s fast,
01:57:06 you use C plus plus.
01:57:08 If you wanna write something that’s usable and friendly,
01:57:11 but slow, you use Python.
01:57:13 And so what is Numba?
01:57:15 What is its goal?
01:57:16 How does it work?
01:57:17 Great, yeah.
01:57:18 Yes, that’s what the argument.
01:57:19 And the reality was people would write high level coding
01:57:22 and use compiled code,
01:57:23 but there’s still user stories, use cases,
01:57:25 where you want to write Python,
01:57:27 but then have it still be fast.
01:57:28 You still need to write a for loop.
01:57:30 Like before Numba, it was always don’t write a for loop.
01:57:33 You know, write it in a vectorized way,
01:57:35 you know, put it in an array.
01:57:37 And often that can make a memory trade off.
01:57:39 Like quite often you can do it,
01:57:41 but then you make maybe use more memory
01:57:42 because you have to build this array of data
01:57:44 that you don’t necessarily need all the time.
01:57:46 So Numba was, it started from a desire to have
01:57:50 kind of a vectorized that worked.
01:57:52 A vectorized was a tool in NumPy, it was released.
01:57:56 You give it a Python function
01:57:57 and it gave you a universal function,
01:57:59 a ufunc that would work on arrays.
01:58:01 So you get the function that just worked on a scaler.
01:58:03 Like you could make a,
01:58:04 like the classic case was a simple function
01:58:07 that an if then statement in it.
01:58:08 So sine X over X function, sync function.
01:58:12 If X equals zero, return one, otherwise do sine X over X.
01:58:16 The challenge is you don’t want that loop
01:58:17 peg one in Python.
01:58:18 So you want a compiled version of that,
01:58:21 but the ufunc, the vectorized in NumPy
01:58:23 would just give you a Python function.
01:58:24 So it would take the array of numbers
01:58:26 and at every call do a loop back into Python.
01:58:29 So it was very slow.
01:58:30 It gave you the appearance of a ufunc,
01:58:31 but it was very slow.
01:58:32 So I always wanted a vectorized
01:58:34 that would take that Python scaler function
01:58:36 and produce a ufunc working on binary native code.
01:58:39 So in fact, I had somebody work on that with PyPy
01:58:42 and see if PyPy could be used to produce a ufunc like that
01:58:45 early on in 2009 or something like that, 2010.
01:58:50 They didn’t work that well.
01:58:51 It was kind of pretty bulky.
01:58:52 But in 2012, Peter and I had just started Anaconda.
01:58:57 We had, I just, I’d learned to raise money.
01:59:00 That’s a different topic,
01:59:01 but I’d learned to raise money from friends, family,
01:59:04 and fools, as they say.
01:59:05 And.
01:59:06 That’s a good line.
01:59:09 Oh, that’s a good line.
01:59:11 But, so we were trying to do something.
01:59:13 We were trying to change the world.
01:59:14 Peter and I are super ambitious.
01:59:15 We wanted to make array computing
01:59:17 and we had ideas for really what’s still,
01:59:19 it’s still the energy right now.
01:59:20 How do you do at scale data science?
01:59:23 And we had a bunch of ideas there, but one of them,
01:59:25 I had just talked to people about LLVM
01:59:27 and I was like, there’s a way to do this.
01:59:30 I just, I went, I heard about my friend Dave Beasley
01:59:32 at a compiler course.
01:59:33 So I was looking at compilers like,
01:59:35 and I realized, oh, this is what you do.
01:59:37 And so I wrote a version of Numba
01:59:40 that just basically mapped Python bytecode to LLVM.
01:59:45 Nice.
01:59:46 Right, so, and the first version is like, this works
01:59:49 and it produces code that’s fast.
01:59:50 This is cool for, you know,
01:59:51 obviously a reduced subset of Python.
01:59:53 I didn’t support all the Python language.
01:59:55 There had been efforts to speed up Python in the past,
01:59:57 but those efforts were, I would say,
01:59:59 not from the array computing perspective,
02:00:00 not from the perspective of wanting to produce
02:00:02 a vectorized improvement.
02:00:03 They were from the perspective of speeding up
02:00:05 the runtime of Python, which is fundamentally hard
02:00:07 because Python allows for some constructs
02:00:10 that aren’t, you can’t speed up.
02:00:12 Like it’s this generic, you know, when it does this variable.
02:00:15 So I, from the start, did not try to replicate
02:00:17 Python’s semantics entirely.
02:00:20 I said, I’m gonna take a subset of the Python syntax
02:00:23 and let people write syntax in Python,
02:00:25 but it’s kind of a new language really.
02:00:27 So it’s almost like four loops, like focusing on four loops.
02:00:30 Four loops, scalar arithmetic, you know, typed,
02:00:34 you know, really typed language, a typed subset.
02:00:38 That was the key.
02:00:39 So, but we wanted to add inference of types.
02:00:41 So you didn’t have to spell all the types out
02:00:43 because when you call a function,
02:00:45 so Python is typed, it’s just dynamically typed.
02:00:48 So you don’t tell it what the types are,
02:00:49 but when it runs, every time an object runs,
02:00:52 there’s a type for the variables.
02:00:53 You know what it is.
02:00:54 And so that was the design goals of Numba
02:00:56 were to make it possible to write functions
02:00:59 that could be compiled and have them used for NumPy arrays.
02:01:03 Like they needed to support NumPy arrays.
02:01:05 And so how does it work?
02:01:07 Do you add a comment within Python that tells it to do,
02:01:10 like how do you help out the compiler?
02:01:11 Yeah, so there isn’t much actually.
02:01:15 You don’t, it’s kind of magical in the sense
02:01:17 that it just looks at the type of the objects
02:01:19 and then it’s typed inference to determine
02:01:21 any other variables it needs.
02:01:23 And then it was also, because we had a use case
02:01:26 that could work early.
02:01:28 Like one of the challenges of any kind of new development
02:01:30 is if you have something that to make it work,
02:01:32 it was gonna take you a long time,
02:01:34 it’s really hard to get out off the ground.
02:01:35 If you have a project where there’s some incremental story,
02:01:39 it can start working today and solve a problem,
02:01:42 then you can start getting it out there, getting feedback.
02:01:44 Because Numba today, now Numba is nine years old today,
02:01:48 the first two, three versions were not great, right?
02:01:52 But they solved a problem and some people could try it
02:01:54 and we could get some feedback on it.
02:01:55 Not great in that it was very focused.
02:01:57 Very fragile, the subset it would actually compile
02:02:02 was small and so if you wrote Python code
02:02:04 and said, so the way it worked is you write a function
02:02:06 and you say at JIT, use decorators.
02:02:09 So decorators, just these little constructs
02:02:11 let you decorate code with an at and then a name.
02:02:15 The at JIT would take your Python function
02:02:17 and actually just compile it and replace the Python function
02:02:20 with another function that interacts
02:02:23 with this compiled function.
02:02:24 And it would just do that and we went from Python bytecode
02:02:28 then we went to AST.
02:02:29 I mean, writing compilers actually,
02:02:31 I learned a lot about why computer science
02:02:32 is taught the way it is because compilers
02:02:35 can be hard to write.
02:02:36 They use tree structures, they use all the concepts
02:02:39 of computer science that are needed.
02:02:40 It’s actually hard to, it’s easy to write a compiler
02:02:44 and then have it be spaghetti code.
02:02:46 Like the passes become challenging
02:02:47 and we ended up with three versions of Numba, right?
02:02:49 Numba got written three times.
02:02:51 What programming language is Numba written in?
02:02:55 Python.
02:02:56 Wait, okay.
02:02:57 Yeah, Python.
02:02:58 So.
02:03:00 Really?
02:03:00 That’s fascinating.
02:03:01 Yeah, so Python, but then the whole goal of Numba
02:03:03 is to translate Python bytecode to LLVM.
02:03:07 And so LLVM actually does the code generation.
02:03:09 In fact, a lot of times they’d say,
02:03:10 yeah, it’s super easy to write a compiler
02:03:12 if you’re not writing the parser nor the code generator.
02:03:15 Right?
02:03:16 So for people who don’t know, LLVM is a compiler itself.
02:03:19 So your compiler.
02:03:20 Yeah, it’s really badly named low level virtual machine,
02:03:22 which that part of it is not used.
02:03:24 It’s really low level.
02:03:25 Chris, he doesn’t mean that.
02:03:26 Yeah, love Chris.
02:03:29 But the name makes you imply that the virtual machine
02:03:31 is what it’s all about.
02:03:32 It’s actually the IR and the library,
02:03:34 the code generation.
02:03:36 That’s the real beauty of it.
02:03:37 The fact that, what I love about LLVM
02:03:39 was the fact that it was a plateau you could collaborate on.
02:03:43 Right?
02:03:44 Instead of the internals of GCC
02:03:45 or the internals of the Intel compiler,
02:03:47 or like how do I extend that?
02:03:49 And it was a place we could collaborate.
02:03:51 And we were early.
02:03:52 I mean, people had started before.
02:03:54 It’s a slow compiler.
02:03:55 Like it’s not a fast compiler.
02:03:56 So for some kind of JITs,
02:03:59 like JITs are common in language
02:04:01 because one, every browser has a JavaScript JIT.
02:04:04 It does real time compilation
02:04:06 of the JavaScript to machine code.
02:04:09 For people who don’t know, JIT is just in time compilation.
02:04:11 Thank you.
02:04:12 Yeah, just in time compilation.
02:04:13 They’re actually really sophisticated.
02:04:14 In fact, I got jealous of how much effort
02:04:17 was put into the JavaScript JITs.
02:04:18 Yes, well, it’s kind of incredible what they’ve done.
02:04:20 Yes, I completely agree.
02:04:22 I’m very impressed.
02:04:24 But you know, Numba was an effort
02:04:26 to make that happen with Python.
02:04:29 And so we used some of the money
02:04:30 we raised from Anaconda to do it.
02:04:32 And then we also applied for this DARPA grant
02:04:34 and used some of that money to continue the development.
02:04:36 And then we used proceeds from service projects we would do.
02:04:40 We get consulting projects
02:04:41 that we would then use some of the profits
02:04:44 to invest in Numba.
02:04:45 So we ended up with a team of two or three people
02:04:47 working on Numba.
02:04:48 It was a fits and starts, right?
02:04:50 And ultimately, the fact that we had a commercial version
02:04:53 of it also we were writing.
02:04:54 So part of the way I was trying to fund Numba,
02:04:56 say, well, let’s do the free Numba
02:04:58 and then we’ll have a commercial version of Numba
02:04:59 called Numba Pro.
02:05:00 And what Numba Pro did is it targeted GPUs.
02:05:03 So we had the very first CUDA JIT
02:05:05 and the very first at JIT compiler that in 2012 for 13,
02:05:10 you could run not just a view func on CPU,
02:05:14 but a view func on GPUs.
02:05:15 And it would automatically paralyze it
02:05:17 and get 1000X speed on it.
02:05:18 And that’s an interesting funding mechanism
02:05:21 because large companies or larger companies
02:05:26 care about speed in just this way.
02:05:30 So it’s exactly a really good way.
02:05:33 Yeah, there’s been a couple of things
02:05:34 you know people will pay for.
02:05:35 One, they’ll pay for really good user interfaces, right?
02:05:37 And so I’m always looking for what are the things
02:05:40 people will pay for that you could actually adapt
02:05:41 to the open source infrastructure?
02:05:43 One is definitely user interfaces.
02:05:45 The second is speed, like a better runtime, faster runtime.
02:05:49 And then when you say people,
02:05:50 you mean like a small number of people pay a lot of money,
02:05:52 but then there’s also this other mechanism that.
02:05:54 That’s true.
02:05:55 A ton of people pay.
02:05:56 That’s true.
02:05:57 A little bit.
02:05:58 First, I gotta, we mentioned Anaconda,
02:06:00 we mentioned friends, family, and fools.
02:06:04 So Anaconda is yet another.
02:06:06 So there’s a company, but there’s also a project.
02:06:09 Correct.
02:06:09 That is exceptionally impactful in terms of,
02:06:14 for many reasons, but one of which is bringing
02:06:16 a lot more people into the community
02:06:21 of folks who use Python.
02:06:23 So what is Anaconda?
02:06:26 What is its goals?
02:06:28 Maybe what is Conda versus Anaconda?
02:06:31 Yeah, I’ll tell you a little bit of the history of that.
02:06:33 Cause Anaconda, we wanted to do,
02:06:35 we wanted to scale Python.
02:06:37 Cause we, you know, that was the goal.
02:06:38 Peter and I had the goal of when we started Anaconda,
02:06:40 we actually started as Continuum Analytics
02:06:42 was the name of the company that started.
02:06:44 It got renamed Anaconda in 2015.
02:06:47 But we said, we want to scale analytics.
02:06:49 NumPy is great, Pandas is emerging,
02:06:52 but these need to run at scale with lots of machines.
02:06:55 The other thing we wanted to do was make user interfaces
02:06:57 that were web.
02:06:59 We wanted to make sure the web did not pass
02:07:01 by the Python community.
02:07:02 That we had ways to translate your data science to the web.
02:07:06 So those are the two kind of technical areas.
02:07:07 We thought, oh, we’ll build products in this space.
02:07:09 And that was the idea.
02:07:12 Very quickly in, but of course,
02:07:13 the thing I knew how to do was to do consulting
02:07:15 to make money and to make sure my family and friends
02:07:18 and fools that had invested didn’t lose their money.
02:07:21 So it’s a little different
02:07:22 than if you take money from a venture fund.
02:07:24 If you take money from a venture fund,
02:07:25 the venture fund, they want you to go big or go home.
02:07:27 And they’re kind of like expecting nine out of 10 to fail
02:07:30 or 99 out of 100 to fail.
02:07:33 It’s different.
02:07:33 I was, I was owed a barbell strategy.
02:07:35 I was like, I can’t fail.
02:07:37 I mean, I may not do super well,
02:07:38 but I cannot lose their money.
02:07:40 So I’m going to do something I know can return a profit,
02:07:43 but I want to have exposure to an upside.
02:07:46 So that’s what happened at Anaconda.
02:07:47 We didn’t, there was lots of things we did not well
02:07:50 in terms of that structure.
02:07:51 And I’ve learned from since and how to do it better.
02:07:53 But we’ve, we did a really good job
02:07:56 of kind of attracting the interest around the area
02:07:59 to get good people working
02:08:00 and then get funnel some money
02:08:01 on some interesting projects.
02:08:03 Super excited about what came out of our energy there.
02:08:05 Like a lot did.
02:08:06 So what are some of the interesting projects?
02:08:08 So Dask, Numba, Bokeh, Conda.
02:08:12 There was a data shader, Panel, Holoviz.
02:08:16 These are all tools that are extremely relevant
02:08:19 in terms of helping you build applications,
02:08:21 build tools, build, you know, faster code.
02:08:25 There’s a couple I’m forgetting.
02:08:25 Oh, JupyterLab, JupyterLab came out of this too.
02:08:28 And yeah.
02:08:30 Okay, so Bokeh does plotting?
02:08:32 Is that?
02:08:33 Bokeh does plotting.
02:08:34 So Bokeh was one of the foundational things to say,
02:08:35 I want to do plot in Python,
02:08:37 but have the things show up in a web.
02:08:39 Right, that’s right.
02:08:40 That’s right, that’s right.
02:08:40 And plotting to me still,
02:08:43 with all due respect to Matplotlib and Bokeh,
02:08:46 it feels like still an unsolved problem,
02:08:48 not a solved problem.
02:08:50 It is, it’s a big problem.
02:08:52 Right, because you’re, I mean, I don’t know,
02:08:55 it’s visualization broadly, right?
02:08:58 I think we’ve got a pretty good API story
02:09:00 around certain use cases of plotting.
02:09:03 But there’s a difference between static plots
02:09:04 versus interactive plots versus I’m an end user,
02:09:07 I just want to write a simple,
02:09:09 for Pandas started the idea of here’s a data frame
02:09:12 on a dot plot, I’m just going to attach plot
02:09:14 as a method to my object,
02:09:16 which was a little bit controversial, right?
02:09:18 But works pretty well, actually,
02:09:20 because there’s a lot less you have to pass in, right?
02:09:23 You can just say, here’s my object, you know what you are,
02:09:26 you tell the visualization what to do.
02:09:29 So that, and there’s things like that
02:09:31 that have not been super well developed entirely,
02:09:33 but Bokeh was focused on interactive plotting.
02:09:36 So you could, it’s a short path
02:09:38 between interactive plotting and application,
02:09:41 dashboard application.
02:09:42 And there’s some incredible work that got done there, right?
02:09:44 And it was a hard project,
02:09:45 because then you’re basically doing JavaScript and Python.
02:09:49 So we wanted to tackle some of these hard problems
02:09:51 and try to just go after them.
02:09:53 We got some DARPA funding to help,
02:09:54 and it was super helpful, funny story there,
02:09:56 we actually did two DARPA proposals,
02:09:58 but one we were five minutes late for.
02:10:00 And DARPA has a very strict cutoff window.
02:10:03 And so I, we had two proposals,
02:10:04 one for the Bokeh and one for actually Numba
02:10:06 and the other work.
02:10:09 Which one were you late for?
02:10:10 The Foundation on Numerical Work.
02:10:12 So Bokeh got funded. Oh no.
02:10:14 Fortunately, Chris let us use some of the money to fund
02:10:17 still some of the other foundational work,
02:10:19 but it wasn’t as, yeah, his hands were tired,
02:10:22 he couldn’t do anything about it.
02:10:23 That was a whole interesting story.
02:10:25 So one of the incredible projects
02:10:27 that you worked on is Conda.
02:10:29 Yes.
02:10:30 So what is Conda? So how that came about,
02:10:31 yeah, Conda, it was early on, like I said, with SciPy.
02:10:35 SciPy was a distribution mass generation library.
02:10:37 And he said, he heard me talking about compiler issues
02:10:40 and trying to get the stuff shipped
02:10:41 and the fact that people can use your libraries
02:10:43 if they have it.
02:10:44 So for a long time,
02:10:45 we’d understood the packaging problem in Python.
02:10:47 And one of the first things he did at Conda Analytics
02:10:50 became Anaconda was organize the Pi data ecosystem
02:10:54 in conjunction with NumFocus.
02:10:56 We actually started NumFocus
02:10:58 with some other folks in the community
02:11:00 the same year we started Anaconda.
02:11:02 I said, we’re gonna build a corporation,
02:11:04 but we’re also gonna reify the community aspect
02:11:07 and build a nonprofit.
02:11:08 So we did both of those.
02:11:09 Can we pause real quick and can you say what is PyPy,
02:11:13 the Python package index,
02:11:14 like this whole story of packaging in Python?
02:11:19 Yeah, that’s what I’m gonna get to actually.
02:11:20 This is exactly the journey I’m on.
02:11:22 It’s to sort of explain packaging in Python.
02:11:24 I think it’s best expressed to the conversation
02:11:26 I had with Guido at a conference,
02:11:27 where I said, so packaging is kind of a problem.
02:11:31 And Guido said, I don’t ever care about packaging.
02:11:34 I don’t use it.
02:11:34 I don’t install new libraries.
02:11:36 I’m like, I guess if you’re the language creator
02:11:38 and if you need something, you just put it in the distribution
02:11:40 maybe you don’t worry about packaging.
02:11:42 But Guido has never really cared about packaging, right?
02:11:45 And never really cared about the problem of distribution.
02:11:47 It’s somebody else’s problem.
02:11:48 And that’s a fair position to take, I think,
02:11:50 as a language creator.
02:11:51 In fact, there’s a philosophical question about
02:11:54 should you have different development packaging managers?
02:11:56 Should you have a package manager per language?
02:11:58 Is that really the right approach?
02:11:59 I think there are some answers of
02:12:01 it is appropriate to have development tools.
02:12:04 And there’s an aspect of a development tool
02:12:06 that is related to packaging.
02:12:07 And every language should have some story there
02:12:10 to help their developers create.
02:12:12 So you should have language specific development tools.
02:12:14 Development tools that relate to package managers.
02:12:17 But then there’s a very specific user story
02:12:19 around package management
02:12:20 that those language specific package managers
02:12:22 have to interact with.
02:12:23 And currently aren’t doing a good job of that.
02:12:25 That was one of the challenges
02:12:27 that not seeing that difference,
02:12:29 and it still exists in the difference today.
02:12:31 Conda always was a user.
02:12:34 I’m gonna use Python to do data science.
02:12:36 I’m gonna use Python to do something.
02:12:38 How do I get this installed?
02:12:39 It was always focused on that.
02:12:41 So it didn’t have a develop.
02:12:43 Classic example is pip has a pip develop.
02:12:45 It’s like, I wanna install this
02:12:47 into my current development environment today.
02:12:50 Conda doesn’t have that concept
02:12:51 because it’s not part of the story.
02:12:52 For people who don’t know,
02:12:54 pip is a Python specific package manager.
02:12:59 That’s exceptionally popular.
02:13:04 That’s probably like the default thing you’ve learned.
02:13:06 It’s the default user.
02:13:07 And so the story there emerged
02:13:08 because what happened is in 2012,
02:13:11 we had this meeting at the Googleplex
02:13:13 and Guido was there to come talk about what we’re gonna do,
02:13:15 how we’re gonna make things work better.
02:13:17 And Wes McKinney, me, Peter,
02:13:19 Peter has a great photo of me talking to Guido
02:13:21 and he pretends we’re talking about this story.
02:13:23 Maybe we were, maybe we weren’t.
02:13:24 But we did at that meeting talk about it
02:13:26 and asked Guido, we need to fix packaging in Python.
02:13:29 People can’t get the stuff.
02:13:31 And he said, go fix it yourself.
02:13:32 I don’t think we’re gonna do it.
02:13:33 All right.
02:13:35 The origin story right there.
02:13:36 All right, you said, okay, you said to do this ourselves.
02:13:39 So at the same time,
02:13:41 people did start to work on the packaging story in Python.
02:13:44 It just took a little longer.
02:13:45 So in 2012, kind of motivated
02:13:48 by our training courses we were teaching,
02:13:49 like very similar to what you just mentioned
02:13:51 about your mother.
02:13:52 Like it was motivated by the same purpose.
02:13:54 Like how do we get this into people’s hands?
02:13:56 It’s this big, long process.
02:13:57 It takes too expensive.
02:13:58 It was actually hurting NumPy development
02:14:00 because I would hear people were saying,
02:14:02 don’t make that change to NumPy
02:14:03 because I just spent a week getting my Python environment.
02:14:05 And if you change NumPy, I have to reinstall everything.
02:14:09 And reinstalling is such a pain, don’t do it.
02:14:10 I’m like, wait, okay.
02:14:12 So now we’re not making changes to a library
02:14:14 because of the installation problem
02:14:16 that it’ll cause for end users.
02:14:17 Okay, there’s a problem with installation.
02:14:19 We gotta fix this.
02:14:20 So we said, we’re gonna make a distribution in Python.
02:14:23 And we’d previously done that.
02:14:24 I’d previously done that at mthought.
02:14:26 I wanted to make one that would give away for free,
02:14:28 that everyone could just get.
02:14:29 Like that was critical that we could just get it.
02:14:32 It wasn’t tied to a product.
02:14:33 It was just you could get it.
02:14:35 And then we had constantly thought about,
02:14:36 well, do we just leverage RPM?
02:14:39 But the challenge had always been,
02:14:40 we want a package manager that works on Windows,
02:14:42 Mac OS X, and Linux the same, right?
02:14:45 And it wasn’t there.
02:14:46 Like you don’t have anything like that.
02:14:47 You have…
02:14:48 And for people who don’t know,
02:14:49 RPM is an operating system specific package manager.
02:14:54 Correct, it’s an operating specific.
02:14:55 Yes, exactly.
02:14:56 So do you create the design questions,
02:15:00 do you create an umbrella package manager
02:15:02 that works across operating systems?
02:15:03 Yes, that was the decision.
02:15:05 And in neighboring design questions,
02:15:08 do you also create a package manager
02:15:09 that spans multiple programming languages?
02:15:11 Correct, exactly.
02:15:12 That was the world we faced.
02:15:14 And we decided to go multiple operating systems,
02:15:17 multiple and programming language independent.
02:15:19 Because even Python, and particularly what was important
02:15:21 was SciPy has a bunch of Fortran in it, right?
02:15:24 And scikit learn has links to a bunch of C++.
02:15:27 There’s a lot of compiled code.
02:15:29 And the Python package managers, especially early on,
02:15:32 didn’t even support that.
02:15:34 So in 2000, so we released Anaconda,
02:15:38 which was just a distribution of libraries,
02:15:39 but we started to work on Conda in 2012.
02:15:42 First version of Conda came out in early 2013,
02:15:44 summer of 2013, and it was a package manager.
02:15:47 So you could say, Conda install scikit learn.
02:15:49 In fact, scikit learn was a fantastic project that emerged.
02:15:54 It was the classic example of the scikits.
02:15:57 I talked to you earlier about SciPy being too big
02:15:59 to be a single library.
02:16:01 Well, what the community had done is said,
02:16:02 let’s make scikits.
02:16:04 And there’s scikit image, there’s scikit learn,
02:16:05 there’s a lot of scikits.
02:16:07 And it was a fantastic move that the community did.
02:16:10 I didn’t do it.
02:16:11 I was like, okay, that’s a good idea.
02:16:12 I didn’t like the name.
02:16:13 I didn’t like the fact you typed scikit image.
02:16:15 I was like, that’s gotta be simpler.
02:16:17 That’s scikit learn, we gotta make that smaller.
02:16:19 I don’t like typing all this stuff from imports.
02:16:21 So I was kind of a pressure that way,
02:16:23 but I love the energy and love the fact
02:16:25 that they went out and they did it,
02:16:26 and DOS people, Jared Millman, and then of course, Gael,
02:16:29 and there’s people I’m not even naming.
02:16:31 Scikit learn really emerged as a fantastic project.
02:16:34 And the documentation around that is also incredible.
02:16:36 And the documentation was incredible, exactly.
02:16:37 I don’t know who did that, but they did a great job.
02:16:40 A lot of people in Inria, a lot of European contributors.
02:16:45 There’s some Andreas in the US.
02:16:47 There’s a lot of just people I just adore,
02:16:48 I think are amazing people.
02:16:51 Awesome use of SciPy, right?
02:16:52 I love the fact that they were using SciPy effectively
02:16:54 to do something I love, which is machine learning,
02:16:57 but couldn’t install it.
02:16:58 Because there’s so many pieces involved.
02:17:00 So many dependencies, right?
02:17:02 So our use case of Conda was Conda install scikit learn.
02:17:06 Right, and it was the best way to install scikit learn
02:17:09 in 2013 to really 2018, 17, 18, PIP finally caught up.
02:17:14 I still think it’s you should Conda install scikit learn
02:17:16 for the PIP install scikit learn,
02:17:17 but you can PIP install scikit learn.
02:17:19 The issue is the package they created was wheels
02:17:21 and PIP does not handle the multi vendor approach.
02:17:24 They don’t handle the fact you have C++ libraries
02:17:26 you’re depending on.
02:17:27 They just stop at the Python boundary.
02:17:29 And so what you have to do in the wheel world
02:17:31 is you have to vendor.
02:17:33 You have to take all of the binary and vendor it.
02:17:35 Now, if your change happens in underlying dependency,
02:17:38 you have to redo the whole wheel.
02:17:40 So TensorFlow, as you know,
02:17:42 you should not PIP install TensorFlow.
02:17:44 It’s a terrible idea.
02:17:45 People do it because the popularity of PIP,
02:17:48 many people think, oh, of course,
02:17:49 that’s how I install everything in Python.
02:17:51 Yeah, this is one of the big challenges.
02:17:53 You take a GitHub repository or just a basic blog post.
02:17:57 The number of time PIP is mentioned over Conda
02:18:00 is like 100 X to one.
02:18:02 Correct, correct.
02:18:03 So it just has to do with the.
02:18:04 And that was increasing.
02:18:05 It wasn’t true early because PIP didn’t exist.
02:18:07 Like Conda came first.
02:18:08 So but that’s the problem.
02:18:10 Like Conda came first, but that’s like the long tail
02:18:13 of the internet documentation user generated.
02:18:15 So that like you think, how do I install Google?
02:18:19 How do I install TensorFlow?
02:18:20 You’re just not gonna see Conda in that first page.
02:18:23 Correct, exactly.
02:18:24 And that.
02:18:24 Not today, you would have in 2016, 2017.
02:18:29 And it’s sad because Conda solves
02:18:32 a lot of usability issues.
02:18:34 Correct.
02:18:35 Like for especially super challenging thing.
02:18:36 I don’t know.
02:18:37 One of the big pain points for me was
02:18:39 just on the computer vision side, OpenCV installation.
02:18:43 Perfect example.
02:18:44 Yes.
02:18:45 I think Conda, I don’t know if Conda solved that one.
02:18:47 Conda has an OpenCV package.
02:18:49 I don’t know.
02:18:49 I certainly know PIP has not solved.
02:18:53 I mean, there’s complexities there because.
02:18:55 Right.
02:18:56 I actually don’t know.
02:18:57 I should probably know a good answer for this,
02:18:59 but if you compile OpenCV with certain dependencies,
02:19:05 you’ll be able to do certain things.
02:19:07 So there’s this kind of flexibility of what you,
02:19:09 like what options you compile with.
02:19:12 Yes.
02:19:13 And I don’t think it’s trivial to do that with Conda or.
02:19:17 So Conda has a notion of variance of a package.
02:19:20 You can actually have different compilation versions
02:19:23 of a package.
02:19:23 So not just the version is different,
02:19:24 but oh, this is compiled with these optimizations on.
02:19:26 So Conda does have an answer.
02:19:28 Has those flavors.
02:19:28 Has flavors, basically.
02:19:30 Well, PIP, as far as I know, does not have flavors.
02:19:32 No, no.
02:19:33 PIP generally hasn’t thought deeply
02:19:36 about the binary dependency problem, right?
02:19:38 And that’s why fundamentally it doesn’t work
02:19:41 for the SciPy ecosystem.
02:19:43 It barely, you can sort of paper over it and duct tape
02:19:46 and it kind of works until it doesn’t
02:19:48 and it falls apart entirely.
02:19:49 So it’s been a mixed bag.
02:19:51 Like, and I’ve been having lots of conversations
02:19:54 with people over the years because again,
02:19:56 it’s an area where if you understand some things,
02:19:58 but not all the things,
02:19:59 but they’ve done a great job of community appeal.
02:20:02 This is an area where I think Anaconda as a company
02:20:05 needed to do some things
02:20:07 in order to make Conda more community centric, right?
02:20:10 And this is a, I talk about this all the time.
02:20:13 There’s a balance between you have every project starts
02:20:16 with what I called company backed open source.
02:20:18 Even if the company is yourself, it’s just one person,
02:20:20 just doing business as.
02:20:23 But ultimately for products to succeed virally
02:20:26 and become massive influencers,
02:20:28 they have to create,
02:20:29 they have to get community people on board.
02:20:30 They have to get other people on board.
02:20:32 So it has to become community driven.
02:20:33 And a big part of that is engagement with those people.
02:20:35 Empowering people, governance around it.
02:20:38 And what happened with Conda in the early days,
02:20:41 PIP emerged and we did do some good things.
02:20:43 Conda Forge, Conda Forge community
02:20:46 is sort of the community recipe creation community.
02:20:49 But Conda itself, I still believe,
02:20:52 and Peter is CEO of Anaconda, he’s my co founder.
02:20:55 I ran Anaconda until 2017, 2018.
02:20:58 Is Peter still Anaconda?
02:20:59 Peter’s still Anaconda, right?
02:21:00 And we’re still great friends.
02:21:01 We talk all the time.
02:21:02 I love him to death.
02:21:03 There’s a long story there about like why and how
02:21:06 and we can cover in some other podcast perhaps.
02:21:08 Yeah.
02:21:09 It’s sort of a more, maybe a more business focused one.
02:21:11 But this is one area where I think Conda
02:21:15 should be more community driven.
02:21:17 Like he should be pushing more
02:21:18 to get more community contributors to Conda
02:21:21 and let the, Anaconda shouldn’t be fighting this battle.
02:21:26 Yeah.
02:21:26 Right?
02:21:27 It’s actually, it’s really a developers.
02:21:28 Like you said, like help the developers
02:21:30 and then they’ll actually move us the right direction.
02:21:32 Well, that was the problem I have is many
02:21:34 of the cool kids I know don’t use Conda.
02:21:36 And that to me is confusing.
02:21:38 It is confusing.
02:21:39 It’s really a matter of, Conda has some challenges.
02:21:42 First of all, Conda still needs to be improved.
02:21:44 There’s lots of improvements to be made.
02:21:45 And it’s that aspect of wait, who’s doing this?
02:21:47 And the fact that then the Pi PA really stepped up.
02:21:50 Like they were not solving the problem at all.
02:21:53 And now they kind of got to where they’re solving it
02:21:55 for the most part.
02:21:56 And then effectively you could get,
02:21:58 like Conda solved a problem that was there.
02:22:00 And it still does.
02:22:01 It’s still, you know, there’s still great things it can do.
02:22:03 But, and we still use it all the time at one site
02:22:06 and with other clients, but with,
02:22:08 but you can kind of do similar things with PIP and Docker.
02:22:12 Right?
02:22:13 So especially with the web development community,
02:22:15 that part of it, again, is this is the,
02:22:17 there’s a lot of different kinds of developers
02:22:19 in the Python ecosystem.
02:22:20 And there’s still a lack of some clear understanding.
02:22:23 I go to the Python conference all the time
02:22:25 and then there’s only a few people in the Pi PA who get it.
02:22:28 And then others who are just massively trumpeting
02:22:30 the power of PIP, but just do not understand the problem.
02:22:32 Yeah.
02:22:33 So one of the obvious things to me from a mom,
02:22:36 from a non programmer perspective,
02:22:37 is the across operating system usability.
02:22:41 That’s much more natural.
02:22:42 So there’s people that use Windows and just,
02:22:45 it seems much easier to recommend Conda there,
02:22:49 but then it, you should also recommend it across the board.
02:22:51 So I’ll definitely sort of.
02:22:53 But what I recommend now is a hybrid.
02:22:55 I do.
02:22:56 I mean, I have no problem.
02:22:57 Is it possible to use?
02:22:57 Oh, it is.
02:22:58 It is.
02:22:59 But like build the environment with PIP, with Conda,
02:23:01 build an environment with Conda
02:23:03 and then PIP install on top of that.
02:23:04 That’s fine.
02:23:05 Be careful about PIP installing OpenCV or TensorFlow
02:23:09 or because if somebody’s allowed that,
02:23:11 it’s gonna be most surely done in a way
02:23:13 that can’t be updated that easily.
02:23:15 So install like the big packages,
02:23:17 the infrastructure with Conda and then the weirdos.
02:23:21 Yeah.
02:23:21 That like the weird like implementation for some.
02:23:24 I had a, there’s a cool library I used
02:23:28 that based on your location and time of day and date
02:23:33 tells you the exact position of the sun
02:23:35 relative to the earth.
02:23:38 And it’s just like a simple library,
02:23:39 but it’s very precise.
02:23:41 And I was like, all right.
02:23:42 But that was, that was, and it’s like PIP.
02:23:45 Well, the thing they did really well is Python developers
02:23:48 who wanna get their stuff published,
02:23:50 you have to have a PIP recipe.
02:23:51 Yeah.
02:23:52 Right?
02:23:53 I mean, even if it’s, you know, the challenge is,
02:23:56 and there’s a key thing that needs to be added to PIP,
02:23:58 just simply add to PIP the ability to defer
02:24:01 to a system package manager.
02:24:03 Like, cause it’s, you know,
02:24:04 recognize you’re not gonna solve all the dependency problem.
02:24:07 So let like give up and allow the system package to work.
02:24:12 That way Anaconda is installed and it has PIP.
02:24:15 It would default to Conda to install stuff,
02:24:16 but Red Hat RPM would default to RPM
02:24:19 to install some more things.
02:24:20 Like that’s the, that’s a key, not difficult,
02:24:23 but somewhat work, some work feature needs to be added.
02:24:25 That’s an example of something like,
02:24:27 I’ve known we need to do it.
02:24:28 I mean, it’s where I wish I had more money.
02:24:30 I wish I was more successful in the business side,
02:24:33 trying to get there, but I wish my, you know,
02:24:35 my family, friends and full community that I know.
02:24:37 Was larger.
02:24:38 Was larger and had more money.
02:24:39 Cause I know tons of things to do effectively
02:24:42 with more resources, but you know,
02:24:46 I have not yet been successful at channel.
02:24:48 Tons of, you know, some, you know,
02:24:49 I’m happy with what we’ve done.
02:24:51 We created again at Quansight,
02:24:54 what we created to get Anaconda started.
02:24:56 We created community to get Anaconda started.
02:24:58 Done it again with Quansight.
02:24:59 Super excited by that.
02:25:00 But it took three years to do it.
02:25:02 What is Quansight?
02:25:03 What is its mission?
02:25:04 We’ve talked a few times about different fascinating
02:25:06 aspects of it, but let’s like big picture,
02:25:08 what is Quansight?
02:25:09 Big picture Quansight.
02:25:10 Quansight is, its mission is to connect data
02:25:13 to an open economy.
02:25:14 So it’s basically consulting of the pie data ecosystem,
02:25:17 right?
02:25:18 It’s a consulting company.
02:25:19 And what I’ve said when I started it was we’re trying
02:25:21 to create products, people, and technology.
02:25:24 So it’s divided into two groups.
02:25:26 And a third one as well.
02:25:28 The two groups are a consulting services company
02:25:30 that just helps people do data science
02:25:31 and data engineering and data management better
02:25:35 and more efficiently.
02:25:35 Like full stack, like full thing.
02:25:36 Full stack data science, full thing.
02:25:38 We’ll help you build a infrastructure.
02:25:40 If you’re using Jupiter, we need,
02:25:41 we do staff augmentation, need more pro programmers,
02:25:43 help you use Dask more effectively,
02:25:44 help you use GPUs more effectively.
02:25:46 Just basically a lot of people need help.
02:25:48 So we do training as well to help people, you know,
02:25:50 both immediate help and then get, learn from somebody.
02:25:55 We’ve added a bunch of stuff too.
02:25:57 We’ve kind of separated some of these other things
02:25:58 into another company called Open Teams
02:26:00 that we currently started.
02:26:01 One of the things I loved about what we did at Anaconda
02:26:03 was creating a community innovation team.
02:26:05 And so I wanted to replicate that.
02:26:06 This time we did a lot of innovation at Anaconda.
02:26:09 I wanted to do innovation,
02:26:10 but also contribute to the projects that existed,
02:26:13 like create a place where maintainers,
02:26:16 so the SciPy and NumPy and Numba
02:26:18 and all these projects we already started
02:26:20 can pay people to work on them and keep them going.
02:26:22 So that’s Labs.
02:26:23 Quansight Labs is a separate organization.
02:26:25 It’s a nonprofit mission.
02:26:28 The profits of Quansight help fund it.
02:26:29 And in fact, every project that we have at Quansight,
02:26:33 a portion of the money goes directly to Quansight Labs
02:26:36 to help keep it funded.
02:26:37 So we’ve gotten several mechanisms
02:26:38 that we keep Quansight Labs funded.
02:26:40 And currently, so I’m really excited about Labs
02:26:41 because it’s been a mission for a long time.
02:26:43 What kind of projects are within Labs?
02:26:45 So Labs is working to make the software better,
02:26:47 like make NumPy better, make SciPy better.
02:26:49 It only works on open source.
02:26:52 So if somebody wants to, so companies do,
02:26:55 we have a thing called a community work order, we call it.
02:26:57 If a company says, I wanna make Spyder better.
02:27:00 Okay, cool.
02:27:01 You can pay for a month of a developer of Spyder
02:27:05 or a developer of NumPy or a developer of SciPy.
02:27:08 You can’t tell them what you want them to do.
02:27:09 You can give them your priorities and things you wish existed
02:27:12 and they’ll work on those priorities with the community
02:27:16 to get what the community wants
02:27:17 and what emerges of what the community wants.
02:27:18 Is there some aspect on the consulting side
02:27:21 that is helping, as we were talking about morphology
02:27:24 and so on, is there specific application
02:27:26 that are particularly like driving,
02:27:29 sort of inspiring the need for updates to SciPy?
02:27:32 Correct, absolutely, absolutely.
02:27:33 GPUs are absolutely one of them.
02:27:34 And new hardware beyond GPUs.
02:27:36 I mean, Tesla’s Dojo chip, I’m hoping we’ll have a chance
02:27:39 to work on that perhaps.
02:27:42 Things like that are definitely driving it.
02:27:43 The other thing that’s driving it is scalable,
02:27:45 like speed and scale.
02:27:47 How do I write NumPy code or NumPy Lite code
02:27:50 if I want it to run across a cluster?
02:27:52 That’s Dask or maybe it’s Ray.
02:27:54 I mean, there’s sort of ways to do that now.
02:27:56 Or there’s Moden and there’s, so Pandas code,
02:27:59 NumPy code, SciPy code, Scikit learn code
02:28:02 that I want to scale.
02:28:03 So that’s one big area.
02:28:04 Have you gotten a chance to chat with Andre and Elon
02:28:08 about particular, because like.
02:28:09 No, I would love to, by the way.
02:28:11 I have not, but I’d love to.
02:28:12 I just saw their Tesla AI Days video.
02:28:15 Super excited.
02:28:16 That’s one of the, you know, I love great engineering,
02:28:18 software engineering teams and engineering teams in general.
02:28:21 And they’re doing a lot of incredible stuff with Python.
02:28:23 They’re like revolutionary.
02:28:25 So many aspects of the machine learning pipeline.
02:28:28 I agree.
02:28:29 That’s operating in the real world.
02:28:30 And so much of that is Python.
02:28:31 Like you said, the guy running, you know, Andre Kapathy,
02:28:35 running Autopilot is tweeting about optimization
02:28:38 of NumPy versus.
02:28:41 I would love to talk to him.
02:28:42 In fact, we have at Quonset, we’ve been fortunate enough
02:28:45 to work with Facebook on PyTorch directly.
02:28:47 So we have about 13 developers at Quonset.
02:28:49 Some of them are in labs working directly on PyTorch.
02:28:52 On PyTorch.
02:28:53 On PyTorch, right.
02:28:54 So I basically started Quonset.
02:28:55 I went to both TensorFlow and PyTorch and said,
02:28:57 hey, I want to help connect what you’re doing
02:29:00 to the broader SciPy ecosystem.
02:29:01 Because I see what you’re doing.
02:29:03 We have this bigger mission that we want to make sure
02:29:04 we don’t, you know, lose energy here.
02:29:06 So, and Facebook responded really positively
02:29:09 and I didn’t get the same reaction.
02:29:12 Not yet, not yet.
02:29:13 Not yet.
02:29:14 So I really love the folks at TensorFlow, too.
02:29:17 They’re fantastic.
02:29:18 I think it’s the, just how it integrates
02:29:21 with their business.
02:29:21 I mean, like I said, there’s a lot of reasons.
02:29:23 Just the timing, the integration with their business,
02:29:25 what they’re looking for.
02:29:27 They’re probably looking for more users.
02:29:28 And I was looking to kind of cut up some development effort
02:29:31 and they couldn’t receive that as easily, I think.
02:29:33 So I’m hoping, I’m really hopeful
02:29:36 and love the people there.
02:29:37 What’s the idea behind OpenTeams?
02:29:39 So OpenTeams, I’m super excited about OpenTeams
02:29:41 because it’s one of the,
02:29:43 I mentioned my idea for investing directly in open source.
02:29:46 So that’s a concept called fair OSS.
02:29:48 But one of the things we, when we started Quansight,
02:29:51 we knew we would do is we develop products and ideas
02:29:53 and new companies might come out.
02:29:55 At Anaconda, this was clear, right?
02:29:57 Anaconda, we did so much innovation
02:30:00 that like five or six companies could have come out of that.
02:30:02 And we just didn’t structure it so they could.
02:30:05 But in fact, they have, you look at Dask,
02:30:07 there’s two companies going out of Dask.
02:30:08 You know, Bokeh could be a company.
02:30:10 There’s like lots of companies that could exist
02:30:11 off the work we did there.
02:30:13 And so I thought, oh, here’s a recipe for an incubation,
02:30:16 a concept that we could actually spawn new companies
02:30:19 and new innovations.
02:30:20 And then the idea has always been,
02:30:22 well, money they earn should come back
02:30:24 to fund the open source projects.
02:30:26 So labs is, you know, I think there should be
02:30:29 a lot of things like Quansight Labs.
02:30:30 I think this concept is one that scales.
02:30:32 You could have a lot of open source research labs.
02:30:35 Along the way, so in 2018, when the bigger idea came,
02:30:37 how to make open source investable, I said,
02:30:38 oh, I need to write, I need to create a venture fund.
02:30:41 So we created a venture fund called Quansight Initiate
02:30:43 at the same time.
02:30:44 It’s an angel fund, really.
02:30:45 It’s, you know, we started to learn that process.
02:30:47 How do we actually do this?
02:30:48 How do we get LPs?
02:30:49 How do we actually go in this direction and build a fund?
02:30:52 And I’m like, every venture fund should have
02:30:54 an associated open source research lab,
02:30:55 which is no reason.
02:30:56 Like our venture fund, the carried interest,
02:30:59 a portion of it goes to the lab.
02:31:01 It directly will fund the lab.
02:31:03 That’s fascinating, brother.
02:31:04 So you use the power of the organic formation of teams
02:31:06 in the open source community, and then like naturally,
02:31:10 that leads to a business that can make money.
02:31:13 Yeah, correct.
02:31:14 And then it always maintains and loops back
02:31:16 to the open source.
02:31:17 Loops back to open source, exactly.
02:31:18 I mean, to me, it’s a natural fit.
02:31:19 There’s something, there’s absolutely
02:31:20 a repeatable pattern there, and it’s also beneficial
02:31:23 because, oh, I have, I have natural connections
02:31:26 to the open source if I have an open source research lab.
02:31:29 Like, they’ll always, they’ll be out there
02:31:31 talking to people, and so we’ve had a chance
02:31:34 to talk to a lot of early stage companies.
02:31:35 And we, and our fund focuses on the early stage.
02:31:37 So Quansight has the services, the lab, the fund, right?
02:31:41 In that process, a lot of stuff started to happen.
02:31:44 They’re like, oh, you know, we started to do recruiting
02:31:46 and support and training, and I was starting
02:31:48 to build a bigger sales team and marketing team
02:31:50 and people besides just developers.
02:31:52 And one of the challenges with that
02:31:54 is you end up with different cultural aspects.
02:31:55 You know, developers, you know, there’s a,
02:31:58 in any company you go to, you kind of go look,
02:32:00 is this a business led company, a developer led company?
02:32:03 Do they kind of coexist?
02:32:04 Are they, what’s the interface between them?
02:32:06 There’s always a bit of a tension there.
02:32:07 Like we were talking about before.
02:32:08 You know, what is the tension there?
02:32:10 With OpenTeams, I thought, wait a minute,
02:32:11 we can actually just create,
02:32:13 like this concept of Quansight plus labs,
02:32:15 it’s, well, it’s specific to the Pi data ecosystem.
02:32:18 The concept is general for all open source.
02:32:20 So OpenTeams emerged as a, oh,
02:32:22 we can create a business development company
02:32:24 for many, many Quansights, like thousands of Quansights.
02:32:28 And it can be a marketplace to connect,
02:32:30 essentially be the enterprise software company
02:32:33 of the future.
02:32:34 If you look at what enterprise software wants
02:32:36 from the customer side, and during this journey,
02:32:38 I’ve had the chance to work and sell to lots of companies,
02:32:42 Exxon and Shell and Davey Morgan Bank of America,
02:32:45 like the Fortune 100,
02:32:46 and talk to a lot of people in procurement
02:32:48 and see what are they buying and why are they buying?
02:32:50 So, you know, I don’t know everything,
02:32:51 but I’ve learned a lot about,
02:32:52 oh, what are they really looking for?
02:32:54 And they’re looking for solutions.
02:32:56 They’re constantly given products
02:32:58 from enterprise software.
02:33:01 Here’s open source, leave the enterprise software,
02:33:02 now I buy it.
02:33:03 And then they have to stitch it together into a solution.
02:33:05 Open source is fantastic for gluing
02:33:07 those solutions together.
02:33:08 So, whereas they keep getting new platforms
02:33:11 they’re trying to buy,
02:33:12 but most open source, what most enterprises want
02:33:15 is tools that they can customize
02:33:16 that are as inexpensive as they can.
02:33:18 Yeah, and so you always want to maintain
02:33:20 the connection to the open source
02:33:21 because that’s going to be the tools.
02:33:22 Yes, so open teams is about solving
02:33:24 enterprise software problems.
02:33:26 Brilliant, brilliant idea, by the way.
02:33:28 With a connect, but we do it honoring the topology.
02:33:30 We don’t hire all the people.
02:33:32 We are a network connecting the sales energy
02:33:35 and the procurement energy,
02:33:36 and we work on the business side,
02:33:37 get the deals closed,
02:33:39 and then have a network of partners
02:33:40 like Quonsight and others who we hand the deals to,
02:33:44 to actually do the work.
02:33:44 And then we have to maintain,
02:33:46 I feel like we have to maintain
02:33:47 some level of quality control
02:33:48 so that the client can rely on open teams
02:33:50 to ensure the delivery.
02:33:52 It’s not just, here’s a lead, go figure that out.
02:33:54 But no, we’re going to make sure you get what you need.
02:33:57 By the way, it’s such a skill,
02:33:58 and I don’t know if I have the patience.
02:34:00 I will have the patience to talk to the business people
02:34:04 or more specific, I mean,
02:34:05 there’s all kinds of flavors of business people
02:34:07 or like marketing people.
02:34:11 There’s a challenge.
02:34:12 I hear what you’re saying
02:34:13 because I’ve had the same challenge.
02:34:14 And it’s true.
02:34:15 There’s sometimes you think, okay, this is way overwrought.
02:34:18 Yeah, but you have to become an adult
02:34:20 and you have to, because the companies have needs.
02:34:22 They have ways to make money
02:34:24 and they also want to learn and grow,
02:34:26 and it’s your job to kind of educate them on the best way,
02:34:28 like the value of open source, for example.
02:34:31 Right, and I’m really grateful for all my experiences
02:34:32 over the past 14 years, understanding that side of it
02:34:35 and still learning for sure,
02:34:37 but not just understanding from companies,
02:34:38 but also dealing with marketing professionals
02:34:40 and sales professionals
02:34:41 and people that make a career out of that
02:34:43 and understanding what they’re thinking about
02:34:44 and also understanding, well, let’s make this better.
02:34:46 We can really make a place.
02:34:48 Open teams I see as the transmission layer
02:34:50 between companies and open source communities
02:34:53 producing enterprise software solutions.
02:34:55 Eventually we want to,
02:34:56 today we’re taking on SaaS and MATLAB
02:34:59 and tools that we know we can replace for folks.
02:35:01 Really, anytime you have a software tool at an organization
02:35:04 where you have to do a lot of customization
02:35:06 to make it work for you.
02:35:07 It’s not you’re just buying this thing off the shelf
02:35:09 and it works.
02:35:09 It’s like, okay, you buy this system
02:35:11 and then you customize it a lot,
02:35:12 usually with expensive consultants
02:35:15 to actually make it work for you.
02:35:17 All of those should be replaced by open source foundations
02:35:19 with the same customization.
02:35:20 You’re doing such important work,
02:35:22 such important work in these giant organizations
02:35:25 that do exactly that,
02:35:26 taking some proprietary software
02:35:28 and hiring a huge team of consultants
02:35:30 that customize it and then that whole thing
02:35:32 gets outdated quick.
02:35:33 Correct.
02:35:34 And so, I mean, that’s brilliant.
02:35:36 So the one solution to that
02:35:39 is kind of what Tesla’s doing a little bit of,
02:35:43 which is basically build up a software engineering team.
02:35:46 Like build a team from scratch.
02:35:48 Build a team from scratch.
02:35:49 And companies are doing it well,
02:35:50 that’s what they’re doing right now.
02:35:50 Yeah, exactly.
02:35:51 And that’s okay.
02:35:52 And you’re creating a topology for some of that.
02:35:54 You’re right.
02:35:55 You just don’t have to do it.
02:35:56 That’s not the only answer, right?
02:35:57 And so other companies can access this,
02:35:58 be more accessible.
02:35:59 We literally say,
02:36:01 open team is the future of enterprise software.
02:36:03 We’re still early.
02:36:04 Like this idea just percolated over the past year
02:36:07 as we’ve kind of grown Quansight
02:36:08 and realized the extensibility of it.
02:36:10 We just finished in our seed round
02:36:13 to help get more sales people
02:36:15 and then push the messaging correctly.
02:36:17 And there’s lots of tools we’re building
02:36:19 to make this easier.
02:36:20 Like we wanna automate the processes.
02:36:21 We feel like a lot of the power
02:36:23 is the efficiency of the sales process.
02:36:25 There’s a lot of wasted energy in small teams
02:36:29 and the sales energy to get into large companies
02:36:31 and make a deal.
02:36:32 There’s a lot of money spent on that process.
02:36:34 Creating the tools and processes for that sales.
02:36:36 So make that super seamless.
02:36:38 So a single company can go,
02:36:39 oh, I’ve got my contract with open teams.
02:36:41 We’ve got a subscription they can get.
02:36:43 They can make that procurement seamless.
02:36:45 And then the fact they have access
02:36:46 to the entire open source ecosystem.
02:36:48 And we have a part of our work
02:36:51 that’s embracing open source ecosystems
02:36:53 and making sure we’re doing things useful for them
02:36:55 or serving them.
02:36:56 And then companies making sure
02:36:57 they’re getting solutions they care about.
02:36:59 And then figuring out which targets we have.
02:37:02 We’re not taking on all of open source,
02:37:04 all of enterprise software yet.
02:37:06 But we’re step by step.
02:37:07 Well this feels like the future.
02:37:08 The idea and the vision is brilliant.
02:37:10 Can I ask you, why do you think Microsoft bought GitHub
02:37:14 and what do you think is the future of GitHub?
02:37:16 Great point.
02:37:17 I thought it was a brilliant move.
02:37:18 I think they did because Microsoft has always
02:37:20 had a developer centric culture.
02:37:22 Like they always have.
02:37:23 Like one of the things Microsoft’s always done well
02:37:25 is understand that their power is the developers.
02:37:27 It’s been, Ballmer didn’t necessarily make a good meme
02:37:31 about how he approached that.
02:37:32 But they’re broadening that.
02:37:34 I think that’s why.
02:37:35 Because they recognize GitHub is where developers are at.
02:37:38 Right?
02:37:38 And so.
02:37:39 But do they have a vision like open teams
02:37:41 type of situation, right?
02:37:41 I don’t think so yet.
02:37:43 Are they just basically throwing money at developers
02:37:46 to show their support?
02:37:47 I think so.
02:37:48 Without a topology like you put it.
02:37:50 Like a way to leverage that.
02:37:53 Like to give developers actual money.
02:37:55 Right.
02:37:56 I don’t think so.
02:37:57 They’re still, it’s an enterprise software company.
02:37:59 And they make a bunch of money.
02:38:00 They make a bunch of games.
02:38:01 They’re a big company.
02:38:02 They sell products.
02:38:03 I think part of it is they know there’s opportunity
02:38:06 to make money from GitHub.
02:38:07 Right?
02:38:08 There’s definitely a business there.
02:38:09 You know, to sell to developers.
02:38:11 Or to sell to people using development.
02:38:13 I think there’s part of that.
02:38:14 I think part of it is also there’s,
02:38:15 they had definitely wanted to recognize
02:38:18 that you need to value open source
02:38:20 to get great developers.
02:38:21 Which is an important concept that was emerging
02:38:24 over the past 10 years.
02:38:25 That, you know, pay at Pi Data.
02:38:28 We were able to convince J.P. Morgan
02:38:29 to support Pi Data because of that fact.
02:38:31 Right?
02:38:32 That was where the money for them putting
02:38:33 a couple hundred thousand into supporting Pi Data
02:38:35 for several conferences was they want developers.
02:38:37 And they realized that developers want
02:38:39 to participate in open source.
02:38:40 So enterprise software folks don’t always understand
02:38:43 how their software gets used.
02:38:44 Having spent a lot of time on the floors
02:38:46 at J.P. Morgan, at InShell, at ExxonMobil,
02:38:49 you see, oh, these companies have large development teams.
02:38:52 And then they’re kind of dealing with
02:38:55 what’s being delivered to them.
02:38:56 So I really feel kind of a privilege
02:38:58 that I had a chance to learn some of these people
02:39:00 and see what they’re doing.
02:39:01 And even work alongside them, you know,
02:39:04 as a consultant, using open source and trying to figure,
02:39:07 how do we make this work inside of our large organization?
02:39:09 Some of it is actually, for a large organization,
02:39:13 some of it is messaging to the world
02:39:14 that you care about developers
02:39:16 and you’re the cool, you care.
02:39:18 Like, for example, like if Ford,
02:39:21 cause I talked to them, like car companies, right?
02:39:23 They want to attract, you know,
02:39:26 you want to take on Tesla and autopilot.
02:39:28 You want to take on, right?
02:39:29 And so what do you do there?
02:39:31 You show that you’re cool.
02:39:32 Like you try to show off that you care about developers
02:39:36 and they have a lot of trouble doing that.
02:39:39 And like one way, I think like Ford should have bought GitHub.
02:39:42 They just to show off, like these old school companies
02:39:46 and it’s in a lot of different industries.
02:39:49 There’s probably different ways.
02:39:51 It’s probably an art show that you care to developers.
02:39:54 And the developers, it’s exactly what you, like,
02:39:57 for example, just spit balling here,
02:40:00 but like Ford or somebody like that
02:40:02 could give a hundred million dollars
02:40:05 to the development of NumPy.
02:40:07 And like literally look at like the top most popular projects
02:40:13 in Python and just say, we’re just going to give money.
02:40:17 Like that’s going to immediately make you cool.
02:40:20 They could actually, yeah.
02:40:21 And in fact, they set up NumFocus to make it easy.
02:40:24 But the challenge was,
02:40:26 is also you have to have some business development.
02:40:28 Like it’s a bit of a seeding problem, right?
02:40:31 And you look at how,
02:40:32 I’ve talked to the folks at Linux Foundation,
02:40:33 know how they’re doing it.
02:40:34 I know how, and starting NumFocus,
02:40:36 because we had two babies in 2012.
02:40:39 One was Anaconda, one was NumFocus, right?
02:40:41 And they were both important efforts.
02:40:42 They had distinct journeys
02:40:44 and super grateful that both existed
02:40:46 and still grateful both exist.
02:40:48 But there’s different energies in getting donations
02:40:51 as there is getting, this is important to my business.
02:40:55 Like I’m selling you something that this is a,
02:40:58 I’m going to make money this way.
02:41:00 Like if you can tie it,
02:41:01 if you can tie the message to an ROI for the company,
02:41:04 it becomes a brainer.
02:41:04 That’s more effective.
02:41:05 It’s much more effective, right?
02:41:06 So, and there are rational arguments to make.
02:41:09 I’ve tried to have conversations with marketing,
02:41:11 especially marketing departments.
02:41:12 Like very early on, it was clear to me that,
02:41:14 oh, you could just take a fraction of your marketing budget
02:41:18 and just spend it on open source development.
02:41:20 And you get better results from your marketing.
02:41:23 Like, because.
02:41:24 How did those, can I, sorry,
02:41:26 I’m going to try not to go and rants here.
02:41:27 What have you learned from the interaction
02:41:29 with the marketing folks on that kind of,
02:41:31 because you gave a great example
02:41:34 of something that will obviously be much better investment
02:41:37 in terms of marketing is supporting open source projects.
02:41:40 The challenge is not dissimilar
02:41:41 from the challenge you have in academia
02:41:44 or the different colleges, right?
02:41:46 Knowledge gets very specific and very channeled, right?
02:41:50 And so people get,
02:41:51 they get a lot of learning in the thing they know about.
02:41:53 And it’s hard then to bridge that
02:41:56 and to get them to think differently enough
02:41:58 to have a sense that you might have something to offer
02:42:02 because it’s different.
02:42:03 It’s like, well, how do I implement that?
02:42:04 How do I, what do I do with that?
02:42:05 Like, do I, which budget do I take from?
02:42:07 Do I slow down my spend on Google ads
02:42:10 or my spend on Facebook ads?
02:42:11 Or do I not hire a content creator and say like,
02:42:14 there’s an operational aspect to that,
02:42:16 that you have to be the CMO, right?
02:42:19 Or the CEO, you have to get the right level.
02:42:21 So you’ll have to hire at a high position level
02:42:24 where they care about this and this.
02:42:25 Right, or they won’t know how, right?
02:42:27 And because you can also do it very clumsily, right?
02:42:30 And I’ve seen it, cause you can,
02:42:32 you absolutely have to honor and recognize
02:42:33 the people you’re going to and the fact
02:42:36 that if you just throw money at them,
02:42:37 it could actually create more problems.
02:42:39 Can I just say, this is not you saying, can I just,
02:42:41 cause I just need, I need to say this.
02:42:44 I’ve been very surprised how often marketing people
02:42:49 are terrible at marketing.
02:42:51 I feel like the best marketing is doing something novel
02:42:55 and unique that anticipates the future.
02:42:58 It feels like so much of the marketing practice
02:43:01 is like what they took in school,
02:43:04 or maybe they’re studying for what was the best thing
02:43:06 that was done in the past decade,
02:43:08 and they’re just repeating that over and over,
02:43:10 as opposed to innovating, like taking the risk.
02:43:13 To me, marketing.
02:43:14 That’s a great point.
02:43:15 Is taking the big risk.
02:43:17 That’s a great point.
02:43:17 And being the first one to risk.
02:43:18 Yeah, there’s an aspect of data observation
02:43:21 from that risk, right?
02:43:22 That’s, I think, shared what they’re doing already.
02:43:25 But it absolutely, it’s about, I think it’s content.
02:43:27 Like there’s this whole world on content marketing
02:43:30 that you could almost say, well, yeah, it can get over,
02:43:33 you can get inundated with stuff
02:43:35 that’s not relevant to you.
02:43:36 Whereas what you’re saying would be highly relevant
02:43:39 and highly useful and highly beneficial.
02:43:41 Yeah, but it’s risk.
02:43:42 I mean, that’s why I sort of,
02:43:44 there’s a lot of innovative ways of doing that.
02:43:46 Tesla’s an example of people
02:43:48 that basically don’t do marketing.
02:43:49 They do marketing in a very, like,
02:43:52 let’s say Elon hired a person who’s just good at Twitter
02:43:55 for running Tesla’s Twitter account.
02:43:57 No, right, right.
02:43:59 I mean, that’s exactly what you wanna be doing.
02:44:00 You want it to be constantly innovating in the.
02:44:03 Right, there’s an aspect of telling.
02:44:04 I mean, I’ve definitely seen people doing great work
02:44:06 where you’re not talking about it.
02:44:08 Like, I would say that’s actually a problem
02:44:09 I have right now with Quonset Labs.
02:44:11 Quonset Labs has been doing amazing work,
02:44:12 really excited about it,
02:44:13 but we have not been talking about it enough.
02:44:15 We haven’t been.
02:44:16 And there’s different ways to talk about it.
02:44:17 There’s different ways to,
02:44:18 there’s different channels to which to communicate.
02:44:20 There’s also, like, I’ll just throw some shade
02:44:25 at companies I love.
02:44:27 So for example, iRobot,
02:44:29 I just had a conversation with them.
02:44:30 They make Roombas.
02:44:31 Sure.
02:44:32 And I think I love, they’re incredible robots,
02:44:35 but like every time they do like advertisement,
02:44:38 not advertisement, but like marketing type stuff,
02:44:41 it just looks so corporate.
02:44:44 And to me, the incredible,
02:44:47 maybe wrong in the case of iRobot, I don’t know.
02:44:50 But to me, when you’re talking about engineering systems,
02:44:54 it’s really nice to show off the magic of the engineering
02:44:57 and the software and all the geniuses behind this product
02:45:02 and the tinkering and like the raw authenticity
02:45:05 of what it takes to build that system
02:45:06 versus the marketing people who want to have like
02:45:09 pretty people, like standing there all pretty
02:45:12 with the robots, like moving perfectly.
02:45:14 So to me, there’s some aspect,
02:45:16 it’s like speaking to the hackers,
02:45:18 you have to throw some bones,
02:45:21 some care towards the engineers, the developers,
02:45:25 because there’s some aspect, one, for the hiring,
02:45:28 but two, there’s an authenticity to that,
02:45:31 authenticity to that kind of communication
02:45:33 that’s really inspiring to the end user as well.
02:45:36 Like if they know that brilliant people,
02:45:38 the best in the world are working at your company,
02:45:40 they start to believe that that product
02:45:42 that you’re creating is really good.
02:45:43 It’s interesting, because your initial reaction would be,
02:45:45 wait, there’s different users here.
02:45:46 Why would you do that to, you know,
02:45:48 my wife bought a Roomba, and she loves developers,
02:45:52 she loves me, but she doesn’t care about that culture.
02:45:56 So essentially what you said is actually the authenticity,
02:45:59 because everyone has a friend, everyone knows people,
02:46:01 there’s word of mouth, I mean, if you.
02:46:02 Word of mouth is so, so proper.
02:46:04 Yeah, exactly, that’s interesting.
02:46:05 Because I think it’s the lack of that realization,
02:46:07 there’s this halo effect that influences
02:46:09 your general marketing, interesting.
02:46:11 For some stupid reason, I do have a platform,
02:46:14 and it seems that the reason I have a platform,
02:46:16 many others like me, millions of others,
02:46:19 is like the authenticity,
02:46:21 and like we get excited naturally about stuff.
02:46:23 And like, I don’t want to get excited
02:46:25 about that iRobot video,
02:46:27 because it’s boring, it’s marketing, it’s corporate,
02:46:30 as opposed to, I wanted to do some fun,
02:46:33 this is me, like a shout out to iRobot,
02:46:36 is they’re not letting me get into the robot.
02:46:39 Yeah, well there’s an aspect of,
02:46:40 that could be benefiting from a culture of modularity,
02:46:44 like add ons, and that could actually dramatically help.
02:46:47 You’ve seen that over history,
02:46:49 I mean, Apple is an example of a company like that,
02:46:51 or the, like, I can see what your point is,
02:46:54 is that you have something that needs to be,
02:46:56 it needs to be adopted broadly,
02:46:58 the concept needs to be adopted broadly.
02:47:00 And if you want to go beyond this one device,
02:47:01 you need to engage this community.
02:47:04 Yeah, and connecting to the open source that you said.
02:47:07 I gotta ask you,
02:47:09 you’re a programmer,
02:47:11 one of the most impactful programmers ever.
02:47:14 You’ve led many programmers, you lead many programmers.
02:47:18 What are some, from a programmer perspective,
02:47:21 what makes a good programmer?
02:47:23 What makes a productive programmer?
02:47:25 Is there a device you can give
02:47:27 to be a great programmer in this world?
02:47:28 That’s a great, great question.
02:47:30 And there are times in my life
02:47:31 I’d probably answer this even better
02:47:32 than I hope maybe give an answer today.
02:47:35 Because I thought about this numerous times,
02:47:36 like right now I’ve spent on so much time
02:47:38 recently hiring salespeople that,
02:47:41 That your mind is a little bit on something else.
02:47:43 On something else.
02:47:44 But I reflected on the past,
02:47:46 and also, you know, I have some really,
02:47:48 the only way I can do this,
02:47:49 is I have some really great programmers that I work with,
02:47:51 who lead the teams that they lead.
02:47:53 And my goal is to inspire them and hopefully help them,
02:47:56 encourage them, and be,
02:47:57 help them encourage with their teams.
02:47:59 I would say there’s a number of things, couple things.
02:48:01 One is curiosity.
02:48:03 Like you, I think a programmer without curiosity
02:48:07 is mundane.
02:48:09 Like you’ll lose interest, you won’t do your best work.
02:48:12 So it’s sort of, it’s an affect.
02:48:13 It’s sort of, are you,
02:48:14 you have some curiosity about things.
02:48:16 I think two, don’t try to do everything at once.
02:48:19 Recognize that you’re, you know, we’re limited as humans.
02:48:21 You’re limited as a human.
02:48:23 And each one of us are limited in different ways.
02:48:24 You know, we all have our different strengths and skills.
02:48:26 So it’s adapting the art of programming to your skills.
02:48:29 One of the things that always works,
02:48:31 is to limit what you’re trying to solve.
02:48:33 Right, so, if you’re part of a team,
02:48:36 usually maybe somebody else has put the architecture together
02:48:38 and they’ve gotten given a portion for you if you’re young.
02:48:41 If you’re not part of a team,
02:48:43 it’s sort of breaking down the problem into smaller parts,
02:48:46 is essential for you to make progress.
02:48:48 It’s very easy to take on a big project
02:48:50 and try to do it all at once, and you get lost.
02:48:52 And then you do it badly.
02:48:53 And so thinking about, you know,
02:48:57 very concretely what you’re doing,
02:48:59 defining the inputs and outputs,
02:49:01 defining what you want to get done.
02:49:03 Even just talking about that and like writing down
02:49:07 before you write code, just what are you trying to accomplish?
02:49:09 I mean, very specific about it, really, really helps.
02:49:12 I think using other people’s work, right?
02:49:17 Don’t be afraid that somehow you’re,
02:49:20 like you should do it all.
02:49:21 Like, nobody does.
02:49:23 Stand on the shoulders of giants.
02:49:25 And copy and paste from Stack Overflow.
02:49:26 Copy and paste from Stack Overflow.
02:49:28 But don’t just copy and paste,
02:49:30 this is particularly relevant in the era of Codex
02:49:31 and the auto generated code, which is essentially,
02:49:34 I see as an indexing of Stack Overflow.
02:49:36 Right, exactly.
02:49:37 Secondly, it’s like.
02:49:38 It’s a search engine.
02:49:39 It’s a search engine over Stack Overflow, basically.
02:49:41 So it’s not, I mean, we’ve had this for a while.
02:49:43 But really, you want to cut and paste, but not blindly.
02:49:47 Like, absolutely I’ve cut and paste to understand,
02:49:51 but then you understand.
02:49:52 Oh, this is what this means.
02:49:53 Oh, this is what it’s doing.
02:49:54 And understand as much as you can.
02:49:56 So it’s critical, that’s where the curiosity comes in.
02:49:59 If you’re just blindly cutting and pasting,
02:50:01 you’re not gonna understand.
02:50:02 So understand, and then be sensitive to hype cycles.
02:50:08 Right, every few often there’s always a,
02:50:10 oh, test driven development is the answer.
02:50:12 Oh, object oriented is the answer.
02:50:13 Oh, there’s always an answer.
02:50:16 Agile is the answer.
02:50:18 Be cautious of jumping onto a hype cycle.
02:50:20 Like, likely there’s signal.
02:50:22 Like, there’s a thing there
02:50:23 that’s actually valuable, you can learn from.
02:50:25 But it’s almost certainly not the answer
02:50:27 to everything you need.
02:50:28 What lessons do you draw
02:50:30 from you having created NumPy and SciPy?
02:50:34 Like, in service of sort of answering the question
02:50:37 of what it takes to be a great programmer
02:50:38 and giving advice to people.
02:50:40 How can you be the next person to create a SciPy?
02:50:42 Yeah, so one is listen.
02:50:45 To?
02:50:46 Listen.
02:50:47 To who?
02:50:48 To people that have a problem, right?
02:50:51 Which is everybody, right?
02:50:52 But listen, and listen to many.
02:50:54 And then try to, and then do.
02:50:57 Like, you’re gonna have to do an experiment, you know?
02:50:59 Do, fall down, don’t be afraid to fall down.
02:51:01 Don’t be afraid, the first thing you do
02:51:04 is probably gonna suck, and that’s okay, right?
02:51:07 It’s honestly, I think iteration is the key to innovation.
02:51:11 And it’s almost that psychological hesitation we have
02:51:16 to just iterate.
02:51:18 Like, yeah, we know it’s not great,
02:51:20 but next time it’ll be better.
02:51:22 I mean, just keep learning and keep improving.
02:51:25 So it’s an attitude.
02:51:27 And then it doesn’t take intense concentration, right?
02:51:32 Good things don’t happen just,
02:51:34 it’s not quite like TikTok or like Facebook, you know?
02:51:38 You can’t scroll your way to good programming, right?
02:51:40 There are sincere hours of deep,
02:51:44 don’t be afraid of the deep problem.
02:51:46 Like, often people will run away from something
02:51:47 because, oh, I can’t solve this.
02:51:49 And you might be right, but give it an hour.
02:51:51 Give it a couple of hours and see.
02:51:53 And just five minutes, not gonna give you that.
02:51:56 Was it lonely when you were building SciPy and NumPy?
02:52:00 Hugely, yeah, absolutely lonely,
02:52:02 in the sense of you had to have an inner drive,
02:52:05 and that inner drive for me always comes from,
02:52:08 I have to see that this is right in some angle.
02:52:11 I have to believe it, that this is the right approach,
02:52:13 the right thing to do.
02:52:14 With SciPy, it was like, oh yeah,
02:52:16 the world needs libraries and Python.
02:52:19 Clearly Python’s popular enough
02:52:20 with enough influential people to start,
02:52:22 and it needs more libraries.
02:52:24 So that is a good in and of itself.
02:52:26 So I’m gonna go do that good.
02:52:28 So find a good, find a thing that you know is good
02:52:30 and just work on it.
02:52:33 So that has to happen, and it is.
02:52:34 And you kind of have to have enough realization
02:52:37 of your mission to be okay with the naysayer
02:52:40 or the fact that not everybody joins you at front.
02:52:42 In fact, one thing I’ve talked to people a lot,
02:52:43 I’ve seen a lot of projects come, and some fail.
02:52:45 Not everything I’ve done has actually worked perfectly.
02:52:47 I’ve tried a bunch of stuff that, okay,
02:52:49 that didn’t really work, or this isn’t working, and why.
02:52:51 But you see the patterns, and one of the key things is
02:52:55 you can’t even know for six months.
02:52:59 I say 18 months right now.
02:53:00 If you’re starting a new project,
02:53:01 you gotta give it a good 18 month run
02:53:03 before you even know if the feedback’s there.
02:53:05 You’re not gonna know in six months.
02:53:07 You might have the perfect thing,
02:53:08 but six months from now, it’s still kind of still emerging.
02:53:11 So give it time, because you’re dealing with humans,
02:53:13 and humans have an inertial energy
02:53:15 that just doesn’t change that quickly, so.
02:53:18 Let me ask a silly question, but like you said,
02:53:23 you’re focused on the sales side of things currently,
02:53:26 but back when you were actively programming,
02:53:28 maybe in the 90s, you talked about IDEs.
02:53:31 What’s a setup that you have that brings you joy?
02:53:36 Keyboard, number of screens, Linux.
02:53:39 I do still like to program some.
02:53:40 It’s not as much as I used to.
02:53:42 I have two projects I’m super interested in,
02:53:44 trying to find funding for them,
02:53:45 trying to figure out teams for them,
02:53:47 but I could talk about those.
02:53:49 But what I, yeah, I’m an Emacs guy.
02:53:51 Great, thank the superior editor, everybody.
02:53:56 I’ve got, I don’t often delete tweets,
02:53:59 but one of the tweets I deleted
02:54:00 when I said Emacs was better than Vim,
02:54:02 and then the hate I got from it.
02:54:04 It is.
02:54:05 I was like, I’m walking away from this.
02:54:07 I do too, I don’t push it.
02:54:09 I mean, I’m not.
02:54:10 I’m just joking, of course.
02:54:11 Yeah, exactly, it’s kind of like,
02:54:12 but people do take the editor seriously, right?
02:54:14 I did it as a joke.
02:54:15 That’s your life.
02:54:16 It is, but there’s something beautiful to me about Emacs,
02:54:20 but for people that love Vim,
02:54:22 there’s something beautiful to them about that.
02:54:23 There is.
02:54:24 I mean, I do use Vim for quick editing.
02:54:26 Like Command Line, if I said quick editing,
02:54:27 I will still sometimes use it, but not much.
02:54:30 Like it’s simple, corrective signal editor character.
02:54:32 So when you were developing SciPy, you were using Emacs?
02:54:34 Emacs, yeah.
02:54:35 SciPy and NumPy are all written on Emacs on a Linux box.
02:54:39 And CVS and then SVN, version control.
02:54:43 Git came later.
02:54:44 Like Git has, I love distributed branch stuff.
02:54:48 I think Git is pretty complicated, but I love the concept.
02:54:51 And also, of course, GitHub and then GitLab
02:54:55 make Git definitely consumable, but that came later.
02:54:59 Did you ever touch Lisp at all?
02:55:00 Like what were your emotional feelings
02:55:03 about all the parentheses?
02:55:04 Yeah, so great question.
02:55:05 So I find myself appreciating Lisp today
02:55:08 much more than I did early.
02:55:09 Because when I came to programming, I knew programming,
02:55:11 but I was a domain expert, right?
02:55:13 And to me, the parentheses were in the way.
02:55:15 It’s like, wow, there’s just all this,
02:55:17 like it just gets in the way of my thinking
02:55:19 about what I’m doing.
02:55:20 So why would I have all these, right?
02:55:22 That was my initial reaction to it.
02:55:24 And now as I appreciate kind of the structure
02:55:27 that kind of naturally maps to a logical thinking
02:55:30 about a program, I can appreciate them, right?
02:55:33 And why it’s actually, you could create editors
02:55:35 that make it not so problematic, right, honestly.
02:55:40 So I actually have a much more appreciation of Lisp
02:55:43 and things like Clojure and there’s HyVee,
02:55:44 which is a Python Lisp that compiles the Python bytecode.
02:55:48 I think it’s challenging.
02:55:50 Like typically these languages are,
02:55:53 I even saw the whole data science programming system
02:55:55 in Lisp that somebody created, which is cool.
02:55:58 But again, I think it’s the lack of recognition
02:56:00 of the fact that there exists
02:56:02 what I call occasional programmers.
02:56:04 People that are never gonna be programmers for a living.
02:56:05 They don’t want to have all this cuteness in their head.
02:56:08 They want just, it’s why basic, you know,
02:56:11 Microsoft had the right idea with basic
02:56:14 in terms of having that be the language of visual basic,
02:56:17 the language of Excel and SQL Server.
02:56:21 They should have converted that to Python 10 years ago.
02:56:23 Like the world would be a better place if they had, but.
02:56:27 There’s also, there’s a beauty and a magic
02:56:29 to the history behind a language in Lisp.
02:56:31 You know, some of the most interesting people
02:56:34 in the history of computer science
02:56:35 and artificial intelligence have used Lisp.
02:56:37 So you feel.
02:56:40 Well, especially that language,
02:56:41 when you have a language, you can think in it.
02:56:43 And it helps you think better.
02:56:44 And it attracts a certain kinds of people
02:56:45 that think in a certain kind of way.
02:56:46 And then that’s there.
02:56:48 Okay, so what about like small laptop with a tiny keyboard,
02:56:52 or is there like three screens?
02:56:55 You know, good question.
02:56:55 I’ve never gotten into the big, many screens to be honest.
02:56:58 I mean, and maybe it’s because in my head,
02:57:00 I kind of just, I just swap between windows.
02:57:03 Like, partly because I guess I really can’t process
02:57:07 three screens at once anyway.
02:57:09 Like, I just am looking at one and I just flip.
02:57:12 You know, I flip an application open.
02:57:14 So where it’s really helpful is actually
02:57:17 when I’m trying to do, you know,
02:57:18 here’s data and I want to input it from here.
02:57:20 Like this is the only time I really need another screen.
02:57:22 So now, because you’re both a developer, lead developers,
02:57:25 but then there’s also these businesses
02:57:27 and there’s salespeople and you’re working
02:57:30 with large companies.
02:57:30 Operations people, hiring people, yeah.
02:57:32 The whole thing.
02:57:33 Which operating system is your favorite at this point?
02:57:37 So Linux was the early days.
02:57:38 So yeah, I love Linux as a server side.
02:57:41 And it was early days I had my own Linux desktop.
02:57:44 I’ve been on Mac laptops for 10 years now.
02:57:47 Yeah, this is what leadership looks like.
02:57:50 As you switch to Mac.
02:57:52 Okay, great.
02:57:53 Pretty much, I mean, just the fact that I had
02:57:56 to do PowerPoints, I had to do presentations
02:57:58 and you know, plug in, I just couldn’t mess
02:58:01 with plugging in laptops, it wouldn’t project and yeah.
02:58:04 So you mentioned also Quantset Labs and things like that.
02:58:09 Can you give advice on how to hire great programmers
02:58:13 and great people?
02:58:14 Yeah, I would say, produce an open source project,
02:58:19 get people contributing to it and hire those people.
02:58:21 Yeah, I mean, you’re doing it sort of,
02:58:25 you may be perhaps a little biased,
02:58:27 but that’s probably 100% really good advice.
02:58:30 I find it hard to hire.
02:58:31 I still find it hard to hire, like in terms of,
02:58:34 I don’t think that it’s not hard to hire
02:58:36 if I’ve worked with somebody for a couple of weeks,
02:58:39 but an hour or two of interviews, I have no idea.
02:58:43 So that instinct, that radar of knowing if you’re good
02:58:47 or not, that you’ve found that you’re still not able to.
02:58:50 It’s really hard, I mean, the resume can help,
02:58:53 but again, the resume is like a presentation
02:58:55 of the things they want you to see, not the reality of,
02:58:58 and there’s also, you have to understand
02:59:02 what you’re hiring for.
02:59:03 There are different stages and different kinds of skills.
02:59:06 And so it isn’t just, one of the things I talk a lot about
02:59:10 internally at my company is just that the whole idea
02:59:14 of measuring ourselves against a single axis is flawed
02:59:18 because we’re not, it’s a multidimensional space
02:59:20 and how do you order a multidimensional space?
02:59:22 There isn’t one ordering.
02:59:23 So this whole idea, you immediately get projected
02:59:26 into a thing when you’re talking about hiring
02:59:28 or best or worst or better or not better.
02:59:30 So what is the thing you’re actually needing?
02:59:33 And you can hire for that.
02:59:35 There is such a thing, generally, I really value people
02:59:39 who have the affect, that care about open source.
02:59:42 Like so in some cases, their affinity to open source
02:59:45 is simply kind of a filter of an affect.
02:59:49 However, I have found this interesting dichotomy
02:59:52 between open source contributors and product creation.
02:59:58 There’s, I don’t know if it’s fully true,
03:00:00 but there does seem to be the more experienced,
03:00:04 the more affect somebody has an open source community,
03:00:08 the less ability to actually produce product that they have.
03:00:11 And the opposite is kind of true too.
03:00:13 The more product focused are, I find a lot of people,
03:00:16 I’ve talked to a lot of people who produce
03:00:17 really great products and they have a,
03:00:19 they’re looking over the open source communities,
03:00:21 kind of wanting to participate and play,
03:00:23 but they’ve played here and they do a great job here
03:00:26 and then they don’t necessarily have some of the same.
03:00:29 Now I don’t think that’s entirely necessary.
03:00:32 I think part of it is cultural, how they’ve emerged.
03:00:34 Because one of the things that open source communities
03:00:36 often lack is great product management,
03:00:39 like some product management energy.
03:00:41 That’s brilliant, but you want both of those energies
03:00:43 in the same place together.
03:00:44 Yes, you really do.
03:00:45 And so a lot of it’s creating these teams of people
03:00:48 that have these needed skills and attributes
03:00:50 that are hard.
03:00:51 And so one of the big things I look for is somebody
03:00:55 that fundamentally recognizes their need to learn.
03:00:57 Like one of the values that we have
03:00:59 in all of the things we do is learning.
03:01:01 Like if somebody thinks they know it all,
03:01:04 they’re gonna struggle.
03:01:06 And some of that is just, there’s more basic things
03:01:09 like humility, just being humble in the face
03:01:12 of all the things you don’t know.
03:01:14 And that’s step one of learning.
03:01:15 That’s step one of learning, right?
03:01:16 And I’ve spent a lot of time learning, right?
03:01:20 Other people spend a lot more time,
03:01:21 but I’ve spent a lot of time learning.
03:01:23 My whole goal was to get a PhD because I love school
03:01:26 and I wanted to be a scientist.
03:01:28 And then what I found is what’s been written about
03:01:31 elsewhere as well is the more I learned,
03:01:32 the more I didn’t know.
03:01:33 The more I realized, man, I know about this,
03:01:37 but this is such a tiny thing in the global scope
03:01:40 of what I might wanna know about.
03:01:41 So I need to be listening a whole lot better
03:01:43 than I am just talking.
03:01:47 That’s changed a little bit actually.
03:01:48 My wife says that I used to be a better listener.
03:01:50 Now that I’m so full of all these ideas I wanna do,
03:01:52 she kind of says, you gotta give people time to talk.
03:01:55 So you’ve succeeded on multiple dimensions.
03:01:58 So one is the tenure track faculty.
03:02:01 The other is just creating all these products
03:02:03 and building up the businesses,
03:02:04 then working with businesses.
03:02:06 Do you have advice for young people today
03:02:09 in high school and college of how to live a life
03:02:13 as nonlinear and as successful as yours,
03:02:18 a life that they could be proud of?
03:02:21 Well, that’s a super compliment.
03:02:22 I’m humbled by that actually.
03:02:24 I would say a life they can be proud of.
03:02:27 Honestly, one thing that I’ve said to people is first,
03:02:31 find people you love and care about them.
03:02:34 Like family matters to me a lot.
03:02:36 And family means people you love and have committed to.
03:02:39 So it can be whatever you mean by that,
03:02:42 but you need to have a foundation.
03:02:45 So find people you love and wanna commit to and do that.
03:02:48 Cause it anchors you in a way that nothing else can.
03:02:52 And then you find other things.
03:02:55 And then kind of from out there,
03:02:56 you find other kinds of things you can commit to,
03:02:58 whether it’s ideas or people or groups of people.
03:03:03 So, especially in high school,
03:03:06 I would say don’t settle on what you think you know.
03:03:09 Like give yourself 10 years to think about the world.
03:03:13 Like I see a lot of high school students
03:03:15 who seem to know everything already.
03:03:17 I think I did too.
03:03:18 I think it’s maybe natural,
03:03:20 but recognize that the things you care about,
03:03:23 you might change your perspective over time.
03:03:26 I certainly have over time.
03:03:28 I was really passionate about one specific thing
03:03:30 and I was kind of softened.
03:03:32 I was a big, I didn’t like the Federal Reserve, right?
03:03:35 And there’s still, we could have a longer conversation
03:03:38 about monetary policy and finances,
03:03:40 but I’m a little more nuanced in my perspective
03:03:46 at this point.
03:03:48 But that’s one area where you learn about something,
03:03:50 go, ah, I wanna attack it.
03:03:52 Build, don’t destroy.
03:03:55 Build, like so often the tendency is to not like something
03:03:58 and wanna go attack it.
03:04:00 Build something, build something to replace it.
03:04:02 Yeah.
03:04:03 Build up, attract people to your new thing.
03:04:05 You’ll be far better, right?
03:04:08 You don’t need to destroy something to build something else.
03:04:12 So that’s, I guess, generally.
03:04:14 And then definitely like curiosity,
03:04:19 follow your curiosity and let it,
03:04:22 don’t just follow the money.
03:04:24 And all of that, like you said,
03:04:25 is grounded in family, friendship, and ultimately love.
03:04:30 Yes.
03:04:31 Which is a great way to end it.
03:04:34 Travis, you’re one of the most impactful people
03:04:37 in the engineering and the computer science
03:04:38 in the human world.
03:04:39 So I truly appreciate everything you’ve done.
03:04:43 And I really appreciate that you would spend
03:04:45 your valuable time with me.
03:04:46 It was an honor.
03:04:47 It was a real pleasure for me.
03:04:48 I appreciate that.
03:04:50 Thanks for listening to this conversation
03:04:52 with Travis Oliphant.
03:04:54 To support this podcast,
03:04:55 please check out our sponsors in the description.
03:04:57 And now, let me leave you with something
03:05:00 that in the programming world is called Hodgson’s Law.
03:05:04 Every sufficiently advanced Lisp application
03:05:08 will eventually be re implemented in Python.
03:05:12 Thank you for listening and hope to see you next time.