Travis Oliphant: NumPy, SciPy, Anaconda, Python & Scientific Programming #224

Transcript

00:00:00 The following is a conversation with Travis Oliphant,

00:00:03 one of the most impactful programmers

00:00:05 and data scientists ever.

00:00:07 He created NumPy, SciPy, and Anaconda.

00:00:12 NumPy formed the foundation

00:00:14 of tensor based machine learning in Python,

00:00:17 SciPy formed the foundation

00:00:18 of scientific programming in Python,

00:00:20 and Anaconda, specifically with Conda,

00:00:23 made Python more accessible to a much larger audience.

00:00:27 Travis’s life work across a large number of programming

00:00:31 and entrepreneurial efforts has and will continue

00:00:34 to have immeasurable impact on millions of lives

00:00:38 by empowering scientists and engineers

00:00:41 in big companies, small companies,

00:00:43 and open source communities to take on difficult problems

00:00:47 and solve them with the power of programming.

00:00:50 Plus, he’s a truly kind human being,

00:00:53 which is something that when combined with vision

00:00:56 and ambition makes for a great leader

00:00:58 and a great person to chat with.

00:01:01 To support this podcast,

00:01:02 please check out our sponsors in the description.

00:01:04 This is the Lex Friedman Podcast,

00:01:06 and here is my conversation with Travis Oliphant.

00:01:11 What was the first computer program you’ve ever written?

00:01:14 Do you remember?

00:01:15 Whoa, that’s a good question.

00:01:16 I think it was in fourth grade.

00:01:18 Just a simple loop in BASIC.

00:01:20 BASIC. BASIC, yeah, on an Atari 800,

00:01:23 Atari 400, I think, or maybe it was an Atari 800.

00:01:26 It was a part of a class,

00:01:28 and we just were just BASIC loops to print things out.

00:01:32 Did you use go to statements?

00:01:34 Yes, yes, we used go to statements.

00:01:38 I remember in the early days,

00:01:39 that’s when I first realized

00:01:41 there’s like principles to programming,

00:01:43 when I was told that don’t use go to statements.

00:01:45 Those are bad software engineering principles,

00:01:48 like it goes against what great, beautiful code is.

00:01:52 I was like, oh, okay, there’s rules to this game.

00:01:54 I didn’t see that until high school

00:01:56 when I took an AP computer science course.

00:01:58 I did a lot of other kinds of just programming in TI,

00:02:02 but finally, when I took an AP computer science course

00:02:04 in Pascal.

00:02:05 Wow.

00:02:06 That’s, yeah, it was Pascal.

00:02:07 That’s when I, oh, there are these principles.

00:02:09 Not C or C++?

00:02:11 No, I didn’t take C until the next year in college.

00:02:14 I had a course in C, but I haven’t done much in Pascal,

00:02:18 just that AP computer science course.

00:02:20 Now, sorry for the romanticized question,

00:02:23 but when did you first fall in love with programming?

00:02:26 Oh, man, good question.

00:02:27 I think actually when I was 10,

00:02:30 my dad got us a TI Timex Sinclair,

00:02:33 and he was excited about the spreadsheet capability,

00:02:37 and then, but I made him get the basic,

00:02:39 the add ons we could actually program in basic,

00:02:41 and just being able to write instructions

00:02:44 and have the computer do something.

00:02:45 Then we got a TI 994A when I was about 12,

00:02:50 and I would just, it had sprites and graphics and music.

00:02:52 You could actually program it to do music.

00:02:55 That’s when I really sort of fell in love with programming.

00:02:58 So this is a full, like a real computer

00:03:01 with like, with memory and storage,

00:03:04 processors and whatnot,

00:03:05 because you say TI. Yeah, the Timex Sinclair

00:03:07 was one of the very first, it was a cheap, cheap,

00:03:09 like, I think it was, well, it was still expensive,

00:03:12 but it was 2K of memory.

00:03:14 We got the 16K add on pack,

00:03:16 but yeah, it had memory, and you could program it.

00:03:19 You had the, in order to store your programs,

00:03:20 you had to attach a tape drive.

00:03:22 Remember that old, the sound that would play

00:03:24 when you converted the modems would convert digital bits

00:03:29 to audio files set on a tape drive.

00:03:31 Still remember that sound, but that was the storage.

00:03:34 And what was the programming language, do you remember?

00:03:36 It was basic. It was basic.

00:03:37 And then they had a VisiCalc,

00:03:38 and so a little bit of spreadsheet programming

00:03:40 in VisiCalc, but mostly just some basic.

00:03:42 Do you remember what kind of things drew you to programming?

00:03:46 Was it working with data, was it video games?

00:03:50 Games, math, mathy stuff?

00:03:52 Yeah, I’ve always loved math,

00:03:54 and a lot of people think they don’t like math

00:03:58 because I think when they’re exposed to it early,

00:04:00 it’s about memory.

00:04:02 When you’re exposed to math early,

00:04:03 you have a good short term memory,

00:04:04 can remember his timetables.

00:04:05 And I do have a reasonably, I mean, not perfect,

00:04:08 but a reasonably long little short term memory buffer.

00:04:12 And so I did great at timetables.

00:04:14 I said, oh, I’m good at math.

00:04:15 But I started to really like math,

00:04:17 just the problem solving aspect.

00:04:20 And so computing was problem solving applied.

00:04:25 And so that’s always kind of been the draw,

00:04:28 kind of coupled with the mathematics.

00:04:30 Did you ever see the computer as like an extension

00:04:33 of your mind, like something able to achieve?

00:04:36 Not till later.

00:04:37 Okay.

00:04:38 Yeah, not then.

00:04:39 It’s just like a little set of puzzles

00:04:40 that you can play with and you can play with math puzzles.

00:04:43 Yeah, it was too rudimentary early on.

00:04:46 Like it was sort of, yeah, it was a lot of work

00:04:49 to actually take a thought you’d have

00:04:51 and actually get it implemented.

00:04:53 And that’s still work, but it’s getting easier.

00:04:56 And so yeah, I would say that’s definitely

00:04:58 what’s attracting me to Python

00:04:59 is that that was more real, right?

00:05:02 I could think in Python.

00:05:04 Speaking of foreign language,

00:05:05 I only speak another language fluently besides English,

00:05:08 which is Spanish.

00:05:09 And I remember the day when I would dream in Spanish

00:05:11 and you start to think in that language.

00:05:13 And then you actually, I do definitely believe

00:05:15 that language limits or expands your thinking.

00:05:19 There’s some languages that actually lead you

00:05:21 to certain thought processes.

00:05:23 Yeah, like, so I speak Russian fluently

00:05:27 and that’s certainly a language that leads you

00:05:30 down certain thought processes.

00:05:33 Well, yeah, I mean, there’s a history

00:05:36 of the two world wars of millions of people starving

00:05:41 to death or near to death throughout its history

00:05:44 of suffering, of injustice, like this promise sold

00:05:48 to the people and then the carpet

00:05:50 or whatever is swept from under them.

00:05:53 And it’s like broken promises.

00:05:54 And all of that pain and melancholy is in the language,

00:05:58 the sad songs, the sad hopeful songs,

00:06:01 the over romanticized, like, I love you, I hate you,

00:06:05 the sort of the swings between all the various spectrums

00:06:09 of emotion, so that’s all within the language.

00:06:13 The way it’s twisted, there’s a strong culture

00:06:18 of rhyming poetry, so like the bards,

00:06:20 like the sync, there’s a musicality to the language too.

00:06:24 Did Dostoevsky write in Russian?

00:06:27 Yeah, so like Dostoevsky, Tostoy, all the,

00:06:32 all the.

00:06:32 The ones that I know about, which are translated

00:06:34 and I’m curious how the translations.

00:06:36 So Dostoevsky did not use the musicality

00:06:40 of the language too much.

00:06:42 So it actually translates pretty well

00:06:44 because it’s so philosophically dense

00:06:46 that the story does a lot of the work,

00:06:48 but there’s a bunch of things that are untranslatable.

00:06:51 Certainly the poetry is not translatable.

00:06:53 I actually have a few conversations coming up offline

00:06:57 and also in this podcast with people

00:06:59 who’ve translated Dostoevsky.

00:07:01 And that’s for people who worked, who work in this field,

00:07:06 know how difficult that is.

00:07:07 Sometimes you can spend months thinking

00:07:10 about a single sentence, right?

00:07:12 In context, like, cause there’s just the magic

00:07:15 captured by that sentence and how do you translate

00:07:17 just in the right way?

00:07:18 Because those words can be really powerful.

00:07:22 There’s a famous line,

00:07:24 beauty will save the world from Dostoevsky.

00:07:27 You know, there’s so many ways to translate that.

00:07:29 And you’re right, the language gives you the tools

00:07:32 with which to tell the story,

00:07:34 but it also leads your mind down certain trajectories

00:07:37 and paths to where over time,

00:07:39 as you think in that language,

00:07:41 you become a different human being.

00:07:42 Yes. Yeah.

00:07:43 Yeah, that’s a fascinating reality, I think.

00:07:45 I know people have explored that,

00:07:47 but it’s just rediscovered.

00:07:49 Well, we don’t, we live in our own like little pockets.

00:07:52 Like this is the sad thing is I feel like unfortunately,

00:07:56 given time and given getting older,

00:07:59 I’ll never know China, the Chinese world,

00:08:03 because I don’t truly know the language.

00:08:05 Same with Japanese, I don’t truly know Japanese

00:08:08 and Portuguese and Brazil,

00:08:10 that whole South American continent.

00:08:12 Like, yeah, I’ll go to Brazil and Argentina,

00:08:14 but will I truly understand the people

00:08:17 if I don’t understand the language?

00:08:18 It’s sad because I wonder how much,

00:08:23 how many geniuses were missing

00:08:25 because so much of the scientific world,

00:08:28 so much of the technical world is in English,

00:08:31 and so much of it might be lost

00:08:33 because it’s just we don’t have the common language.

00:08:36 I completely agree.

00:08:36 I’m very much in that vein of there’s a lot of genius

00:08:40 out there that we miss,

00:08:41 and it’s sort of fortunate when it bubbles up

00:08:45 into something that we can understand or process,

00:08:48 there’s a lot we miss.

00:08:50 So I tend to lean towards really loving democratization

00:08:54 or things that empower people

00:08:55 or very resistant sort of authoritarian structures.

00:09:00 Fundamentally for that reason,

00:09:01 well, several reasons, but it just hurts us.

00:09:04 We’re soft.

00:09:06 So speaking of languages that empower you,

00:09:09 so Python was the first language for me

00:09:11 that I really enjoyed thinking in, as you said.

00:09:16 Sounds like you shared my experience too.

00:09:18 So when did you first,

00:09:19 do you remember when you first kind of connected with Python,

00:09:21 maybe even fell in love with Python?

00:09:23 It’s a good question.

00:09:24 It was a process.

00:09:25 It took about a year.

00:09:26 I first encountered Python in 1997.

00:09:29 I was a graduate student studying biomedical engineering

00:09:31 at the Mayo Clinic.

00:09:32 And I had previously,

00:09:34 I’d been involved in taking information from satellites.

00:09:39 I was an electrical engineering student

00:09:41 used to taking information

00:09:42 and trying to get something out of it,

00:09:44 doing some data processing, getting information out of it.

00:09:46 And I’d done that in MATLAB.

00:09:47 I’d done that in Perl.

00:09:49 I’d done that in scripting on a VMS.

00:09:52 There’s actually a VAX VMS system,

00:09:54 they had their own little scripting tools around Fortran.

00:09:57 Done a lot of that.

00:09:58 And then as a graduate student,

00:10:00 I was looking for something and encountered Python.

00:10:04 And because Python had an array,

00:10:06 had two things that made me not filter it away.

00:10:09 Because I was filtering a bunch of stuff,

00:10:10 as Yorick, I looked at Yorick,

00:10:11 I looked at a few other languages that are out there

00:10:14 at the time in 1997, but it had arrays.

00:10:17 There’s a library called Numeric

00:10:19 that had just been written in 95,

00:10:20 like not very, not too much earlier.

00:10:23 By an MIT alum, Jim Huganen.

00:10:26 You know, and I went back and read the mailing list

00:10:29 to see the history of how it grew.

00:10:30 And there was a very interesting,

00:10:31 it’s fascinating to do that actually,

00:10:32 to see how this emergent cooperation,

00:10:36 unstructured cooperation happens in the open source world

00:10:39 that led to a lot of this collective programming,

00:10:43 which is something maybe we might get into a little later,

00:10:45 but what that looks like.

00:10:46 What gap did Numeric fill?

00:10:48 Numeric filled the gap of having an array object.

00:10:50 There was no array object.

00:10:51 There was no array.

00:10:52 There was a one dimensional byte concept,

00:10:55 but there was no n dimensional,

00:10:57 two, three, four dimensional tensor they call it now.

00:11:00 I’m still in the category that a tensor is another thing

00:11:03 and it’s just an ndarray we should call it,

00:11:05 but kind of lost that battle.

00:11:08 There’s many battles in this world,

00:11:10 some of which we win, some we lose.

00:11:12 That’s exactly right.

00:11:13 So, but it had no math to it.

00:11:17 So Numeric had math and a basic way to think in arrays.

00:11:20 So I was looking for that,

00:11:21 and it had complex numbers,

00:11:24 a lot of programming languages.

00:11:26 And you can see it because,

00:11:28 if you’re just a computer scientist,

00:11:29 you think, ah, complex numbers are just two floats.

00:11:32 So you can, people can build that on.

00:11:34 But in practice, a complex number

00:11:36 as one of the significant algebras

00:11:38 that helps connect a lot of physical

00:11:40 and mathematical ideas,

00:11:42 particularly FFT for an electrical engineer.

00:11:45 And it’s a really important concept

00:11:48 and not having it means you have to develop it

00:11:50 several times and those times may not share an approach.

00:11:54 One of the common things in programming,

00:11:55 one of the things programming enables is abstractions.

00:11:59 But when you have shared abstractions, it’s even better.

00:12:01 It sort of gets to the level of language

00:12:02 of actually we all think of this the same way,

00:12:05 which is both powerful and dangerous, right?

00:12:07 Because powerful in that we now can quickly

00:12:11 make bigger and higher level things

00:12:13 on top of those abstractions dangerous

00:12:14 because it also limits us as to the things

00:12:17 we maybe left behind in producing that abstraction,

00:12:20 which is at the heart of programming today

00:12:21 and actually building around the programming world.

00:12:24 I think it’s a fascinating philosophical topic.

00:12:26 Yeah, they will continue for many years, I think.

00:12:28 They’ll continue for many years.

00:12:29 As we build more and more and more abstractions.

00:12:31 Yes, I often think about, you know,

00:12:32 we have a world that’s built on these abstractions

00:12:35 that were they the only ones possible?

00:12:37 Certainly not, but they led to,

00:12:39 you know, it’s very hard to do it differently.

00:12:42 Like there’s an inertia that’s very hard to,

00:12:44 you know, push out, push away from.

00:12:47 That has implications for things like,

00:12:49 you know, the Julia language,

00:12:50 which you have heard of, I’m sure.

00:12:52 And I’ve met the creators and I liked Julia.

00:12:55 It’s a really cool language,

00:12:56 but they struggled to kind of against the,

00:12:59 just the tide of like this inertia of people using Python.

00:13:03 And, you know, there’s strategies to approach that,

00:13:05 but nonetheless, it’s a phenomena.

00:13:07 And sometimes, so I love complex numbers

00:13:09 and I love to raise, so I looked at Python.

00:13:12 And then I had the experience, I did some stuff in Python

00:13:15 and I was just doing my PhD.

00:13:16 So I was out, my focus was on,

00:13:19 I was actually doing a combination of MRI and ultrasound

00:13:22 and looking at a phenomenon called elastography,

00:13:24 which is you push waves into the body

00:13:27 and observe those waves, like you can actually measure them.

00:13:30 And then you do mathematical inversion

00:13:32 to see what the elasticity is.

00:13:35 And so that’s the problem I was solving

00:13:36 is how to do that with both ultrasound and MRI.

00:13:39 I needed some tool to do that with.

00:13:41 So I was starting to use Python in 97.

00:13:44 In 98, I went back, looked at what I’d written

00:13:47 and realized I could still understand it,

00:13:49 which is not the experience I’d had

00:13:50 when doing Perl in 95, right?

00:13:53 I’d done the same thing and then I looked back

00:13:55 and I forgotten what I was even saying.

00:13:58 Now, you know, I’m not saying, so that may,

00:14:00 hey, this may work, I like this.

00:14:02 This is something I can retain

00:14:04 without becoming an expert per se.

00:14:07 And so that led me to go, I’m gonna push more into this.

00:14:10 And then that 98 was kind of when I started

00:14:14 to fall in love with Python, I would say.

00:14:18 A few peculiar things about Python.

00:14:20 So maybe compare it to Perl,

00:14:22 compare it to some of the other languages.

00:14:24 So there’s no braces.

00:14:26 Yeah.

00:14:27 So space is used, indentation, I should say,

00:14:31 is used as part of the language.

00:14:33 Yeah, right.

00:14:35 So did you, I mean, that’s quite a leap.

00:14:39 Were you comfortable with that leap

00:14:41 or were you just very open minded?

00:14:42 It’s a good question.

00:14:43 I was open minded, so I was cognizant of the concern.

00:14:48 And it definitely has, it has specific challenges.

00:14:52 You know, cut and pasting.

00:14:53 For example, when you’re cut and pasting code,

00:14:55 and if your editors aren’t supportive of that,

00:14:57 if you’re putting it into a terminal,

00:14:58 and particularly in the past when terminals

00:15:01 didn’t necessarily have the intelligence to manage it now.

00:15:03 Now, I, Python, and Jupyter Notebooks

00:15:05 handle that just fine, so there’s really no problem.

00:15:06 But in the past, it created some challenges,

00:15:08 formatting challenges, also mixed tabs and spaces.

00:15:12 If editors weren’t, you weren’t clear

00:15:14 on what was happening, you would have these issues.

00:15:16 So there were really concrete reasons about it

00:15:19 that I heard and understood.

00:15:20 I never really encountered a problem with it personally.

00:15:23 Like, it was occasional annoyances,

00:15:26 but I really liked the fact

00:15:28 that it didn’t have all this extra characters, right?

00:15:31 That these extra characters didn’t show up

00:15:33 in my visual field when I was just trying

00:15:35 to process understanding a snippet of code.

00:15:38 Yeah, there’s a cleanness to it.

00:15:39 But, I mean, the idea is supposed to be

00:15:41 that Perl also has a cleanness to it

00:15:43 because of the minimalism of how many characters

00:15:46 it takes to express a certain thing.

00:15:48 So it’s very compact.

00:15:49 But what you realize with that compactness comes,

00:15:53 there’s a culture that prizes compactness,

00:15:57 and so the code gets more and more compact

00:15:58 and less and less readable to a point where it’s like,

00:16:03 like, to be a good programmer in Perl,

00:16:05 you write code that’s basically unreadable.

00:16:07 There’s a culture, like.

00:16:09 Correct, and you’re proud of it.

00:16:10 Yeah, you’re proud of it.

00:16:12 Right, exactly, and it’s like, feels good.

00:16:14 And it’s really selective.

00:16:16 It means you have to be an expert in Perl to understand it.

00:16:20 Whereas Python allowed you not to have to be an expert.

00:16:22 You didn’t have to take all this brain energy.

00:16:24 You could leverage, what I say,

00:16:25 you could leverage your English language center,

00:16:28 which you’re using all the time.

00:16:29 I’ve wondered about other languages,

00:16:31 particularly non Latin based languages.

00:16:34 Latin based languages with the characters are at least similar.

00:16:37 I think people have an easier time,

00:16:38 but I don’t know what it’s like to be a Japanese

00:16:41 or a Chinese person trying to learn different syntax.

00:16:46 Like, what would computer programming look like in that?

00:16:49 I haven’t looked at that at all,

00:16:50 but it certainly doesn’t,

00:16:52 you know, leveraging your Chinese language center,

00:16:54 I’m not sure Python or any programming does that.

00:16:57 But that was a big deal.

00:16:58 The fact that it was accessible, I could be a scientist.

00:17:00 What I really liked is many programming languages

00:17:02 really demand a lot of you, and you can get a lot,

00:17:04 you know, you do a lot if you learn it.

00:17:07 But Python enables you to do a lot

00:17:08 without demanding a lot of you.

00:17:11 There’s nuance to that statement,

00:17:13 but it certainly was, it’s more accessible.

00:17:15 So more people could actually, as a scientist,

00:17:18 as somebody who, or an engineer,

00:17:19 who was trying to solve another problem

00:17:21 besides point programming,

00:17:23 I could still use this language and get things done

00:17:26 and be happy about it.

00:17:27 And I was also comfortable in C at that time.

00:17:30 And MATLAB, you did a little bit of that.

00:17:31 And MATLAB, I did a lot before that, exactly.

00:17:33 So I was comfortable in,

00:17:34 those three languages were really the tools I used

00:17:37 during my studies and schooling.

00:17:40 But to your point about language helping you think,

00:17:42 one of the big things about MATLAB was it was,

00:17:44 and APL before it, I don’t know if you remember APL.

00:17:48 APL is actually the predecessor of array based programming,

00:17:51 which I think is really an underappreciated,

00:17:54 if I talk to people who are just steeped

00:17:55 in computer programming, computer science,

00:17:57 like most of the people that Microsoft has hired

00:17:59 in the past, for example,

00:18:01 Microsoft as a company generally did not understand

00:18:03 array based programming.

00:18:05 Like culturally, they didn’t understand it.

00:18:06 So they kept missing the boat,

00:18:08 kept missing the understanding of what this was.

00:18:11 They’ve gotten better,

00:18:12 but there’s still a whole culture of folks

00:18:14 that doesn’t, programming, that’s systems programming

00:18:17 or web programming or lists and maps.

00:18:20 And what about an n dimensional array?

00:18:22 Oh yeah, that’s just an implementation detail.

00:18:24 Well, you can think that,

00:18:26 but then actually if you have that as a construct,

00:18:28 you actually think differently.

00:18:29 APL was the first language to understand that.

00:18:31 And it was in the sixties, right?

00:18:33 The challenge of APL is APL had very dense,

00:18:36 not only glyphs, like new characters, new glyphs,

00:18:39 but they even had a new keyboard

00:18:40 because to produce those glyphs,

00:18:42 this is back in the early days in computing

00:18:43 when the QWERTY keyboard maybe wasn’t as established,

00:18:47 like, well, we can have a new keyboard, no big deal.

00:18:50 But it was a big deal and it didn’t catch on.

00:18:52 And the language APL, very much like Perl,

00:18:56 as people would pride themselves on how much,

00:18:58 could they write the game of life

00:18:59 in 30 characters of APL.

00:19:03 APL has characters that mean summation

00:19:06 and they have adverbs,

00:19:08 they would have adjectives and these things called adverbs,

00:19:10 which are like methods, like reduction,

00:19:12 reduction would be an adverb on an ad operator, right?

00:19:15 So, but doing, using these tools you could construct

00:19:18 and then you start to think at that level,

00:19:20 you think in n dimensions is something I like to say,

00:19:22 and you start to think differently about data at that point.

00:19:25 Now you’re, it really helps.

00:19:27 Yeah, I mean, outside of programming,

00:19:30 if you really internalize linear algebra as a course,

00:19:33 I mean, it’s philosophically allows you

00:19:35 to think of the world differently.

00:19:37 It’s almost like liberating, you don’t have to,

00:19:39 you don’t have to think about the individual numbers

00:19:42 in the n dimensional array.

00:19:44 You could think of it as an object in itself

00:19:46 and all of a sudden this world can open up.

00:19:48 You’re saying MATLAB and APL were like the early C,

00:19:52 I don’t know if many languages got that right ever.

00:19:54 No, no, no they didn’t.

00:19:56 Even still.

00:19:57 Even still, I would say.

00:19:58 I mean, NumPy is an inheritor of the traditions

00:20:02 that I would say APLJ was another version that was,

00:20:06 what it did is not have the glyphs,

00:20:08 just have short characters,

00:20:09 but still a Latin keyboard could type them.

00:20:11 And then numeric inherited from that

00:20:14 in terms of let’s add arrays plus broadcasting

00:20:17 plus methods, reduction,

00:20:19 even some of the language like rank is a concept

00:20:21 that was in Python and is still in Python

00:20:24 for the number of dimensions, right?

00:20:27 That’s different than say the rank of a matrix

00:20:29 which people think of as well.

00:20:31 So it came from that tradition,

00:20:33 but NumPy is a very pragmatic, practical tool.

00:20:37 NumPy inherited from numeric

00:20:39 and we can get to where NumPy came from

00:20:40 which is the current array,

00:20:43 at least current as of 2015, 2017.

00:20:46 Now there’s a ton of them over the past two or three years.

00:20:49 We can get into that too.

00:20:50 So if we just linger on the early days

00:20:52 of what was your favorite feature of Python?

00:20:56 Do you remember like what?

00:20:58 So it’s so interesting to linger on like the,

00:21:02 what really makes you connect with a language?

00:21:06 I’m not sure it’s obvious to introspect that.

00:21:09 No, it isn’t.

00:21:10 And I’ve thought about that at some length.

00:21:12 I think definitely the fact that I could read it later,

00:21:16 that I could use it productively

00:21:18 without becoming an expert.

00:21:19 Other language I had to put more effort into.

00:21:22 That’s like an empirical observation.

00:21:23 Like you’re not analyzing any one aspect of the language.

00:21:26 It just seems time after time when you look back,

00:21:29 it’s somehow readable.

00:21:30 It’s somehow readable.

00:21:31 Then it was sort of, I could take executable English

00:21:35 and translate it to Python more easily.

00:21:36 Like I didn’t have to go, there was no translation layer.

00:21:39 As an engineer or as a scientist,

00:21:41 I could think about what I wanted to do.

00:21:43 And then the syntax wasn’t that far behind it, right?

00:21:46 Now there are some warts there still.

00:21:49 It wasn’t perfect.

00:21:50 Like there’s some areas where I’m like,

00:21:51 ah, it’d be better if this were different

00:21:52 or if this were different.

00:21:54 Some of those things got added to the language too.

00:21:56 I was really grateful for some of the early pioneers

00:21:58 in the Python ecosystem back,

00:22:00 because Python got written in 91.

00:22:01 That’s when the first version came out.

00:22:03 But Guido was very open to users.

00:22:06 And one of the sets of users were people like Jim Huganen

00:22:08 and David Asher and Paul Dubois and Conrad Hinson.

00:22:13 These were people that were on the main list.

00:22:15 And they were just asking for things like,

00:22:16 hey, we really should have complex numbers in this language.

00:22:19 So let’s, you know, there’s a J, there’s a one J, right?

00:22:22 And the fact that they went the engineering route of J

00:22:24 is interesting.

00:22:26 I don’t think that’s entirely favoring engineers.

00:22:28 I think it’s because I is so often used

00:22:30 as the index of a for loop.

00:22:32 So I think that’s actually why.

00:22:34 Probably, I mean, there’s a pragmatic aspect.

00:22:36 But the fact that complex numbers were there, I love that.

00:22:39 The fact that I could write in the array constructs

00:22:41 and that reduction was there,

00:22:42 very simple to write summations and broadcasting was there.

00:22:46 I could do addition of whole arrays.

00:22:49 So that was cool.

00:22:50 Those are some things I loved about it.

00:22:52 I don’t know what to start talking to you about

00:22:54 because you’ve created so many incredible projects

00:22:57 that basically changed the whole landscape of programming.

00:23:00 But okay, let’s start with,

00:23:02 let’s go chronologically with SciPy.

00:23:06 You created SciPy over two decades ago now?

00:23:09 Yes, yes, I love to talk about SciPy.

00:23:11 SciPy was really my baby.

00:23:12 What is it?

00:23:14 What was its goal?

00:23:15 What is its goal?

00:23:16 How does it work?

00:23:17 Yeah, fantastic.

00:23:18 So SciPy was effectively, here I am using Python

00:23:21 to do stuff that I previously used MATLAB to use.

00:23:24 And I was using numeric, which is an array library

00:23:26 that made a lot of it possible.

00:23:28 But there’s things that were missing.

00:23:29 Like I didn’t have an ordinary differential equation solver

00:23:32 I could just call, right?

00:23:33 I didn’t have integration.

00:23:35 Hey, I wanted to integrate this function.

00:23:37 Okay, well, I don’t have just a function

00:23:38 I can call to do that.

00:23:40 These are things I remember being critical things

00:23:42 that I was missing.

00:23:43 Optimization.

00:23:44 I just wanna pass a function to an optimizer

00:23:46 and have it tell me what the optimal value is.

00:23:50 Those are things I’m like, well,

00:23:51 why don’t we just write a library that adds these tools?

00:23:54 And I started to post on the mailing list

00:23:55 and there’d previously been, people have discussed,

00:23:58 I remember Conrad Henson saying,

00:23:59 wouldn’t it be great if we had this optimizer library

00:24:00 or David Ashwood say this stuff.

00:24:02 And I’m a ambitious, ambitious is the wrong word,

00:24:06 an eager and probably more time than sense.

00:24:11 I was a poor graduate student.

00:24:13 My wife thinks I’m working on my PhD and I am,

00:24:15 but part of the PhD that I loved

00:24:17 was the fact that it’s exploratory.

00:24:19 You’re not just taking orders,

00:24:21 fulfilling a list of things to do,

00:24:23 you’re trying to figure out what to do.

00:24:25 And so I thought, well, I’m running tools

00:24:27 for my own use and a PhD,

00:24:29 so I’ll just start this project.

00:24:32 And so in 99, 98 was when I first started

00:24:34 to write libraries for Python.

00:24:36 Definitely when I fell in love with Python 98,

00:24:38 I thought, oh, well, there’s just a few things missing.

00:24:39 Like, oh, I need a reader to read DICOM files.

00:24:42 I was in medical imaging and DICOM was a format

00:24:44 that I want to be able to load that into Python.

00:24:46 Okay, how do I write a reader for that?

00:24:48 So I wrote something called, it was an IO package, right?

00:24:51 And that was my very first extension module, which is C.

00:24:55 So I wrote C code to extend Python

00:24:57 so that in Python I could write things more easily.

00:24:59 That combination kind of hooked me.

00:25:02 It was the idea that I could,

00:25:03 here’s this powerful tool I can use as a scripting language

00:25:05 and a high level language to think about,

00:25:07 but that I can extend easily, easily in C,

00:25:11 easily for me because I knew enough C.

00:25:13 And then Guido had written a link.

00:25:15 I mean, the only, the hard part of extending Python

00:25:17 was something called the way memory management networks,

00:25:19 and you have to do reference counting.

00:25:21 And so there’s a tracking of reference counting

00:25:23 you have to do manually.

00:25:25 And if you don’t, you have memory leaks.

00:25:27 And so that’s hard.

00:25:29 Plus then C, you know, it’s just much more,

00:25:31 you have to put more effort into it.

00:25:32 It’s not just, I have to now think about pointers

00:25:34 and I have to think about stuff that is different.

00:25:37 I have to kind of,

00:25:38 you’re like putting a new cartridge in your brain.

00:25:40 Like, okay, I’m thinking about MRI.

00:25:42 Now I’m thinking about programming.

00:25:43 And there are distinct modules

00:25:45 you end up having to think about.

00:25:46 So it’s harder.

00:25:47 And when I was just in Python,

00:25:48 I could just think about MRI and high level writing,

00:25:51 but I could do that.

00:25:52 And that kind of, I liked it.

00:25:54 I found that to be enjoyable and fun.

00:25:55 And so I ended up, oh,

00:25:57 well, let me just add a bunch of stuff to Python

00:25:59 to do integration.

00:26:00 Well, and the cool thing is,

00:26:01 is that the power of the internet,

00:26:03 just looking around and I found,

00:26:04 oh, there’s this NetLive,

00:26:06 which has hundreds of 4chan routines

00:26:08 that people have written in the 60s and the 70s and the 80s

00:26:12 in 4chan 77, fortunately, it wasn’t 4chan 16.

00:26:14 So it had been ported to 4chan 77.

00:26:18 And 4chan 77 is actually a really great language.

00:26:21 4chan 90 probably is my favorite 4chan

00:26:24 because it’s also, it’s got complex numbers,

00:26:26 got arrays and it’s pretty high level.

00:26:27 Now, the problem with it

00:26:28 is you’d never want to write a program in 4chan 90

00:26:31 or 4chan 77,

00:26:32 but it’s totally fine to write a subroutine in, right?

00:26:34 And so, and then 4chan kind of got a little off course

00:26:37 when they tried to compete with C++.

00:26:39 But at the time,

00:26:40 I just want libraries to do something like,

00:26:42 oh, here’s an ordinary differential equation.

00:26:43 Here’s integration.

00:26:44 Here’s runge cut integration.

00:26:46 Already done.

00:26:47 I don’t have to think about that algorithm.

00:26:48 I mean, you could,

00:26:49 but it’s nice to have somebody who’s already done one

00:26:51 and tested it.

00:26:51 And so I sort of started this journey in 98, really.

00:26:55 If you look back at the mailing list,

00:26:55 there’s sort of this productive era of me

00:26:59 writing an extension module

00:27:01 to connect runge cut integration to Python

00:27:04 and making an ordinary differential equation solver.

00:27:06 And then releasing that as a package.

00:27:09 So we could call ODE pack, I think I called it then.

00:27:11 Quad pack.

00:27:12 And then I just made these packages.

00:27:14 Eventually that became multipack

00:27:16 because they’re originally modular.

00:27:17 You can install them separately.

00:27:19 But a massive problem in Python

00:27:20 was actually just getting your stuff installed.

00:27:23 At the time, releasing software for me,

00:27:25 like today it’s people think, what does that mean?

00:27:27 Well, then it meant some poorly written webpage.

00:27:30 I had some bad webpage up and I put a tarball,

00:27:33 just a GZIP tarball of source code.

00:27:35 That was the release.

00:27:37 But okay, can we just stand that?

00:27:39 Because the community aspect

00:27:43 of creating the package and sharing that, that’s rare.

00:27:47 That, to have, to both have the, at that time,

00:27:50 so like the raw.

00:27:51 Yeah, it was pretty early, yeah.

00:27:52 Oh, well, not rare.

00:27:54 Maybe you can correct me on this,

00:27:57 but it seems like in the scientific community,

00:27:59 so many people, you were basically solving the problems

00:28:02 you needed to solve to process the particular application,

00:28:07 the data that you need.

00:28:08 And to also have the mind

00:28:10 that I’m going to make this usable for others, that’s.

00:28:15 I would say I was inspired.

00:28:16 I’d been inspired by Linux,

00:28:18 been inspired by Linus and him making his code available.

00:28:21 And I was starting to use Linux at the time.

00:28:23 And I went, this is cool.

00:28:24 So I’d kind of been previously primed that way.

00:28:27 And generally I was into science

00:28:29 because I liked the sharing notion.

00:28:30 I liked the idea of, hey, let’s,

00:28:32 if collectively we build knowledge and share it,

00:28:34 we can all be better off.

00:28:35 Okay, so you want to energize by that idea.

00:28:37 So I was energized by that idea already, right?

00:28:39 And I can’t deny that I was.

00:28:40 I’m sort of had this very,

00:28:42 I liked that part of science, that part of sharing.

00:28:45 And then all of a sudden, oh, wait, here’s something.

00:28:47 And here’s something I could do.

00:28:49 And then I slowly over years learned how to share better

00:28:52 so that you could actually engage more people faster.

00:28:55 One of the key things was actually giving people a binary

00:28:57 they could install, right?

00:28:58 So that it wasn’t just your source code, good luck.

00:29:01 Compile this and then.

00:29:02 It’s compiled, ready to install, just, you know.

00:29:05 So in fact, a lot of the journey from 98,

00:29:07 even through 2012 when I started Anaconda was about that.

00:29:10 Like it’s why, you know, it’s really the key

00:29:13 as to why a scientist with dreams of doing MRI research

00:29:17 ended up starting a software company

00:29:19 that installs software.

00:29:22 I work with a few folks now that don’t program

00:29:26 like on the creative side and the video side,

00:29:28 the audio side.

00:29:29 And because my whole life is running on scripts,

00:29:32 I have to try to get them,

00:29:34 I’m having all the task of teaching them

00:29:35 how to do Python enough to run the scripts.

00:29:39 And so I’ve been actually facing this,

00:29:40 whether it’s Anaconda or some with the task of

00:29:44 how do I minimally explain basically to my mom

00:29:46 how to write a Python script.

00:29:48 And it’s an interesting challenge.

00:29:50 I have to, it’s a to do item for me to figure out like,

00:29:53 what is the minimal amount of information I have to teach?

00:29:56 What are the tools you use that one, you enjoy it,

00:29:59 two, you’re effective at it.

00:30:00 And they’re related, those are two related questions.

00:30:02 And then the debugging, like the iterative process

00:30:05 of running the script to figure out what the error is,

00:30:07 maybe even for some people to do the fix yourself.

00:30:11 So do you compile it?

00:30:12 Do you, like how do you distribute that code to them?

00:30:15 And it’s interesting because I think

00:30:18 it’s exactly what you’re talking about.

00:30:20 If you increase the circle of empathy,

00:30:24 the circle of people that are able to use your programs,

00:30:28 you increase it, it’s like effectiveness and it’s power.

00:30:32 And so you have to think, can I write scripts?

00:30:37 Can I write programs that can be used by medical engineers,

00:30:40 by all kinds of people that don’t know programming

00:30:43 and actually maybe plant a seed,

00:30:46 have them catch the bug of programming

00:30:48 so that they start on a journey.

00:30:50 That’s a huge responsibility.

00:30:51 And ultimately it has to do with the Amazon one click buy.

00:30:55 Like how frictionless can you make the early steps?

00:30:58 Frictionless is actually really key.

00:31:00 To go in any community is, any friction point,

00:31:03 you’re just gonna lose some people, right?

00:31:05 Now sometimes you may wanna intentionally do that.

00:31:09 If you’re early enough on, you need a lot of help.

00:31:11 You need people who have the skills.

00:31:13 You might actually, it’s helpful.

00:31:14 You don’t necessarily have too many users

00:31:16 as opposed to contributors if you’re early on.

00:31:20 Anyway, there’s, SciFi started in 98,

00:31:23 but it really emerged as this collection of modules

00:31:25 that I was just putting on the net.

00:31:27 People were downloading and I think I got 100 users, right?

00:31:31 By the end of that year.

00:31:32 But the fact that I got 100 users and more than that,

00:31:35 people started to email me with fixes.

00:31:39 And that was actually intoxicating, right?

00:31:41 That was the, here I’m writing papers

00:31:44 and I’m giving conferences and I get people to say hello,

00:31:46 but yeah, good job.

00:31:47 But mostly it was, you’re viewed with,

00:31:49 it’s competitive, right?

00:31:51 You publish a paper and people are like,

00:31:52 oh, it wasn’t my paper.

00:31:55 I was starting to see that sense of academic life

00:31:59 where it was so much,

00:32:00 I thought there was this cooperative effort,

00:32:01 but it sounds like we’re here just to one up each other.

00:32:04 And it’s not true across the board,

00:32:07 but a lot of that’s there.

00:32:08 But here in this world,

00:32:09 I was getting responses from people all over the world.

00:32:13 I remember Pjaro Peterson in Estonia, right?

00:32:16 Was one of the first people.

00:32:17 And he sent me back this make file,

00:32:18 cause the first thing it is, yeah, your build thing stinks

00:32:21 and here’s a better make file.

00:32:23 Now it was a complex make file.

00:32:24 I don’t think I never understood that make file actually,

00:32:26 but it worked and it did a lot more.

00:32:29 And so I said, thanks, this is cool.

00:32:30 And that was my first kind of engagement

00:32:32 with community development.

00:32:35 But the process was, he sent me a patch file.

00:32:37 I had to upload a new tar ball.

00:32:39 And I just found, I really love that.

00:32:41 And the style back then was here’s a mailing list.

00:32:43 It’s very, it wasn’t as,

00:32:45 it’s certainly weren’t the tools that are available today.

00:32:47 It was very early on, but I really started to,

00:32:49 that’s the whole year.

00:32:50 I think I did about seven packages that year, right?

00:32:54 And then by the end of the year,

00:32:55 I collected them into a thing called multipack.

00:32:57 So in 99, there was this thing called multipack.

00:32:59 And that’s when a high school student,

00:33:01 no, he was a high school student at the time,

00:33:03 guy named Robert Kern,

00:33:04 took that package and made a Windows installer, right?

00:33:09 And then of course, a massive increase of usage.

00:33:12 So by the way, most of this development was under Linux.

00:33:15 Yes, yes, it was on Linux.

00:33:17 I was a Linux developer doing it on a Unix box.

00:33:20 I mean, at the time I was actually getting into,

00:33:23 I had a new hard drive,

00:33:24 did some kernel programming to make the hard drive work.

00:33:26 I mean, not programming, but modification to the kernel

00:33:28 so I could actually get a hard drive working.

00:33:31 I love that aspect of it.

00:33:32 I was also in, at school, I was building a cluster.

00:33:36 I took Mac computers and you put yellow dog Linux on them.

00:33:40 At the Mayo Clinic, they were just,

00:33:42 they had all these Macs that were older,

00:33:43 they were just getting rid of.

00:33:44 And so I kind of got permission to go grab them together.

00:33:46 I put about 24 of them together in a cluster, in a cabinet,

00:33:50 and put yellow dog Linux on them all.

00:33:51 And I wrote a C++ program to do MRI simulation.

00:33:56 That was what I was doing at the same time

00:33:58 for my day job, so to speak.

00:34:01 So I was loving the whole process.

00:34:03 And the same time I was,

00:34:04 oh, I need a ordinary differential equation.

00:34:06 That’s why ordinary differential equations were key

00:34:08 was because that’s the heart of a block equation

00:34:09 for simulating MRI, is an ODE solver.

00:34:12 And so that’s, but I actually did that,

00:34:15 it just happened at the same time.

00:34:16 That’s why it was kind of what you’re working on

00:34:18 and what you’re interested in, they’re coinciding.

00:34:20 I was definitely scratching my own itch

00:34:22 in terms of building stuff.

00:34:24 And which helped in the sense that I was using it for me,

00:34:27 so at least I had one user.

00:34:28 I had one person who was like, well, no, this is better.

00:34:30 I like this interface better.

00:34:31 And I had the experience of MATLAB

00:34:33 to guide some of what those APIs might look like.

00:34:36 But you’re just doing yourself,

00:34:37 you’re building all this stuff.

00:34:39 But with the Windows installer,

00:34:40 it was the first time I realized, oh yeah,

00:34:41 the binary installer really helps people.

00:34:43 And so that led to spending more time

00:34:46 on that side of things.

00:34:49 So around 2000, so I graduated my PhD in 2000,

00:34:52 end of year, end of 2000.

00:34:53 So 99 doing a lot of work there,

00:34:56 98 doing a lot of work there,

00:34:57 99 kind of spending more time on my PhD,

00:35:00 helping people use the tools,

00:35:02 thinking about what do I want to go from here.

00:35:04 There was a company, there was a guy actually,

00:35:05 Eric Jones and Travis Vought.

00:35:07 They were two friends who founded a company called NTHOT.

00:35:11 It’s here in Austin, still here.

00:35:13 And they, Eric contacted me at the time

00:35:16 when I was a graduate student still.

00:35:19 And he said, hey, why don’t you come down?

00:35:20 We want to build a company.

00:35:22 We’re thinking of a scientific company

00:35:25 and we want to take what you’re doing

00:35:27 and kind of add it to some stuff that he’d done.

00:35:29 He’d written some tools.

00:35:31 And then Piero Peterson had done F2Py.

00:35:32 Let’s come together and build,

00:35:34 pull this all together and call it SciPy.

00:35:36 So that’s the origin of the SciPy brand.

00:35:39 It came from multi pack

00:35:41 and a whole bunch of modules I’d written,

00:35:42 plus a few things from some other folks

00:35:44 and then pulled together in a single installer.

00:35:47 SciPy was really a distribution of Python

00:35:49 masquerading as a library.

00:35:51 How did you think about SciPy in context of Python,

00:35:54 in context of Numeric, like what?

00:35:56 So we saw SciPy as a way to make an R&D environment

00:35:59 for Python, like use Python, depended on Numeric.

00:36:03 So Numeric was the array library we depended on.

00:36:05 And then from there, extend it with a bunch of modules

00:36:08 that allowed for, and at the time,

00:36:10 the original vision of SciPy was to have plotting,

00:36:13 was to have the REPL environment

00:36:16 and kind of really a whole data environment

00:36:19 that you could then install and get going with.

00:36:21 And that was kind of the thinking.

00:36:23 It didn’t really evolve that way, right?

00:36:25 It sort of had a, for one,

00:36:27 it’s really hard to do massive scale projects

00:36:31 with open source collectives.

00:36:34 Actually, there’s sort of an intrinsic cooperation limit

00:36:38 as to which, too many cooks in the kitchen,

00:36:40 you can do amazing infrastructure work.

00:36:42 When it comes down to bringing it all together

00:36:44 into a single deliverable,

00:36:45 that actually requires a little more product management

00:36:49 that is not, that doesn’t really emerge

00:36:52 from the same dynamic.

00:36:53 So it struggled, struggled to get almost too many voices.

00:36:57 It’s hard to have everybody agree.

00:36:59 Consensus doesn’t really work at that scale.

00:37:02 You end up with politics,

00:37:03 with the same kind of things that’s happened

00:37:05 in large organizations trying to decide

00:37:07 what to do together.

00:37:09 So consensus building was challenging at scale

00:37:12 as more people came in, right?

00:37:13 Early on, it’s fine, because there’s nobody there.

00:37:15 So it works, but then as you get more successful

00:37:17 and more people use it, all of a sudden,

00:37:18 oh, there’s this scale at which this doesn’t work anymore

00:37:22 and we have to come up with different approaches.

00:37:23 So Sidepy came out officially in 2001,

00:37:26 was the first release, most of the time.

00:37:28 I remember the days of getting that release ready.

00:37:31 It was a Windows installer and there were bugs

00:37:33 on how the Windows compiler handled complex numbers

00:37:36 and you were chasing segmentation faults.

00:37:38 And it was, it’s a lot of work.

00:37:40 There was a lot of effort had nothing to do

00:37:43 with my area of study.

00:37:45 And at the same time, I had just gotten an offer.

00:37:47 So he wondered if I wanted to come down

00:37:48 and help him start that company with his friend.

00:37:51 And at the time I was like, I was intrigued,

00:37:53 but I was squaring a path, an academic path.

00:37:56 And I had just got an offer to go and teach at my alma mater.

00:37:59 So I took that tenure track position.

00:38:02 And Sidepy, and kind of, then I started to work on Sidepy

00:38:05 as a professor too.

00:38:07 So that’s, I left, I’ve got the Mayo Clinic,

00:38:09 graduated, wrote my thesis using Sidepy,

00:38:11 wrote, you know, there’s images that were created.

00:38:15 Now the plotting tool I used was something

00:38:17 from Yorick actually.

00:38:18 It was a plotting, a PLT kind of a plotting language

00:38:21 that I used.

00:38:22 Yorick is a programming language?

00:38:23 It was a programming language, had a plotting tool,

00:38:26 Dyslin, it had integration to Dyslin.

00:38:28 I ended up using Dyslin plus some of the plotting

00:38:31 from Yorick linked to from Python.

00:38:33 Anyway, it was, people don’t plot that way now,

00:38:37 but this is before, and Sidepy was trying to add plotting.

00:38:40 Yeah. Right?

00:38:41 It didn’t have much success.

00:38:42 Really the success of plotting came from John Hunter,

00:38:45 who had a similar experience to my experience,

00:38:47 my kind of maverick experience as a person

00:38:49 just trying to get stuff done and kind of having more time

00:38:51 than money maybe, right?

00:38:53 And John Hunter created what?

00:38:55 MapPlotLib.

00:38:56 He’s the creator of MapPlotLib.

00:38:57 Yeah, so John Hunter was, you know,

00:38:59 he wasn’t a student at the time, but he was an,

00:39:00 he was working in Quant field and he said,

00:39:02 we need better plotting.

00:39:03 So he just went out and said, cool, I’ll make a new project

00:39:05 and we’ll call it MapPlotLib.

00:39:06 And he released in 2001,

00:39:08 about the same time that Sidepy came out

00:39:09 and it was separate library, separate install,

00:39:12 use numeric, Sidepy use numeric.

00:39:15 And so Sidepy, you know, in 2001, we released Sidepy

00:39:18 and then Endthought created a conference called Sidepy,

00:39:22 which was brought people together to talk about the space.

00:39:25 And that conference is still ongoing.

00:39:26 It’s one of the favorite conferences of a lot of people

00:39:28 because it’s, you know, it’s changed over the years,

00:39:30 but early on it was, you know, a collection of 50 people

00:39:33 who care about, scientists mostly, you know,

00:39:36 practicing scientists who want, who care about coding

00:39:39 and doing it well and not using MATLAB.

00:39:42 And I remember being driven by, you know, I liked MATLAB,

00:39:44 but I didn’t like the fact that,

00:39:46 so I’m not opposed to proprietary software.

00:39:48 I’m actually not an open source zealot.

00:39:50 I love open source for the, what it brings,

00:39:52 but I also see the role for proprietary software.

00:39:54 But what I didn’t like was the fact that I would develop

00:39:56 code and publish it and then effectively telling somebody

00:39:59 here to run my code, you have to have

00:40:01 this proprietary software.

00:40:02 Right, and there’s also culture around MATLAB as much,

00:40:05 because I’ve talked to a few folks in,

00:40:08 MathWorks creates MATLAB?

00:40:09 Yeah.

00:40:10 I mean, there’s just a culture, they try really hard,

00:40:13 but it just, there’s this corporate IBM style culture

00:40:16 that’s like, or whatever.

00:40:18 I don’t want to say negative things about IBM or whatever,

00:40:20 but there’s a…

00:40:22 No, it’s really that connection.

00:40:23 It’s something I’m in the middle of right now

00:40:24 is the business of open source.

00:40:27 And how do you connect the ethos of cooperative development

00:40:30 with the necessity of creating profits, right?

00:40:34 And like right now today, I’m still in the middle of that.

00:40:38 That’s actually the early days of me exploring this question.

00:40:42 Cause I was writing SciPy, I mean, as an aside,

00:40:44 I also had, so I had three kids at the time.

00:40:46 I have six kids now.

00:40:47 I got married early, wanted a family.

00:40:50 I had three kids and I remember reading,

00:40:52 I read Richard Stallman’s post and I was a fan of Stallman.

00:40:55 I would read his work, I liked this collective ideas

00:40:58 he would have.

00:40:58 Certainly the ideas on IP law, I read a lot of his stuff.

00:41:01 But then he said, okay, well,

00:41:04 how do I make money with this?

00:41:05 How do I make a living?

00:41:06 How do I pay for my kids?

00:41:07 All this stuff was in my mind,

00:41:09 young graduate student making no money,

00:41:10 thinking I got to get a job.

00:41:12 And he said, well, I think just be like me

00:41:14 and don’t have kids, right?

00:41:15 That’s just, don’t, don’t.

00:41:17 That’s his take on it.

00:41:18 That was what he said in that moment, right?

00:41:20 That’s the thing I read and I went,

00:41:22 okay, this is a train I can’t get on.

00:41:24 There has to be a way to preserve the culture

00:41:26 of open source and still be able to make sufficient money

00:41:29 to feed your kids.

00:41:30 Yes, exactly, there’s gotta be.

00:41:31 Well, so that actually led me to a study of economics.

00:41:34 Because at the time I was ignorant and I really was.

00:41:36 I’m actually, I’m embarrassed for educational system

00:41:39 that they could let me and I was valedictorian

00:41:41 in my high school class and I did super well in college.

00:41:43 And like academically I did great, right?

00:41:47 But the fact that I could do that and then be clueless

00:41:49 about this key part of life,

00:41:52 it led me to go, there’s a problem.

00:41:54 Like I should have learned this in fifth grade.

00:41:56 I should have learned this in eighth grade.

00:41:58 Like everybody should come out

00:41:59 with a basic knowledge of economics.

00:42:01 You’re an interesting example because you’ve created tools

00:42:04 that change the lives of probably millions of people

00:42:07 and the fact that you don’t understand at the time

00:42:10 of the creation of those tools, the basics economics

00:42:12 of how like to build up a giant system is the problem.

00:42:15 Yeah, it’s a problem.

00:42:16 And so during my PhD at the same time,

00:42:18 this is back in 98, 99 at the same time,

00:42:20 I was in a library, I was reading books on capitalism,

00:42:23 I was reading books on Marxism,

00:42:24 I was reading books on what is this thing?

00:42:27 What does it mean?

00:42:29 And I encountered, basically I encountered a set of writings

00:42:33 from people that said they were the inheritors of Adam Smith.

00:42:35 Read Adam Smith for the first time, right?

00:42:37 Which is the wealth of nations

00:42:38 and kind of this notion of emergent societies

00:42:42 and realized, oh, there’s this whole world out here

00:42:45 of people and the challenge of economics is also political.

00:42:49 Like, cause economics, people, different parties

00:42:53 running for office, they want their economic friends.

00:42:58 They want their economists to back them up, right?

00:43:00 Or to be their magicians, like the magicians

00:43:03 in Pharaoh’s court, right?

00:43:04 The people that are kind of say, hey, this is,

00:43:06 you should listen to me because I’ve got the expert

00:43:08 who says this.

00:43:09 And so it gets really muddled, right?

00:43:11 But I was looking at it from as a scientist going,

00:43:14 what is this space?

00:43:14 What does this mean?

00:43:15 How does Paris get fed?

00:43:16 How does, what is money?

00:43:18 How does it work?

00:43:19 And I found a lot of writings that I really loved.

00:43:21 I found some things that I really loved

00:43:22 and I learned from that.

00:43:23 It was writings from people like Von Missess.

00:43:26 He wrote a paper in 1920 that still should be read

00:43:29 more than it is.

00:43:29 It was the economic calculation problem

00:43:33 of the socialist commonwealth.

00:43:34 It was basically in response

00:43:35 to the Bolshevik revolution in 1917.

00:43:37 And his basic argument was it’s not gonna work

00:43:40 to not have private property.

00:43:41 You’re not gonna be able to come up with prices.

00:43:43 The bureaucrats aren’t gonna be able to determine

00:43:45 how to allocate resources without a price system.

00:43:47 And a price system emerges from people making trades.

00:43:51 And they can only make trades if they have authority

00:43:53 over the thing they’re trading.

00:43:55 And that creates information flow

00:43:58 that you just don’t have if you try to top down it.

00:44:01 Right.

00:44:02 And it’s like, huh, that’s a really good point.

00:44:04 Yeah, the prices have a signal that’s used.

00:44:06 And it’s important to have that signal

00:44:09 when you’re trying to build a community

00:44:11 of productive people like you would

00:44:12 in the software engineering space.

00:44:13 Yeah, the prices are actually

00:44:14 an important signaling mechanism.

00:44:17 Right, and that money is just a bartering tool.

00:44:20 Right, so this is the first time I’ve encountered

00:44:22 any of this concept, right, and the fact that,

00:44:24 oh, this is actually really critical.

00:44:26 Like it’s so critical to our prosperity

00:44:29 and that we’re dangerously not learning about this,

00:44:34 not teaching our children about this.

00:44:36 So you had the three kids,

00:44:37 you had to make some hard decisions.

00:44:38 I had to make some money, right, had to figure it out.

00:44:39 But I didn’t really care.

00:44:40 I mean, I’ve never been driven by money, just need it.

00:44:43 Yeah, right, need to eat.

00:44:45 So how did that resolve itself in terms of site buy?

00:44:49 So I would say it didn’t really resolve itself.

00:44:51 It sort of started a journey that I’m continuing on.

00:44:53 I’m still on, I would say.

00:44:54 I don’t think it resolved itself.

00:44:55 But I will say I went in eyes wide open.

00:44:59 Like I knew that there were problems

00:45:00 with giving stuff away and creating the market externalities

00:45:07 that the fact that, yeah, people might use it

00:45:09 and I might not get paid for it

00:45:10 and I’ll have to figure something else out to get paid.

00:45:13 Like at least I can say I’m not bitter

00:45:14 that a lot of people have used stuff that I’ve written

00:45:17 and I haven’t necessarily benefited economically from it.

00:45:20 I’ve heard other people be bitter about that

00:45:22 when they write or they talk.

00:45:23 Like, oh, I should’ve got more value out of this.

00:45:24 And I’m also, I want to create systems

00:45:27 that let people like me who might have these desires

00:45:31 to do things, let them benefit.

00:45:32 So it actually creates more of the same.

00:45:34 Not to turn on your bitterness module,

00:45:36 but there’s some aspect, I wish there was mechanisms for me

00:45:40 to reward whoever created side buy and non buy

00:45:43 because it brought so much joy to my life.

00:45:45 I appreciate that.

00:45:46 You know what I mean?

00:45:46 The tip dark notion was there.

00:45:48 I appreciate that.

00:45:49 But there should be a very frictionless mechanism.

00:45:51 There should be a frictionless mechanism.

00:45:52 I totally agree.

00:45:53 I would love to talk about some of the ideas I have

00:45:55 because I actually came across,

00:45:56 I think I’ve come up with some interesting notions

00:45:58 that could work, but they’ll require anything that will work

00:46:01 takes time to emerge, right?

00:46:03 Like things don’t just turn overnight.

00:46:04 That’s definitely one thing I’ve also understood

00:46:06 and learned is any fixes, that’s why it’s kind of funny.

00:46:10 We often give credit to, oh, this president gets elected

00:46:12 and oh, look how great things have done.

00:46:14 And I saw that when I had a transition in a condo

00:46:18 when a new CEO came in, right?

00:46:19 And it’s like the success that’s happening,

00:46:22 there’s an inertia there.

00:46:23 Yeah, and sometimes the decision you made

00:46:25 like 10 years before is the reason why the success is the.

00:46:28 Right, exactly.

00:46:29 So we’re sort of just running around taking credit

00:46:31 for stuff.

00:46:32 The credit assignment has like a delay to it

00:46:35 that makes the credit assignment basically wrong

00:46:38 more than right.

00:46:39 Wrong more than right, exactly.

00:46:40 And so I’m like, oh, this is, you know,

00:46:42 that’s the stuff I would read a ton about, you know,

00:46:44 early on.

00:46:45 So I don’t, I feel like I’m with you.

00:46:47 Like I want the same thing.

00:46:48 I want to be able to, and honestly, not for personally,

00:46:50 I’ve been happy.

00:46:51 I’ve been happy.

00:46:52 I feel like I don’t have any, I mean,

00:46:53 we’ve been done reasonably okay, but I’ve had to pursue it.

00:46:56 Like that’s really what started my trajectory from academia

00:47:01 is reading that stuff led me to say,

00:47:02 oh, entrepreneurship matters.

00:47:05 So I love software, but we need more entrepreneurs

00:47:09 and I wanna understand that better.

00:47:10 So once I kind of had that virus infect my brain,

00:47:16 even though I was on a trajectory

00:47:17 to go to a tenure track position at a university

00:47:20 and I was there for six years,

00:47:22 I was kind of already out the door when I started.

00:47:26 And we can get into that, but.

00:47:27 Well, can I just ask you a quick question on,

00:47:30 is there some design principles

00:47:32 that were in your mind around SciPy?

00:47:34 Like, is there some key ideas

00:47:36 that were just like sticking to you

00:47:38 that this is the fundamental ideas?

00:47:40 Yeah, I would say so.

00:47:41 I would think it’s basically accessibility to scientists,

00:47:43 like give them, give scientists and engineers tools

00:47:46 that they don’t have to think a lot about programming.

00:47:48 So give them really good building blocks,

00:47:50 give them functions that they wanna call

00:47:51 and sort of just the right length of spelling.

00:47:55 There’s one tradition in programming where it’s like,

00:47:59 make very, very long names, right?

00:48:01 And you can see it in some programming languages

00:48:03 where the names get, take half the screen.

00:48:06 And in the 4chan world, characters had to be six letters

00:48:11 early on, right?

00:48:12 And that’s way too much, too little.

00:48:14 But I was like, I liked to have names

00:48:16 that were informative but short.

00:48:18 So even though Python, well this is a different conversation,

00:48:22 but documentation is doing some work there.

00:48:25 So when you look at great scientific libraries

00:48:29 and functions, there’s a richness of documentation

00:48:32 that helps you get into the details.

00:48:34 The first glance at a function gives you the intuition

00:48:37 of all it needs to do by looking at the headers and so on.

00:48:40 But to get the depths of all the complexities involved,

00:48:43 all the options involved,

00:48:44 documentation does some of the work.

00:48:45 Documentation is essential, yeah.

00:48:47 So that was actually a, so we thought about several things.

00:48:50 One is we wanted plotting.

00:48:51 We wanted interactive environment.

00:48:53 We wanted good documentation.

00:48:54 These are things we knew, we wanted.

00:48:56 The reality is those took about 10 years to evolve, right?

00:49:00 Given the fact that we didn’t have a big budget,

00:49:02 it was all volunteer labor.

00:49:03 It was sort of, when nthought got created

00:49:06 and they started to try to find projects,

00:49:10 people would pay for pieces

00:49:11 and they were able to fund some of it.

00:49:13 Not nearly enough to keep up with what was necessary.

00:49:15 And no criticism, just simply the reality.

00:49:18 I mean, it’s hard to start a business

00:49:21 and then do consulting and then also

00:49:23 promote an open source project that’s still fairly new.

00:49:26 Cypo is fairly niche.

00:49:27 We stayed connected all while I was a student,

00:49:30 sorry, a professor.

00:49:30 I went to BYU and started to teach.

00:49:32 Electrical engineering, all the applied math courses.

00:49:35 I loved teaching single processing,

00:49:36 probability theory, electromagnetism.

00:49:39 I was, if you look at writing my professor,

00:49:40 which my kids loved to do,

00:49:42 I wasn’t, I got some bad reviews because people.

00:49:46 What was the criticism?

00:49:48 I would speak too high of a level.

00:49:50 Like I definitely had a calibration problem

00:49:52 coming out of graduate work

00:49:54 where I hate to be condescending to people.

00:49:56 Like I really have a ton of respect for people fundamentally.

00:49:59 Like my fundamental thing is I respect people.

00:50:02 Sometimes that can lead to a,

00:50:03 I was thinking they had more knowledge than they did.

00:50:07 And so I would just speak at a very high level,

00:50:10 assume they got it.

00:50:11 But they need to rise to the standard that you set.

00:50:14 I mean, that’s one of the,

00:50:15 some of the greatest teachers do that.

00:50:17 And I agree.

00:50:18 And that was kind of what was inspiring me.

00:50:19 But you also have to,

00:50:22 I cannot say I was articulate

00:50:24 with some of the greatest teachers, right?

00:50:26 I was, like one classic example,

00:50:28 when I first taught at BYU,

00:50:30 my very first class, it was overheads,

00:50:31 transparencies, overheads.

00:50:34 Before projectors were really that common,

00:50:35 I taught transparencies.

00:50:37 I’m writing my notes out.

00:50:38 I go in, room’s half dark.

00:50:40 I just blaring through these transparencies.

00:50:42 Here it is, here it is, here it is.

00:50:44 And I did give a quiz after two weeks.

00:50:47 No one knew anything.

00:50:48 Nothing I had taught had gotten anywhere.

00:50:50 And I realized, okay, I’m not, this is not working.

00:50:54 So I put away the transparencies

00:50:56 and I turned around and just started using the chalkboard.

00:50:58 And what it did is it slowed me down, right?

00:51:00 The chalkboard just slowed me down

00:51:02 and gave people time to process and to think.

00:51:04 And then that made me focus.

00:51:06 My writing wasn’t great on the chalkboard,

00:51:07 but I really love that part of like the teaching.

00:51:10 So that entered SciPy’s world in terms of,

00:51:12 we always understood that there’s a didactic aspect

00:51:14 of SciPy, kind of how do you take the knowledge

00:51:17 and then produce it?

00:51:18 The challenge we had was the scope.

00:51:21 Like ultimately SciPy was everything, right?

00:51:23 And so 2001, when it first came out,

00:51:25 people were starting to use it.

00:51:26 No, this is cool, this is a tool we actually use.

00:51:29 At the same time, 2001 timeframe,

00:51:31 there was a little bit of like the Hubble Space Telescope,

00:51:33 the folks at Hubble that started to say,

00:51:35 hey, Python, we’re gonna use Python

00:51:36 for processing images from Hubble.

00:51:38 And so Perry Greenfield was a good friend

00:51:40 in running that program.

00:51:42 And he had called me before I left WIU and said,

00:51:45 you know, we wanna do this,

00:51:47 but numeric actually has some challenges in terms of,

00:51:50 you know, it’s not, the array doesn’t have enough types.

00:51:52 We need more operations.

00:51:54 You know, broadcasting needs to be a little more settled.

00:51:56 They wanted record arrays.

00:51:57 They wanted, you know, record arrays are like a data frame,

00:52:00 but a little bit different,

00:52:02 but they wanted more structured data.

00:52:03 So he had called me even early on then,

00:52:06 and he said, you know, what,

00:52:06 would you wanna work on something to make this work?

00:52:08 And I said, yeah, I’m interested, but I’m going here,

00:52:10 and I, you know, we’ll see if I have time.

00:52:12 So in the meantime, while I was teaching

00:52:13 and SciPy was emerging, and I had a student,

00:52:15 I was constantly, while I was teaching,

00:52:16 trying to figure a way to fund this stuff.

00:52:18 So I had a graduate student, my only graduate student,

00:52:21 a Chinese fellow, Liu Hongze is his name, great guy.

00:52:26 He wrote a bunch of stuff for iterative linear algebra,

00:52:29 like got into writing some of the iterative

00:52:31 linear algebra tools that are currently there in SciPy,

00:52:34 and they’ve gotten better since,

00:52:36 but this is in 2005, kept working on SciPy,

00:52:39 but Perry has started working on a replacement

00:52:43 to numeric called NumArray.

00:52:45 And in 2004, a package called ND Image,

00:52:49 it was an image processing library

00:52:50 that was written for NumArray,

00:52:53 and it had in it a morphology tool.

00:52:55 I don’t know if you know what morphology is.

00:52:56 It’s open, dilations, closed, you know,

00:52:58 there was sort of this, as a medical imaging student,

00:53:01 I knew what it was,

00:53:02 because it was used in segmentation a lot.

00:53:04 And in fact, I’d wanted to do something like that

00:53:06 in Python, in SciPy, but just had never gotten around to it.

00:53:10 So when it came out, but it worked only on NumArray,

00:53:14 and SciPy needed numeric,

00:53:16 and so we effectively had the beginning of this split.

00:53:20 And numeric and NumArray didn’t share data,

00:53:22 they were just two, so you could have a gigabyte

00:53:24 of numeric, NumArray data, and gigabyte of numeric data,

00:53:26 and they wouldn’t share it.

00:53:27 And so you had these,

00:53:28 then you had these scientific libraries written on top.

00:53:31 I got really bugged by that.

00:53:32 I got really like, oh man, this is not good,

00:53:35 we’re not cooperating now,

00:53:36 we’re sort of redoing each other’s work,

00:53:37 and we’re just this young community.

00:53:40 So that’s what led me, even though I knew it was risky,

00:53:43 because my, you know, I was on a tenure track position,

00:53:47 2004 I got reviewed.

00:53:48 They said, hey, things are going okay,

00:53:49 you’re doing well, paper’s coming out,

00:53:51 but you’re kind of spending a lot of time

00:53:52 doing this open source stuff, maybe do a little less of that,

00:53:54 and a little more of the paper writing and grant writing,

00:53:57 which was naive, but it was definitely the thinking.

00:54:00 It still goes on.

00:54:01 Still goes on.

00:54:03 You’re basically creating a thing

00:54:05 which enables science in the 21st century.

00:54:08 Right.

00:54:09 Maybe don’t emphasize that so much in your free year tenure.

00:54:11 Right.

00:54:13 It illustrates some of the challenges.

00:54:14 Yes.

00:54:15 It does, and it’s, people mean well.

00:54:18 Yes.

00:54:19 Like, but we’ve gotten broken in a bunch of ways.

00:54:22 Certain things, programming,

00:54:23 understanding the role of software engineering,

00:54:25 programming in society is a little bit lacking.

00:54:27 Exactly.

00:54:28 Now, I was in electrical engineering position.

00:54:30 Right.

00:54:30 That’s even worse there.

00:54:33 Yeah, it was very, they were very focused,

00:54:34 and so, you know, good people, and I had a great time,

00:54:37 I loved my time, I loved my teaching,

00:54:38 I loved all the things I did there.

00:54:40 The problem was, the split was happening

00:54:42 in this community that I loved, right?

00:54:43 I saw people, and I went, oh my gosh,

00:54:45 this is gonna be, this is not great,

00:54:47 and so I happened, you know, fate,

00:54:50 I had a class I had signed up for,

00:54:52 it’s a, I was trying to build an MRI system,

00:54:54 so I had a kind of a radio, instead of a radio,

00:54:58 a digital radio class, it was a digital MRI class.

00:55:01 And I had people sign up, two people signed up,

00:55:04 then they dropped, and so I had nobody in this class.

00:55:06 So, and I didn’t have any other courses to teach,

00:55:08 and I thought, oh, I’ve got some time,

00:55:10 and I’ll just write, I’ll just write a replace,

00:55:13 a merger of Numerica Numeray.

00:55:14 Like, I’ll basically take the numeric code base

00:55:16 at the features Numeray was adding,

00:55:19 and then kind of come up with a single array library

00:55:21 that everybody can use.

00:55:22 So that’s where NumPy came from,

00:55:24 was my thinking, hey, I can do this,

00:55:26 and who else is going to?

00:55:27 Because at that point, I’d been around the community

00:55:29 long enough, and I’d written enough C code,

00:55:30 I knew, I knew the structures, and I,

00:55:33 in fact, my first contribution to numeric

00:55:35 had been writing the CAPI documentation

00:55:38 that went in the first documentation for NumPy,

00:55:41 for numeric, sorry, this is Paul DuBois,

00:55:43 David Asher, Conrad Hinson, and myself.

00:55:45 I got credit because I wrote this chapter,

00:55:47 which is all the CAPI of Numerica, all the C stuff.

00:55:51 So I said, I’m probably the one to do it,

00:55:53 and nobody else is gonna do this.

00:55:54 So it was sort of, out of a sense of duty and passion,

00:55:58 knowing that, eh, I don’t think my academic,

00:56:01 I don’t think the department here is gonna appreciate this,

00:56:03 but it’s the right thing to do.

00:56:06 It was like.

00:56:06 Can we just link on that moment?

00:56:08 Yeah, yeah.

00:56:09 Because the importance of the way you thought

00:56:11 and the action you took, I feel is understated

00:56:16 and is rare and I would love to see so much more of it

00:56:19 because what happens as the tools become more popular,

00:56:24 there’s a split that happens.

00:56:27 And it’s a truly heroic and impactful action

00:56:30 to in those early, in that early split,

00:56:33 to step up and it’s like great leaders throughout history,

00:56:37 like get, what is the brave heart,

00:56:39 like get on a horse and rile the troops

00:56:42 because I think that can have, make a big difference.

00:56:46 We have TensorFlow versus PyTorch

00:56:48 in the machine learning community.

00:56:49 We have the same problem today.

00:56:50 Yeah, I wonder.

00:56:51 It’s actually bigger.

00:56:52 I wonder if it’s possible in the early days

00:56:56 to rally the troops.

00:56:58 It is possible, especially in the early days.

00:57:00 The longer it goes, the harder, right?

00:57:01 The more energy in the factions, the harder.

00:57:03 But in the early days, it is possible

00:57:05 and it’s extremely helpful

00:57:07 and there’s a willingness there,

00:57:09 but the challenge is there’s just not a willingness

00:57:11 to fund it.

00:57:12 There’s not a willingness to, you know,

00:57:14 like I was literally walking into a field

00:57:17 saying I’m going to do this

00:57:18 and here I am, like, you know,

00:57:20 I have five kids at home now.

00:57:23 Pressure builds.

00:57:24 Sometimes my wife hears these stories

00:57:26 and she’s like, you did what?

00:57:29 I thought we were going to,

00:57:29 I thought you were actually on a path

00:57:31 to make sure we had resources and money, but,

00:57:34 but again, there’s a, there’s an aspect,

00:57:36 I’m a very hopeful person.

00:57:37 I’m an optimistic person by nature.

00:57:39 I love people.

00:57:41 I learned that about myself later on.

00:57:43 And part of my, my religious beliefs

00:57:47 actually lead to that.

00:57:48 And it’s why I hold them dear

00:57:49 because it’s actually how I feel about,

00:57:51 that’s what leads me to these attitudes,

00:57:53 sort of this hopefulness and this sense of,

00:57:55 yeah, it may not work out for me financially

00:57:58 or maybe, but that’s not the ultimate gain.

00:58:00 Like that’s a thing, but it’s not,

00:58:02 that’s not the scorecard for me.

00:58:05 And so I just wanted to be helpful

00:58:07 and I knew, and partly because these SciPy conferences,

00:58:09 because the maintenance conversations,

00:58:10 I knew there was a lot of need for this, right?

00:58:13 And so I had this, it wasn’t like I was alone

00:58:15 in terms of no feedback.

00:58:16 I had these people who knew, but it was crazy.

00:58:19 Like people who at the time said,

00:58:20 yeah, we didn’t think you’d be able to do it.

00:58:22 We thought it was crazy.

00:58:23 And also instructive, like practically speaking,

00:58:26 that you had a cool feature

00:58:28 that you were chasing the morphology, like the.

00:58:30 Yes.

00:58:31 Like it’s not just like.

00:58:32 There’s an end result.

00:58:33 It’s not some visionary thing.

00:58:35 I’m going to unite the community.

00:58:36 You were like. Correct.

00:58:38 You were actually practically,

00:58:39 this is what one person actually could do

00:58:42 and actually build.

00:58:43 Cause that is important.

00:58:44 Cause you can get over your skis.

00:58:47 You can definitely get over your skis.

00:58:49 And I had, in fact, this almost got me over my skis, right?

00:58:52 I would say, well, in retrospect, I hate looking back.

00:58:56 I can tell you all the flaws with NumPy, right?

00:58:58 When I go into it, there’s lots of stuff that I’m like,

00:59:00 oh man, that’s embarrassing.

00:59:01 That was wrong.

00:59:02 I wish I had somebody stop me with a wet fish there.

00:59:04 Like I needed, like what I’d wished I’d had

00:59:07 was somebody with more experience and certainly library

00:59:10 writing and array library.

00:59:11 There’s like, I wish I had me.

00:59:12 I could go back in time and go do this, do that.

00:59:14 There’s a more important thing.

00:59:15 Cause there’s things we did that are still there

00:59:18 that are problematic, that created challenges for later.

00:59:20 And I didn’t know it at the time.

00:59:22 Didn’t understand how important that was.

00:59:24 And in many cases, didn’t know what to do.

00:59:26 Like there was pieces of the design of NumPy.

00:59:29 I didn’t know what to do until five years ago.

00:59:31 Now I know what they should have been, Ben.

00:59:32 But I didn’t know at the time and nobody,

00:59:33 and I couldn’t get the help.

00:59:35 Anyway, so I wrote it.

00:59:36 It took about, it took four months to write

00:59:38 the first version, then about 14 months to make it usable.

00:59:43 But it was, it wasn’t, it was that first four months

00:59:45 of intense writing, coding, getting something out the door

00:59:49 that worked that was, it was, it was definitely challenging.

00:59:52 And then the big thing I did was create a new type object

00:59:54 called D type.

00:59:56 That was probably the contribution.

00:59:58 And then the fact that I added broad, not just broadcasting,

01:00:01 but advanced indexing so that you could do masked indexing

01:00:06 and indirect indexing instead of just slicing.

01:00:09 So for people who don’t know, and maybe you can elaborate,

01:00:13 NumPy, I guess the vision in the narrowest sense

01:00:17 is to have this object that represents

01:00:21 n dimensional arrays.

01:00:23 And like at any level of abstraction you want,

01:00:26 but basically it could be a black box

01:00:28 that you can investigate in ways that you would naturally

01:00:30 want to investigate such objects.

01:00:33 Yes, exactly.

01:00:34 So you could do math on it easily.

01:00:35 Math on it easily, yeah.

01:00:37 So it had an associated library of math operations

01:00:39 and effectively SciPy became an even larger operate set

01:00:43 of math operations.

01:00:44 So the key for me was I was going to write NumPy

01:00:48 and then move SciPy to depend on NumPy.

01:00:50 In fact, early on, one of the initial proposals

01:00:52 was that we would just write SciPy

01:00:54 and it would have the numeric object inside of it.

01:00:56 And it’d be SciPy.array or something.

01:00:59 That turned out to be problematic because numeric

01:01:02 already had a little mini library of linear algebra

01:01:04 and some functions, and it had enough momentum,

01:01:08 enough users that nobody wanted to,

01:01:10 they wanted backward compatibility.

01:01:12 One of the big challenges of NumPy

01:01:13 was I had to be backward compatible

01:01:14 with both numeric and NumArray

01:01:16 in order to allow both of those communities to come together.

01:01:19 There was a ton of work in creating

01:01:21 that backward compatibility

01:01:22 that also created echoes in today’s object.

01:01:25 Like some of the complexity in today’s object

01:01:27 is actually from that goal of backward compatibility

01:01:30 to these other communities,

01:01:31 which if you didn’t have that, you’d do something different,

01:01:34 which is instructive because a lot of things are there.

01:01:37 You think, what is that there for?

01:01:38 It’s like, well, it’s a remnant.

01:01:41 It’s an artifact of its historical existence.

01:01:45 By the way, I love the empathy

01:01:46 and the lack of ego behind that

01:01:48 because I feel, you see that in the split

01:01:51 in the JavaScript framework, for example,

01:01:53 the arbitrary branching.

01:01:54 Right.

01:01:56 I think in order to unite people,

01:01:59 you have to kind of put your ego aside

01:02:00 and truly listen to others.

01:02:02 You do.

01:02:03 What do you love about NumArray?

01:02:04 What do you love about Numeric?

01:02:06 Like actually get a sense,

01:02:07 we were talking about languages earlier,

01:02:08 sort of empathize to the culture,

01:02:11 the people that love something about this particular API,

01:02:14 some of the naming style

01:02:18 or the actual usage patterns

01:02:21 and truly understand them

01:02:22 and so that you can create that same draw

01:02:26 in the united thing. I completely agree.

01:02:28 I completely agree.

01:02:29 And you have to also have enough passion

01:02:31 that you’ll do it.

01:02:32 It can’t be just like a perfunctory,

01:02:34 oh yes, I’ll listen to you

01:02:36 and then I’m not really that excited about it.

01:02:38 So it really is an aspect,

01:02:39 it’s a philosophical, like there’s a philia,

01:02:42 there’s a love of esteeming of others.

01:02:44 It’s actually at the heart of what,

01:02:47 it’s sort of a life philosophy for me, right?

01:02:49 That I’m constantly pursuing and that helped,

01:02:51 absolutely helped.

01:02:52 Makes me wonder in a philosophical,

01:02:54 like looking at human civilization as one object,

01:02:57 it makes me wonder how we can copy and paste Travis’s

01:02:59 in this book.

01:03:00 Well, some aspects, maybe.

01:03:03 Some aspects, right, right, exactly.

01:03:05 Well, it’s a good question.

01:03:07 How do we teach this?

01:03:08 How do we encourage it?

01:03:09 How do we lift it?

01:03:10 Because so much of the software world,

01:03:12 it’s giant communities, right?

01:03:15 But it seems like so much is moved by,

01:03:16 like little individuals.

01:03:18 You talk about like Linus Torvalds.

01:03:21 It’s like, could you have not,

01:03:23 could you have had Linux without him?

01:03:25 Could you?

01:03:26 Yeah, Guido and Python.

01:03:28 Guido and Python.

01:03:28 Guido and Python.

01:03:29 Well, the iPy community particularly,

01:03:30 it’s like I said, we wanted to build this big thing,

01:03:32 but ultimately we didn’t.

01:03:33 What happened is we had Mavericks and champions

01:03:36 like John Hunter who created Matplotlib.

01:03:37 We had Fernando Perez who created iPython.

01:03:39 And so we sort of inspired each other,

01:03:42 but then it kind of, there’s sort of a culture

01:03:43 of this selfless giving, the stewardship mentality,

01:03:47 as opposed to ownership mentality,

01:03:49 but stewardship and community focused,

01:03:54 community focused, but intentional work.

01:03:56 Like not waiting for everybody else to do the work,

01:03:58 but you’re doing it for the benefit of others

01:04:00 and not worried about what you’re gonna get.

01:04:04 You’re not worried about the credit.

01:04:04 You’re not worried about what you’re gonna get.

01:04:05 You’re worried about, I later realized

01:04:07 that I have to worry a little about credit,

01:04:09 not because I want the credit,

01:04:10 because I want people to understand

01:04:11 what led to the results.

01:04:13 Like, I don’t, it’s not about me.

01:04:15 It’s I want to understand this is what led to the result.

01:04:17 So let’s like, I think doing,

01:04:18 and this is what had no impact on the result.

01:04:21 Like let’s promote, just like you said,

01:04:23 I want to promote the attributes

01:04:25 that help make us better off.

01:04:26 How do we make more of West McKinney?

01:04:28 Like West McKinney was critical to the success of Python

01:04:31 because of his creation of pandas,

01:04:33 which is the roots of that were all the way back

01:04:36 in numeric and num array and numpy,

01:04:40 where numpy created an array of records.

01:04:43 West started to use that almost like a data frame,

01:04:45 except it’s an array of records.

01:04:47 And data frame, the challenge is,

01:04:49 okay, if you want to augment it at another column,

01:04:52 you have to insert, you have to do all this memory movement

01:04:54 to insert a column.

01:04:55 Whereas data frames became,

01:04:57 oh, I’m going to have a loose collection of arrays.

01:05:00 So it’s a record of arrays that is a part of a data frame.

01:05:03 And we thought about that back in the memory days,

01:05:05 but West ended up doing the work to build it.

01:05:08 And then also the operations that were relevant

01:05:11 for data processing.

01:05:12 What I noticed is just that each of these little things

01:05:15 creates just another tick, another up.

01:05:17 So numpy ultimately took a little while,

01:05:19 about six months in, people started to join me,

01:05:22 Francesc Altad, Robert Kern, Charles Harris.

01:05:27 And these people are many of the unsung heroes, I would say.

01:05:30 People who are, you know,

01:05:31 they sometimes don’t get the credit they deserve

01:05:34 because they were critical both to support,

01:05:36 like, you know, it’s hard and you want,

01:05:38 you need some support, people need support.

01:05:40 And I needed just encouragement.

01:05:41 And they were helping and encouraged by contributing.

01:05:43 And once, the big thing for me was when John Hunter,

01:05:48 he had previously done kind of a simple thing

01:05:50 called numerics to kind of, you know, between numeric

01:05:52 and numerae, he had a little high level tool

01:05:55 that would just select each one for matplotlib.

01:05:57 In 2006, he finally said,

01:06:00 we’re gonna just make numpy the dependency of matplotlib.

01:06:03 As soon as he did that,

01:06:04 and I remember specifically when he did that,

01:06:06 I said, okay, we’ve done it.

01:06:07 Like, that was when I knew we had to see success.

01:06:11 Before then it was still unsure,

01:06:13 but that kind of started a roller coaster.

01:06:15 And then 2006 to 2009.

01:06:17 And then I’ve been floored by what it’s done.

01:06:20 Like, I knew it would help.

01:06:22 I had no idea how much it would help.

01:06:25 Right, so.

01:06:26 And it has to do with, again, the language thing.

01:06:28 It just, people started to think in terms of numpy.

01:06:31 Yes.

01:06:32 And that opened up a whole new way of thinking.

01:06:36 And part of the story that you kind of mentioned,

01:06:39 but maybe you can elaborate,

01:06:42 is it seems like at some point in the story,

01:06:46 Python took over science and data science.

01:06:50 Yes.

01:06:51 And bigger than that,

01:06:54 the scientific community started to think like programmers

01:07:00 or started to utilize the tools of computers to do,

01:07:04 like at a scale that wasn’t done with Fortran.

01:07:06 Like at this gigantic scale,

01:07:09 they started to open in their heart.

01:07:10 And then Python was the thing.

01:07:12 I mean, there’s a few other competitors, I guess,

01:07:14 but Python, I think, really, really took over.

01:07:16 I agree.

01:07:17 There’s a lot of stories here

01:07:18 that are kind of during this journey,

01:07:19 because this is sort of the start of this journey in 2005, 2006.

01:07:23 So my tenure committee, I applied for tenure in 2006, 2007.

01:07:28 It came back, I split the department.

01:07:29 I was very polarizing.

01:07:31 I had some huge fans

01:07:32 and then some people that said no way, right?

01:07:34 So it was very, I was a polarizing figure in the department.

01:07:36 It went all the way up to the university president.

01:07:39 Ultimately, my department chair had the sway

01:07:42 and they didn’t say no.

01:07:43 They said, come back in two years and do it again.

01:07:46 And I went, eh, at that point, I was like,

01:07:49 I mean, I had this interest in entrepreneurship,

01:07:52 this interest in not the academic circles,

01:07:56 not the, like, how do we make industry work?

01:07:59 So I do have to give credit to that exploration of economics

01:08:03 because that led me, oh, I had a lot of opinions.

01:08:06 I was actually very libertarian at the time.

01:08:09 And I still have some libertarian trends,

01:08:11 but I’m more of a, I’m more of a collectivist libertarian.

01:08:15 So you value broadly, philosophically freedom.

01:08:18 I value broadly, philosophically freedom,

01:08:20 but I also understand the power of communities,

01:08:23 like the power of collective behavior.

01:08:26 And so what’s that balance, right?

01:08:27 That makes sense.

01:08:29 So by the time I was just,

01:08:31 I gotta go out and explore this entrepreneur world.

01:08:33 So I left academia.

01:08:34 I said, no thanks, called my friend, Eric, here,

01:08:37 who had, his company was going.

01:08:39 I said, hey, could I join you and start this trend?

01:08:43 And he, at that time they were using SciFi a lot.

01:08:45 They were trying to get clients.

01:08:47 And so I came down to Texas.

01:08:48 And in Texas is where I sort of,

01:08:51 it’s my entrepreneur world, right?

01:08:53 I left academia and went to entrepreneur world in 2007.

01:08:57 So I moved here in 2007, kind of took a leap,

01:08:59 knew nothing really about business,

01:09:01 knew nothing about a lot of stuff there.

01:09:05 There’s, you know, for a long time,

01:09:06 I’ve kept some connections to a lot of academics

01:09:08 because I still value it.

01:09:10 I still love the scientific tradition.

01:09:12 I still value the essence and the soul and the heart

01:09:15 of what is possible.

01:09:17 Don’t like a lot of the administration

01:09:21 and the kind of, we can go into detail about why

01:09:24 and where and how this happens,

01:09:25 what are some of the challenges.

01:09:26 I don’t know, but I’m with you.

01:09:28 So I’m still affiliated with MIT.

01:09:31 I still love MIT because there’s magic there.

01:09:35 There’s people I talk to, like researchers, faculty,

01:09:40 in those conversations and the whiteboard

01:09:43 and just the conversation, that’s magic there.

01:09:46 All the other stuff, the administration,

01:09:48 all that kind of stuff seems to,

01:09:52 you don’t wanna say too harshly criticize

01:09:54 sort of bureaucracies, but there’s a lag

01:09:57 that seems to get in the way of the magic.

01:10:00 And I’m still have a lot of hope

01:10:03 that that can change because I don’t often see

01:10:08 that particular type of magic elsewhere in the industry.

01:10:12 So like we need that and we need that flame going.

01:10:15 And it’s the same thing as exactly as you said,

01:10:19 it has the same kind of elements

01:10:20 like the open source community does.

01:10:23 And, but then if you, like the reason I stepped away,

01:10:27 the reason I’m here, just like you did in Austin is like,

01:10:30 if I wanna build one robot, I’ll stay at MIT.

01:10:33 But if I wanna build millions and make money enough

01:10:37 to where I can explore the magic of that, then you can’t.

01:10:41 And I think that dance is…

01:10:44 That translational dance has been lost a bit, right?

01:10:47 And there’s a lot of reasons for that.

01:10:48 I’m not, I’m certainly not an expert on this stuff.

01:10:50 I can opine like anybody else,

01:10:51 but I realized that I wanted to explore entrepreneurship,

01:10:55 which I, and really figure out,

01:10:57 and it’s been a driving passion for 20 years, 25 years.

01:11:01 How do we connect capital markets and company?

01:11:06 Cause again, I fell in love with the notion of,

01:11:07 oh, profit seeking on its own is not a bad thing.

01:11:11 It’s actually a coordination mechanism

01:11:13 for allocating resources that, you know,

01:11:16 in an emergent way, right?

01:11:18 That respects everybody’s opinions, right?

01:11:20 So this is actually powerful.

01:11:21 So I say all the time, when I make a company

01:11:25 and we do something that makes profit,

01:11:27 what we’re saying is, hey,

01:11:28 we’re collecting of the world’s resources

01:11:29 and voluntarily people are asking us

01:11:31 to do something that they like.

01:11:33 And that’s a huge deal.

01:11:34 And so I really liked that energy.

01:11:36 So that’s what I came to do and to learn

01:11:37 and to try to figure out.

01:11:38 And that’s what I’ve been kind of stumbling through

01:11:40 since for the past 14 years.

01:11:40 And that’s 2007.

01:11:42 2007, yeah.

01:11:43 And so you were still working at NoPi.

01:11:44 So NoPi was just emerging.

01:11:46 Just emerging.

01:11:47 One of the things I’ve done,

01:11:49 it’s worth mentioning because it emphasizes

01:11:51 the exploratory nature of my thinking at the time.

01:11:53 I said, well, I don’t know how to fund this thing.

01:11:55 I’ve got a graduate student I’m paying for

01:11:56 and I’ve got no funding for him.

01:11:57 And I had done some fundraising from the public

01:12:00 to try to get public fundraisers in my lab.

01:12:02 I didn’t really wanna go out

01:12:03 and just do the fundraising circuit

01:12:05 the way it’s traditionally done.

01:12:06 So I wrote a book and I said, I’m gonna write a book

01:12:09 and I’m gonna charge for it.

01:12:11 It was called Guide to NoPi.

01:12:12 And so ultimately NoPi became

01:12:14 documentation driven development

01:12:15 because I basically wrote the book

01:12:17 and made sure the stuff worked or the book would work.

01:12:19 So it really helped actually make NoPi become a thing.

01:12:23 So writing that book,

01:12:25 and it’s not a page turner.

01:12:28 Guide to NoPi is not a book you pick up

01:12:29 and go, oh, this is great, over the fire.

01:12:31 But it’s where you could find the details,

01:12:33 like how’d all this work.

01:12:34 And a lot of people love that book.

01:12:36 And so a lot of people ended up,

01:12:38 so I said, look, I need to, so I’m gonna charge for it.

01:12:41 And I got some flack for that.

01:12:42 Not that much, just probably five angry messages,

01:12:45 people yelling at me saying I was a bad guy

01:12:49 for charging for this book.

01:12:51 Was one of them Richard Stallman?

01:12:53 No. Just kidding.

01:12:54 No, I haven’t really had any interaction with him personally,

01:12:56 like I said, but there were a few,

01:12:59 but actually surprisingly not.

01:13:01 There was actually a lot of people like,

01:13:02 no, it’s fine, you can charge for a book.

01:13:04 That’s no big deal.

01:13:05 We know that’s a way you can try to make money

01:13:07 around open source.

01:13:07 So what I did, I did it in an interesting way.

01:13:10 I said, well, kind of my ideas around IP law and stuff.

01:13:14 I love the idea you can share something, you can spread it.

01:13:16 Like once it’s, the fact that you have a thing

01:13:18 and copying is free, but the creation is not free.

01:13:21 So how do you fund the creation and allow the copying?

01:13:25 And in software, it’s a little more complicated than that

01:13:27 because creation is actually a continuous thing.

01:13:29 It’s not like you build a widget and it’s done.

01:13:31 It’s sort of a process of emerging

01:13:32 and continuing to create.

01:13:34 But I wrote the book

01:13:35 and had this market determined price thing.

01:13:37 I said, look, I need, I think I said 250,000.

01:13:40 If I make 250,000 from this book, I’ll make it free.

01:13:44 So as soon as I get that much money,

01:13:45 or I said five years, so there’s a time limit.

01:13:48 Like it’s not forever.

01:13:49 That’s really cool.

01:13:50 It’s amazing.

01:13:51 I released it on this.

01:13:53 And it’s actually interesting

01:13:54 because one of the people

01:13:55 who also thought that was interesting

01:13:57 ended up being Chris White,

01:13:58 who was the director of DARPA project

01:14:01 that we got funding through at Anaconda.

01:14:02 And the reason he even called us back

01:14:04 is because he remembered my name from this book

01:14:06 and he thought that was interesting.

01:14:08 And so even though we hadn’t gone to the demo days,

01:14:10 we applied and the people said, yeah,

01:14:12 nobody ever gets this without coming to the demo day first.

01:14:15 This is the first time I’ve seen it.

01:14:16 But it’s because I knew, you know,

01:14:18 Chris had done this and had this interaction.

01:14:19 So it did have impact.

01:14:21 I was actually really, really pleased by the result.

01:14:23 I mean, I ended up in three years, I made 90,000.

01:14:27 So sold 30,000 copies by myself.

01:14:29 I just put it up on, you know, use PayPal and sold it.

01:14:33 And that was my first taste of kind of, okay,

01:14:36 this can work to some degree.

01:14:37 And I, you know, all over the world, right?

01:14:40 From Germany to Japan to, it was actually, it did work.

01:14:44 And so I appreciated the fact that PayPal existed

01:14:47 and I had a way to get the money, the distribution was simple.

01:14:51 This is pre Amazon book stuff.

01:14:53 So it was just publishing a website.

01:14:55 It was the popularity of SciPy emerging

01:14:57 and getting company usage.

01:14:58 I ended up not letting it go the five years

01:15:00 and not trying to make the full amount

01:15:01 because, you know, a year and a half later,

01:15:04 I was at Enthought.

01:15:05 I had left academia as an Enthought

01:15:06 and I kind of had a full time job.

01:15:07 And then actually what happened is the documentation people,

01:15:10 there’s a group that said, hey,

01:15:10 we want to do documentation for SciPy as a collective.

01:15:14 And they’re essentially needing the stuff in the book, right?

01:15:18 And so they kind of ask,

01:15:20 hey, could we just use the stuff in your book?

01:15:21 And at that point I said, yeah, I’ll just open it up.

01:15:24 So that’s, but it has served its purpose.

01:15:27 And the money that I made actually funded my grad student.

01:15:31 Like it was actually, you know,

01:15:32 I paid him 25,000 a year out of that money.

01:15:35 So the funny thing is if you do a very similar

01:15:37 kind of experiment now with NumPy or something like it,

01:15:40 you could probably make a lot more.

01:15:42 It’s probably true.

01:15:43 Because of the tooling and the community building.

01:15:46 Yeah, I agree.

01:15:47 Like the, and social media,

01:15:48 that there’s just a virality to that kind of idea.

01:15:51 I agree.

01:15:52 There’d be things to do.

01:15:53 I’ve thought about that.

01:15:54 And really I thought about a couple of books

01:15:56 or a couple of things that could be done there.

01:15:57 And I just haven’t, right?

01:15:58 Even, I tried to hire a ghostwriter this year too

01:16:01 to see if that could help, but it didn’t.

01:16:04 But part of my problem is this,

01:16:06 I’ve been so excited by a number of things

01:16:08 that have stemmed from that.

01:16:09 Like, so I came here, worked at Enthought for four years,

01:16:13 graciously, Eric made me president.

01:16:14 Then we started to work closely together.

01:16:16 We actually helped him buy out his partner.

01:16:19 It didn’t end great.

01:16:20 Like unfortunately Eric and I aren’t real,

01:16:22 aren’t friends now.

01:16:24 I still respect him.

01:16:25 I have a lot, I wish we were,

01:16:26 but he didn’t like the fact that Peter and I

01:16:30 started Anaconda, right?

01:16:31 That was not, I mean, so there’s two sides to that story.

01:16:36 So I’m not gonna go into it, right?

01:16:37 Sure.

01:16:38 But you, as human beings

01:16:40 and you wish you still could be friends.

01:16:42 I do, I do.

01:16:43 It saddens me.

01:16:45 I mean, that’s a story of great minds

01:16:49 building great companies.

01:16:51 Somehow it’s sad that when there’s that kind of.

01:16:55 And I hold him in esteem.

01:16:57 I’m grateful for him.

01:16:58 I think Enthought still exists.

01:17:00 They’re doing great work helping scientists.

01:17:02 They still run the SciPy conference.

01:17:05 They have an R&D platform they’re selling now

01:17:07 that’s a tool that you can go get today, right?

01:17:10 So Enthought has played a role in the SciPy

01:17:14 in supporting the community around SciPy, I would say.

01:17:18 They ended up not being able to,

01:17:20 they ended up building a tool suite

01:17:22 to write GUI applications.

01:17:24 Like that’s where they could actually make

01:17:25 that the business could work.

01:17:26 And so supporting SciPy and NumPy itself

01:17:29 wasn’t as possible.

01:17:30 Like they didn’t, they tried.

01:17:31 I mean, it was not just because,

01:17:33 it was just because of the business aspect.

01:17:34 So, and I wanted to build a company that could do,

01:17:36 that could get venture funding, right?

01:17:39 Better for worse.

01:17:39 I mean, that’s a longer story.

01:17:41 We could talk a lot about that, but.

01:17:42 And that’s where Anaconda came to be.

01:17:44 That’s where Anaconda came to be.

01:17:45 So let me ask you, it’s a little bit for fun

01:17:48 because you built this amazing thing.

01:17:50 And so let’s talk about like an old warrior

01:17:54 looking over old battles.

01:17:57 You’ve, you know, there’s a sad letter in 2012

01:18:01 that you wrote to the NumPy mailing list

01:18:04 announcing that you’re leaving NumPy.

01:18:06 And some of the things you’ve listed

01:18:08 as some of the things you regret

01:18:10 or not regret necessarily, but some things to think about.

01:18:14 If you could go back and you could fix stuff about NumPy

01:18:17 or both sort of in a personal level,

01:18:20 but also like looking forward,

01:18:21 what kind of things would you like to see changed?

01:18:24 Good question.

01:18:25 So I think there’s technical questions

01:18:26 and social questions right there.

01:18:29 First of all, you know, I wrote NumPy as a service

01:18:33 and I spent a lot of time doing it.

01:18:35 And then other people came help make it happen.

01:18:36 NumPy succeeded because the work of a lot of people, right?

01:18:39 So it’s important to understand that.

01:18:42 I’m grateful for the opportunity,

01:18:43 the role I had, I could play

01:18:45 and grateful that things I did had an impact,

01:18:47 but they only had the impact they had

01:18:49 because the other people that came to the story.

01:18:52 And so they were essential,

01:18:53 but the way data types were handled,

01:18:55 the way data types, we had array scalers, for example,

01:18:59 that are really just a substitute for a type concept, right?

01:19:04 So we had array scalers or actual Python objects

01:19:06 so that there’s for every, for a 32 bit float

01:19:09 or a 16 bit float or a 16 bit integer,

01:19:13 Python doesn’t have a natural,

01:19:14 it’s just one integer, there’s one float.

01:19:17 Well, what about these lower precision types,

01:19:19 these larger precision types?

01:19:21 So we had them in NumPy

01:19:23 so that you could have a collection of them,

01:19:25 but then have an object in Python that was one of them.

01:19:28 And there’s questions about like in retrospect,

01:19:31 I wouldn’t have created those

01:19:32 if it improved the type system.

01:19:34 And like made the type system actually a Python type system

01:19:38 as opposed to currently,

01:19:39 it’s a Python one level type system.

01:19:41 I don’t know if you know the difference

01:19:42 between Python one, Python two,

01:19:43 it’s kind of technical, kind of depth,

01:19:44 but Python two, one of its big things that Guido did,

01:19:47 it was really brilliant.

01:19:48 It was the actually Python one,

01:19:51 all classes, new objects were one.

01:19:55 If you as a user wrote a class,

01:19:56 it was an instance of a single Python type

01:19:59 called the class type, right?

01:20:02 In Python two, he used a meta typing hook

01:20:06 to actually go, oh, we can extend this

01:20:07 and have users write classes that are new types.

01:20:10 So he was able to have your user classes be actual types

01:20:13 and the Python type system got a lot more rich.

01:20:16 I barely understood that at the time that NumPy was written.

01:20:19 And so I essentially in NumPy created a type system

01:20:22 that was Python one era.

01:20:24 It was every D type is an instance of the same type

01:20:29 as opposed to having new D types be really just Python types

01:20:33 with additional metadata.

01:20:34 What’s the cost of that?

01:20:35 Is it efficiency, is it usability?

01:20:37 It’s usability primarily.

01:20:38 The cost isn’t really efficiency.

01:20:40 It’s the fact that it’s clumsy to create new types.

01:20:45 It’s hard.

01:20:45 And then one of the challenges,

01:20:47 you wanna create new types.

01:20:48 You wanna quaternion type or you wanna add a new posit type

01:20:52 or you wanna, so it’s hard.

01:20:55 And now, if we had done that well,

01:20:59 when Numba came on the scene

01:21:00 where we could actually compile Python code,

01:21:02 it would integrate with that type system much cleaner.

01:21:05 And now all of a sudden you could do gradual typing

01:21:08 more easily.

01:21:08 You could actually have Python when you add Numba

01:21:10 plus better typing, could actually be a,

01:21:14 you’d smooth out a lot of rough edges.

01:21:16 But there’s already, there’s like,

01:21:18 but are you talking about from the perspective

01:21:20 of developers within NumPy or users of NumPy?

01:21:23 Developers of new, not really users of NumPy so much.

01:21:27 It’s the development of NumPy.

01:21:28 So you’re thinking about like how to design NumPy

01:21:32 so that it’s contributors.

01:21:33 Yeah, the contributors, it’s easier.

01:21:35 It’s easier.

01:21:36 It’s less work to make it better and to keep it maintained.

01:21:39 And where that’s impacted things, for example,

01:21:41 is the GPU.

01:21:43 Like all of a sudden GPUs start getting added

01:21:45 and we don’t have them in NumPy.

01:21:48 Like NumPy should just work on GPUs.

01:21:50 The fact that we’d have to download a whole other object

01:21:52 called Kupy to have arrays on GPUs

01:21:54 is just an artifact of history.

01:21:57 Like there’s no fundamental reason for it.

01:21:59 Well, that’s really interesting.

01:22:00 If we could sort of go on that tangent briefly

01:22:02 is you have PyTorch and other libraries like TensorFlow

01:22:07 that basically tried to mimic NumPy.

01:22:11 Like you’ve created a sort of platonic form

01:22:15 of multi dimension. Basically, yeah.

01:22:16 Yeah, exactly.

01:22:17 Well, and the problem was I didn’t realize that.

01:22:19 Platonic form has a lot of edges.

01:22:21 They’re like, well, we should cut those out

01:22:23 before we present it.

01:22:24 So I wonder if you can comment,

01:22:26 is there like a difference between their implementations?

01:22:29 Do you wish that they were all using NumPy

01:22:31 or like in this abstraction of GPU?

01:22:34 And sorry to interrupt that there’s GPUs, ASICs.

01:22:38 There might be other neuromorphic computing.

01:22:40 There might be other kind of,

01:22:41 or the aliens will come with a new kind of computer.

01:22:43 Like an abstraction that NumPy should just operate nicely

01:22:47 over the things that are more and more

01:22:50 and smarter and smarter with this multi dimensional arrays.

01:22:54 Yeah, yeah.

01:22:55 There’s several comments there.

01:22:56 We are working on something now called data dash APIs.org.

01:23:00 Data dash API.org, you can go there today.

01:23:02 And it’s our answer.

01:23:04 It’s my answer.

01:23:05 It’s not just me.

01:23:06 It’s me and Rolf and Athen and Aaron

01:23:09 and a lot of companies are helping us at Quansight Labs.

01:23:13 It’s not unifying all the arrays.

01:23:14 It’s creating an API that is unified.

01:23:17 So we do care about this

01:23:19 and we’re trying to work through it.

01:23:21 I actually had the chance to go and meet

01:23:22 with the TensorFlow team and the PyTorch team

01:23:25 and talk to them after exiting Anaconda.

01:23:29 Just talking about,

01:23:29 because the first year after leaving Anaconda in 2018,

01:23:33 I became deeply aware of this and realized that,

01:23:36 oh, this split in the array community that exists today

01:23:38 makes what I was concerned about in 2005 pretty parochial.

01:23:44 It’s a lot worse, right?

01:23:45 Now there’s a lot more people.

01:23:47 So perhaps the industry can sustain more stacks, right?

01:23:51 There’s a lot of money,

01:23:52 but it makes it a lot less efficient.

01:23:54 I mean, but I’ve also learned to appreciate,

01:23:56 it’s okay to have some competition.

01:23:58 It’s okay to have different implementations,

01:24:00 but it’s better if you can at least refactor some parts.

01:24:03 I mean, you’re gonna be more efficient

01:24:04 if you can refactor parts.

01:24:07 It’s nice to have competition over things,

01:24:09 over what is nice to have competition.

01:24:11 They’re innovative.

01:24:12 Yeah, innovative.

01:24:13 And then maybe on the infrastructure,

01:24:15 whatever, however you define infrastructure,

01:24:18 that maybe it’s nice to have come together.

01:24:21 Exactly, I agree.

01:24:22 And I think, but it was interesting to hear the stories.

01:24:24 I mean, TensorFlow came out of a C++ library,

01:24:29 Jeff Dean wrote, I think,

01:24:30 that was basically how they were doing inference, right?

01:24:33 And then they realized, oh,

01:24:34 we could do this TensorFlow thing.

01:24:36 That C++ library, then what was interesting to me

01:24:38 was the fact that both Google and Facebook did not,

01:24:42 it’s not like they supported Python or NumPy initially.

01:24:44 They just realized they had to.

01:24:47 They came to this world and then all the users were like,

01:24:48 hey, where’s the NumPy interface?

01:24:50 Oh, and then they kind of came late to it

01:24:52 and then they had these bolt ons.

01:24:54 TensorFlow’s bolt on, I don’t mean to offend,

01:24:57 but it was so bad.

01:24:58 Yeah, it was bad.

01:24:59 It’s the first time that I’m usually,

01:25:01 I mean, one of the challenges I have

01:25:04 is I don’t criticize enough in the sense

01:25:07 that I don’t give people input enough, you know, if.

01:25:09 I think it’s universally agreed upon

01:25:11 that the bolt ons on TensorFlow were.

01:25:13 But I went to, it was a talk given at Mallorca in Spain

01:25:17 and a great guy came and gave a talk and I said,

01:25:19 you should never show that API again

01:25:21 at a PyData conference.

01:25:23 Like that was, that’s terrible.

01:25:24 Like you’re taking this beautiful system we’ve created

01:25:27 and like you’re corrupting all these poor Python people,

01:25:29 forcing them to write code like that

01:25:30 or thinking they should.

01:25:32 Fortunately, you know, they adopted Keras as their,

01:25:35 and Keras is better.

01:25:36 And so Keras, TensorFlow is fine, is reasonable,

01:25:40 but they bolted it on.

01:25:42 Facebook did too.

01:25:43 Like Facebook had their own C++ library for doing inference

01:25:48 and they also had the same reaction, they had to do this.

01:25:51 One big difference is Facebook,

01:25:52 maybe because of the way it’s situated in part of fair,

01:25:55 part of the research library,

01:25:56 TensorFlow is definitely used and, you know,

01:25:58 they have to make, they couldn’t just open it up

01:26:00 and let the community, you know, change what that is.

01:26:03 Cause I guess they were worried

01:26:04 about disrupting their operations.

01:26:06 Facebook’s been much more open to having community input

01:26:10 on the structure itself.

01:26:12 Whereas Google and TensorFlow,

01:26:14 they’re really eager to have community users,

01:26:16 people use it and build the infrastructure,

01:26:17 but it’s much more walled.

01:26:18 Like it’s harder to become a contributor to TensorFlow.

01:26:21 And it’s also, this is very difficult question to answer

01:26:24 and don’t mean to be throwing shade at anybody,

01:26:27 but you have to wonder, it’s the Microsoft question

01:26:30 of when you have a tool like PyTorch or TensorFlow,

01:26:33 how much are you tending to the hackers

01:26:36 and how much are you tending to the big corporate clients?

01:26:39 Correct.

01:26:40 So like the ones that,

01:26:42 do you tend to the millions of people

01:26:44 that are giving you almost no money,

01:26:46 or do you tend to the few

01:26:48 that are giving you a ton of money?

01:26:50 I tend to stand with the people.

01:26:54 Right.

01:26:54 Cause I feel like if you nurture the hackers,

01:26:57 you will make the right decisions in the longterm

01:27:00 that will make the companies happy.

01:27:02 I lean that way too.

01:27:03 I totally agree.

01:27:04 But then you have to find the right dance.

01:27:05 But it’s a balance.

01:27:07 Cause you can lean to the hackers and run out of money.

01:27:08 Yeah, exactly.

01:27:10 Exactly.

01:27:11 Which has been some of the challenge I’ve faced

01:27:13 in the sense that,

01:27:14 like I would look at some of the experiments,

01:27:17 like NumPy, the fact that we have this split

01:27:19 is a factor of I wasn’t able to collect more money

01:27:21 towards NumPy development.

01:27:22 Yeah.

01:27:23 Right?

01:27:24 I mean, I didn’t succeed in the early days

01:27:26 of getting enough financial contribution to NumPy

01:27:29 so that they could work on it.

01:27:31 Right?

01:27:31 I couldn’t work on it full time.

01:27:32 I had to just catch an hour here, an hour there.

01:27:35 And I basically not liked that.

01:27:37 Like I’ve wanted to be able to do something about that

01:27:39 for a long time and try to figure out how,

01:27:41 well, there’s lots of ways.

01:27:42 I mean, possibly one could say,

01:27:44 we had an offer from Microsoft

01:27:46 at early days of Anaconda.

01:27:48 2014, they offered to come buy us, right?

01:27:51 The problem was the right people at Microsoft

01:27:52 didn’t offer to buy us.

01:27:53 And they were still,

01:27:54 they were, it was really a,

01:27:56 we were like a second,

01:27:58 they had really bought, they just bought R,

01:27:59 the R company called,

01:28:01 it was not R studio,

01:28:02 but it was another R company that was emergent.

01:28:05 And it was kind of a,

01:28:07 well, we should also get a Python play,

01:28:09 but they were really doubling down on R.

01:28:11 Right?

01:28:12 And so it was like,

01:28:13 it was where you would go to die.

01:28:14 So it’s not, it wasn’t,

01:28:15 it was before Satya was there.

01:28:17 Satya had just started.

01:28:18 Just started.

01:28:19 Right?

01:28:20 And the offer was coming from someone

01:28:21 two levels down from him.

01:28:23 Got you.

01:28:23 Right?

01:28:24 And if it had come from Scott Guthrie,

01:28:26 so I got a chance to meet Scott Guthrie,

01:28:28 great guy, I like him.

01:28:29 If an offer had come from him,

01:28:31 probably would be at Microsoft right now.

01:28:33 That’d be fascinating.

01:28:34 That would be really nice actually,

01:28:36 especially given what Microsoft has since done

01:28:38 for the open source community and all those things.

01:28:40 Yes, I think they’re doing well.

01:28:41 I really like some of the stuff they’ve been doing.

01:28:43 They’re still working,

01:28:45 and they’ve, you know,

01:28:46 they’ve hired Guido now,

01:28:46 and they’ve hired a lot of Python developers.

01:28:47 Wait, Guido’s not at Microsoft?

01:28:49 Yeah, he works at Microsoft.

01:28:50 I need to.

01:28:52 Which, he retired,

01:28:53 then he came out of retirement,

01:28:54 and he’s working now.

01:28:55 I was just talking to him,

01:28:56 and he didn’t mention this person.

01:28:57 Well.

01:28:58 I should investigate this further.

01:29:01 Well.

01:29:02 Because I know he loved Dropbox,

01:29:02 but I wasn’t sure what he was doing,

01:29:04 who he was up to.

01:29:05 Well, he was kind of saying he’d retire,

01:29:06 but, and it’s literally been five years

01:29:09 since I last sat down and really talked to Guido.

01:29:12 Right?

01:29:13 Guido’s a technology expert, right?

01:29:16 He’s a, so I came,

01:29:17 I was excited because I’d finally figured out

01:29:18 the type system for NumPy.

01:29:20 I wanted to kind of talk about that with him,

01:29:22 and I kind of overwhelmed him.

01:29:23 Could you stay in that,

01:29:25 just for a brief moment,

01:29:26 because you’re a fascinating person

01:29:28 in the history of programming.

01:29:29 He is a fascinating person.

01:29:31 What have you learned from Guido

01:29:34 about programming, about life?

01:29:37 Yeah, yeah.

01:29:38 A lot, actually.

01:29:39 I’ve been a fan of Guido’s.

01:29:40 You know, we have a chance to talk.

01:29:42 Some, I wouldn’t say, you know,

01:29:43 we talk all the time.

01:29:44 Not at all.

01:29:45 He may, but we talk enough to,

01:29:47 I respect his,

01:29:48 in fact, when I first started NumPy,

01:29:49 one of the first things I did was I had a,

01:29:51 I asked Guido for a meeting

01:29:53 with him and Paul Dubois in San Mateo.

01:29:55 And I went and met him for lunch.

01:29:56 And basically, to say,

01:29:58 maybe we can actually,

01:29:59 part of the strategy for NumPy

01:30:00 was to get it into Python 3,

01:30:02 and maybe be part of Python.

01:30:04 And so we talked about that.

01:30:05 That’s a cool conversation.

01:30:06 And about that approach, right?

01:30:06 I would have loved to be a flyer in the water.

01:30:09 That was good.

01:30:10 And over the years for Guido,

01:30:12 I learned,

01:30:13 so he was open.

01:30:14 Like, he was willing to listen to people’s ideas.

01:30:18 Right?

01:30:19 And over the years,

01:30:19 now generally, you know,

01:30:20 I’m not saying universally that’s been true,

01:30:22 but generally that’s been true.

01:30:24 So he’s willing to listen.

01:30:25 He’s willing to defer.

01:30:27 Like on the scientific side,

01:30:28 he would just kind of defer.

01:30:29 He didn’t really always understand

01:30:30 what we were doing.

01:30:31 Yeah.

01:30:31 And he’d defer.

01:30:32 One place where he didn’t enough

01:30:35 was we missed a matrix multiply operator.

01:30:37 Like that finally got added to Python,

01:30:39 but about 10 years later than it should have.

01:30:42 But the reason was because nobody,

01:30:44 it takes a lot of effort.

01:30:46 And I learned this while I was writing NumPy.

01:30:48 I also wrote tools to Python.

01:30:49 I began with Python Dev,

01:30:50 and I added some pieces to Python.

01:30:52 Like the memory view object.

01:30:53 I wanted the structure of NumPy into Python.

01:30:55 So we didn’t get NumPy into Python,

01:30:56 but we got the basic structure of it into Python.

01:30:59 Like, so you could build on it.

01:31:01 Nobody did for a while,

01:31:01 but eventually database authors started to.

01:31:04 And it’s a lot better.

01:31:05 They did.

01:31:06 And also Antoine Petrou and Stefan Krah

01:31:08 actually fixed the memory view object.

01:31:10 Cause I wrote the underlying infrastructure in C,

01:31:13 but the Python exposure was terrible

01:31:15 until they came in and fixed it.

01:31:16 Partly because I was writing NumPy,

01:31:18 and NumPy was the Python exposure.

01:31:19 I didn’t really care about

01:31:21 if you didn’t have NumPy installed.

01:31:22 Anyway, Guido opened up ideas,

01:31:25 technologically brilliant.

01:31:27 Like really, I really got a lot of respect for him

01:31:29 when I saw what he did

01:31:30 with this type class merger thing.

01:31:33 It was actually tricky, right?

01:31:35 And then willing to share, willing to share his ideas.

01:31:38 So the other thing early on in 1998,

01:31:40 I said, I wrote my first extension module.

01:31:42 The reason I could is because he’d written this blog post

01:31:44 on how to do reference counting, right?

01:31:47 And without it, I would have been lost, right?

01:31:50 But he was willing to at least try to write this post.

01:31:53 And so he’s been motivated early on with Python.

01:31:56 There’s a computer science for everybody.

01:31:58 You kind of have this early on desire to,

01:31:59 oh, maybe we should be pushing programming to more people.

01:32:02 So he had this populist notion, I guess,

01:32:04 or populist sense to learn that there’s a certain skill,

01:32:08 and I’ve seen it in other people too,

01:32:10 of engaging with contributors sufficiently to,

01:32:13 because when somebody engaged with you

01:32:15 and wants to contribute to you,

01:32:16 if you ignore them, they go away.

01:32:18 So building that early contributor base

01:32:19 requires real engagement with other people.

01:32:23 And he would do that.

01:32:24 Can you also comment on this tragic stepping down

01:32:29 from his position as the benevolent dictator for life

01:32:32 over the wars, you know?

01:32:35 The Walrus operator?

01:32:36 The Walrus operator was the last battle.

01:32:39 I don’t know if that’s the cause of it,

01:32:40 but there’s this, for people who don’t know,

01:32:43 you can look up, there’s the Walrus operator,

01:32:45 which looks like a colon and equal sign.

01:32:49 Yeah, colon, equal sign.

01:32:50 And it actually does maybe the thing

01:32:54 that an equal sign should be doing.

01:32:57 Yeah, maybe, right, exactly.

01:33:00 But it’s just historically,

01:33:02 equal sign means something else.

01:33:03 It just means assignment.

01:33:05 So he stepped down over this.

01:33:07 What do you think about the pressure of leadership?

01:33:10 It’s something that, you mentioned the letter I wrote

01:33:12 in NumPy at the time.

01:33:13 That was a hard time, actually.

01:33:15 I mean, there’s been really hard times.

01:33:17 It was hard.

01:33:19 You get criticized, right?

01:33:20 And you get pushed, and you get,

01:33:22 not everybody loves what you do.

01:33:23 Like anytime you do anything that has impact at all,

01:33:26 you’re not universally loved, right?

01:33:28 You get some real critics.

01:33:29 And that’s an important energy,

01:33:31 because it’s impossible for you to do everything right.

01:33:35 You need people to be pushing.

01:33:37 But sometimes people can get mean, right?

01:33:39 People can, I prefer to give people the benefit of the doubt.

01:33:43 I don’t immediately assume they have bad intentions.

01:33:45 And maybe for other, maybe that doesn’t happen for everybody.

01:33:49 For whatever reason, their past,

01:33:50 their experiences with people, they sometimes have bad,

01:33:53 so they immediately attribute to you bad intentions.

01:33:54 So you’re like, where did this come from?

01:33:56 I mean, I’m definitely open to criticism,

01:33:57 but I think you’re misinterpreting the whole point.

01:34:00 Because I would get that, certainly when I started Anaconda.

01:34:05 Sometimes I say to people,

01:34:08 I care enough about entrepreneurship

01:34:09 to make some open source people uncomfortable.

01:34:12 And I care enough about open source

01:34:13 to make investors uncomfortable.

01:34:15 So I sort of, you create kind of doubters on both sides.

01:34:19 So when you have, and this is just a plea

01:34:23 to the listener and the public, I’ve noticed this too,

01:34:27 that there’s a tendency, and social media makes this worse,

01:34:32 when you don’t have perfect information about the situation,

01:34:35 you tend to fill the gaps with the worst possible,

01:34:39 or at least a bad story that fills those gaps.

01:34:43 And I think it’s good to live life,

01:34:46 maybe not fully naively, but filling in the gaps

01:34:49 with the good, with the best, with the positive,

01:34:54 with the hopeful explanation of why you see this.

01:34:57 So if you see somebody like you trying to make money

01:35:00 on a book about an umpire,

01:35:01 there’s a million stories around that that are positive.

01:35:04 And those are good to think about,

01:35:07 to project positive intent on the people.

01:35:10 Because for many reasons, usually because people are good

01:35:13 and they do have good intent.

01:35:15 And also when you project that positive intent,

01:35:17 people will step up to that too.

01:35:19 Yes.

01:35:20 It’s a great point.

01:35:21 It has this kind of viral nature to it.

01:35:24 And of course with Twitter, early on figured out,

01:35:27 and Facebook is that they can make a lot of money

01:35:30 and engagement from the negative.

01:35:32 Yes.

01:35:33 So there’s this, we’re fighting this mechanism.

01:35:35 I agree.

01:35:36 Which is challenging.

01:35:37 It’s easier.

01:35:37 It’s just easier to be.

01:35:38 To be negative.

01:35:39 And then for some reason, something in our minds

01:35:41 really enjoys sharing that and getting all excited

01:35:45 about the negativity.

01:35:46 We do, yeah.

01:35:47 Some protective mechanism perhaps that we’re gonna get eaten

01:35:50 if we don’t, yeah.

01:35:51 Exactly.

01:35:52 For us to be effective as a group of people

01:35:53 in a software engineering project,

01:35:54 you have to project positive intent, I think.

01:35:56 I totally agree.

01:35:57 Totally agree.

01:35:58 And I think that’s very,

01:35:59 and so that happens in this space.

01:36:01 But Python has done a reasonable job in the past,

01:36:03 but here is a situation where I think it started

01:36:05 to get this pressure where it didn’t.

01:36:07 I really didn’t, I didn’t know enough about what happened.

01:36:10 I’ve talked to several people about it.

01:36:12 And I know most of the steering committee members today,

01:36:15 one person nominated me for that role,

01:36:17 but it’s the wrong role for me right now, right?

01:36:20 I have a lot of respect for the Python developer space

01:36:24 and the Python developers.

01:36:25 I also understand the gap between computer science

01:36:27 Python developers and array programming developers

01:36:30 or science developers.

01:36:31 And in fact, Python succeeds in the array space

01:36:34 the more it has people in that boundary.

01:36:36 And there’s often very few.

01:36:37 Like I was playing a role in that boundary

01:36:39 and working like everything to try to keep up

01:36:42 with even what Guido was saying, like I’m a C programmer,

01:36:47 but not a computer scientist.

01:36:49 Like I was an engineer and physicist and mathematician,

01:36:52 and I didn’t always understand

01:36:54 what they were talking about

01:36:56 and why they would have opinions the way they did.

01:36:58 So, you know, you have to listen and try to understand.

01:37:00 Then you also have to explain your point of view

01:37:02 in a way they can understand.

01:37:03 And that takes a lot of work.

01:37:04 And that communication is always the challenge.

01:37:07 And it’s just what we’re describing here

01:37:09 about the negativity is just another form of that.

01:37:11 Like how do we come together?

01:37:12 And it does appear we’re wired anyway

01:37:14 to at least have a, there’s a part of us

01:37:16 that will enemy, you know, friend, enemy.

01:37:18 And we see, yeah, it’s like,

01:37:21 why are we wiring on the enemy front?

01:37:23 So why are we pushing that?

01:37:24 Why are we promoting that so deeply?

01:37:26 Assume friend until proven otherwise.

01:37:28 Yes, yes.

01:37:30 So, cause you have such a fascinating mind in all of this.

01:37:32 Let me just ask you these questions.

01:37:34 So one interesting side on the Python history

01:37:38 is the move from Python two to Python three.

01:37:41 You mentioned move from Python one to Python two,

01:37:43 but the move from Python two to Python three

01:37:46 is a little bit interesting

01:37:47 because it took a very long time.

01:37:50 It broke, you know, quite a small way

01:37:53 backward compatibility, but even that small way

01:37:56 seemed to have been very painful for people.

01:37:58 Is there lessons you draw?

01:38:00 Oh man, tons of lessons.

01:38:01 From how long it took and how painful it seemed to be?

01:38:05 Yeah, tons of lessons.

01:38:07 Well, I mentioned here earlier

01:38:08 that NumPy was written in 2005.

01:38:11 It was in 2005 that I actually went to Guido

01:38:15 to talk about getting NumPy into Python three.

01:38:17 Like my strategy was to,

01:38:18 oh, we were moving to Python three.

01:38:19 Let’s have that be, and it seems funny in retrospect

01:38:22 because like, wait, Python three,

01:38:23 that was in 2020, right?

01:38:25 When we finally ended the support for Python two

01:38:27 or at least 2017.

01:38:29 The reason it took a long time,

01:38:30 a lot of time, I think it was because one of the things is

01:38:33 there wasn’t much to like about Python three.

01:38:36 3.0, 3.1, it really wasn’t until 3.3.

01:38:40 Like I consider Python 3.3 to be Python 3.0.

01:38:43 But it wasn’t until Python 3.3

01:38:44 that I felt there’s enough stuff in it

01:38:47 to make it worth anybody using it, right?

01:38:49 And then 3.4 started to be, oh yeah, I want that.

01:38:52 And then 3.5 as the matrix multiply operator,

01:38:54 and now it’s like, okay, we gotta use that.

01:38:56 Plus the libraries that started leveraging

01:38:58 some of the features of Python three.

01:38:59 Exactly.

01:39:00 So it really, the challenge was it was,

01:39:03 but it also illustrated a truism that, you know,

01:39:07 when you have inertia,

01:39:08 when you have a group of people using something,

01:39:10 it’s really hard to move them away from it.

01:39:11 You can’t just change the world on them.

01:39:13 And Python three, you know, made some,

01:39:15 I think it fixed some things Guido had always hated.

01:39:17 I don’t think he didn’t like the fact

01:39:18 that print was a statement.

01:39:19 He wanted to make it a function.

01:39:20 But in some sense, that’s a bit of gratuitous change

01:39:23 to the language.

01:39:24 And you could argue, and people have,

01:39:27 but one of the challenges was there wasn’t enough features

01:39:31 and too many just changes without features.

01:39:34 And so the empathy for the end user

01:39:37 as to why they would switch wasn’t there.

01:39:40 I think also it illustrated just the funding realities.

01:39:42 Like Python wasn’t funded.

01:39:45 Like it was also a project

01:39:46 with a bunch of volunteer labor, right?

01:39:48 It had more people, so more volunteer labor,

01:39:50 but it was still, it was fun in the sense

01:39:52 that at least Guido had a job.

01:39:53 And I’ve learned some of the behind the scenes on that now

01:39:55 since talking to people who have lived through it

01:39:57 and maybe not on air, we can talk about some of that.

01:40:00 But it’s interesting to see, but Guido had a job,

01:40:03 but his full time job wasn’t just work on Python.

01:40:07 Like he had other things to do.

01:40:08 Just wild.

01:40:09 It is wild, isn’t it?

01:40:10 It’s wild how few people are funded.

01:40:13 Yes.

01:40:14 And how much impact they have.

01:40:15 Yes.

01:40:16 Maybe that’s a feature not a bug, I don’t know.

01:40:17 Maybe, yes, exactly.

01:40:19 At least early on, like it’s sort of, I know, yeah.

01:40:21 It’s like Olympic athletes are often severely underfunded,

01:40:25 but maybe that’s what brings out the greatness.

01:40:27 Perhaps, yes, correct.

01:40:28 No, exactly.

01:40:29 Maybe this is the essential part of it.

01:40:31 Because I do think about that in terms of,

01:40:33 I currently have an incubator for open source startups.

01:40:36 Like what I’m trying to do right now

01:40:37 is create the environment I wished had existed

01:40:40 when I was leaving academia with NumPy

01:40:42 and trying to figure out what to do.

01:40:44 I’m trying to create those opportunities and environments.

01:40:46 So, and that’s what drives me still,

01:40:49 is how do I make the world easier

01:40:50 for the open source entrepreneur?

01:40:52 So let me stay, I mean, I could probably stay on NumPy

01:40:55 for a long time, but this is fun question.

01:41:00 So Andre Kapathy leads the Tesla Autopilot team,

01:41:04 and he’s also one of the most like legit programmers I know.

01:41:10 It’s like he builds stuff from scratch a lot,

01:41:13 and that’s how he builds intuition about how a problem works.

01:41:16 He just builds it from scratch, and I always love that.

01:41:18 And the primary language he uses is Python

01:41:21 for the intuition building.

01:41:23 But he posted something on Twitter saying

01:41:27 that they got a significant improvement

01:41:31 on some aspect of their like data loading, I think,

01:41:35 by switching away from np.square root,

01:41:39 so the NumPy’s implementation of square root,

01:41:42 to math.square root, and then somebody else commented

01:41:44 that you can get even a much greater improvement

01:41:48 by using the vanilla Python square root, which is like.

01:41:52 Power 0.5.

01:41:53 Power 0.5.

01:41:55 And it’s fascinating to me, I just wanted to.

01:41:58 So that was some shade throwing at some.

01:42:02 No, no, and yes, we’re talking about.

01:42:04 It’s a good way to ask the trade off

01:42:08 between usability and efficiency broadly in NumPy,

01:42:12 but also on these specific weird quirks

01:42:14 of like a single function.

01:42:16 Yep, so on that point, if you use a NumPy math function

01:42:21 on a scaler, it’s gonna be slower

01:42:25 than using a Python function on that scaler.

01:42:27 But because the math object in NumPy is more complicated,

01:42:33 because you can also call that math object on an array.

01:42:36 And so effectively, it goes through a similar machine.

01:42:39 There aren’t enough of the, which you would do

01:42:41 and you could do like checks and fast paths.

01:42:45 So yeah, if you’re basically doing a list,

01:42:48 if you run over a list, in fact,

01:42:50 for problems that are less than 1,000,

01:42:53 even maybe 10,000 is probably the,

01:42:55 if you’re going more than 10,000,

01:42:56 that’s where you definitely need to be using arrays.

01:42:59 But if you’re less than that, and for reading,

01:43:01 if you’re doing a reading process

01:43:02 and essentially it’s not compute bound, it’s IO bound.

01:43:05 And so you’re really taking lists of 1,000 at a time

01:43:08 and doing work on it.

01:43:09 Yeah, you could be faster just using Python,

01:43:11 straight up Python.

01:43:12 See, but also, and this is the side to the top,

01:43:16 there’s the fundamental questions

01:43:18 when you look at the long arc of history,

01:43:21 it’s very possible that np.square root is much faster.

01:43:25 It could be.

01:43:26 So like in terms of like, don’t worry about it,

01:43:29 it’s the evils of over optimization or whatever,

01:43:32 all the different quotes around that,

01:43:34 is sometimes obsessing about this particular little quark

01:43:39 is not sufficient.

01:43:41 For somebody like, if you’re trying to optimize your path,

01:43:45 I mean, I agree, premature optimization

01:43:47 creates all kinds of challenges, right?

01:43:49 Because now, but you may have to do it.

01:43:51 I believe the quote is, it’s the root of all evil.

01:43:53 It’s the root of all evil, right?

01:43:55 Let’s give Donald Knuth, I think,

01:43:57 or is he more than somebody else?

01:43:59 Well, Doc Knuth is kind of like Mark Twain,

01:44:00 people just attribute stuff to him, I don’t know.

01:44:02 And it’s fine because he’s brilliant.

01:44:04 So, no, I was a LaTeX user myself,

01:44:07 and so I have a lot of respect,

01:44:09 and he did more than that, of course,

01:44:10 but yeah, someone I really appreciate

01:44:14 in the computer science space.

01:44:15 Yeah, I don’t, I think that’s appropriate.

01:44:17 There’s a lot of little things like that,

01:44:18 where people actually, if you understood it,

01:44:20 you go, yeah, of course, that’s the case.

01:44:22 And the other part, the other part I didn’t mention,

01:44:25 and Numba was a thing we wrote early on,

01:44:27 and I was really excited by Numba

01:44:29 because it’s something we wanted,

01:44:30 it was a compiler for Python syntax,

01:44:32 and I wanted it from the beginning of writing NumPy

01:44:35 because of this function question,

01:44:38 like taking, the power of arrays

01:44:41 is really that you can write functions using all of it.

01:44:45 It has implicit looping, right?

01:44:47 So you don’t worry about,

01:44:47 I write this n dimensional for loop

01:44:49 with four loops, four, four statements.

01:44:51 You just say, oh, big four dimensional array,

01:44:53 I’m gonna do this operation, this plus, this minus,

01:44:55 this reduction, and you get this,

01:44:57 it’s called vectorization in other areas,

01:44:59 but you can basically think at a high level

01:45:01 and get massive amounts of computation done

01:45:03 with the added benefit of,

01:45:06 oh, it can be paralyzed easily.

01:45:08 It can be put in parallel.

01:45:09 You don’t have to think about that.

01:45:10 In fact, it’s worse to go decompose your,

01:45:12 you write the for loops

01:45:14 and then try to infer parallelism from for loops.

01:45:16 That’s actually a harder problem

01:45:17 than to take the array problem

01:45:19 and just automatically parallelize that problem.

01:45:22 That’s what, and so functions in NumPy

01:45:25 are called universal functions, ufuncs.

01:45:27 So square root is an example of a ufunk.

01:45:29 There are others, sine, cosine, add, subtract.

01:45:32 In fact, one of the first libraries to SciPy

01:45:34 was something called Special

01:45:35 where I added Bessel functions

01:45:36 and all these special functions that come up in physics

01:45:40 and I added them as ufuncs so they could work on arrays.

01:45:43 So I understood ufuncs very, very well

01:45:44 from day one inside of numeric.

01:45:45 That was one of the things we tried to make better

01:45:47 in NumPy was how do they work?

01:45:49 Can they do broadcasting?

01:45:50 What does broadcasting mean?

01:45:51 But one of the problems is, okay,

01:45:54 what do I do with a Python scaler?

01:45:57 So what happens, the Python scaler gets broadcast

01:45:59 to a zero dimensional array

01:46:01 and then it goes through the whole same machinery

01:46:02 as if it were a 10,000 dimensional array.

01:46:05 And then it kind of unpacks the element

01:46:07 and then does the addition.

01:46:09 That’s not to mention the function it calls

01:46:12 in the case of square root

01:46:13 is just the clib square root, right?

01:46:15 In some cases, like Python’s power,

01:46:18 there’s some optimizations they’re doing

01:46:20 that could be faster

01:46:21 than just calling this the clib square root.

01:46:23 In the interpreter or in the?

01:46:25 No, in the C code, in the Python runtime.

01:46:27 In the Python runtime, so they really optimize it

01:46:30 and they have the freedom to do that

01:46:32 because they don’t have to worry about.

01:46:32 It’s just a scaler.

01:46:34 It’s just a scaler.

01:46:34 Right, they don’t have to worry about the fact

01:46:36 that, oh, this could be an object with many pieces.

01:46:39 The ufunc machine is also generic

01:46:41 in sense that typecasting and broadcasting,

01:46:44 broadcasting’s idea of I’m gonna go,

01:46:46 I have a zero dimensional array,

01:46:47 I have a scaler with a four dimensional array

01:46:49 and I add them.

01:46:50 Oh, I have to kind of coerce the shape of this guy

01:46:54 to make it work against the whole four dimensional array.

01:46:56 So it’s the idea of I can do a one dimensional array

01:46:59 against a two dimensional array and have it make sense.

01:47:02 Well, that’s what NumPy does is it challenges you

01:47:04 to reformulate, rethink your problem

01:47:07 as a multi dimensional array problem

01:47:09 versus move away from scalers completely.

01:47:12 Right, exactly, exactly.

01:47:14 In fact, that’s where some of the edge cases boundaries are

01:47:16 is that, well, they’re still there

01:47:18 and this is where array scalers are particular.

01:47:21 So array scalers are particularly bad

01:47:23 in the sense that they were written

01:47:24 so that you could optimize the math on them,

01:47:26 but that hasn’t happened.

01:47:29 And so their default is to coerce the array scaler

01:47:32 to a zero dimensional array

01:47:33 and then use the NumPy machinery.

01:47:36 That’s what, and you could specialize,

01:47:38 but it doesn’t happen all the time.

01:47:39 So in fact, when we first wrote Numba,

01:47:41 we do comparisons and say, look, it’s 1000X speed up.

01:47:45 We were lying a little bit in the sense that,

01:47:47 well, first do the 40X slowdown

01:47:50 of using the array scalers inside of a loop.

01:47:52 Cause if you used to use Python scalers,

01:47:53 you’d already be 10 times faster.

01:47:56 But then we would get a hundred times faster

01:47:58 over that using just compilation.

01:48:00 But what we do is compile the loop

01:48:01 from out of the interpreter to machine code.

01:48:04 And then that’s always been the power of Python

01:48:06 is this extensibility so that you can,

01:48:08 cause people say, oh, Python’s so slow.

01:48:09 Well, sure, if you do all your logic

01:48:11 in the runtime of the Python interpreter, yeah.

01:48:13 But the power is that you don’t have to.

01:48:15 You write all the logic,

01:48:17 what you do in the high level is just high level logic.

01:48:19 And the actual calls you’re making

01:48:21 could be on gigabyte arrays of data.

01:48:24 And that’s all done at compiled speeds.

01:48:26 And the fact that integration is one can happen,

01:48:30 but two is separable.

01:48:32 That’s one of the, the language like Julia says,

01:48:35 we’re going to be all in one.

01:48:36 You can do all of it together.

01:48:37 And then there’s, the jury’s out, is that possible?

01:48:39 I tend to think that you’re going to,

01:48:41 there’s separate concerns there.

01:48:43 You want to precompile.

01:48:44 In fact, generally you will want to precompile your,

01:48:47 some of your loops.

01:48:48 Like SciPy is a compilation step.

01:48:50 To install SciPy, it takes about two hours.

01:48:53 If you have many machines,

01:48:54 maybe you can get it down to one hour.

01:48:55 But to compile those libraries takes about, takes a while.

01:48:57 You don’t want to do that at runtime.

01:48:59 You don’t want to do that all the time.

01:49:00 You want to have this precompiled binary available

01:49:02 that you’re then just linking into.

01:49:04 So there’s real questions about the whole source code.

01:49:09 Code is, running binary code is more than source code.

01:49:11 It’s creating object code, it’s the linker, it’s the loader,

01:49:14 it’s the how does that interpret it

01:49:15 inside of virtual memory space.

01:49:17 There’s a lot of details there that actually

01:49:19 I didn’t understand for a long time

01:49:20 until I read books on the topic.

01:49:23 And it led to, the more you know, the better off you are

01:49:27 and you can do more details,

01:49:28 but sometimes it helps with abstractions too.

01:49:31 Well, the problem, as we mentioned earlier

01:49:33 with abstractions is you kind of sometimes assume

01:49:37 that whoever implemented this thing

01:49:41 had your case in mind and found the optimal solution.

01:49:45 Yes.

01:49:45 Or like you assume certain things.

01:49:47 I mean, there’s a lot of,

01:49:48 Correct.

01:49:49 One of the really powerful things to me early on,

01:49:52 I mean, it sounds silly to say, but with Python,

01:49:55 probably one of the reasons I fell in love with it

01:49:58 is dictionaries.

01:49:59 Yes.

01:50:00 So obviously probably most languages

01:50:03 have some mapping concept,

01:50:06 but it felt like it was a first class citizen

01:50:09 and it was just my brain was able to think in dictionaries.

01:50:12 But then there’s the thing that I guess I still use

01:50:14 to this day is order dictionaries

01:50:16 because that seems like a more natural way

01:50:20 to construct dictionaries.

01:50:21 Yeah.

01:50:22 And from a computer science perspective,

01:50:23 the running time cost is not that significant,

01:50:26 but there’s a lot of things to understand about dictionaries

01:50:30 that the abstraction kind of

01:50:33 doesn’t necessarily incentivize you to understand.

01:50:37 Right, do you really understand the notion of a hash map

01:50:39 and how the dictionary is implemented?

01:50:41 But you’re right.

01:50:42 Dictionaries are a good example

01:50:43 of an abstraction that’s powerful.

01:50:44 And I agree with you.

01:50:46 I agree, I love dictionaries too.

01:50:47 Took me a while to understand that once you do,

01:50:49 you realize, oh, they’re everywhere.

01:50:50 And Python uses them everywhere too.

01:50:52 Like it’s actually constructed,

01:50:54 one of the foundational things is dictionaries

01:50:55 and it does everything with dictionaries.

01:50:57 So it is, it’s powerful.

01:50:58 Order dictionaries came later,

01:51:00 but it is very, very powerful.

01:51:02 It took me a little while coming

01:51:03 from just the array programming entirely

01:51:05 to understand these other objects,

01:51:07 like dictionaries and lists and tuples and binary trees.

01:51:11 Like I said, I wasn’t a computer scientist,

01:51:13 I studied arrays first.

01:51:15 And so I was very array centric.

01:51:16 And you realize, oh, these others

01:51:17 don’t have purposes and value actually.

01:51:21 I agree.

01:51:22 There’s a friendliness about,

01:51:24 like one way to think about arrays

01:51:26 is arrays are just like full of numbers,

01:51:31 but to make them accessible to humans

01:51:35 and make them less error prone to human users,

01:51:38 sometimes you want to attach names,

01:51:41 human interpretable names

01:51:43 that are sticky to those arrays.

01:51:44 So that’s how you start to think about dictionaries

01:51:47 is you start to convert numbers

01:51:50 into something that’s human interpretable.

01:51:52 And that’s actually the tension I’ve had with NumPy

01:51:55 because I’ve built so much tooling

01:51:58 around human interpretability

01:52:02 and also protecting me from a year later

01:52:05 not making the mistakes by being,

01:52:07 I wanted to force myself to use English versus numbers.

01:52:12 Yes, so there’s a project called Labeled Arrays.

01:52:15 Like very early it was recognized that,

01:52:18 oh, we’re indexing NumPy with just numbers,

01:52:21 all the columns and particularly the dimensions.

01:52:23 I mean, if you have an image,

01:52:25 you don’t necessarily need to label each column or row,

01:52:27 but if you have a lot of images

01:52:29 or you have another dimension,

01:52:30 you’d at least like to label the dimension

01:52:31 as this is X, this is Y, this is Z,

01:52:33 or this is give us some human meaning

01:52:34 or some domain specific meaning.

01:52:36 That was one of the impetuses for Pandas actually

01:52:39 was just, oh, we do need to label these things.

01:52:43 And Label Array was an attempt to add

01:52:45 that like a lighter weight version of that.

01:52:47 And there’s been, like, that’s an example of something

01:52:49 I think NumPy could add, could be added to NumPy,

01:52:53 but one of the challenges again, how do you fund this?

01:52:55 Like I said, one of the tragedies I think is that,

01:52:58 so I never had the chance to,

01:53:00 I was never paid to work on NumPy, right?

01:53:02 So I’ve always just done it in my spare time,

01:53:04 always taken from one thing,

01:53:05 taken from another thing to do it.

01:53:07 And at the time, I mean, today,

01:53:09 it would be the wrong day and today,

01:53:11 like paying me to work on NumPy now

01:53:12 would not be a good use of effort,

01:53:13 but we are finally at Quansight Labs,

01:53:16 I’m actually paying people to work on NumPy and SciPy,

01:53:19 which is I’m thrilled with, I’m excited by.

01:53:22 I’ve wanted to do that.

01:53:22 That’s what I always wanted to do from day one.

01:53:24 It just took me a while to figure out a mechanism to do that.

01:53:27 Even like in the university setting,

01:53:29 respecting that, like pushing students,

01:53:33 young minds and young graduate students to contribute

01:53:38 and then figuring out financial mechanisms

01:53:41 that enable them to contribute

01:53:43 and then sort of reward them

01:53:45 for their innovative scientific journey,

01:53:48 that would be nice.

01:53:49 But then also just a better allocation of resources.

01:53:53 It’s 20 year anniversary since 9.11

01:53:55 and I was just looking, we spent over $6 trillion

01:53:59 in the Middle East after 9.11 in the various efforts there.

01:54:04 And sort of to put politics and all that aside,

01:54:08 it’s just, you think about the education system,

01:54:10 all the other ways we could have

01:54:11 possibly allocated that money.

01:54:14 To me, to take it back,

01:54:16 the amount of impact you would have

01:54:21 by allocating a little bit of money to the programmers

01:54:26 that build the tools that run the world is fascinating.

01:54:30 It is.

01:54:32 I don’t know, I think, again,

01:54:34 there is some aspect to being broke

01:54:38 as somewhat of a feature, not a bug,

01:54:40 that you make sure that you’re valued.

01:54:42 But you can still manage that.

01:54:43 Right, no, I know.

01:54:45 But I don’t think that’s a big part.

01:54:47 So it’s like, I think you can have enough money

01:54:50 and actually be wealthy while maintaining your values.

01:54:53 Agreed, agreed.

01:54:55 There’s an old adage that nations that trade together

01:54:57 don’t go to war together.

01:54:59 I’ve often thought about nations that code together.

01:55:01 Yeah, code together.

01:55:02 Right?

01:55:03 I love that.

01:55:04 Because one of the things I love about open source

01:55:05 is it’s global, it’s multinational.

01:55:07 Like there aren’t national boundaries.

01:55:09 One of the challenges with business and open source

01:55:10 is the fact that, well, business is national.

01:55:12 Like businesses are entities

01:55:13 that are recognized in legal jurisdictions, right?

01:55:16 And have laws that are respected in those jurisdictions

01:55:18 and hiring, and yet the open source ecosystem

01:55:21 is not, it’s not there.

01:55:23 Like currently, one of the problems we’re solving

01:55:25 is hiring people all over the world, right?

01:55:27 Because we, it’s a global effort.

01:55:29 And I’ve had the chance to work, and I’ve loved the chance.

01:55:31 I’ve never been to like Iran,

01:55:35 but I once had a conference

01:55:36 where I was able to talk to people there, right?

01:55:38 And talk to folks in Pakistan.

01:55:40 I’ve never been there, but we had a call

01:55:44 where there were people there,

01:55:45 like just scientists and normal people.

01:55:47 And there’s a certain amount of humanizing, right?

01:55:52 That gets away from the,

01:55:54 like we often get the memes of society

01:55:56 that bubble up and get discussed,

01:55:58 but the memes are not even an accurate reflection

01:56:00 of the reality of what people are.

01:56:02 Well, if you look at the major power centers

01:56:05 that are leading to something like cyber war

01:56:08 in the next few decades,

01:56:10 it’s the United States, it’s Russia, and China.

01:56:13 And those three countries in particular

01:56:16 have incredible developers.

01:56:18 So if they work together, I think that’s one way,

01:56:21 the politicians can do their stupid bickering,

01:56:23 but like there’s a layer of infrastructure, of humanity.

01:56:27 If they collaborate together,

01:56:29 that I think can prevent major military conflict,

01:56:34 which would, I think most likely happen at the cyber level

01:56:37 versus the actual hot war level.

01:56:39 You’re right.

01:56:40 You know, I think that’s a good prediction.

01:56:43 Nations that code together don’t go to war together.

01:56:46 Don’t go to war together.

01:56:47 That’s a hope, right?

01:56:48 That’s one of the philosophical hopes, but yeah.

01:56:52 So you mentioned the project of Numba,

01:56:55 which is fascinating.

01:56:58 So from the early days,

01:56:59 there was kind of a pushback on Python that it’s not fast.

01:57:04 You know, you see C plus,

01:57:05 if you wanna write something that’s fast,

01:57:06 you use C plus plus.

01:57:08 If you wanna write something that’s usable and friendly,

01:57:11 but slow, you use Python.

01:57:13 And so what is Numba?

01:57:15 What is its goal?

01:57:16 How does it work?

01:57:17 Great, yeah.

01:57:18 Yes, that’s what the argument.

01:57:19 And the reality was people would write high level coding

01:57:22 and use compiled code,

01:57:23 but there’s still user stories, use cases,

01:57:25 where you want to write Python,

01:57:27 but then have it still be fast.

01:57:28 You still need to write a for loop.

01:57:30 Like before Numba, it was always don’t write a for loop.

01:57:33 You know, write it in a vectorized way,

01:57:35 you know, put it in an array.

01:57:37 And often that can make a memory trade off.

01:57:39 Like quite often you can do it,

01:57:41 but then you make maybe use more memory

01:57:42 because you have to build this array of data

01:57:44 that you don’t necessarily need all the time.

01:57:46 So Numba was, it started from a desire to have

01:57:50 kind of a vectorized that worked.

01:57:52 A vectorized was a tool in NumPy, it was released.

01:57:56 You give it a Python function

01:57:57 and it gave you a universal function,

01:57:59 a ufunc that would work on arrays.

01:58:01 So you get the function that just worked on a scaler.

01:58:03 Like you could make a,

01:58:04 like the classic case was a simple function

01:58:07 that an if then statement in it.

01:58:08 So sine X over X function, sync function.

01:58:12 If X equals zero, return one, otherwise do sine X over X.

01:58:16 The challenge is you don’t want that loop

01:58:17 peg one in Python.

01:58:18 So you want a compiled version of that,

01:58:21 but the ufunc, the vectorized in NumPy

01:58:23 would just give you a Python function.

01:58:24 So it would take the array of numbers

01:58:26 and at every call do a loop back into Python.

01:58:29 So it was very slow.

01:58:30 It gave you the appearance of a ufunc,

01:58:31 but it was very slow.

01:58:32 So I always wanted a vectorized

01:58:34 that would take that Python scaler function

01:58:36 and produce a ufunc working on binary native code.

01:58:39 So in fact, I had somebody work on that with PyPy

01:58:42 and see if PyPy could be used to produce a ufunc like that

01:58:45 early on in 2009 or something like that, 2010.

01:58:50 They didn’t work that well.

01:58:51 It was kind of pretty bulky.

01:58:52 But in 2012, Peter and I had just started Anaconda.

01:58:57 We had, I just, I’d learned to raise money.

01:59:00 That’s a different topic,

01:59:01 but I’d learned to raise money from friends, family,

01:59:04 and fools, as they say.

01:59:05 And.

01:59:06 That’s a good line.

01:59:09 Oh, that’s a good line.

01:59:11 But, so we were trying to do something.

01:59:13 We were trying to change the world.

01:59:14 Peter and I are super ambitious.

01:59:15 We wanted to make array computing

01:59:17 and we had ideas for really what’s still,

01:59:19 it’s still the energy right now.

01:59:20 How do you do at scale data science?

01:59:23 And we had a bunch of ideas there, but one of them,

01:59:25 I had just talked to people about LLVM

01:59:27 and I was like, there’s a way to do this.

01:59:30 I just, I went, I heard about my friend Dave Beasley

01:59:32 at a compiler course.

01:59:33 So I was looking at compilers like,

01:59:35 and I realized, oh, this is what you do.

01:59:37 And so I wrote a version of Numba

01:59:40 that just basically mapped Python bytecode to LLVM.

01:59:45 Nice.

01:59:46 Right, so, and the first version is like, this works

01:59:49 and it produces code that’s fast.

01:59:50 This is cool for, you know,

01:59:51 obviously a reduced subset of Python.

01:59:53 I didn’t support all the Python language.

01:59:55 There had been efforts to speed up Python in the past,

01:59:57 but those efforts were, I would say,

01:59:59 not from the array computing perspective,

02:00:00 not from the perspective of wanting to produce

02:00:02 a vectorized improvement.

02:00:03 They were from the perspective of speeding up

02:00:05 the runtime of Python, which is fundamentally hard

02:00:07 because Python allows for some constructs

02:00:10 that aren’t, you can’t speed up.

02:00:12 Like it’s this generic, you know, when it does this variable.

02:00:15 So I, from the start, did not try to replicate

02:00:17 Python’s semantics entirely.

02:00:20 I said, I’m gonna take a subset of the Python syntax

02:00:23 and let people write syntax in Python,

02:00:25 but it’s kind of a new language really.

02:00:27 So it’s almost like four loops, like focusing on four loops.

02:00:30 Four loops, scalar arithmetic, you know, typed,

02:00:34 you know, really typed language, a typed subset.

02:00:38 That was the key.

02:00:39 So, but we wanted to add inference of types.

02:00:41 So you didn’t have to spell all the types out

02:00:43 because when you call a function,

02:00:45 so Python is typed, it’s just dynamically typed.

02:00:48 So you don’t tell it what the types are,

02:00:49 but when it runs, every time an object runs,

02:00:52 there’s a type for the variables.

02:00:53 You know what it is.

02:00:54 And so that was the design goals of Numba

02:00:56 were to make it possible to write functions

02:00:59 that could be compiled and have them used for NumPy arrays.

02:01:03 Like they needed to support NumPy arrays.

02:01:05 And so how does it work?

02:01:07 Do you add a comment within Python that tells it to do,

02:01:10 like how do you help out the compiler?

02:01:11 Yeah, so there isn’t much actually.

02:01:15 You don’t, it’s kind of magical in the sense

02:01:17 that it just looks at the type of the objects

02:01:19 and then it’s typed inference to determine

02:01:21 any other variables it needs.

02:01:23 And then it was also, because we had a use case

02:01:26 that could work early.

02:01:28 Like one of the challenges of any kind of new development

02:01:30 is if you have something that to make it work,

02:01:32 it was gonna take you a long time,

02:01:34 it’s really hard to get out off the ground.

02:01:35 If you have a project where there’s some incremental story,

02:01:39 it can start working today and solve a problem,

02:01:42 then you can start getting it out there, getting feedback.

02:01:44 Because Numba today, now Numba is nine years old today,

02:01:48 the first two, three versions were not great, right?

02:01:52 But they solved a problem and some people could try it

02:01:54 and we could get some feedback on it.

02:01:55 Not great in that it was very focused.

02:01:57 Very fragile, the subset it would actually compile

02:02:02 was small and so if you wrote Python code

02:02:04 and said, so the way it worked is you write a function

02:02:06 and you say at JIT, use decorators.

02:02:09 So decorators, just these little constructs

02:02:11 let you decorate code with an at and then a name.

02:02:15 The at JIT would take your Python function

02:02:17 and actually just compile it and replace the Python function

02:02:20 with another function that interacts

02:02:23 with this compiled function.

02:02:24 And it would just do that and we went from Python bytecode

02:02:28 then we went to AST.

02:02:29 I mean, writing compilers actually,

02:02:31 I learned a lot about why computer science

02:02:32 is taught the way it is because compilers

02:02:35 can be hard to write.

02:02:36 They use tree structures, they use all the concepts

02:02:39 of computer science that are needed.

02:02:40 It’s actually hard to, it’s easy to write a compiler

02:02:44 and then have it be spaghetti code.

02:02:46 Like the passes become challenging

02:02:47 and we ended up with three versions of Numba, right?

02:02:49 Numba got written three times.

02:02:51 What programming language is Numba written in?

02:02:55 Python.

02:02:56 Wait, okay.

02:02:57 Yeah, Python.

02:02:58 So.

02:03:00 Really?

02:03:00 That’s fascinating.

02:03:01 Yeah, so Python, but then the whole goal of Numba

02:03:03 is to translate Python bytecode to LLVM.

02:03:07 And so LLVM actually does the code generation.

02:03:09 In fact, a lot of times they’d say,

02:03:10 yeah, it’s super easy to write a compiler

02:03:12 if you’re not writing the parser nor the code generator.

02:03:15 Right?

02:03:16 So for people who don’t know, LLVM is a compiler itself.

02:03:19 So your compiler.

02:03:20 Yeah, it’s really badly named low level virtual machine,

02:03:22 which that part of it is not used.

02:03:24 It’s really low level.

02:03:25 Chris, he doesn’t mean that.

02:03:26 Yeah, love Chris.

02:03:29 But the name makes you imply that the virtual machine

02:03:31 is what it’s all about.

02:03:32 It’s actually the IR and the library,

02:03:34 the code generation.

02:03:36 That’s the real beauty of it.

02:03:37 The fact that, what I love about LLVM

02:03:39 was the fact that it was a plateau you could collaborate on.

02:03:43 Right?

02:03:44 Instead of the internals of GCC

02:03:45 or the internals of the Intel compiler,

02:03:47 or like how do I extend that?

02:03:49 And it was a place we could collaborate.

02:03:51 And we were early.

02:03:52 I mean, people had started before.

02:03:54 It’s a slow compiler.

02:03:55 Like it’s not a fast compiler.

02:03:56 So for some kind of JITs,

02:03:59 like JITs are common in language

02:04:01 because one, every browser has a JavaScript JIT.

02:04:04 It does real time compilation

02:04:06 of the JavaScript to machine code.

02:04:09 For people who don’t know, JIT is just in time compilation.

02:04:11 Thank you.

02:04:12 Yeah, just in time compilation.

02:04:13 They’re actually really sophisticated.

02:04:14 In fact, I got jealous of how much effort

02:04:17 was put into the JavaScript JITs.

02:04:18 Yes, well, it’s kind of incredible what they’ve done.

02:04:20 Yes, I completely agree.

02:04:22 I’m very impressed.

02:04:24 But you know, Numba was an effort

02:04:26 to make that happen with Python.

02:04:29 And so we used some of the money

02:04:30 we raised from Anaconda to do it.

02:04:32 And then we also applied for this DARPA grant

02:04:34 and used some of that money to continue the development.

02:04:36 And then we used proceeds from service projects we would do.

02:04:40 We get consulting projects

02:04:41 that we would then use some of the profits

02:04:44 to invest in Numba.

02:04:45 So we ended up with a team of two or three people

02:04:47 working on Numba.

02:04:48 It was a fits and starts, right?

02:04:50 And ultimately, the fact that we had a commercial version

02:04:53 of it also we were writing.

02:04:54 So part of the way I was trying to fund Numba,

02:04:56 say, well, let’s do the free Numba

02:04:58 and then we’ll have a commercial version of Numba

02:04:59 called Numba Pro.

02:05:00 And what Numba Pro did is it targeted GPUs.

02:05:03 So we had the very first CUDA JIT

02:05:05 and the very first at JIT compiler that in 2012 for 13,

02:05:10 you could run not just a view func on CPU,

02:05:14 but a view func on GPUs.

02:05:15 And it would automatically paralyze it

02:05:17 and get 1000X speed on it.

02:05:18 And that’s an interesting funding mechanism

02:05:21 because large companies or larger companies

02:05:26 care about speed in just this way.

02:05:30 So it’s exactly a really good way.

02:05:33 Yeah, there’s been a couple of things

02:05:34 you know people will pay for.

02:05:35 One, they’ll pay for really good user interfaces, right?

02:05:37 And so I’m always looking for what are the things

02:05:40 people will pay for that you could actually adapt

02:05:41 to the open source infrastructure?

02:05:43 One is definitely user interfaces.

02:05:45 The second is speed, like a better runtime, faster runtime.

02:05:49 And then when you say people,

02:05:50 you mean like a small number of people pay a lot of money,

02:05:52 but then there’s also this other mechanism that.

02:05:54 That’s true.

02:05:55 A ton of people pay.

02:05:56 That’s true.

02:05:57 A little bit.

02:05:58 First, I gotta, we mentioned Anaconda,

02:06:00 we mentioned friends, family, and fools.

02:06:04 So Anaconda is yet another.

02:06:06 So there’s a company, but there’s also a project.

02:06:09 Correct.

02:06:09 That is exceptionally impactful in terms of,

02:06:14 for many reasons, but one of which is bringing

02:06:16 a lot more people into the community

02:06:21 of folks who use Python.

02:06:23 So what is Anaconda?

02:06:26 What is its goals?

02:06:28 Maybe what is Conda versus Anaconda?

02:06:31 Yeah, I’ll tell you a little bit of the history of that.

02:06:33 Cause Anaconda, we wanted to do,

02:06:35 we wanted to scale Python.

02:06:37 Cause we, you know, that was the goal.

02:06:38 Peter and I had the goal of when we started Anaconda,

02:06:40 we actually started as Continuum Analytics

02:06:42 was the name of the company that started.

02:06:44 It got renamed Anaconda in 2015.

02:06:47 But we said, we want to scale analytics.

02:06:49 NumPy is great, Pandas is emerging,

02:06:52 but these need to run at scale with lots of machines.

02:06:55 The other thing we wanted to do was make user interfaces

02:06:57 that were web.

02:06:59 We wanted to make sure the web did not pass

02:07:01 by the Python community.

02:07:02 That we had ways to translate your data science to the web.

02:07:06 So those are the two kind of technical areas.

02:07:07 We thought, oh, we’ll build products in this space.

02:07:09 And that was the idea.

02:07:12 Very quickly in, but of course,

02:07:13 the thing I knew how to do was to do consulting

02:07:15 to make money and to make sure my family and friends

02:07:18 and fools that had invested didn’t lose their money.

02:07:21 So it’s a little different

02:07:22 than if you take money from a venture fund.

02:07:24 If you take money from a venture fund,

02:07:25 the venture fund, they want you to go big or go home.

02:07:27 And they’re kind of like expecting nine out of 10 to fail

02:07:30 or 99 out of 100 to fail.

02:07:33 It’s different.

02:07:33 I was, I was owed a barbell strategy.

02:07:35 I was like, I can’t fail.

02:07:37 I mean, I may not do super well,

02:07:38 but I cannot lose their money.

02:07:40 So I’m going to do something I know can return a profit,

02:07:43 but I want to have exposure to an upside.

02:07:46 So that’s what happened at Anaconda.

02:07:47 We didn’t, there was lots of things we did not well

02:07:50 in terms of that structure.

02:07:51 And I’ve learned from since and how to do it better.

02:07:53 But we’ve, we did a really good job

02:07:56 of kind of attracting the interest around the area

02:07:59 to get good people working

02:08:00 and then get funnel some money

02:08:01 on some interesting projects.

02:08:03 Super excited about what came out of our energy there.

02:08:05 Like a lot did.

02:08:06 So what are some of the interesting projects?

02:08:08 So Dask, Numba, Bokeh, Conda.

02:08:12 There was a data shader, Panel, Holoviz.

02:08:16 These are all tools that are extremely relevant

02:08:19 in terms of helping you build applications,

02:08:21 build tools, build, you know, faster code.

02:08:25 There’s a couple I’m forgetting.

02:08:25 Oh, JupyterLab, JupyterLab came out of this too.

02:08:28 And yeah.

02:08:30 Okay, so Bokeh does plotting?

02:08:32 Is that?

02:08:33 Bokeh does plotting.

02:08:34 So Bokeh was one of the foundational things to say,

02:08:35 I want to do plot in Python,

02:08:37 but have the things show up in a web.

02:08:39 Right, that’s right.

02:08:40 That’s right, that’s right.

02:08:40 And plotting to me still,

02:08:43 with all due respect to Matplotlib and Bokeh,

02:08:46 it feels like still an unsolved problem,

02:08:48 not a solved problem.

02:08:50 It is, it’s a big problem.

02:08:52 Right, because you’re, I mean, I don’t know,

02:08:55 it’s visualization broadly, right?

02:08:58 I think we’ve got a pretty good API story

02:09:00 around certain use cases of plotting.

02:09:03 But there’s a difference between static plots

02:09:04 versus interactive plots versus I’m an end user,

02:09:07 I just want to write a simple,

02:09:09 for Pandas started the idea of here’s a data frame

02:09:12 on a dot plot, I’m just going to attach plot

02:09:14 as a method to my object,

02:09:16 which was a little bit controversial, right?

02:09:18 But works pretty well, actually,

02:09:20 because there’s a lot less you have to pass in, right?

02:09:23 You can just say, here’s my object, you know what you are,

02:09:26 you tell the visualization what to do.

02:09:29 So that, and there’s things like that

02:09:31 that have not been super well developed entirely,

02:09:33 but Bokeh was focused on interactive plotting.

02:09:36 So you could, it’s a short path

02:09:38 between interactive plotting and application,

02:09:41 dashboard application.

02:09:42 And there’s some incredible work that got done there, right?

02:09:44 And it was a hard project,

02:09:45 because then you’re basically doing JavaScript and Python.

02:09:49 So we wanted to tackle some of these hard problems

02:09:51 and try to just go after them.

02:09:53 We got some DARPA funding to help,

02:09:54 and it was super helpful, funny story there,

02:09:56 we actually did two DARPA proposals,

02:09:58 but one we were five minutes late for.

02:10:00 And DARPA has a very strict cutoff window.

02:10:03 And so I, we had two proposals,

02:10:04 one for the Bokeh and one for actually Numba

02:10:06 and the other work.

02:10:09 Which one were you late for?

02:10:10 The Foundation on Numerical Work.

02:10:12 So Bokeh got funded. Oh no.

02:10:14 Fortunately, Chris let us use some of the money to fund

02:10:17 still some of the other foundational work,

02:10:19 but it wasn’t as, yeah, his hands were tired,

02:10:22 he couldn’t do anything about it.

02:10:23 That was a whole interesting story.

02:10:25 So one of the incredible projects

02:10:27 that you worked on is Conda.

02:10:29 Yes.

02:10:30 So what is Conda? So how that came about,

02:10:31 yeah, Conda, it was early on, like I said, with SciPy.

02:10:35 SciPy was a distribution mass generation library.

02:10:37 And he said, he heard me talking about compiler issues

02:10:40 and trying to get the stuff shipped

02:10:41 and the fact that people can use your libraries

02:10:43 if they have it.

02:10:44 So for a long time,

02:10:45 we’d understood the packaging problem in Python.

02:10:47 And one of the first things he did at Conda Analytics

02:10:50 became Anaconda was organize the Pi data ecosystem

02:10:54 in conjunction with NumFocus.

02:10:56 We actually started NumFocus

02:10:58 with some other folks in the community

02:11:00 the same year we started Anaconda.

02:11:02 I said, we’re gonna build a corporation,

02:11:04 but we’re also gonna reify the community aspect

02:11:07 and build a nonprofit.

02:11:08 So we did both of those.

02:11:09 Can we pause real quick and can you say what is PyPy,

02:11:13 the Python package index,

02:11:14 like this whole story of packaging in Python?

02:11:19 Yeah, that’s what I’m gonna get to actually.

02:11:20 This is exactly the journey I’m on.

02:11:22 It’s to sort of explain packaging in Python.

02:11:24 I think it’s best expressed to the conversation

02:11:26 I had with Guido at a conference,

02:11:27 where I said, so packaging is kind of a problem.

02:11:31 And Guido said, I don’t ever care about packaging.

02:11:34 I don’t use it.

02:11:34 I don’t install new libraries.

02:11:36 I’m like, I guess if you’re the language creator

02:11:38 and if you need something, you just put it in the distribution

02:11:40 maybe you don’t worry about packaging.

02:11:42 But Guido has never really cared about packaging, right?

02:11:45 And never really cared about the problem of distribution.

02:11:47 It’s somebody else’s problem.

02:11:48 And that’s a fair position to take, I think,

02:11:50 as a language creator.

02:11:51 In fact, there’s a philosophical question about

02:11:54 should you have different development packaging managers?

02:11:56 Should you have a package manager per language?

02:11:58 Is that really the right approach?

02:11:59 I think there are some answers of

02:12:01 it is appropriate to have development tools.

02:12:04 And there’s an aspect of a development tool

02:12:06 that is related to packaging.

02:12:07 And every language should have some story there

02:12:10 to help their developers create.

02:12:12 So you should have language specific development tools.

02:12:14 Development tools that relate to package managers.

02:12:17 But then there’s a very specific user story

02:12:19 around package management

02:12:20 that those language specific package managers

02:12:22 have to interact with.

02:12:23 And currently aren’t doing a good job of that.

02:12:25 That was one of the challenges

02:12:27 that not seeing that difference,

02:12:29 and it still exists in the difference today.

02:12:31 Conda always was a user.

02:12:34 I’m gonna use Python to do data science.

02:12:36 I’m gonna use Python to do something.

02:12:38 How do I get this installed?

02:12:39 It was always focused on that.

02:12:41 So it didn’t have a develop.

02:12:43 Classic example is pip has a pip develop.

02:12:45 It’s like, I wanna install this

02:12:47 into my current development environment today.

02:12:50 Conda doesn’t have that concept

02:12:51 because it’s not part of the story.

02:12:52 For people who don’t know,

02:12:54 pip is a Python specific package manager.

02:12:59 That’s exceptionally popular.

02:13:04 That’s probably like the default thing you’ve learned.

02:13:06 It’s the default user.

02:13:07 And so the story there emerged

02:13:08 because what happened is in 2012,

02:13:11 we had this meeting at the Googleplex

02:13:13 and Guido was there to come talk about what we’re gonna do,

02:13:15 how we’re gonna make things work better.

02:13:17 And Wes McKinney, me, Peter,

02:13:19 Peter has a great photo of me talking to Guido

02:13:21 and he pretends we’re talking about this story.

02:13:23 Maybe we were, maybe we weren’t.

02:13:24 But we did at that meeting talk about it

02:13:26 and asked Guido, we need to fix packaging in Python.

02:13:29 People can’t get the stuff.

02:13:31 And he said, go fix it yourself.

02:13:32 I don’t think we’re gonna do it.

02:13:33 All right.

02:13:35 The origin story right there.

02:13:36 All right, you said, okay, you said to do this ourselves.

02:13:39 So at the same time,

02:13:41 people did start to work on the packaging story in Python.

02:13:44 It just took a little longer.

02:13:45 So in 2012, kind of motivated

02:13:48 by our training courses we were teaching,

02:13:49 like very similar to what you just mentioned

02:13:51 about your mother.

02:13:52 Like it was motivated by the same purpose.

02:13:54 Like how do we get this into people’s hands?

02:13:56 It’s this big, long process.

02:13:57 It takes too expensive.

02:13:58 It was actually hurting NumPy development

02:14:00 because I would hear people were saying,

02:14:02 don’t make that change to NumPy

02:14:03 because I just spent a week getting my Python environment.

02:14:05 And if you change NumPy, I have to reinstall everything.

02:14:09 And reinstalling is such a pain, don’t do it.

02:14:10 I’m like, wait, okay.

02:14:12 So now we’re not making changes to a library

02:14:14 because of the installation problem

02:14:16 that it’ll cause for end users.

02:14:17 Okay, there’s a problem with installation.

02:14:19 We gotta fix this.

02:14:20 So we said, we’re gonna make a distribution in Python.

02:14:23 And we’d previously done that.

02:14:24 I’d previously done that at mthought.

02:14:26 I wanted to make one that would give away for free,

02:14:28 that everyone could just get.

02:14:29 Like that was critical that we could just get it.

02:14:32 It wasn’t tied to a product.

02:14:33 It was just you could get it.

02:14:35 And then we had constantly thought about,

02:14:36 well, do we just leverage RPM?

02:14:39 But the challenge had always been,

02:14:40 we want a package manager that works on Windows,

02:14:42 Mac OS X, and Linux the same, right?

02:14:45 And it wasn’t there.

02:14:46 Like you don’t have anything like that.

02:14:47 You have…

02:14:48 And for people who don’t know,

02:14:49 RPM is an operating system specific package manager.

02:14:54 Correct, it’s an operating specific.

02:14:55 Yes, exactly.

02:14:56 So do you create the design questions,

02:15:00 do you create an umbrella package manager

02:15:02 that works across operating systems?

02:15:03 Yes, that was the decision.

02:15:05 And in neighboring design questions,

02:15:08 do you also create a package manager

02:15:09 that spans multiple programming languages?

02:15:11 Correct, exactly.

02:15:12 That was the world we faced.

02:15:14 And we decided to go multiple operating systems,

02:15:17 multiple and programming language independent.

02:15:19 Because even Python, and particularly what was important

02:15:21 was SciPy has a bunch of Fortran in it, right?

02:15:24 And scikit learn has links to a bunch of C++.

02:15:27 There’s a lot of compiled code.

02:15:29 And the Python package managers, especially early on,

02:15:32 didn’t even support that.

02:15:34 So in 2000, so we released Anaconda,

02:15:38 which was just a distribution of libraries,

02:15:39 but we started to work on Conda in 2012.

02:15:42 First version of Conda came out in early 2013,

02:15:44 summer of 2013, and it was a package manager.

02:15:47 So you could say, Conda install scikit learn.

02:15:49 In fact, scikit learn was a fantastic project that emerged.

02:15:54 It was the classic example of the scikits.

02:15:57 I talked to you earlier about SciPy being too big

02:15:59 to be a single library.

02:16:01 Well, what the community had done is said,

02:16:02 let’s make scikits.

02:16:04 And there’s scikit image, there’s scikit learn,

02:16:05 there’s a lot of scikits.

02:16:07 And it was a fantastic move that the community did.

02:16:10 I didn’t do it.

02:16:11 I was like, okay, that’s a good idea.

02:16:12 I didn’t like the name.

02:16:13 I didn’t like the fact you typed scikit image.

02:16:15 I was like, that’s gotta be simpler.

02:16:17 That’s scikit learn, we gotta make that smaller.

02:16:19 I don’t like typing all this stuff from imports.

02:16:21 So I was kind of a pressure that way,

02:16:23 but I love the energy and love the fact

02:16:25 that they went out and they did it,

02:16:26 and DOS people, Jared Millman, and then of course, Gael,

02:16:29 and there’s people I’m not even naming.

02:16:31 Scikit learn really emerged as a fantastic project.

02:16:34 And the documentation around that is also incredible.

02:16:36 And the documentation was incredible, exactly.

02:16:37 I don’t know who did that, but they did a great job.

02:16:40 A lot of people in Inria, a lot of European contributors.

02:16:45 There’s some Andreas in the US.

02:16:47 There’s a lot of just people I just adore,

02:16:48 I think are amazing people.

02:16:51 Awesome use of SciPy, right?

02:16:52 I love the fact that they were using SciPy effectively

02:16:54 to do something I love, which is machine learning,

02:16:57 but couldn’t install it.

02:16:58 Because there’s so many pieces involved.

02:17:00 So many dependencies, right?

02:17:02 So our use case of Conda was Conda install scikit learn.

02:17:06 Right, and it was the best way to install scikit learn

02:17:09 in 2013 to really 2018, 17, 18, PIP finally caught up.

02:17:14 I still think it’s you should Conda install scikit learn

02:17:16 for the PIP install scikit learn,

02:17:17 but you can PIP install scikit learn.

02:17:19 The issue is the package they created was wheels

02:17:21 and PIP does not handle the multi vendor approach.

02:17:24 They don’t handle the fact you have C++ libraries

02:17:26 you’re depending on.

02:17:27 They just stop at the Python boundary.

02:17:29 And so what you have to do in the wheel world

02:17:31 is you have to vendor.

02:17:33 You have to take all of the binary and vendor it.

02:17:35 Now, if your change happens in underlying dependency,

02:17:38 you have to redo the whole wheel.

02:17:40 So TensorFlow, as you know,

02:17:42 you should not PIP install TensorFlow.

02:17:44 It’s a terrible idea.

02:17:45 People do it because the popularity of PIP,

02:17:48 many people think, oh, of course,

02:17:49 that’s how I install everything in Python.

02:17:51 Yeah, this is one of the big challenges.

02:17:53 You take a GitHub repository or just a basic blog post.

02:17:57 The number of time PIP is mentioned over Conda

02:18:00 is like 100 X to one.

02:18:02 Correct, correct.

02:18:03 So it just has to do with the.

02:18:04 And that was increasing.

02:18:05 It wasn’t true early because PIP didn’t exist.

02:18:07 Like Conda came first.

02:18:08 So but that’s the problem.

02:18:10 Like Conda came first, but that’s like the long tail

02:18:13 of the internet documentation user generated.

02:18:15 So that like you think, how do I install Google?

02:18:19 How do I install TensorFlow?

02:18:20 You’re just not gonna see Conda in that first page.

02:18:23 Correct, exactly.

02:18:24 And that.

02:18:24 Not today, you would have in 2016, 2017.

02:18:29 And it’s sad because Conda solves

02:18:32 a lot of usability issues.

02:18:34 Correct.

02:18:35 Like for especially super challenging thing.

02:18:36 I don’t know.

02:18:37 One of the big pain points for me was

02:18:39 just on the computer vision side, OpenCV installation.

02:18:43 Perfect example.

02:18:44 Yes.

02:18:45 I think Conda, I don’t know if Conda solved that one.

02:18:47 Conda has an OpenCV package.

02:18:49 I don’t know.

02:18:49 I certainly know PIP has not solved.

02:18:53 I mean, there’s complexities there because.

02:18:55 Right.

02:18:56 I actually don’t know.

02:18:57 I should probably know a good answer for this,

02:18:59 but if you compile OpenCV with certain dependencies,

02:19:05 you’ll be able to do certain things.

02:19:07 So there’s this kind of flexibility of what you,

02:19:09 like what options you compile with.

02:19:12 Yes.

02:19:13 And I don’t think it’s trivial to do that with Conda or.

02:19:17 So Conda has a notion of variance of a package.

02:19:20 You can actually have different compilation versions

02:19:23 of a package.

02:19:23 So not just the version is different,

02:19:24 but oh, this is compiled with these optimizations on.

02:19:26 So Conda does have an answer.

02:19:28 Has those flavors.

02:19:28 Has flavors, basically.

02:19:30 Well, PIP, as far as I know, does not have flavors.

02:19:32 No, no.

02:19:33 PIP generally hasn’t thought deeply

02:19:36 about the binary dependency problem, right?

02:19:38 And that’s why fundamentally it doesn’t work

02:19:41 for the SciPy ecosystem.

02:19:43 It barely, you can sort of paper over it and duct tape

02:19:46 and it kind of works until it doesn’t

02:19:48 and it falls apart entirely.

02:19:49 So it’s been a mixed bag.

02:19:51 Like, and I’ve been having lots of conversations

02:19:54 with people over the years because again,

02:19:56 it’s an area where if you understand some things,

02:19:58 but not all the things,

02:19:59 but they’ve done a great job of community appeal.

02:20:02 This is an area where I think Anaconda as a company

02:20:05 needed to do some things

02:20:07 in order to make Conda more community centric, right?

02:20:10 And this is a, I talk about this all the time.

02:20:13 There’s a balance between you have every project starts

02:20:16 with what I called company backed open source.

02:20:18 Even if the company is yourself, it’s just one person,

02:20:20 just doing business as.

02:20:23 But ultimately for products to succeed virally

02:20:26 and become massive influencers,

02:20:28 they have to create,

02:20:29 they have to get community people on board.

02:20:30 They have to get other people on board.

02:20:32 So it has to become community driven.

02:20:33 And a big part of that is engagement with those people.

02:20:35 Empowering people, governance around it.

02:20:38 And what happened with Conda in the early days,

02:20:41 PIP emerged and we did do some good things.

02:20:43 Conda Forge, Conda Forge community

02:20:46 is sort of the community recipe creation community.

02:20:49 But Conda itself, I still believe,

02:20:52 and Peter is CEO of Anaconda, he’s my co founder.

02:20:55 I ran Anaconda until 2017, 2018.

02:20:58 Is Peter still Anaconda?

02:20:59 Peter’s still Anaconda, right?

02:21:00 And we’re still great friends.

02:21:01 We talk all the time.

02:21:02 I love him to death.

02:21:03 There’s a long story there about like why and how

02:21:06 and we can cover in some other podcast perhaps.

02:21:08 Yeah.

02:21:09 It’s sort of a more, maybe a more business focused one.

02:21:11 But this is one area where I think Conda

02:21:15 should be more community driven.

02:21:17 Like he should be pushing more

02:21:18 to get more community contributors to Conda

02:21:21 and let the, Anaconda shouldn’t be fighting this battle.

02:21:26 Yeah.

02:21:26 Right?

02:21:27 It’s actually, it’s really a developers.

02:21:28 Like you said, like help the developers

02:21:30 and then they’ll actually move us the right direction.

02:21:32 Well, that was the problem I have is many

02:21:34 of the cool kids I know don’t use Conda.

02:21:36 And that to me is confusing.

02:21:38 It is confusing.

02:21:39 It’s really a matter of, Conda has some challenges.

02:21:42 First of all, Conda still needs to be improved.

02:21:44 There’s lots of improvements to be made.

02:21:45 And it’s that aspect of wait, who’s doing this?

02:21:47 And the fact that then the Pi PA really stepped up.

02:21:50 Like they were not solving the problem at all.

02:21:53 And now they kind of got to where they’re solving it

02:21:55 for the most part.

02:21:56 And then effectively you could get,

02:21:58 like Conda solved a problem that was there.

02:22:00 And it still does.

02:22:01 It’s still, you know, there’s still great things it can do.

02:22:03 But, and we still use it all the time at one site

02:22:06 and with other clients, but with,

02:22:08 but you can kind of do similar things with PIP and Docker.

02:22:12 Right?

02:22:13 So especially with the web development community,

02:22:15 that part of it, again, is this is the,

02:22:17 there’s a lot of different kinds of developers

02:22:19 in the Python ecosystem.

02:22:20 And there’s still a lack of some clear understanding.

02:22:23 I go to the Python conference all the time

02:22:25 and then there’s only a few people in the Pi PA who get it.

02:22:28 And then others who are just massively trumpeting

02:22:30 the power of PIP, but just do not understand the problem.

02:22:32 Yeah.

02:22:33 So one of the obvious things to me from a mom,

02:22:36 from a non programmer perspective,

02:22:37 is the across operating system usability.

02:22:41 That’s much more natural.

02:22:42 So there’s people that use Windows and just,

02:22:45 it seems much easier to recommend Conda there,

02:22:49 but then it, you should also recommend it across the board.

02:22:51 So I’ll definitely sort of.

02:22:53 But what I recommend now is a hybrid.

02:22:55 I do.

02:22:56 I mean, I have no problem.

02:22:57 Is it possible to use?

02:22:57 Oh, it is.

02:22:58 It is.

02:22:59 But like build the environment with PIP, with Conda,

02:23:01 build an environment with Conda

02:23:03 and then PIP install on top of that.

02:23:04 That’s fine.

02:23:05 Be careful about PIP installing OpenCV or TensorFlow

02:23:09 or because if somebody’s allowed that,

02:23:11 it’s gonna be most surely done in a way

02:23:13 that can’t be updated that easily.

02:23:15 So install like the big packages,

02:23:17 the infrastructure with Conda and then the weirdos.

02:23:21 Yeah.

02:23:21 That like the weird like implementation for some.

02:23:24 I had a, there’s a cool library I used

02:23:28 that based on your location and time of day and date

02:23:33 tells you the exact position of the sun

02:23:35 relative to the earth.

02:23:38 And it’s just like a simple library,

02:23:39 but it’s very precise.

02:23:41 And I was like, all right.

02:23:42 But that was, that was, and it’s like PIP.

02:23:45 Well, the thing they did really well is Python developers

02:23:48 who wanna get their stuff published,

02:23:50 you have to have a PIP recipe.

02:23:51 Yeah.

02:23:52 Right?

02:23:53 I mean, even if it’s, you know, the challenge is,

02:23:56 and there’s a key thing that needs to be added to PIP,

02:23:58 just simply add to PIP the ability to defer

02:24:01 to a system package manager.

02:24:03 Like, cause it’s, you know,

02:24:04 recognize you’re not gonna solve all the dependency problem.

02:24:07 So let like give up and allow the system package to work.

02:24:12 That way Anaconda is installed and it has PIP.

02:24:15 It would default to Conda to install stuff,

02:24:16 but Red Hat RPM would default to RPM

02:24:19 to install some more things.

02:24:20 Like that’s the, that’s a key, not difficult,

02:24:23 but somewhat work, some work feature needs to be added.

02:24:25 That’s an example of something like,

02:24:27 I’ve known we need to do it.

02:24:28 I mean, it’s where I wish I had more money.

02:24:30 I wish I was more successful in the business side,

02:24:33 trying to get there, but I wish my, you know,

02:24:35 my family, friends and full community that I know.

02:24:37 Was larger.

02:24:38 Was larger and had more money.

02:24:39 Cause I know tons of things to do effectively

02:24:42 with more resources, but you know,

02:24:46 I have not yet been successful at channel.

02:24:48 Tons of, you know, some, you know,

02:24:49 I’m happy with what we’ve done.

02:24:51 We created again at Quansight,

02:24:54 what we created to get Anaconda started.

02:24:56 We created community to get Anaconda started.

02:24:58 Done it again with Quansight.

02:24:59 Super excited by that.

02:25:00 But it took three years to do it.

02:25:02 What is Quansight?

02:25:03 What is its mission?

02:25:04 We’ve talked a few times about different fascinating

02:25:06 aspects of it, but let’s like big picture,

02:25:08 what is Quansight?

02:25:09 Big picture Quansight.

02:25:10 Quansight is, its mission is to connect data

02:25:13 to an open economy.

02:25:14 So it’s basically consulting of the pie data ecosystem,

02:25:17 right?

02:25:18 It’s a consulting company.

02:25:19 And what I’ve said when I started it was we’re trying

02:25:21 to create products, people, and technology.

02:25:24 So it’s divided into two groups.

02:25:26 And a third one as well.

02:25:28 The two groups are a consulting services company

02:25:30 that just helps people do data science

02:25:31 and data engineering and data management better

02:25:35 and more efficiently.

02:25:35 Like full stack, like full thing.

02:25:36 Full stack data science, full thing.

02:25:38 We’ll help you build a infrastructure.

02:25:40 If you’re using Jupiter, we need,

02:25:41 we do staff augmentation, need more pro programmers,

02:25:43 help you use Dask more effectively,

02:25:44 help you use GPUs more effectively.

02:25:46 Just basically a lot of people need help.

02:25:48 So we do training as well to help people, you know,

02:25:50 both immediate help and then get, learn from somebody.

02:25:55 We’ve added a bunch of stuff too.

02:25:57 We’ve kind of separated some of these other things

02:25:58 into another company called Open Teams

02:26:00 that we currently started.

02:26:01 One of the things I loved about what we did at Anaconda

02:26:03 was creating a community innovation team.

02:26:05 And so I wanted to replicate that.

02:26:06 This time we did a lot of innovation at Anaconda.

02:26:09 I wanted to do innovation,

02:26:10 but also contribute to the projects that existed,

02:26:13 like create a place where maintainers,

02:26:16 so the SciPy and NumPy and Numba

02:26:18 and all these projects we already started

02:26:20 can pay people to work on them and keep them going.

02:26:22 So that’s Labs.

02:26:23 Quansight Labs is a separate organization.

02:26:25 It’s a nonprofit mission.

02:26:28 The profits of Quansight help fund it.

02:26:29 And in fact, every project that we have at Quansight,

02:26:33 a portion of the money goes directly to Quansight Labs

02:26:36 to help keep it funded.

02:26:37 So we’ve gotten several mechanisms

02:26:38 that we keep Quansight Labs funded.

02:26:40 And currently, so I’m really excited about Labs

02:26:41 because it’s been a mission for a long time.

02:26:43 What kind of projects are within Labs?

02:26:45 So Labs is working to make the software better,

02:26:47 like make NumPy better, make SciPy better.

02:26:49 It only works on open source.

02:26:52 So if somebody wants to, so companies do,

02:26:55 we have a thing called a community work order, we call it.

02:26:57 If a company says, I wanna make Spyder better.

02:27:00 Okay, cool.

02:27:01 You can pay for a month of a developer of Spyder

02:27:05 or a developer of NumPy or a developer of SciPy.

02:27:08 You can’t tell them what you want them to do.

02:27:09 You can give them your priorities and things you wish existed

02:27:12 and they’ll work on those priorities with the community

02:27:16 to get what the community wants

02:27:17 and what emerges of what the community wants.

02:27:18 Is there some aspect on the consulting side

02:27:21 that is helping, as we were talking about morphology

02:27:24 and so on, is there specific application

02:27:26 that are particularly like driving,

02:27:29 sort of inspiring the need for updates to SciPy?

02:27:32 Correct, absolutely, absolutely.

02:27:33 GPUs are absolutely one of them.

02:27:34 And new hardware beyond GPUs.

02:27:36 I mean, Tesla’s Dojo chip, I’m hoping we’ll have a chance

02:27:39 to work on that perhaps.

02:27:42 Things like that are definitely driving it.

02:27:43 The other thing that’s driving it is scalable,

02:27:45 like speed and scale.

02:27:47 How do I write NumPy code or NumPy Lite code

02:27:50 if I want it to run across a cluster?

02:27:52 That’s Dask or maybe it’s Ray.

02:27:54 I mean, there’s sort of ways to do that now.

02:27:56 Or there’s Moden and there’s, so Pandas code,

02:27:59 NumPy code, SciPy code, Scikit learn code

02:28:02 that I want to scale.

02:28:03 So that’s one big area.

02:28:04 Have you gotten a chance to chat with Andre and Elon

02:28:08 about particular, because like.

02:28:09 No, I would love to, by the way.

02:28:11 I have not, but I’d love to.

02:28:12 I just saw their Tesla AI Days video.

02:28:15 Super excited.

02:28:16 That’s one of the, you know, I love great engineering,

02:28:18 software engineering teams and engineering teams in general.

02:28:21 And they’re doing a lot of incredible stuff with Python.

02:28:23 They’re like revolutionary.

02:28:25 So many aspects of the machine learning pipeline.

02:28:28 I agree.

02:28:29 That’s operating in the real world.

02:28:30 And so much of that is Python.

02:28:31 Like you said, the guy running, you know, Andre Kapathy,

02:28:35 running Autopilot is tweeting about optimization

02:28:38 of NumPy versus.

02:28:41 I would love to talk to him.

02:28:42 In fact, we have at Quonset, we’ve been fortunate enough

02:28:45 to work with Facebook on PyTorch directly.

02:28:47 So we have about 13 developers at Quonset.

02:28:49 Some of them are in labs working directly on PyTorch.

02:28:52 On PyTorch.

02:28:53 On PyTorch, right.

02:28:54 So I basically started Quonset.

02:28:55 I went to both TensorFlow and PyTorch and said,

02:28:57 hey, I want to help connect what you’re doing

02:29:00 to the broader SciPy ecosystem.

02:29:01 Because I see what you’re doing.

02:29:03 We have this bigger mission that we want to make sure

02:29:04 we don’t, you know, lose energy here.

02:29:06 So, and Facebook responded really positively

02:29:09 and I didn’t get the same reaction.

02:29:12 Not yet, not yet.

02:29:13 Not yet.

02:29:14 So I really love the folks at TensorFlow, too.

02:29:17 They’re fantastic.

02:29:18 I think it’s the, just how it integrates

02:29:21 with their business.

02:29:21 I mean, like I said, there’s a lot of reasons.

02:29:23 Just the timing, the integration with their business,

02:29:25 what they’re looking for.

02:29:27 They’re probably looking for more users.

02:29:28 And I was looking to kind of cut up some development effort

02:29:31 and they couldn’t receive that as easily, I think.

02:29:33 So I’m hoping, I’m really hopeful

02:29:36 and love the people there.

02:29:37 What’s the idea behind OpenTeams?

02:29:39 So OpenTeams, I’m super excited about OpenTeams

02:29:41 because it’s one of the,

02:29:43 I mentioned my idea for investing directly in open source.

02:29:46 So that’s a concept called fair OSS.

02:29:48 But one of the things we, when we started Quansight,

02:29:51 we knew we would do is we develop products and ideas

02:29:53 and new companies might come out.

02:29:55 At Anaconda, this was clear, right?

02:29:57 Anaconda, we did so much innovation

02:30:00 that like five or six companies could have come out of that.

02:30:02 And we just didn’t structure it so they could.

02:30:05 But in fact, they have, you look at Dask,

02:30:07 there’s two companies going out of Dask.

02:30:08 You know, Bokeh could be a company.

02:30:10 There’s like lots of companies that could exist

02:30:11 off the work we did there.

02:30:13 And so I thought, oh, here’s a recipe for an incubation,

02:30:16 a concept that we could actually spawn new companies

02:30:19 and new innovations.

02:30:20 And then the idea has always been,

02:30:22 well, money they earn should come back

02:30:24 to fund the open source projects.

02:30:26 So labs is, you know, I think there should be

02:30:29 a lot of things like Quansight Labs.

02:30:30 I think this concept is one that scales.

02:30:32 You could have a lot of open source research labs.

02:30:35 Along the way, so in 2018, when the bigger idea came,

02:30:37 how to make open source investable, I said,

02:30:38 oh, I need to write, I need to create a venture fund.

02:30:41 So we created a venture fund called Quansight Initiate

02:30:43 at the same time.

02:30:44 It’s an angel fund, really.

02:30:45 It’s, you know, we started to learn that process.

02:30:47 How do we actually do this?

02:30:48 How do we get LPs?

02:30:49 How do we actually go in this direction and build a fund?

02:30:52 And I’m like, every venture fund should have

02:30:54 an associated open source research lab,

02:30:55 which is no reason.

02:30:56 Like our venture fund, the carried interest,

02:30:59 a portion of it goes to the lab.

02:31:01 It directly will fund the lab.

02:31:03 That’s fascinating, brother.

02:31:04 So you use the power of the organic formation of teams

02:31:06 in the open source community, and then like naturally,

02:31:10 that leads to a business that can make money.

02:31:13 Yeah, correct.

02:31:14 And then it always maintains and loops back

02:31:16 to the open source.

02:31:17 Loops back to open source, exactly.

02:31:18 I mean, to me, it’s a natural fit.

02:31:19 There’s something, there’s absolutely

02:31:20 a repeatable pattern there, and it’s also beneficial

02:31:23 because, oh, I have, I have natural connections

02:31:26 to the open source if I have an open source research lab.

02:31:29 Like, they’ll always, they’ll be out there

02:31:31 talking to people, and so we’ve had a chance

02:31:34 to talk to a lot of early stage companies.

02:31:35 And we, and our fund focuses on the early stage.

02:31:37 So Quansight has the services, the lab, the fund, right?

02:31:41 In that process, a lot of stuff started to happen.

02:31:44 They’re like, oh, you know, we started to do recruiting

02:31:46 and support and training, and I was starting

02:31:48 to build a bigger sales team and marketing team

02:31:50 and people besides just developers.

02:31:52 And one of the challenges with that

02:31:54 is you end up with different cultural aspects.

02:31:55 You know, developers, you know, there’s a,

02:31:58 in any company you go to, you kind of go look,

02:32:00 is this a business led company, a developer led company?

02:32:03 Do they kind of coexist?

02:32:04 Are they, what’s the interface between them?

02:32:06 There’s always a bit of a tension there.

02:32:07 Like we were talking about before.

02:32:08 You know, what is the tension there?

02:32:10 With OpenTeams, I thought, wait a minute,

02:32:11 we can actually just create,

02:32:13 like this concept of Quansight plus labs,

02:32:15 it’s, well, it’s specific to the Pi data ecosystem.

02:32:18 The concept is general for all open source.

02:32:20 So OpenTeams emerged as a, oh,

02:32:22 we can create a business development company

02:32:24 for many, many Quansights, like thousands of Quansights.

02:32:28 And it can be a marketplace to connect,

02:32:30 essentially be the enterprise software company

02:32:33 of the future.

02:32:34 If you look at what enterprise software wants

02:32:36 from the customer side, and during this journey,

02:32:38 I’ve had the chance to work and sell to lots of companies,

02:32:42 Exxon and Shell and Davey Morgan Bank of America,

02:32:45 like the Fortune 100,

02:32:46 and talk to a lot of people in procurement

02:32:48 and see what are they buying and why are they buying?

02:32:50 So, you know, I don’t know everything,

02:32:51 but I’ve learned a lot about,

02:32:52 oh, what are they really looking for?

02:32:54 And they’re looking for solutions.

02:32:56 They’re constantly given products

02:32:58 from enterprise software.

02:33:01 Here’s open source, leave the enterprise software,

02:33:02 now I buy it.

02:33:03 And then they have to stitch it together into a solution.

02:33:05 Open source is fantastic for gluing

02:33:07 those solutions together.

02:33:08 So, whereas they keep getting new platforms

02:33:11 they’re trying to buy,

02:33:12 but most open source, what most enterprises want

02:33:15 is tools that they can customize

02:33:16 that are as inexpensive as they can.

02:33:18 Yeah, and so you always want to maintain

02:33:20 the connection to the open source

02:33:21 because that’s going to be the tools.

02:33:22 Yes, so open teams is about solving

02:33:24 enterprise software problems.

02:33:26 Brilliant, brilliant idea, by the way.

02:33:28 With a connect, but we do it honoring the topology.

02:33:30 We don’t hire all the people.

02:33:32 We are a network connecting the sales energy

02:33:35 and the procurement energy,

02:33:36 and we work on the business side,

02:33:37 get the deals closed,

02:33:39 and then have a network of partners

02:33:40 like Quonsight and others who we hand the deals to,

02:33:44 to actually do the work.

02:33:44 And then we have to maintain,

02:33:46 I feel like we have to maintain

02:33:47 some level of quality control

02:33:48 so that the client can rely on open teams

02:33:50 to ensure the delivery.

02:33:52 It’s not just, here’s a lead, go figure that out.

02:33:54 But no, we’re going to make sure you get what you need.

02:33:57 By the way, it’s such a skill,

02:33:58 and I don’t know if I have the patience.

02:34:00 I will have the patience to talk to the business people

02:34:04 or more specific, I mean,

02:34:05 there’s all kinds of flavors of business people

02:34:07 or like marketing people.

02:34:11 There’s a challenge.

02:34:12 I hear what you’re saying

02:34:13 because I’ve had the same challenge.

02:34:14 And it’s true.

02:34:15 There’s sometimes you think, okay, this is way overwrought.

02:34:18 Yeah, but you have to become an adult

02:34:20 and you have to, because the companies have needs.

02:34:22 They have ways to make money

02:34:24 and they also want to learn and grow,

02:34:26 and it’s your job to kind of educate them on the best way,

02:34:28 like the value of open source, for example.

02:34:31 Right, and I’m really grateful for all my experiences

02:34:32 over the past 14 years, understanding that side of it

02:34:35 and still learning for sure,

02:34:37 but not just understanding from companies,

02:34:38 but also dealing with marketing professionals

02:34:40 and sales professionals

02:34:41 and people that make a career out of that

02:34:43 and understanding what they’re thinking about

02:34:44 and also understanding, well, let’s make this better.

02:34:46 We can really make a place.

02:34:48 Open teams I see as the transmission layer

02:34:50 between companies and open source communities

02:34:53 producing enterprise software solutions.

02:34:55 Eventually we want to,

02:34:56 today we’re taking on SaaS and MATLAB

02:34:59 and tools that we know we can replace for folks.

02:35:01 Really, anytime you have a software tool at an organization

02:35:04 where you have to do a lot of customization

02:35:06 to make it work for you.

02:35:07 It’s not you’re just buying this thing off the shelf

02:35:09 and it works.

02:35:09 It’s like, okay, you buy this system

02:35:11 and then you customize it a lot,

02:35:12 usually with expensive consultants

02:35:15 to actually make it work for you.

02:35:17 All of those should be replaced by open source foundations

02:35:19 with the same customization.

02:35:20 You’re doing such important work,

02:35:22 such important work in these giant organizations

02:35:25 that do exactly that,

02:35:26 taking some proprietary software

02:35:28 and hiring a huge team of consultants

02:35:30 that customize it and then that whole thing

02:35:32 gets outdated quick.

02:35:33 Correct.

02:35:34 And so, I mean, that’s brilliant.

02:35:36 So the one solution to that

02:35:39 is kind of what Tesla’s doing a little bit of,

02:35:43 which is basically build up a software engineering team.

02:35:46 Like build a team from scratch.

02:35:48 Build a team from scratch.

02:35:49 And companies are doing it well,

02:35:50 that’s what they’re doing right now.

02:35:50 Yeah, exactly.

02:35:51 And that’s okay.

02:35:52 And you’re creating a topology for some of that.

02:35:54 You’re right.

02:35:55 You just don’t have to do it.

02:35:56 That’s not the only answer, right?

02:35:57 And so other companies can access this,

02:35:58 be more accessible.

02:35:59 We literally say,

02:36:01 open team is the future of enterprise software.

02:36:03 We’re still early.

02:36:04 Like this idea just percolated over the past year

02:36:07 as we’ve kind of grown Quansight

02:36:08 and realized the extensibility of it.

02:36:10 We just finished in our seed round

02:36:13 to help get more sales people

02:36:15 and then push the messaging correctly.

02:36:17 And there’s lots of tools we’re building

02:36:19 to make this easier.

02:36:20 Like we wanna automate the processes.

02:36:21 We feel like a lot of the power

02:36:23 is the efficiency of the sales process.

02:36:25 There’s a lot of wasted energy in small teams

02:36:29 and the sales energy to get into large companies

02:36:31 and make a deal.

02:36:32 There’s a lot of money spent on that process.

02:36:34 Creating the tools and processes for that sales.

02:36:36 So make that super seamless.

02:36:38 So a single company can go,

02:36:39 oh, I’ve got my contract with open teams.

02:36:41 We’ve got a subscription they can get.

02:36:43 They can make that procurement seamless.

02:36:45 And then the fact they have access

02:36:46 to the entire open source ecosystem.

02:36:48 And we have a part of our work

02:36:51 that’s embracing open source ecosystems

02:36:53 and making sure we’re doing things useful for them

02:36:55 or serving them.

02:36:56 And then companies making sure

02:36:57 they’re getting solutions they care about.

02:36:59 And then figuring out which targets we have.

02:37:02 We’re not taking on all of open source,

02:37:04 all of enterprise software yet.

02:37:06 But we’re step by step.

02:37:07 Well this feels like the future.

02:37:08 The idea and the vision is brilliant.

02:37:10 Can I ask you, why do you think Microsoft bought GitHub

02:37:14 and what do you think is the future of GitHub?

02:37:16 Great point.

02:37:17 I thought it was a brilliant move.

02:37:18 I think they did because Microsoft has always

02:37:20 had a developer centric culture.

02:37:22 Like they always have.

02:37:23 Like one of the things Microsoft’s always done well

02:37:25 is understand that their power is the developers.

02:37:27 It’s been, Ballmer didn’t necessarily make a good meme

02:37:31 about how he approached that.

02:37:32 But they’re broadening that.

02:37:34 I think that’s why.

02:37:35 Because they recognize GitHub is where developers are at.

02:37:38 Right?

02:37:38 And so.

02:37:39 But do they have a vision like open teams

02:37:41 type of situation, right?

02:37:41 I don’t think so yet.

02:37:43 Are they just basically throwing money at developers

02:37:46 to show their support?

02:37:47 I think so.

02:37:48 Without a topology like you put it.

02:37:50 Like a way to leverage that.

02:37:53 Like to give developers actual money.

02:37:55 Right.

02:37:56 I don’t think so.

02:37:57 They’re still, it’s an enterprise software company.

02:37:59 And they make a bunch of money.

02:38:00 They make a bunch of games.

02:38:01 They’re a big company.

02:38:02 They sell products.

02:38:03 I think part of it is they know there’s opportunity

02:38:06 to make money from GitHub.

02:38:07 Right?

02:38:08 There’s definitely a business there.

02:38:09 You know, to sell to developers.

02:38:11 Or to sell to people using development.

02:38:13 I think there’s part of that.

02:38:14 I think part of it is also there’s,

02:38:15 they had definitely wanted to recognize

02:38:18 that you need to value open source

02:38:20 to get great developers.

02:38:21 Which is an important concept that was emerging

02:38:24 over the past 10 years.

02:38:25 That, you know, pay at Pi Data.

02:38:28 We were able to convince J.P. Morgan

02:38:29 to support Pi Data because of that fact.

02:38:31 Right?

02:38:32 That was where the money for them putting

02:38:33 a couple hundred thousand into supporting Pi Data

02:38:35 for several conferences was they want developers.

02:38:37 And they realized that developers want

02:38:39 to participate in open source.

02:38:40 So enterprise software folks don’t always understand

02:38:43 how their software gets used.

02:38:44 Having spent a lot of time on the floors

02:38:46 at J.P. Morgan, at InShell, at ExxonMobil,

02:38:49 you see, oh, these companies have large development teams.

02:38:52 And then they’re kind of dealing with

02:38:55 what’s being delivered to them.

02:38:56 So I really feel kind of a privilege

02:38:58 that I had a chance to learn some of these people

02:39:00 and see what they’re doing.

02:39:01 And even work alongside them, you know,

02:39:04 as a consultant, using open source and trying to figure,

02:39:07 how do we make this work inside of our large organization?

02:39:09 Some of it is actually, for a large organization,

02:39:13 some of it is messaging to the world

02:39:14 that you care about developers

02:39:16 and you’re the cool, you care.

02:39:18 Like, for example, like if Ford,

02:39:21 cause I talked to them, like car companies, right?

02:39:23 They want to attract, you know,

02:39:26 you want to take on Tesla and autopilot.

02:39:28 You want to take on, right?

02:39:29 And so what do you do there?

02:39:31 You show that you’re cool.

02:39:32 Like you try to show off that you care about developers

02:39:36 and they have a lot of trouble doing that.

02:39:39 And like one way, I think like Ford should have bought GitHub.

02:39:42 They just to show off, like these old school companies

02:39:46 and it’s in a lot of different industries.

02:39:49 There’s probably different ways.

02:39:51 It’s probably an art show that you care to developers.

02:39:54 And the developers, it’s exactly what you, like,

02:39:57 for example, just spit balling here,

02:40:00 but like Ford or somebody like that

02:40:02 could give a hundred million dollars

02:40:05 to the development of NumPy.

02:40:07 And like literally look at like the top most popular projects

02:40:13 in Python and just say, we’re just going to give money.

02:40:17 Like that’s going to immediately make you cool.

02:40:20 They could actually, yeah.

02:40:21 And in fact, they set up NumFocus to make it easy.

02:40:24 But the challenge was,

02:40:26 is also you have to have some business development.

02:40:28 Like it’s a bit of a seeding problem, right?

02:40:31 And you look at how,

02:40:32 I’ve talked to the folks at Linux Foundation,

02:40:33 know how they’re doing it.

02:40:34 I know how, and starting NumFocus,

02:40:36 because we had two babies in 2012.

02:40:39 One was Anaconda, one was NumFocus, right?

02:40:41 And they were both important efforts.

02:40:42 They had distinct journeys

02:40:44 and super grateful that both existed

02:40:46 and still grateful both exist.

02:40:48 But there’s different energies in getting donations

02:40:51 as there is getting, this is important to my business.

02:40:55 Like I’m selling you something that this is a,

02:40:58 I’m going to make money this way.

02:41:00 Like if you can tie it,

02:41:01 if you can tie the message to an ROI for the company,

02:41:04 it becomes a brainer.

02:41:04 That’s more effective.

02:41:05 It’s much more effective, right?

02:41:06 So, and there are rational arguments to make.

02:41:09 I’ve tried to have conversations with marketing,

02:41:11 especially marketing departments.

02:41:12 Like very early on, it was clear to me that,

02:41:14 oh, you could just take a fraction of your marketing budget

02:41:18 and just spend it on open source development.

02:41:20 And you get better results from your marketing.

02:41:23 Like, because.

02:41:24 How did those, can I, sorry,

02:41:26 I’m going to try not to go and rants here.

02:41:27 What have you learned from the interaction

02:41:29 with the marketing folks on that kind of,

02:41:31 because you gave a great example

02:41:34 of something that will obviously be much better investment

02:41:37 in terms of marketing is supporting open source projects.

02:41:40 The challenge is not dissimilar

02:41:41 from the challenge you have in academia

02:41:44 or the different colleges, right?

02:41:46 Knowledge gets very specific and very channeled, right?

02:41:50 And so people get,

02:41:51 they get a lot of learning in the thing they know about.

02:41:53 And it’s hard then to bridge that

02:41:56 and to get them to think differently enough

02:41:58 to have a sense that you might have something to offer

02:42:02 because it’s different.

02:42:03 It’s like, well, how do I implement that?

02:42:04 How do I, what do I do with that?

02:42:05 Like, do I, which budget do I take from?

02:42:07 Do I slow down my spend on Google ads

02:42:10 or my spend on Facebook ads?

02:42:11 Or do I not hire a content creator and say like,

02:42:14 there’s an operational aspect to that,

02:42:16 that you have to be the CMO, right?

02:42:19 Or the CEO, you have to get the right level.

02:42:21 So you’ll have to hire at a high position level

02:42:24 where they care about this and this.

02:42:25 Right, or they won’t know how, right?

02:42:27 And because you can also do it very clumsily, right?

02:42:30 And I’ve seen it, cause you can,

02:42:32 you absolutely have to honor and recognize

02:42:33 the people you’re going to and the fact

02:42:36 that if you just throw money at them,

02:42:37 it could actually create more problems.

02:42:39 Can I just say, this is not you saying, can I just,

02:42:41 cause I just need, I need to say this.

02:42:44 I’ve been very surprised how often marketing people

02:42:49 are terrible at marketing.

02:42:51 I feel like the best marketing is doing something novel

02:42:55 and unique that anticipates the future.

02:42:58 It feels like so much of the marketing practice

02:43:01 is like what they took in school,

02:43:04 or maybe they’re studying for what was the best thing

02:43:06 that was done in the past decade,

02:43:08 and they’re just repeating that over and over,

02:43:10 as opposed to innovating, like taking the risk.

02:43:13 To me, marketing.

02:43:14 That’s a great point.

02:43:15 Is taking the big risk.

02:43:17 That’s a great point.

02:43:17 And being the first one to risk.

02:43:18 Yeah, there’s an aspect of data observation

02:43:21 from that risk, right?

02:43:22 That’s, I think, shared what they’re doing already.

02:43:25 But it absolutely, it’s about, I think it’s content.

02:43:27 Like there’s this whole world on content marketing

02:43:30 that you could almost say, well, yeah, it can get over,

02:43:33 you can get inundated with stuff

02:43:35 that’s not relevant to you.

02:43:36 Whereas what you’re saying would be highly relevant

02:43:39 and highly useful and highly beneficial.

02:43:41 Yeah, but it’s risk.

02:43:42 I mean, that’s why I sort of,

02:43:44 there’s a lot of innovative ways of doing that.

02:43:46 Tesla’s an example of people

02:43:48 that basically don’t do marketing.

02:43:49 They do marketing in a very, like,

02:43:52 let’s say Elon hired a person who’s just good at Twitter

02:43:55 for running Tesla’s Twitter account.

02:43:57 No, right, right.

02:43:59 I mean, that’s exactly what you wanna be doing.

02:44:00 You want it to be constantly innovating in the.

02:44:03 Right, there’s an aspect of telling.

02:44:04 I mean, I’ve definitely seen people doing great work

02:44:06 where you’re not talking about it.

02:44:08 Like, I would say that’s actually a problem

02:44:09 I have right now with Quonset Labs.

02:44:11 Quonset Labs has been doing amazing work,

02:44:12 really excited about it,

02:44:13 but we have not been talking about it enough.

02:44:15 We haven’t been.

02:44:16 And there’s different ways to talk about it.

02:44:17 There’s different ways to,

02:44:18 there’s different channels to which to communicate.

02:44:20 There’s also, like, I’ll just throw some shade

02:44:25 at companies I love.

02:44:27 So for example, iRobot,

02:44:29 I just had a conversation with them.

02:44:30 They make Roombas.

02:44:31 Sure.

02:44:32 And I think I love, they’re incredible robots,

02:44:35 but like every time they do like advertisement,

02:44:38 not advertisement, but like marketing type stuff,

02:44:41 it just looks so corporate.

02:44:44 And to me, the incredible,

02:44:47 maybe wrong in the case of iRobot, I don’t know.

02:44:50 But to me, when you’re talking about engineering systems,

02:44:54 it’s really nice to show off the magic of the engineering

02:44:57 and the software and all the geniuses behind this product

02:45:02 and the tinkering and like the raw authenticity

02:45:05 of what it takes to build that system

02:45:06 versus the marketing people who want to have like

02:45:09 pretty people, like standing there all pretty

02:45:12 with the robots, like moving perfectly.

02:45:14 So to me, there’s some aspect,

02:45:16 it’s like speaking to the hackers,

02:45:18 you have to throw some bones,

02:45:21 some care towards the engineers, the developers,

02:45:25 because there’s some aspect, one, for the hiring,

02:45:28 but two, there’s an authenticity to that,

02:45:31 authenticity to that kind of communication

02:45:33 that’s really inspiring to the end user as well.

02:45:36 Like if they know that brilliant people,

02:45:38 the best in the world are working at your company,

02:45:40 they start to believe that that product

02:45:42 that you’re creating is really good.

02:45:43 It’s interesting, because your initial reaction would be,

02:45:45 wait, there’s different users here.

02:45:46 Why would you do that to, you know,

02:45:48 my wife bought a Roomba, and she loves developers,

02:45:52 she loves me, but she doesn’t care about that culture.

02:45:56 So essentially what you said is actually the authenticity,

02:45:59 because everyone has a friend, everyone knows people,

02:46:01 there’s word of mouth, I mean, if you.

02:46:02 Word of mouth is so, so proper.

02:46:04 Yeah, exactly, that’s interesting.

02:46:05 Because I think it’s the lack of that realization,

02:46:07 there’s this halo effect that influences

02:46:09 your general marketing, interesting.

02:46:11 For some stupid reason, I do have a platform,

02:46:14 and it seems that the reason I have a platform,

02:46:16 many others like me, millions of others,

02:46:19 is like the authenticity,

02:46:21 and like we get excited naturally about stuff.

02:46:23 And like, I don’t want to get excited

02:46:25 about that iRobot video,

02:46:27 because it’s boring, it’s marketing, it’s corporate,

02:46:30 as opposed to, I wanted to do some fun,

02:46:33 this is me, like a shout out to iRobot,

02:46:36 is they’re not letting me get into the robot.

02:46:39 Yeah, well there’s an aspect of,

02:46:40 that could be benefiting from a culture of modularity,

02:46:44 like add ons, and that could actually dramatically help.

02:46:47 You’ve seen that over history,

02:46:49 I mean, Apple is an example of a company like that,

02:46:51 or the, like, I can see what your point is,

02:46:54 is that you have something that needs to be,

02:46:56 it needs to be adopted broadly,

02:46:58 the concept needs to be adopted broadly.

02:47:00 And if you want to go beyond this one device,

02:47:01 you need to engage this community.

02:47:04 Yeah, and connecting to the open source that you said.

02:47:07 I gotta ask you,

02:47:09 you’re a programmer,

02:47:11 one of the most impactful programmers ever.

02:47:14 You’ve led many programmers, you lead many programmers.

02:47:18 What are some, from a programmer perspective,

02:47:21 what makes a good programmer?

02:47:23 What makes a productive programmer?

02:47:25 Is there a device you can give

02:47:27 to be a great programmer in this world?

02:47:28 That’s a great, great question.

02:47:30 And there are times in my life

02:47:31 I’d probably answer this even better

02:47:32 than I hope maybe give an answer today.

02:47:35 Because I thought about this numerous times,

02:47:36 like right now I’ve spent on so much time

02:47:38 recently hiring salespeople that,

02:47:41 That your mind is a little bit on something else.

02:47:43 On something else.

02:47:44 But I reflected on the past,

02:47:46 and also, you know, I have some really,

02:47:48 the only way I can do this,

02:47:49 is I have some really great programmers that I work with,

02:47:51 who lead the teams that they lead.

02:47:53 And my goal is to inspire them and hopefully help them,

02:47:56 encourage them, and be,

02:47:57 help them encourage with their teams.

02:47:59 I would say there’s a number of things, couple things.

02:48:01 One is curiosity.

02:48:03 Like you, I think a programmer without curiosity

02:48:07 is mundane.

02:48:09 Like you’ll lose interest, you won’t do your best work.

02:48:12 So it’s sort of, it’s an affect.

02:48:13 It’s sort of, are you,

02:48:14 you have some curiosity about things.

02:48:16 I think two, don’t try to do everything at once.

02:48:19 Recognize that you’re, you know, we’re limited as humans.

02:48:21 You’re limited as a human.

02:48:23 And each one of us are limited in different ways.

02:48:24 You know, we all have our different strengths and skills.

02:48:26 So it’s adapting the art of programming to your skills.

02:48:29 One of the things that always works,

02:48:31 is to limit what you’re trying to solve.

02:48:33 Right, so, if you’re part of a team,

02:48:36 usually maybe somebody else has put the architecture together

02:48:38 and they’ve gotten given a portion for you if you’re young.

02:48:41 If you’re not part of a team,

02:48:43 it’s sort of breaking down the problem into smaller parts,

02:48:46 is essential for you to make progress.

02:48:48 It’s very easy to take on a big project

02:48:50 and try to do it all at once, and you get lost.

02:48:52 And then you do it badly.

02:48:53 And so thinking about, you know,

02:48:57 very concretely what you’re doing,

02:48:59 defining the inputs and outputs,

02:49:01 defining what you want to get done.

02:49:03 Even just talking about that and like writing down

02:49:07 before you write code, just what are you trying to accomplish?

02:49:09 I mean, very specific about it, really, really helps.

02:49:12 I think using other people’s work, right?

02:49:17 Don’t be afraid that somehow you’re,

02:49:20 like you should do it all.

02:49:21 Like, nobody does.

02:49:23 Stand on the shoulders of giants.

02:49:25 And copy and paste from Stack Overflow.

02:49:26 Copy and paste from Stack Overflow.

02:49:28 But don’t just copy and paste,

02:49:30 this is particularly relevant in the era of Codex

02:49:31 and the auto generated code, which is essentially,

02:49:34 I see as an indexing of Stack Overflow.

02:49:36 Right, exactly.

02:49:37 Secondly, it’s like.

02:49:38 It’s a search engine.

02:49:39 It’s a search engine over Stack Overflow, basically.

02:49:41 So it’s not, I mean, we’ve had this for a while.

02:49:43 But really, you want to cut and paste, but not blindly.

02:49:47 Like, absolutely I’ve cut and paste to understand,

02:49:51 but then you understand.

02:49:52 Oh, this is what this means.

02:49:53 Oh, this is what it’s doing.

02:49:54 And understand as much as you can.

02:49:56 So it’s critical, that’s where the curiosity comes in.

02:49:59 If you’re just blindly cutting and pasting,

02:50:01 you’re not gonna understand.

02:50:02 So understand, and then be sensitive to hype cycles.

02:50:08 Right, every few often there’s always a,

02:50:10 oh, test driven development is the answer.

02:50:12 Oh, object oriented is the answer.

02:50:13 Oh, there’s always an answer.

02:50:16 Agile is the answer.

02:50:18 Be cautious of jumping onto a hype cycle.

02:50:20 Like, likely there’s signal.

02:50:22 Like, there’s a thing there

02:50:23 that’s actually valuable, you can learn from.

02:50:25 But it’s almost certainly not the answer

02:50:27 to everything you need.

02:50:28 What lessons do you draw

02:50:30 from you having created NumPy and SciPy?

02:50:34 Like, in service of sort of answering the question

02:50:37 of what it takes to be a great programmer

02:50:38 and giving advice to people.

02:50:40 How can you be the next person to create a SciPy?

02:50:42 Yeah, so one is listen.

02:50:45 To?

02:50:46 Listen.

02:50:47 To who?

02:50:48 To people that have a problem, right?

02:50:51 Which is everybody, right?

02:50:52 But listen, and listen to many.

02:50:54 And then try to, and then do.

02:50:57 Like, you’re gonna have to do an experiment, you know?

02:50:59 Do, fall down, don’t be afraid to fall down.

02:51:01 Don’t be afraid, the first thing you do

02:51:04 is probably gonna suck, and that’s okay, right?

02:51:07 It’s honestly, I think iteration is the key to innovation.

02:51:11 And it’s almost that psychological hesitation we have

02:51:16 to just iterate.

02:51:18 Like, yeah, we know it’s not great,

02:51:20 but next time it’ll be better.

02:51:22 I mean, just keep learning and keep improving.

02:51:25 So it’s an attitude.

02:51:27 And then it doesn’t take intense concentration, right?

02:51:32 Good things don’t happen just,

02:51:34 it’s not quite like TikTok or like Facebook, you know?

02:51:38 You can’t scroll your way to good programming, right?

02:51:40 There are sincere hours of deep,

02:51:44 don’t be afraid of the deep problem.

02:51:46 Like, often people will run away from something

02:51:47 because, oh, I can’t solve this.

02:51:49 And you might be right, but give it an hour.

02:51:51 Give it a couple of hours and see.

02:51:53 And just five minutes, not gonna give you that.

02:51:56 Was it lonely when you were building SciPy and NumPy?

02:52:00 Hugely, yeah, absolutely lonely,

02:52:02 in the sense of you had to have an inner drive,

02:52:05 and that inner drive for me always comes from,

02:52:08 I have to see that this is right in some angle.

02:52:11 I have to believe it, that this is the right approach,

02:52:13 the right thing to do.

02:52:14 With SciPy, it was like, oh yeah,

02:52:16 the world needs libraries and Python.

02:52:19 Clearly Python’s popular enough

02:52:20 with enough influential people to start,

02:52:22 and it needs more libraries.

02:52:24 So that is a good in and of itself.

02:52:26 So I’m gonna go do that good.

02:52:28 So find a good, find a thing that you know is good

02:52:30 and just work on it.

02:52:33 So that has to happen, and it is.

02:52:34 And you kind of have to have enough realization

02:52:37 of your mission to be okay with the naysayer

02:52:40 or the fact that not everybody joins you at front.

02:52:42 In fact, one thing I’ve talked to people a lot,

02:52:43 I’ve seen a lot of projects come, and some fail.

02:52:45 Not everything I’ve done has actually worked perfectly.

02:52:47 I’ve tried a bunch of stuff that, okay,

02:52:49 that didn’t really work, or this isn’t working, and why.

02:52:51 But you see the patterns, and one of the key things is

02:52:55 you can’t even know for six months.

02:52:59 I say 18 months right now.

02:53:00 If you’re starting a new project,

02:53:01 you gotta give it a good 18 month run

02:53:03 before you even know if the feedback’s there.

02:53:05 You’re not gonna know in six months.

02:53:07 You might have the perfect thing,

02:53:08 but six months from now, it’s still kind of still emerging.

02:53:11 So give it time, because you’re dealing with humans,

02:53:13 and humans have an inertial energy

02:53:15 that just doesn’t change that quickly, so.

02:53:18 Let me ask a silly question, but like you said,

02:53:23 you’re focused on the sales side of things currently,

02:53:26 but back when you were actively programming,

02:53:28 maybe in the 90s, you talked about IDEs.

02:53:31 What’s a setup that you have that brings you joy?

02:53:36 Keyboard, number of screens, Linux.

02:53:39 I do still like to program some.

02:53:40 It’s not as much as I used to.

02:53:42 I have two projects I’m super interested in,

02:53:44 trying to find funding for them,

02:53:45 trying to figure out teams for them,

02:53:47 but I could talk about those.

02:53:49 But what I, yeah, I’m an Emacs guy.

02:53:51 Great, thank the superior editor, everybody.

02:53:56 I’ve got, I don’t often delete tweets,

02:53:59 but one of the tweets I deleted

02:54:00 when I said Emacs was better than Vim,

02:54:02 and then the hate I got from it.

02:54:04 It is.

02:54:05 I was like, I’m walking away from this.

02:54:07 I do too, I don’t push it.

02:54:09 I mean, I’m not.

02:54:10 I’m just joking, of course.

02:54:11 Yeah, exactly, it’s kind of like,

02:54:12 but people do take the editor seriously, right?

02:54:14 I did it as a joke.

02:54:15 That’s your life.

02:54:16 It is, but there’s something beautiful to me about Emacs,

02:54:20 but for people that love Vim,

02:54:22 there’s something beautiful to them about that.

02:54:23 There is.

02:54:24 I mean, I do use Vim for quick editing.

02:54:26 Like Command Line, if I said quick editing,

02:54:27 I will still sometimes use it, but not much.

02:54:30 Like it’s simple, corrective signal editor character.

02:54:32 So when you were developing SciPy, you were using Emacs?

02:54:34 Emacs, yeah.

02:54:35 SciPy and NumPy are all written on Emacs on a Linux box.

02:54:39 And CVS and then SVN, version control.

02:54:43 Git came later.

02:54:44 Like Git has, I love distributed branch stuff.

02:54:48 I think Git is pretty complicated, but I love the concept.

02:54:51 And also, of course, GitHub and then GitLab

02:54:55 make Git definitely consumable, but that came later.

02:54:59 Did you ever touch Lisp at all?

02:55:00 Like what were your emotional feelings

02:55:03 about all the parentheses?

02:55:04 Yeah, so great question.

02:55:05 So I find myself appreciating Lisp today

02:55:08 much more than I did early.

02:55:09 Because when I came to programming, I knew programming,

02:55:11 but I was a domain expert, right?

02:55:13 And to me, the parentheses were in the way.

02:55:15 It’s like, wow, there’s just all this,

02:55:17 like it just gets in the way of my thinking

02:55:19 about what I’m doing.

02:55:20 So why would I have all these, right?

02:55:22 That was my initial reaction to it.

02:55:24 And now as I appreciate kind of the structure

02:55:27 that kind of naturally maps to a logical thinking

02:55:30 about a program, I can appreciate them, right?

02:55:33 And why it’s actually, you could create editors

02:55:35 that make it not so problematic, right, honestly.

02:55:40 So I actually have a much more appreciation of Lisp

02:55:43 and things like Clojure and there’s HyVee,

02:55:44 which is a Python Lisp that compiles the Python bytecode.

02:55:48 I think it’s challenging.

02:55:50 Like typically these languages are,

02:55:53 I even saw the whole data science programming system

02:55:55 in Lisp that somebody created, which is cool.

02:55:58 But again, I think it’s the lack of recognition

02:56:00 of the fact that there exists

02:56:02 what I call occasional programmers.

02:56:04 People that are never gonna be programmers for a living.

02:56:05 They don’t want to have all this cuteness in their head.

02:56:08 They want just, it’s why basic, you know,

02:56:11 Microsoft had the right idea with basic

02:56:14 in terms of having that be the language of visual basic,

02:56:17 the language of Excel and SQL Server.

02:56:21 They should have converted that to Python 10 years ago.

02:56:23 Like the world would be a better place if they had, but.

02:56:27 There’s also, there’s a beauty and a magic

02:56:29 to the history behind a language in Lisp.

02:56:31 You know, some of the most interesting people

02:56:34 in the history of computer science

02:56:35 and artificial intelligence have used Lisp.

02:56:37 So you feel.

02:56:40 Well, especially that language,

02:56:41 when you have a language, you can think in it.

02:56:43 And it helps you think better.

02:56:44 And it attracts a certain kinds of people

02:56:45 that think in a certain kind of way.

02:56:46 And then that’s there.

02:56:48 Okay, so what about like small laptop with a tiny keyboard,

02:56:52 or is there like three screens?

02:56:55 You know, good question.

02:56:55 I’ve never gotten into the big, many screens to be honest.

02:56:58 I mean, and maybe it’s because in my head,

02:57:00 I kind of just, I just swap between windows.

02:57:03 Like, partly because I guess I really can’t process

02:57:07 three screens at once anyway.

02:57:09 Like, I just am looking at one and I just flip.

02:57:12 You know, I flip an application open.

02:57:14 So where it’s really helpful is actually

02:57:17 when I’m trying to do, you know,

02:57:18 here’s data and I want to input it from here.

02:57:20 Like this is the only time I really need another screen.

02:57:22 So now, because you’re both a developer, lead developers,

02:57:25 but then there’s also these businesses

02:57:27 and there’s salespeople and you’re working

02:57:30 with large companies.

02:57:30 Operations people, hiring people, yeah.

02:57:32 The whole thing.

02:57:33 Which operating system is your favorite at this point?

02:57:37 So Linux was the early days.

02:57:38 So yeah, I love Linux as a server side.

02:57:41 And it was early days I had my own Linux desktop.

02:57:44 I’ve been on Mac laptops for 10 years now.

02:57:47 Yeah, this is what leadership looks like.

02:57:50 As you switch to Mac.

02:57:52 Okay, great.

02:57:53 Pretty much, I mean, just the fact that I had

02:57:56 to do PowerPoints, I had to do presentations

02:57:58 and you know, plug in, I just couldn’t mess

02:58:01 with plugging in laptops, it wouldn’t project and yeah.

02:58:04 So you mentioned also Quantset Labs and things like that.

02:58:09 Can you give advice on how to hire great programmers

02:58:13 and great people?

02:58:14 Yeah, I would say, produce an open source project,

02:58:19 get people contributing to it and hire those people.

02:58:21 Yeah, I mean, you’re doing it sort of,

02:58:25 you may be perhaps a little biased,

02:58:27 but that’s probably 100% really good advice.

02:58:30 I find it hard to hire.

02:58:31 I still find it hard to hire, like in terms of,

02:58:34 I don’t think that it’s not hard to hire

02:58:36 if I’ve worked with somebody for a couple of weeks,

02:58:39 but an hour or two of interviews, I have no idea.

02:58:43 So that instinct, that radar of knowing if you’re good

02:58:47 or not, that you’ve found that you’re still not able to.

02:58:50 It’s really hard, I mean, the resume can help,

02:58:53 but again, the resume is like a presentation

02:58:55 of the things they want you to see, not the reality of,

02:58:58 and there’s also, you have to understand

02:59:02 what you’re hiring for.

02:59:03 There are different stages and different kinds of skills.

02:59:06 And so it isn’t just, one of the things I talk a lot about

02:59:10 internally at my company is just that the whole idea

02:59:14 of measuring ourselves against a single axis is flawed

02:59:18 because we’re not, it’s a multidimensional space

02:59:20 and how do you order a multidimensional space?

02:59:22 There isn’t one ordering.

02:59:23 So this whole idea, you immediately get projected

02:59:26 into a thing when you’re talking about hiring

02:59:28 or best or worst or better or not better.

02:59:30 So what is the thing you’re actually needing?

02:59:33 And you can hire for that.

02:59:35 There is such a thing, generally, I really value people

02:59:39 who have the affect, that care about open source.

02:59:42 Like so in some cases, their affinity to open source

02:59:45 is simply kind of a filter of an affect.

02:59:49 However, I have found this interesting dichotomy

02:59:52 between open source contributors and product creation.

02:59:58 There’s, I don’t know if it’s fully true,

03:00:00 but there does seem to be the more experienced,

03:00:04 the more affect somebody has an open source community,

03:00:08 the less ability to actually produce product that they have.

03:00:11 And the opposite is kind of true too.

03:00:13 The more product focused are, I find a lot of people,

03:00:16 I’ve talked to a lot of people who produce

03:00:17 really great products and they have a,

03:00:19 they’re looking over the open source communities,

03:00:21 kind of wanting to participate and play,

03:00:23 but they’ve played here and they do a great job here

03:00:26 and then they don’t necessarily have some of the same.

03:00:29 Now I don’t think that’s entirely necessary.

03:00:32 I think part of it is cultural, how they’ve emerged.

03:00:34 Because one of the things that open source communities

03:00:36 often lack is great product management,

03:00:39 like some product management energy.

03:00:41 That’s brilliant, but you want both of those energies

03:00:43 in the same place together.

03:00:44 Yes, you really do.

03:00:45 And so a lot of it’s creating these teams of people

03:00:48 that have these needed skills and attributes

03:00:50 that are hard.

03:00:51 And so one of the big things I look for is somebody

03:00:55 that fundamentally recognizes their need to learn.

03:00:57 Like one of the values that we have

03:00:59 in all of the things we do is learning.

03:01:01 Like if somebody thinks they know it all,

03:01:04 they’re gonna struggle.

03:01:06 And some of that is just, there’s more basic things

03:01:09 like humility, just being humble in the face

03:01:12 of all the things you don’t know.

03:01:14 And that’s step one of learning.

03:01:15 That’s step one of learning, right?

03:01:16 And I’ve spent a lot of time learning, right?

03:01:20 Other people spend a lot more time,

03:01:21 but I’ve spent a lot of time learning.

03:01:23 My whole goal was to get a PhD because I love school

03:01:26 and I wanted to be a scientist.

03:01:28 And then what I found is what’s been written about

03:01:31 elsewhere as well is the more I learned,

03:01:32 the more I didn’t know.

03:01:33 The more I realized, man, I know about this,

03:01:37 but this is such a tiny thing in the global scope

03:01:40 of what I might wanna know about.

03:01:41 So I need to be listening a whole lot better

03:01:43 than I am just talking.

03:01:47 That’s changed a little bit actually.

03:01:48 My wife says that I used to be a better listener.

03:01:50 Now that I’m so full of all these ideas I wanna do,

03:01:52 she kind of says, you gotta give people time to talk.

03:01:55 So you’ve succeeded on multiple dimensions.

03:01:58 So one is the tenure track faculty.

03:02:01 The other is just creating all these products

03:02:03 and building up the businesses,

03:02:04 then working with businesses.

03:02:06 Do you have advice for young people today

03:02:09 in high school and college of how to live a life

03:02:13 as nonlinear and as successful as yours,

03:02:18 a life that they could be proud of?

03:02:21 Well, that’s a super compliment.

03:02:22 I’m humbled by that actually.

03:02:24 I would say a life they can be proud of.

03:02:27 Honestly, one thing that I’ve said to people is first,

03:02:31 find people you love and care about them.

03:02:34 Like family matters to me a lot.

03:02:36 And family means people you love and have committed to.

03:02:39 So it can be whatever you mean by that,

03:02:42 but you need to have a foundation.

03:02:45 So find people you love and wanna commit to and do that.

03:02:48 Cause it anchors you in a way that nothing else can.

03:02:52 And then you find other things.

03:02:55 And then kind of from out there,

03:02:56 you find other kinds of things you can commit to,

03:02:58 whether it’s ideas or people or groups of people.

03:03:03 So, especially in high school,

03:03:06 I would say don’t settle on what you think you know.

03:03:09 Like give yourself 10 years to think about the world.

03:03:13 Like I see a lot of high school students

03:03:15 who seem to know everything already.

03:03:17 I think I did too.

03:03:18 I think it’s maybe natural,

03:03:20 but recognize that the things you care about,

03:03:23 you might change your perspective over time.

03:03:26 I certainly have over time.

03:03:28 I was really passionate about one specific thing

03:03:30 and I was kind of softened.

03:03:32 I was a big, I didn’t like the Federal Reserve, right?

03:03:35 And there’s still, we could have a longer conversation

03:03:38 about monetary policy and finances,

03:03:40 but I’m a little more nuanced in my perspective

03:03:46 at this point.

03:03:48 But that’s one area where you learn about something,

03:03:50 go, ah, I wanna attack it.

03:03:52 Build, don’t destroy.

03:03:55 Build, like so often the tendency is to not like something

03:03:58 and wanna go attack it.

03:04:00 Build something, build something to replace it.

03:04:02 Yeah.

03:04:03 Build up, attract people to your new thing.

03:04:05 You’ll be far better, right?

03:04:08 You don’t need to destroy something to build something else.

03:04:12 So that’s, I guess, generally.

03:04:14 And then definitely like curiosity,

03:04:19 follow your curiosity and let it,

03:04:22 don’t just follow the money.

03:04:24 And all of that, like you said,

03:04:25 is grounded in family, friendship, and ultimately love.

03:04:30 Yes.

03:04:31 Which is a great way to end it.

03:04:34 Travis, you’re one of the most impactful people

03:04:37 in the engineering and the computer science

03:04:38 in the human world.

03:04:39 So I truly appreciate everything you’ve done.

03:04:43 And I really appreciate that you would spend

03:04:45 your valuable time with me.

03:04:46 It was an honor.

03:04:47 It was a real pleasure for me.

03:04:48 I appreciate that.

03:04:50 Thanks for listening to this conversation

03:04:52 with Travis Oliphant.

03:04:54 To support this podcast,

03:04:55 please check out our sponsors in the description.

03:04:57 And now, let me leave you with something

03:05:00 that in the programming world is called Hodgson’s Law.

03:05:04 Every sufficiently advanced Lisp application

03:05:08 will eventually be re implemented in Python.

03:05:12 Thank you for listening and hope to see you next time.