Jeremy Howard: fast.ai Deep Learning Courses and Research #35

Transcript

00:00:00 The following is a conversation with Jeremy Howard.

00:00:03 He’s the founder of FastAI, a research institute dedicated

00:00:07 to making deep learning more accessible.

00:00:09 He’s also a distinguished research scientist

00:00:12 at the University of San Francisco,

00:00:14 a former president of Kaggle,

00:00:16 as well as a top ranking competitor there.

00:00:18 And in general, he’s a successful entrepreneur,

00:00:21 educator, researcher, and an inspiring personality

00:00:25 in the AI community.

00:00:27 When someone asks me, how do I get started with deep learning?

00:00:30 FastAI is one of the top places that point them to.

00:00:33 It’s free, it’s easy to get started,

00:00:35 it’s insightful and accessible,

00:00:37 and if I may say so, it has very little BS

00:00:40 that can sometimes dilute the value of educational content

00:00:44 on popular topics like deep learning.

00:00:46 FastAI has a focus on practical application of deep learning

00:00:50 and hands on exploration of the cutting edge

00:00:52 that is incredibly both accessible to beginners

00:00:56 and useful to experts.

00:00:57 This is the Artificial Intelligence Podcast.

00:01:01 If you enjoy it, subscribe on YouTube,

00:01:03 give it five stars on iTunes,

00:01:05 support it on Patreon,

00:01:06 or simply connect with me on Twitter

00:01:09 at Lex Friedman, spelled F R I D M A N.

00:01:13 And now, here’s my conversation with Jeremy Howard.

00:01:18 What’s the first program you ever written?

00:01:21 First program I wrote that I remember

00:01:24 would be at high school.

00:01:29 I did an assignment where I decided

00:01:31 to try to find out if there were some better musical scales

00:01:36 than the normal 12 tone, 12 interval scale.

00:01:40 So I wrote a program on my Commodore 64 in basic

00:01:43 that searched through other scale sizes

00:01:46 to see if it could find one

00:01:47 where there were more accurate harmonies.

00:01:51 Like mid tone?

00:01:53 Like you want an actual exactly three to two ratio

00:01:56 or else with a 12 interval scale,

00:01:59 it’s not exactly three to two, for example.

00:02:01 So that’s well tempered as they say in there.

00:02:05 And basic on a Commodore 64.

00:02:07 Where was the interest in music from?

00:02:09 Or is it just technical?

00:02:10 I did music all my life.

00:02:12 So I played saxophone and clarinet and piano

00:02:15 and guitar and drums and whatever.

00:02:18 How does that thread go through your life?

00:02:22 Where’s music today?

00:02:24 It’s not where I wish it was.

00:02:28 For various reasons, couldn’t really keep it going,

00:02:30 particularly because I had a lot of problems

00:02:31 with RSI with my fingers.

00:02:33 And so I had to kind of like cut back anything

00:02:35 that used hands and fingers.

00:02:39 I hope one day I’ll be able to get back to it health wise.

00:02:43 So there’s a love for music underlying it all.

00:02:46 Yeah.

00:02:47 What’s your favorite instrument?

00:02:49 Saxophone.

00:02:50 Sax.

00:02:51 Or baritone saxophone.

00:02:52 Well, probably bass saxophone, but they’re awkward.

00:02:57 Well, I always love it when music

00:03:00 is coupled with programming.

00:03:01 There’s something about a brain that utilizes those

00:03:04 that emerges with creative ideas.

00:03:07 So you’ve used and studied quite a few programming languages.

00:03:11 Can you give an overview of what you’ve used?

00:03:15 What are the pros and cons of each?

00:03:17 Well, my favorite programming environment,

00:03:20 well, most certainly was Microsoft Access

00:03:24 back in like the earliest days.

00:03:26 So that was Visual Basic for applications,

00:03:28 which is not a good programming language,

00:03:30 but the programming environment was fantastic.

00:03:33 It’s like the ability to create, you know,

00:03:38 user interfaces and tie data and actions to them

00:03:41 and create reports and all that

00:03:43 as I’ve never seen anything as good.

00:03:46 There’s things nowadays like Airtable,

00:03:48 which are like small subsets of that,

00:03:54 which people love for good reason,

00:03:56 but unfortunately, nobody’s ever achieved

00:04:00 anything like that.

00:04:01 What is that?

00:04:01 If you could pause on that for a second.

00:04:03 Oh, Access?

00:04:04 Is it a database?

00:04:06 It was a database program that Microsoft produced,

00:04:09 part of Office, and they kind of withered, you know,

00:04:13 but basically it lets you in a totally graphical way

00:04:16 create tables and relationships and queries

00:04:18 and tie them to forms and set up, you know,

00:04:21 event handlers and calculations.

00:04:24 And it was a very complete powerful system

00:04:28 designed for not massive scalable things,

00:04:31 but for like useful little applications that I loved.

00:04:36 So what’s the connection between Excel and Access?

00:04:40 So very close.

00:04:42 So Access kind of was the relational database equivalent,

00:04:47 if you like.

00:04:47 So people still do a lot of that stuff

00:04:50 that should be in Access in Excel as they know it.

00:04:53 Excel’s great as well.

00:04:54 So, but it’s just not as rich a programming model

00:04:59 as VBA combined with a relational database.

00:05:04 And so I’ve always loved relational databases,

00:05:06 but today programming on top of relational database

00:05:10 is just a lot more of a headache.

00:05:13 You know, you generally either need to kind of,

00:05:15 you know, you need something that connects,

00:05:18 that runs some kind of database server

00:05:19 unless you use SQLite, which has its own issues.

00:05:25 Then you kind of often,

00:05:25 if you want to get a nice programming model,

00:05:27 you’ll need to like create an, add an ORM on top.

00:05:30 And then, I don’t know,

00:05:31 there’s all these pieces to tie together

00:05:34 and it’s just a lot more awkward than it should be.

00:05:37 There are people that are trying to make it easier.

00:05:39 So in particular, I think of F sharp, you know, Don Syme,

00:05:42 who, him and his team have done a great job

00:05:45 of making something like a database appear

00:05:50 in the type system.

00:05:51 So you actually get like tab completion for fields

00:05:54 and tables and stuff like that.

00:05:57 Anyway, so that was kind of, anyway,

00:05:59 so like that whole VBA office thing, I guess,

00:06:01 was a starting point, which I still miss.

00:06:04 And I got into standard Visual Basic, which…

00:06:07 That’s interesting, just to pause on that for a second.

00:06:09 It’s interesting that you’re connecting programming languages

00:06:13 to the ease of management of data.

00:06:17 Yeah.

00:06:18 So in your use of programming languages,

00:06:20 you always had a love and a connection with data.

00:06:24 I’ve always been interested in doing useful things

00:06:28 for myself and for others,

00:06:29 which generally means getting some data

00:06:31 and doing something with it and putting it out there again.

00:06:34 So that’s been my interest throughout.

00:06:38 So I also did a lot of stuff with AppleScript

00:06:41 back in the early days.

00:06:43 So it’s kind of nice being able to get the computer

00:06:48 and computers to talk to each other

00:06:50 and to do things for you.

00:06:52 And then I think that one,

00:06:54 the programming language I most loved then

00:06:58 would have been Delphi, which was Object Pascal,

00:07:02 created by Anders Heilsberg,

00:07:04 who previously did Turbo Pascal

00:07:07 and then went on to create.NET

00:07:08 and then went on to create TypeScript.

00:07:11 Delphi was amazing because it was like a compiled,

00:07:14 fast language that was as easy to use as Visual Basic.

00:07:20 Delphi, what is it similar to in more modern languages?

00:07:27 Visual Basic.

00:07:28 Visual Basic.

00:07:29 Yeah, but a compiled, fast version.

00:07:32 So I’m not sure there’s anything quite like it anymore.

00:07:37 If you took like C Sharp or Java

00:07:40 and got rid of the virtual machine

00:07:42 and replaced it with something,

00:07:43 you could compile a small type binary.

00:07:46 I feel like it’s where Swift could get to

00:07:50 with the new Swift UI

00:07:52 and the cross platform development going on.

00:07:56 Like that’s one of my dreams

00:07:59 is that we’ll hopefully get back to where Delphi was.

00:08:02 There is actually a free Pascal project nowadays

00:08:08 called Lazarus,

00:08:09 which is also attempting to kind of recreate Delphi.

00:08:13 So they’re making good progress.

00:08:16 So, okay, Delphi,

00:08:18 that’s one of your favorite programming languages.

00:08:20 Well, it’s programming environments.

00:08:22 Again, I’d say Pascal’s not a nice language.

00:08:26 If you wanted to know specifically

00:08:27 about what languages I like,

00:08:29 I would definitely pick J as being an amazingly wonderful

00:08:33 language.

00:08:35 What’s J?

00:08:37 J, are you aware of APL?

00:08:39 I am not, except from doing a little research

00:08:42 on the work you’ve done.

00:08:44 Okay, so not at all surprising you’re not familiar with it

00:08:48 because it’s not well known,

00:08:49 but it’s actually one of the main families

00:08:54 of programming languages going back to the late 50s,

00:08:57 early 60s.

00:08:57 So there was a couple of major directions.

00:09:01 One was the kind of Lambda Calculus Alonzo Church direction,

00:09:06 which I guess kind of lisp and scheme and whatever,

00:09:09 which has a history going back

00:09:12 to the early days of computing.

00:09:13 The second was the kind of imperative slash OO,

00:09:18 algo similar going on to C, C++ and so forth.

00:09:23 There was a third,

00:09:24 which are called array oriented languages,

00:09:26 which started with a paper by a guy called Ken Iverson,

00:09:31 which was actually a math theory paper,

00:09:35 not a programming paper.

00:09:37 It was called Notation as a Tool for Thought.

00:09:41 And it was the development of a new way,

00:09:43 a new type of math notation.

00:09:45 And the idea is that this math notation

00:09:47 was much more flexible, expressive,

00:09:51 and also well defined than traditional math notation,

00:09:55 which is none of those things.

00:09:56 Math notation is awful.

00:09:59 And so he actually turned that into a programming language

00:10:02 and cause this was the early 50s or the sorry, late 50s,

00:10:05 all the names were available.

00:10:06 So he called his language a programming language or APL.

00:10:10 APL.

00:10:11 So APL is a implementation of notation

00:10:15 as a tool for thought by which he means math notation.

00:10:18 And Ken and his son went on to do many things,

00:10:22 but eventually they actually produced a new language

00:10:26 that was built on top of all the learnings of APL.

00:10:28 And that was called J.

00:10:30 And J is the most expressive, composable language

00:10:39 of beautifully designed language I’ve ever seen.

00:10:42 Does it have object oriented components?

00:10:44 Does it have that kind of thing?

00:10:45 Not really, it’s an array oriented language.

00:10:47 It’s the third path.

00:10:51 Are you saying array?

00:10:52 Array oriented, yeah.

00:10:53 What does it mean to be array oriented?

00:10:55 So array oriented means that you generally

00:10:57 don’t use any loops,

00:10:59 but the whole thing is done with kind of

00:11:02 a extreme version of broadcasting,

00:11:06 if you’re familiar with that NumPy slash Python concept.

00:11:09 So you do a lot with one line of code.

00:11:14 It looks a lot like math notation, highly compact.

00:11:19 And the idea is that you can kind of,

00:11:22 because you can do so much with one line of code,

00:11:24 a single screen of code is very unlikely to,

00:11:27 you very rarely need more than that

00:11:29 to express your program.

00:11:31 And so you can kind of keep it all in your head

00:11:33 and you can kind of clearly communicate it.

00:11:36 It’s interesting that APL created two main branches,

00:11:40 K and J.

00:11:41 J is this kind of like open source,

00:11:44 niche community of crazy enthusiasts like me.

00:11:49 And then the other path, K, was fascinating.

00:11:52 It’s an astonishingly expensive programming language,

00:11:56 which many of the world’s

00:11:58 most ludicrously rich hedge funds use.

00:12:02 So the entire K machine is so small

00:12:06 it sits inside level three cache on your CPU.

00:12:09 And it easily wins every benchmark I’ve ever seen

00:12:14 in terms of data processing speed.

00:12:16 But you don’t come across it very much

00:12:17 because it’s like $100,000 per CPU to run it.

00:12:22 It’s like this path of programming languages

00:12:26 is just so much, I don’t know,

00:12:28 so much more powerful in every way

00:12:30 than the ones that almost anybody uses every day.

00:12:33 So it’s all about computation.

00:12:37 It’s really focused on computation.

00:12:38 It’s pretty heavily focused on computation.

00:12:40 I mean, so much of programming

00:12:43 is data processing by definition.

00:12:45 So there’s a lot of things you can do with it.

00:12:48 But yeah, there’s not much work being done

00:12:51 on making like user interface toolkits or whatever.

00:12:57 I mean, there’s some, but they’re not great.

00:12:59 At the same time, you’ve done a lot of stuff

00:13:00 with Perl and Python.

00:13:03 So where does that fit into the picture of J and K and APL?

00:13:08 Well, it’s just much more pragmatic.

00:13:11 Like in the end, you kind of have to end up

00:13:13 where the libraries are, you know?

00:13:17 Like, cause to me, my focus is on productivity.

00:13:21 I just want to get stuff done and solve problems.

00:13:23 So Perl was great.

00:13:27 I created an email company called FastMail

00:13:29 and Perl was great cause back in the late nineties,

00:13:32 early two thousands, it just had a lot of stuff it could do.

00:13:38 I still had to write my own monitoring system

00:13:41 and my own web framework, my own whatever,

00:13:43 cause like none of that stuff existed.

00:13:45 But it was a super flexible language to do that in.

00:13:50 And you used Perl for FastMail, you used it as a backend?

00:13:54 Like so everything was written in Perl?

00:13:55 Yeah, yeah, everything, everything was Perl.

00:13:58 Why do you think Perl hasn’t succeeded

00:14:02 or hasn’t dominated the market where Python

00:14:05 really takes over a lot of the tasks?

00:14:07 Well, I mean, Perl did dominate.

00:14:09 It was everything, everywhere,

00:14:13 but then the guy that ran Perl, Larry Wohl,

00:14:17 kind of just didn’t put the time in anymore.

00:14:22 And no project can be successful if there isn’t,

00:14:28 you know, particularly one that started with a strong leader

00:14:31 that loses that strong leadership.

00:14:35 So then Python has kind of replaced it.

00:14:37 You know, Python is a lot less elegant language

00:14:43 in nearly every way,

00:14:45 but it has the data science libraries

00:14:48 and a lot of them are pretty great.

00:14:51 So I kind of use it

00:14:56 cause it’s the best we have,

00:14:58 but it’s definitely not good enough.

00:15:01 But what do you think the future of programming looks like?

00:15:04 What do you hope the future of programming looks like

00:15:06 if we zoom in on the computational fields,

00:15:08 on data science, on machine learning?

00:15:11 I hope Swift is successful

00:15:15 because the goal of Swift,

00:15:19 the way Chris Latner describes it,

00:15:21 is to be infinitely hackable.

00:15:22 And that’s what I want.

00:15:23 I want something where me and the people I do research with

00:15:26 and my students can look at

00:15:29 and change everything from top to bottom.

00:15:32 There’s nothing mysterious and magical and inaccessible.

00:15:36 Unfortunately with Python, it’s the opposite of that

00:15:38 because Python is so slow.

00:15:40 It’s extremely unhackable.

00:15:42 You get to a point where it’s like,

00:15:43 okay, from here on down at C.

00:15:45 So your debugger doesn’t work in the same way.

00:15:47 Your profiler doesn’t work in the same way.

00:15:48 Your build system doesn’t work in the same way.

00:15:50 It’s really not very hackable at all.

00:15:53 What’s the part you like to be hackable?

00:15:55 Is it for the objective of optimizing training

00:16:00 of neural networks, inference of neural networks?

00:16:02 Is it performance of the system

00:16:04 or is there some non performance related, just?

00:16:07 It’s everything.

00:16:09 I mean, in the end, I want to be productive

00:16:11 as a practitioner.

00:16:13 So that means that, so like at the moment,

00:16:16 our understanding of deep learning is incredibly primitive.

00:16:20 There’s very little we understand.

00:16:21 Most things don’t work very well,

00:16:23 even though it works better than anything else out there.

00:16:26 There’s so many opportunities to make it better.

00:16:28 So you look at any domain area,

00:16:31 like, I don’t know, speech recognition with deep learning

00:16:35 or natural language processing classification

00:16:38 with deep learning or whatever.

00:16:39 Every time I look at an area with deep learning,

00:16:41 I always see like, oh, it’s terrible.

00:16:44 There’s lots and lots of obviously stupid ways

00:16:47 to do things that need to be fixed.

00:16:50 So then I want to be able to jump in there

00:16:51 and quickly experiment and make them better.

00:16:54 You think the programming language has a role in that?

00:16:59 Huge role, yeah.

00:17:00 So currently, Python has a big gap

00:17:05 in terms of our ability to innovate,

00:17:09 particularly around recurrent neural networks

00:17:11 and natural language processing.

00:17:14 Because it’s so slow, the actual loop

00:17:18 where we actually loop through words,

00:17:20 we have to do that whole thing in CUDA C.

00:17:23 So we actually can’t innovate with the kernel,

00:17:27 the heart of that most important algorithm.

00:17:31 And it’s just a huge problem.

00:17:33 And this happens all over the place.

00:17:36 So we hit research limitations.

00:17:40 Another example, convolutional neural networks,

00:17:42 which are actually the most popular architecture

00:17:44 for lots of things, maybe most things in deep learning.

00:17:48 We almost certainly should be using

00:17:50 sparse convolutional neural networks,

00:17:52 but only like two people are,

00:17:55 because to do it, you have to rewrite

00:17:57 all of that CUDA C level stuff.

00:17:59 And yeah, just researchers and practitioners don’t.

00:18:04 So there’s just big gaps in what people actually research on,

00:18:09 what people actually implement

00:18:10 because of the programming language problem.

00:18:13 So you think it’s just too difficult to write in CUDA C

00:18:20 that a higher level programming language like Swift

00:18:24 should enable the easier,

00:18:30 fooling around creative stuff with RNNs

00:18:33 or with sparse convolutional neural networks?

00:18:34 Kind of.

00:18:35 Who’s at fault?

00:18:37 Who’s at charge of making it easy

00:18:41 for a researcher to play around?

00:18:42 I mean, no one’s at fault,

00:18:43 just nobody’s got around to it yet,

00:18:45 or it’s just, it’s hard, right?

00:18:46 And I mean, part of the fault is that we ignored

00:18:49 that whole APL kind of direction.

00:18:53 Nearly everybody did for 60 years, 50 years.

00:18:57 But recently people have been starting to

00:19:01 reinvent pieces of that

00:19:03 and kind of create some interesting new directions

00:19:05 in the compiler technology.

00:19:07 So the place where that’s particularly happening right now

00:19:11 is something called MLIR,

00:19:13 which is something that, again,

00:19:14 Chris Latina, the Swift guy, is leading.

00:19:18 And yeah, because it’s actually not gonna be Swift

00:19:20 on its own that solves this problem,

00:19:22 because the problem is that currently writing

00:19:24 a acceptably fast, you know, GPU program

00:19:30 is too complicated regardless of what language you use.

00:19:33 Right.

00:19:36 And that’s just because if you have to deal with the fact

00:19:38 that I’ve got, you know, 10,000 threads

00:19:41 and I have to synchronize between them all

00:19:43 and I have to put my thing into grid blocks

00:19:45 and think about warps and all this stuff,

00:19:47 it’s just so much boilerplate that to do that well,

00:19:50 you have to be a specialist at that

00:19:52 and it’s gonna be a year’s work to, you know,

00:19:56 optimize that algorithm in that way.

00:19:59 But with things like tensor comprehensions

00:20:03 and TILE and MLIR and TVM,

00:20:07 there’s all these various projects

00:20:08 which are all about saying,

00:20:10 let’s let people create like domain specific languages

00:20:14 for tensor computations.

00:20:16 These are the kinds of things we do generally

00:20:19 on the GPU for deep learning and then have a compiler

00:20:22 which can optimize that tensor computation.

00:20:28 A lot of this work is actually sitting

00:20:29 on top of a project called Halide,

00:20:32 which is a mind blowing project where they came up

00:20:37 with such a domain specific language.

00:20:38 In fact, two, one domain specific language for expressing

00:20:41 this is what my tensor computation is

00:20:43 and another domain specific language for expressing

00:20:46 this is the kind of the way I want you to structure

00:20:50 the compilation of that and like do it block by block

00:20:53 and do these bits in parallel.

00:20:54 And they were able to show how you can compress

00:20:57 the amount of code by 10X compared to optimized GPU code

00:21:03 and get the same performance.

00:21:05 So that’s like, so these other things are kind of sitting

00:21:08 on top of that kind of research and MLIR is pulling a lot

00:21:12 of those best practices together.

00:21:15 And now we’re starting to see work done on making all

00:21:18 of that directly accessible through Swift

00:21:21 so that I could use Swift to kind of write those

00:21:23 domain specific languages and hopefully we’ll get

00:21:27 then Swift CUDA kernels written in a very expressive

00:21:30 and concise way that looks a bit like J and APL

00:21:34 and then Swift layers on top of that

00:21:36 and then a Swift UI on top of that.

00:21:38 And it’ll be so nice if we can get to that point.

00:21:42 Now does it all eventually boil down to CUDA

00:21:46 and NVIDIA GPUs?

00:21:48 Unfortunately at the moment it does,

00:21:50 but one of the nice things about MLIR if AMD ever

00:21:54 gets their act together which they probably won’t

00:21:56 is that they or others could write MLIR backends

00:22:02 for other GPUs or rather tensor computation devices

00:22:09 of which today there are increasing number

00:22:11 like Graph Core or Vertex AI or whatever.

00:22:18 So yeah, being able to target lots of backends

00:22:22 would be another benefit of this

00:22:23 and the market really needs competition

00:22:26 because at the moment NVIDIA is massively overcharging

00:22:29 for their kind of enterprise class cards

00:22:33 because there is no serious competition

00:22:36 because nobody else is doing the software properly.

00:22:39 In the cloud there is some competition, right?

00:22:41 But…

00:22:42 Not really, other than TPUs perhaps,

00:22:45 but TPUs are almost unprogrammable at the moment.

00:22:48 So TPUs have the same problem that you can’t?

00:22:51 It’s even worse.

00:22:52 So TPUs, Google actually made an explicit decision

00:22:54 to make them almost entirely unprogrammable

00:22:57 because they felt that there was too much IP in there

00:22:59 and if they gave people direct access to program them,

00:23:02 people would learn their secrets.

00:23:04 So you can’t actually directly program the memory

00:23:09 in a TPU.

00:23:11 You can’t even directly create code that runs on

00:23:15 and that you look at on the machine that has the TPU,

00:23:18 it all goes through a virtual machine.

00:23:19 So all you can really do is this kind of cookie cutter thing

00:23:22 of like plug in high level stuff together,

00:23:26 which is just super tedious and annoying

00:23:30 and totally unnecessary.

00:23:32 So what was the, tell me if you could,

00:23:36 the origin story of fast AI.

00:23:38 What is the motivation, its mission, its dream?

00:23:43 So I guess the founding story is heavily tied

00:23:48 to my previous startup, which is a company called Analytic,

00:23:51 which was the first company to focus on deep learning

00:23:54 for medicine and I created that because I saw

00:23:58 that was a huge opportunity to,

00:24:02 there’s about a 10X shortage of the number of doctors

00:24:05 in the world, in the developing world that we need.

00:24:08 I expected it would take about 300 years

00:24:11 to train enough doctors to meet that gap.

00:24:13 But I guess that maybe if we used deep learning

00:24:19 for some of the analytics, we could maybe make it

00:24:22 so you don’t need as highly trained doctors.

00:24:25 For diagnosis.

00:24:26 For diagnosis and treatment planning.

00:24:27 Where’s the biggest benefit just before we get to fast AI,

00:24:31 where’s the biggest benefit of AI

00:24:33 and medicine that you see today?

00:24:36 And maybe next time.

00:24:37 Not much happening today in terms of like stuff

00:24:39 that’s actually out there, it’s very early.

00:24:41 But in terms of the opportunity,

00:24:42 it’s to take markets like India and China and Indonesia,

00:24:48 which have big populations, Africa,

00:24:52 small numbers of doctors,

00:24:55 and provide diagnostic, particularly treatment planning

00:25:00 and triage kind of on device so that if you do a test

00:25:05 for malaria or tuberculosis or whatever,

00:25:09 you immediately get something that even a healthcare worker

00:25:12 that’s had a month of training can get

00:25:16 a very high quality assessment of whether the patient

00:25:20 might be at risk and tell, okay,

00:25:22 we’ll send them off to a hospital.

00:25:25 So for example, in Africa, outside of South Africa,

00:25:29 there’s only five pediatric radiologists

00:25:31 for the entire continent.

00:25:32 So most countries don’t have any.

00:25:34 So if your kid is sick and they need something diagnosed

00:25:37 through medical imaging, the person,

00:25:39 even if you’re able to get medical imaging done,

00:25:41 the person that looks at it will be a nurse at best.

00:25:46 But actually in India, for example, and China,

00:25:50 almost no x rays are read by anybody,

00:25:52 by any trained professional because they don’t have enough.

00:25:57 So if instead we had a algorithm that could take

00:26:02 the most likely high risk 5% and say triage,

00:26:08 basically say, okay, someone needs to look at this,

00:26:11 it would massively change the kind of way

00:26:14 that what’s possible with medicine in the developing world.

00:26:18 And remember, they have, increasingly they have money.

00:26:21 They’re the developing world, they’re not the poor world,

00:26:23 they’re the developing world.

00:26:24 So they have the money.

00:26:25 So they’re building the hospitals,

00:26:27 they’re getting the diagnostic equipment,

00:26:30 but there’s no way for a very long time

00:26:33 will they be able to have the expertise.

00:26:37 Shortage of expertise, okay.

00:26:38 And that’s where the deep learning systems can step in

00:26:41 and magnify the expertise they do have.

00:26:44 Exactly, yeah.

00:26:46 So you do see, just to linger a little bit longer,

00:26:51 the interaction, do you still see the human experts

00:26:55 still at the core of these systems?

00:26:57 Yeah, absolutely.

00:26:58 Is there something in medicine

00:26:59 that could be automated almost completely?

00:27:01 I don’t see the point of even thinking about that

00:27:03 because we have such a shortage of people.

00:27:06 Why would we want to find a way not to use them?

00:27:09 We have people, so the idea of like,

00:27:13 even from an economic point of view,

00:27:14 if you can make them 10X more productive,

00:27:17 getting rid of the person,

00:27:18 doesn’t impact your unit economics at all.

00:27:21 And it totally ignores the fact

00:27:23 that there are things people do better than machines.

00:27:26 So it’s just to me,

00:27:27 that’s not a useful way of framing the problem.

00:27:32 I guess, just to clarify,

00:27:33 I guess I meant there may be some problems

00:27:36 where you can avoid even going to the expert ever,

00:27:40 sort of maybe preventative care or some basic stuff,

00:27:44 allowing food,

00:27:44 allowing the expert to focus on the things

00:27:46 that are really that, you know.

00:27:49 Well, that’s what the triage would do, right?

00:27:50 So the triage would say,

00:27:52 okay, there’s 99% sure there’s nothing here.

00:27:58 So that can be done on device

00:28:01 and they can just say, okay, go home.

00:28:03 So the experts are being used to look at the stuff

00:28:07 which has some chance it’s worth looking at,

00:28:10 which most things it’s not, it’s fine.

00:28:14 Why do you think that is?

00:28:15 You know, it’s fine.

00:28:16 Why do you think we haven’t quite made progress on that yet

00:28:19 in terms of the scale of how much AI is applied

00:28:27 in the medical field?

00:28:27 Oh, there’s a lot of reasons.

00:28:28 I mean, one is it’s pretty new.

00:28:29 I only started in Liddick in like 2014.

00:28:32 And before that, it’s hard to express

00:28:36 to what degree the medical world

00:28:37 was not aware of the opportunities here.

00:28:40 So I went to RSNA,

00:28:42 which is the world’s largest radiology conference.

00:28:46 And I told everybody I could, you know,

00:28:49 like I’m doing this thing with deep learning,

00:28:51 please come and check it out.

00:28:53 And no one had any idea what I was talking about

00:28:56 and no one had any interest in it.

00:28:59 So like we’ve come from absolute zero, which is hard.

00:29:05 And then the whole regulatory framework, education system,

00:29:09 everything is just set up to think of doctoring

00:29:13 in a very different way.

00:29:14 So today there is a small number of people

00:29:17 who are deep learning practitioners

00:29:20 and doctors at the same time.

00:29:23 And we’re starting to see the first ones

00:29:24 come out of their PhD programs.

00:29:26 So Zach Kahane over in Boston, Cambridge

00:29:31 has a number of students now who are data science experts,

00:29:37 deep learning experts, and actual medical doctors.

00:29:43 Quite a few doctors have completed our fast AI course now

00:29:47 and are publishing papers and creating journal reading groups

00:29:52 in the American Council of Radiology.

00:29:55 And like, it’s just starting to happen,

00:29:57 but it’s gonna be a long time coming.

00:29:59 It’s gonna happen, but it’s gonna be a long process.

00:30:02 The regulators have to learn how to regulate this.

00:30:04 They have to build guidelines.

00:30:08 And then the lawyers at hospitals

00:30:12 have to develop a new way of understanding

00:30:15 that sometimes it makes sense for data to be looked at

00:30:22 in raw form in large quantities

00:30:24 in order to create well changing results.

00:30:27 Yeah, so the regulation around data, all that,

00:30:30 it sounds probably the hardest problem,

00:30:33 but sounds reminiscent of autonomous vehicles as well.

00:30:36 Many of the same regulatory challenges,

00:30:38 many of the same data challenges.

00:30:40 Yeah, I mean, funnily enough,

00:30:41 the problem is less the regulation

00:30:43 and more the interpretation of that regulation

00:30:45 by lawyers in hospitals.

00:30:48 So HIPAA is actually, was designed to pay,

00:30:52 and HIPAA does not stand for privacy.

00:30:56 It stands for portability.

00:30:57 It’s actually meant to be a way that data can be used.

00:31:01 And it was created with lots of gray areas

00:31:04 because the idea is that would be more practical

00:31:06 and it would help people to use this legislation

00:31:10 to actually share data in a more thoughtful way.

00:31:13 Unfortunately, it’s done the opposite

00:31:15 because when a lawyer sees a gray area,

00:31:17 they say, oh, if we don’t know, we won’t get sued,

00:31:20 then we can’t do it.

00:31:22 So HIPAA is not exactly the problem.

00:31:26 The problem is more that there’s,

00:31:29 hospital lawyers are not incented

00:31:31 to make bold decisions about data portability.

00:31:36 Or even to embrace technology that saves lives.

00:31:40 They more want to not get in trouble

00:31:42 for embracing that technology.

00:31:44 It also saves lives in a very abstract way,

00:31:47 which is like, oh, we’ve been able to release

00:31:49 these 100,000 anonymized records.

00:31:52 I can’t point to the specific person

00:31:54 whose life that saved.

00:31:55 I can say like, oh, we ended up with this paper

00:31:57 which found this result,

00:31:58 which diagnosed a thousand more people

00:32:02 than we would have otherwise,

00:32:03 but it’s like, which ones were helped?

00:32:05 It’s very abstract.

00:32:07 And on the counter side of that,

00:32:09 you may be able to point to a life that was taken

00:32:13 because of something that was.

00:32:14 Yeah, or a person whose privacy was violated.

00:32:18 It’s like, oh, this specific person was deidentified.

00:32:24 So, identified.

00:32:25 Just a fascinating topic.

00:32:27 We’re jumping around.

00:32:28 We’ll get back to fast AI,

00:32:29 but on the question of privacy,

00:32:32 data is the fuel for so much innovation in deep learning.

00:32:38 What’s your sense on privacy?

00:32:39 Whether we’re talking about Twitter, Facebook, YouTube,

00:32:44 just the technologies like in the medical field

00:32:48 that rely on people’s data in order to create impact.

00:32:53 How do we get that right,

00:32:56 respecting people’s privacy and yet creating technology

00:33:01 that is learning from data?

00:33:03 One of my areas of focus is on doing more with less data.

00:33:08 More with less data, which,

00:33:11 so most vendors, unfortunately,

00:33:14 are strongly incented to find ways

00:33:17 to require more data and more computation.

00:33:20 So, Google and IBM being the most obvious.

00:33:24 IBM.

00:33:25 Yeah, so Watson.

00:33:27 So, Google and IBM both strongly push the idea

00:33:31 that you have to be,

00:33:33 that they have more data and more computation

00:33:35 and more intelligent people than anybody else.

00:33:37 And so you have to trust them to do things

00:33:39 because nobody else can do it.

00:33:42 And Google’s very upfront about this,

00:33:45 like Jeff Dean has gone out there and given talks

00:33:48 and said, our goal is to require

00:33:50 a thousand times more computation, but less people.

00:33:55 Our goal is to use the people that you have better

00:34:00 and the data you have better

00:34:01 and the computation you have better.

00:34:03 So, one of the things that we’ve discovered is,

00:34:06 or at least highlighted,

00:34:08 is that you very, very, very often

00:34:11 don’t need much data at all.

00:34:13 And so the data you already have in your organization

00:34:16 will be enough to get state of the art results.

00:34:19 So, like my starting point would be to kind of say

00:34:21 around privacy is a lot of people are looking for ways

00:34:25 to share data and aggregate data,

00:34:28 but I think often that’s unnecessary.

00:34:29 They assume that they need more data than they do

00:34:32 because they’re not familiar with the basics

00:34:34 of transfer learning, which is this critical technique

00:34:38 for needing orders of magnitude less data.

00:34:42 Is your sense, one reason you might wanna collect data

00:34:44 from everyone is like in the recommender system context,

00:34:50 where your individual, Jeremy Howard’s individual data

00:34:54 is the most useful for providing a product

00:34:58 that’s impactful for you.

00:34:59 So, for giving you advertisements,

00:35:02 for recommending to you movies,

00:35:04 for doing medical diagnosis,

00:35:07 is your sense we can build with a small amount of data,

00:35:11 general models that will have a huge impact

00:35:15 for most people that we don’t need to have data

00:35:18 from each individual?

00:35:19 On the whole, I’d say yes.

00:35:20 I mean, there are things like,

00:35:25 you know, recommender systems have this cold start problem

00:35:28 where, you know, Jeremy is a new customer,

00:35:30 we haven’t seen him before, so we can’t recommend him things

00:35:33 based on what else he’s bought and liked with us.

00:35:36 And there’s various workarounds to that.

00:35:38 Like in a lot of music programs,

00:35:40 we’ll start out by saying, which of these artists do you like?

00:35:44 Which of these albums do you like?

00:35:46 Which of these songs do you like?

00:35:49 Netflix used to do that, nowadays they tend not to.

00:35:53 People kind of don’t like that

00:35:54 because they think, oh, we don’t wanna bother the user.

00:35:57 So, you could work around that

00:35:58 by having some kind of data sharing

00:36:00 where you get my marketing record from Axiom or whatever,

00:36:04 and try to guess from that.

00:36:06 To me, the benefit to me and to society

00:36:12 of saving me five minutes on answering some questions

00:36:16 versus the negative externalities of the privacy issue

00:36:23 doesn’t add up.

00:36:24 So, I think like a lot of the time,

00:36:26 the places where people are invading our privacy

00:36:30 in order to provide convenience

00:36:32 is really about just trying to make them more money

00:36:36 and they move these negative externalities

00:36:40 to places that they don’t have to pay for them.

00:36:44 So, when you actually see regulations appear

00:36:48 that actually cause the companies

00:36:50 that create these negative externalities

00:36:52 to have to pay for it themselves,

00:36:53 they say, well, we can’t do it anymore.

00:36:56 So, the cost is actually too high.

00:36:58 But for something like medicine,

00:37:00 yeah, I mean, the hospital has my medical imaging,

00:37:05 my pathology studies, my medical records,

00:37:08 and also I own my medical data.

00:37:11 So, you can, so I help a startup called Doc.ai.

00:37:16 One of the things Doc.ai does is that it has an app.

00:37:19 You can connect to, you know, Sutter Health

00:37:23 and LabCorp and Walgreens

00:37:26 and download your medical data to your phone

00:37:29 and then upload it again at your discretion

00:37:33 to share it as you wish.

00:37:35 So, with that kind of approach,

00:37:38 we can share our medical information

00:37:41 with the people we want to.

00:37:44 Yeah, so control.

00:37:45 I mean, really being able to control

00:37:47 who you share it with and so on.

00:37:48 Yeah.

00:37:49 So, that has a beautiful, interesting tangent

00:37:53 to return back to the origin story of Fast.ai.

00:37:59 Right, so before I started Fast.ai,

00:38:02 I spent a year researching

00:38:06 where are the biggest opportunities for deep learning?

00:38:10 Because I knew from my time at Kaggle in particular

00:38:14 that deep learning had kind of hit this threshold point

00:38:16 where it was rapidly becoming the state of the art approach

00:38:19 in every area that looked at it.

00:38:21 And I’d been working with neural nets for over 20 years.

00:38:25 I knew that from a theoretical point of view,

00:38:27 once it hit that point,

00:38:28 it would do that in kind of just about every domain.

00:38:31 And so I kind of spent a year researching

00:38:34 what are the domains that’s gonna have

00:38:36 the biggest low hanging fruit

00:38:37 in the shortest time period.

00:38:39 I picked medicine, but there were so many

00:38:42 I could have picked.

00:38:43 And so there was a kind of level of frustration for me

00:38:46 of like, okay, I’m really glad we’ve opened up

00:38:49 the medical deep learning world.

00:38:51 And today it’s huge, as you know,

00:38:53 but we can’t do, I can’t do everything.

00:38:58 I don’t even know, like in medicine,

00:39:00 it took me a really long time to even get a sense

00:39:02 of like what kind of problems do medical practitioners solve?

00:39:05 What kind of data do they have?

00:39:06 Who has that data?

00:39:08 So I kind of felt like I need to approach this differently

00:39:12 if I wanna maximize the positive impact of deep learning.

00:39:16 Rather than me picking an area

00:39:19 and trying to become good at it and building something,

00:39:21 I should let people who are already domain experts

00:39:24 in those areas and who already have the data

00:39:27 do it themselves.

00:39:29 So that was the reason for Fast.ai

00:39:33 is to basically try and figure out

00:39:36 how to get deep learning into the hands of people

00:39:40 who could benefit from it and help them to do so

00:39:43 in as quick and easy and effective a way as possible.

00:39:47 Got it, so sort of empower the domain experts.

00:39:50 Yeah, and like partly it’s because like,

00:39:54 unlike most people in this field,

00:39:56 my background is very applied and industrial.

00:39:59 Like my first job was at McKinsey & Company.

00:40:02 I spent 10 years in management consulting.

00:40:04 I spend a lot of time with domain experts.

00:40:10 So I kind of respect them and appreciate them.

00:40:12 And I know that’s where the value generation in society is.

00:40:16 And so I also know how most of them can’t code

00:40:21 and most of them don’t have the time to invest

00:40:26 three years in a graduate degree or whatever.

00:40:29 So I was like, how do I upskill those domain experts?

00:40:33 I think that would be a super powerful thing,

00:40:36 the biggest societal impact I could have.

00:40:40 So yeah, that was the thinking.

00:40:41 So much of Fast.ai students and researchers

00:40:45 and the things you teach are pragmatically minded,

00:40:50 practically minded,

00:40:52 figuring out ways how to solve real problems and fast.

00:40:55 So from your experience,

00:40:57 what’s the difference between theory

00:40:59 and practice of deep learning?

00:41:03 Well, most of the research in the deep learning world

00:41:07 is a total waste of time.

00:41:09 Right, that’s what I was getting at.

00:41:11 Yeah.

00:41:12 It’s a problem in science in general.

00:41:16 Scientists need to be published,

00:41:19 which means they need to work on things

00:41:21 that their peers are extremely familiar with

00:41:24 and can recognize in advance in that area.

00:41:26 So that means that they all need to work on the same thing.

00:41:30 And so it really, and the thing they work on,

00:41:33 there’s nothing to encourage them to work on things

00:41:35 that are practically useful.

00:41:38 So you get just a whole lot of research,

00:41:41 which is minor advances and stuff

00:41:43 that’s been very highly studied

00:41:44 and has no significant practical impact.

00:41:49 Whereas the things that really make a difference,

00:41:50 like I mentioned transfer learning,

00:41:52 like if we can do better at transfer learning,

00:41:55 then it’s this like world changing thing

00:41:58 where suddenly like lots more people

00:41:59 can do world class work with less resources and less data.

00:42:06 But almost nobody works on that.

00:42:08 Or another example, active learning,

00:42:10 which is the study of like,

00:42:11 how do we get more out of the human beings in the loop?

00:42:15 That’s my favorite topic.

00:42:17 Yeah, so active learning is great,

00:42:18 but it’s almost nobody working on it

00:42:21 because it’s just not a trendy thing right now.

00:42:23 You know what somebody, sorry to interrupt,

00:42:27 you’re saying that nobody is publishing on active learning,

00:42:31 but there’s people inside companies,

00:42:33 anybody who actually has to solve a problem,

00:42:36 they’re going to innovate on active learning.

00:42:39 Yeah, everybody kind of reinvents active learning

00:42:42 when they actually have to work in practice

00:42:43 because they start labeling things and they think,

00:42:46 gosh, this is taking a long time and it’s very expensive.

00:42:49 And then they start thinking,

00:42:51 well, why am I labeling everything?

00:42:52 I’m only, the machine’s only making mistakes

00:42:54 on those two classes.

00:42:56 They’re the hard ones.

00:42:56 Maybe I’ll just start labeling those two classes.

00:42:58 And then you start thinking,

00:43:00 well, why did I do that manually?

00:43:01 Why can’t I just get the system to tell me

00:43:03 which things are going to be hardest?

00:43:05 It’s an obvious thing to do, but yeah,

00:43:08 it’s just like transfer learning.

00:43:11 It’s understudied and the academic world

00:43:14 just has no reason to care about practical results.

00:43:17 The funny thing is,

00:43:18 like I’ve only really ever written one paper.

00:43:19 I hate writing papers.

00:43:21 And I didn’t even write it.

00:43:22 It was my colleague, Sebastian Ruder,

00:43:24 who actually wrote it.

00:43:25 I just did the research for it,

00:43:28 but it was basically introducing transfer learning,

00:43:30 successful transfer learning to NLP for the first time.

00:43:34 The algorithm is called ULM fit.

00:43:36 And it actually, I actually wrote it for the course,

00:43:42 for the Fast AI course.

00:43:43 I wanted to teach people NLP and I thought,

00:43:45 I only want to teach people practical stuff.

00:43:47 And I think the only practical stuff is transfer learning.

00:43:50 And I couldn’t find any examples of transfer learning in NLP.

00:43:53 So I just did it.

00:43:54 And I was shocked to find that as soon as I did it,

00:43:57 which, you know, the basic prototype took a couple of days,

00:44:01 smashed the state of the art

00:44:02 on one of the most important data sets

00:44:04 in a field that I knew nothing about.

00:44:06 And I just thought, well, this is ridiculous.

00:44:10 And so I spoke to Sebastian about it

00:44:13 and he kindly offered to write it up, the results.

00:44:17 And so it ended up being published in ACL,

00:44:21 which is the top computational linguistics conference.

00:44:25 So like people do actually care once you do it,

00:44:28 but I guess it’s difficult for maybe like junior researchers

00:44:32 or like, I don’t care whether I get citations

00:44:36 or papers or whatever.

00:44:37 There’s nothing in my life that makes that important,

00:44:39 which is why I’ve never actually bothered

00:44:41 to write a paper myself.

00:44:43 But for people who do,

00:44:43 I guess they have to pick the kind of safe option,

00:44:49 which is like, yeah, make a slight improvement

00:44:52 on something that everybody’s already working on.

00:44:54 Yeah, nobody does anything interesting

00:44:58 or succeeds in life with the safe option.

00:45:01 Although, I mean, the nice thing is,

00:45:02 nowadays everybody is now working on NLP transfer learning

00:45:05 because since that time we’ve had GPT and GPT2 and BERT,

00:45:09 and, you know, it’s like, it’s, so yeah,

00:45:12 once you show that something’s possible,

00:45:15 everybody jumps in, I guess, so.

00:45:17 I hope to be a part of,

00:45:19 and I hope to see more innovation

00:45:20 and active learning in the same way.

00:45:22 I think transfer learning and active learning

00:45:24 are fascinating, public, open work.

00:45:27 I actually helped start a startup called Platform AI,

00:45:29 which is really all about active learning.

00:45:31 And yeah, it’s been interesting trying to kind of see

00:45:35 what research is out there and make the most of it.

00:45:37 And there’s basically none.

00:45:39 So we’ve had to do all our own research.

00:45:41 Once again, and just as you described.

00:45:44 Can you tell the story of the Stanford competition,

00:45:47 Dawn Bench, and FastAI’s achievement on it?

00:45:51 Sure, so something which I really enjoy

00:45:54 is that I basically teach two courses a year,

00:45:57 the Practical Deep Learning for Coders,

00:45:59 which is kind of the introductory course,

00:46:02 and then Cutting Edge Deep Learning for Coders,

00:46:04 which is the kind of research level course.

00:46:08 And while I teach those courses,

00:46:10 I basically have a big office

00:46:16 at the University of San Francisco,

00:46:18 big enough for like 30 people.

00:46:19 And I invite anybody, any student who wants to come

00:46:22 and hang out with me while I build the course.

00:46:25 And so generally it’s full.

00:46:26 And so we have 20 or 30 people in a big office

00:46:30 with nothing to do but study deep learning.

00:46:33 So it was during one of these times

00:46:35 that somebody in the group said,

00:46:37 oh, there’s a thing called Dawn Bench

00:46:40 that looks interesting.

00:46:41 And I was like, what the hell is that?

00:46:42 And they set out some competition

00:46:44 to see how quickly you can train a model.

00:46:46 Seems kind of, not exactly relevant to what we’re doing,

00:46:50 but it sounds like the kind of thing

00:46:51 which you might be interested in.

00:46:52 And I checked it out and I was like,

00:46:53 oh crap, there’s only 10 days till it’s over.

00:46:55 It’s too late.

00:46:58 And we’re kind of busy trying to teach this course.

00:47:00 But we’re like, oh, it would make an interesting case study

00:47:05 for the course.

00:47:06 It’s like, it’s all the stuff we’re already doing.

00:47:08 Why don’t we just put together

00:47:09 our current best practices and ideas?

00:47:12 So me and I guess about four students

00:47:16 just decided to give it a go.

00:47:17 And we focused on this small one called Cifar 10,

00:47:20 which is little 32 by 32 pixel images.

00:47:24 Can you say what Dawn Bench is?

00:47:26 Yeah, so it’s a competition to train a model

00:47:28 as fast as possible.

00:47:29 It was run by Stanford.

00:47:30 And it’s cheap as possible too.

00:47:32 That’s also another one for as cheap as possible.

00:47:34 And there was a couple of categories,

00:47:36 ImageNet and Cifar 10.

00:47:38 So ImageNet is this big 1.3 million image thing

00:47:42 that took a couple of days to train.

00:47:45 Remember a friend of mine, Pete Warden,

00:47:47 who’s now at Google.

00:47:51 I remember he told me how he trained ImageNet

00:47:53 a few years ago when he basically like had this

00:47:58 little granny flat out the back

00:47:59 that he turned into his ImageNet training center.

00:48:01 And he figured, you know, after like a year of work,

00:48:03 he figured out how to train it in like 10 days or something.

00:48:07 It’s like, that was a big job.

00:48:08 Whereas Cifar 10, at that time,

00:48:10 you could train in a few hours.

00:48:12 You know, it’s much smaller and easier.

00:48:14 So we thought we’d try Cifar 10.

00:48:18 And yeah, I’ve really never done that before.

00:48:23 Like I’d never really,

00:48:24 like things like using more than one GPU at a time

00:48:27 was something I tried to avoid.

00:48:29 Cause to me, it’s like very against the whole idea

00:48:32 of accessibility is should better do things with one GPU.

00:48:35 I mean, have you asked in the past before,

00:48:38 after having accomplished something,

00:48:39 how do I do this faster, much faster?

00:48:42 Oh, always, but it’s always, for me,

00:48:44 it’s always how do I make it much faster on a single GPU

00:48:47 that a normal person could afford in their day to day life.

00:48:50 It’s not how could I do it faster by, you know,

00:48:53 having a huge data center.

00:48:55 Cause to me, it’s all about like,

00:48:57 as many people should better use something as possible

00:48:59 without fussing around with infrastructure.

00:49:04 So anyways, in this case it’s like, well,

00:49:06 we can use eight GPUs just by renting a AWS machine.

00:49:10 So we thought we’d try that.

00:49:11 And yeah, basically using the stuff we were already doing,

00:49:16 we were able to get, you know, the speed,

00:49:20 you know, within a few days we had the speed down to,

00:49:23 I don’t know, a very small number of minutes.

00:49:26 I can’t remember exactly how many minutes it was,

00:49:28 but it might’ve been like 10 minutes or something.

00:49:31 And so, yeah, we found ourselves

00:49:32 at the top of the leaderboard easily

00:49:34 for both time and money, which really shocked me

00:49:39 cause the other people competing in this

00:49:40 were like Google and Intel and stuff

00:49:41 who I like know a lot more about this stuff

00:49:43 than I think we do.

00:49:45 So then we were emboldened.

00:49:46 We thought let’s try the ImageNet one too.

00:49:50 I mean, it seemed way out of our league,

00:49:53 but our goal was to get under 12 hours.

00:49:55 And we did, which was really exciting.

00:49:59 But we didn’t put anything up on the leaderboard,

00:50:01 but we were down to like 10 hours.

00:50:03 But then Google put in like five hours or something

00:50:09 and we’re just like, oh, we’re so screwed.

00:50:13 But we kind of thought, we’ll keep trying.

00:50:16 You know, if Google can do it in five,

00:50:17 I mean, Google did on five hours on something

00:50:19 on like a TPU pod or something, like a lot of hardware.

00:50:23 But we kind of like had a bunch of ideas to try.

00:50:26 Like a really simple thing was

00:50:28 why are we using these big images?

00:50:30 They’re like 224 or 256 by 256 pixels.

00:50:35 You know, why don’t we try smaller ones?

00:50:37 And just to elaborate, there’s a constraint

00:50:40 on the accuracy that your trained model

00:50:42 is supposed to achieve, right?

00:50:43 Yeah, you gotta achieve 93%, I think it was,

00:50:46 for ImageNet, exactly.

00:50:49 Which is very tough, so you have to.

00:50:51 Yeah, 93%, like they picked a good threshold.

00:50:54 It was a little bit higher

00:50:56 than what the most commonly used ResNet 50 model

00:51:00 could achieve at that time.

00:51:03 So yeah, so it’s quite a difficult problem to solve.

00:51:08 But yeah, we realized if we actually

00:51:09 just use 64 by 64 images,

00:51:14 it trained a pretty good model.

00:51:16 And then we could take that same model

00:51:18 and just give it a couple of epochs to learn 224 by 224 images.

00:51:21 And it was basically already trained.

00:51:24 It makes a lot of sense.

00:51:25 Like if you teach somebody,

00:51:26 like here’s what a dog looks like

00:51:28 and you show them low res versions,

00:51:30 and then you say, here’s a really clear picture of a dog,

00:51:33 they already know what a dog looks like.

00:51:35 So that like just, we jumped to the front

00:51:39 and we ended up winning parts of that competition.

00:51:43 We actually ended up doing a distributed version

00:51:47 over multiple machines a couple of months later

00:51:49 and ended up at the top of the leaderboard.

00:51:51 We had 18 minutes.

00:51:53 ImageNet.

00:51:53 Yeah, and it was,

00:51:55 and people have just kept on blasting through

00:51:57 again and again since then, so.

00:52:00 So what’s your view on multi GPU

00:52:03 or multiple machine training in general

00:52:06 as a way to speed code up?

00:52:09 I think it’s largely a waste of time.

00:52:11 Both of them.

00:52:12 I think it’s largely a waste of time.

00:52:13 Both multi GPU on a single machine and.

00:52:15 Yeah, particularly multi machines,

00:52:17 cause it’s just clunky.

00:52:21 Multi GPUs is less clunky than it used to be,

00:52:25 but to me anything that slows down your iteration speed

00:52:28 is a waste of time.

00:52:31 So you could maybe do your very last,

00:52:34 you know, perfecting of the model on multi GPUs

00:52:38 if you need to, but.

00:52:40 So for example, I think doing stuff on ImageNet

00:52:44 is generally a waste of time.

00:52:46 Why test things on 1.3 million images?

00:52:48 Most of us don’t use 1.3 million images.

00:52:51 And we’ve also done research that shows that

00:52:53 doing things on a smaller subset of images

00:52:56 gives you the same relative answers anyway.

00:52:59 So from a research point of view, why waste that time?

00:53:02 So actually I released a couple of new data sets recently.

00:53:06 One is called ImageNet,

00:53:07 the French ImageNet, which is a small subset of ImageNet,

00:53:12 which is designed to be easy to classify.

00:53:15 What’s, how do you spell ImageNet?

00:53:17 It’s got an extra T and E at the end,

00:53:19 cause it’s very French.

00:53:20 And then another one called ImageWolf,

00:53:24 which is a subset of ImageNet that only contains dog breeds.

00:53:29 And that’s a hard one, right?

00:53:31 That’s a hard one.

00:53:31 And I’ve discovered that if you just look at these

00:53:34 two subsets, you can train things on a single GPU

00:53:37 in 10 minutes.

00:53:39 And the results you get are directly transferable

00:53:42 to ImageNet nearly all the time.

00:53:44 And so now I’m starting to see some researchers

00:53:46 start to use these much smaller data sets.

00:53:48 I so deeply love the way you think,

00:53:51 because I think you might’ve written a blog post

00:53:55 saying that sort of going these big data sets

00:54:00 is encouraging people to not think creatively.

00:54:03 Absolutely.

00:54:04 So you’re too, it sort of constrains you to train

00:54:08 on large resources.

00:54:09 And because you have these resources,

00:54:11 you think more research will be better.

00:54:13 And then you start, so like somehow you kill the creativity.

00:54:17 Yeah, and even worse than that, Lex,

00:54:19 I keep hearing from people who say,

00:54:21 I decided not to get into deep learning

00:54:23 because I don’t believe it’s accessible to people

00:54:26 outside of Google to do useful work.

00:54:28 So like I see a lot of people make an explicit decision

00:54:31 to not learn this incredibly valuable tool

00:54:35 because they’ve drunk the Google Koolaid,

00:54:39 which is that only Google’s big enough

00:54:40 and smart enough to do it.

00:54:42 And I just find that so disappointing and it’s so wrong.

00:54:45 And I think all of the major breakthroughs in AI

00:54:49 in the next 20 years will be doable on a single GPU.

00:54:53 Like I would say, my sense is all the big sort of.

00:54:57 Well, let’s put it this way.

00:54:58 None of the big breakthroughs of the last 20 years

00:55:00 have required multiple GPUs.

00:55:01 So like batch norm, ReLU, Dropout.

00:55:05 To demonstrate that there’s something to them.

00:55:08 Every one of them, none of them has required multiple GPUs.

00:55:11 GANs, the original GANs didn’t require multiple GPUs.

00:55:15 Well, and we’ve actually recently shown

00:55:18 that you don’t even need GANs.

00:55:19 So we’ve developed GAN level outcomes without needing GANs.

00:55:24 And we can now do it with, again,

00:55:26 by using transfer learning,

00:55:27 we can do it in a couple of hours on a single GPU.

00:55:30 You’re just using a generator model

00:55:31 without the adversarial part?

00:55:32 Yeah, so we’ve found loss functions

00:55:35 that work super well without the adversarial part.

00:55:38 And then one of our students, a guy called Jason Antich,

00:55:41 has created a system called dealtify,

00:55:44 which uses this technique to colorize

00:55:47 old black and white movies.

00:55:48 You can do it on a single GPU,

00:55:50 colorize a whole movie in a couple of hours.

00:55:52 And one of the things that Jason and I did together

00:55:56 was we figured out how to add a little bit of GAN

00:56:00 at the very end, which it turns out for colorization

00:56:02 makes it just a bit brighter and nicer.

00:56:05 And then Jason did masses of experiments

00:56:07 to figure out exactly how much to do,

00:56:09 but it’s still all done on his home machine

00:56:12 on a single GPU in his lounge room.

00:56:15 And if you think about colorizing Hollywood movies,

00:56:19 that sounds like something a huge studio would have to do,

00:56:21 but he has the world’s best results on this.

00:56:25 There’s this problem of microphones.

00:56:27 We’re just talking to microphones now.

00:56:29 It’s such a pain in the ass to have these microphones

00:56:32 to get good quality audio.

00:56:34 And I tried to see if it’s possible to plop down

00:56:36 a bunch of cheap sensors and reconstruct

00:56:39 higher quality audio from multiple sources.

00:56:41 Because right now I haven’t seen the work from,

00:56:45 okay, we can say even expensive mics

00:56:47 automatically combining audio from multiple sources

00:56:50 to improve the combined audio.

00:56:52 People haven’t done that.

00:56:53 And that feels like a learning problem.

00:56:55 So hopefully somebody can.

00:56:56 Well, I mean, it’s evidently doable

00:56:58 and it should have been done by now.

00:57:01 I felt the same way about computational photography

00:57:03 four years ago.

00:57:05 Why are we investing in big lenses

00:57:07 when three cheap lenses plus actually

00:57:10 a little bit of intentional movement,

00:57:13 so like take a few frames,

00:57:16 gives you enough information

00:57:18 to get excellent subpixel resolution,

00:57:20 which particularly with deep learning,

00:57:22 you would know exactly what you meant to be looking at.

00:57:25 We can totally do the same thing with audio.

00:57:28 I think it’s madness that it hasn’t been done yet.

00:57:30 Is there progress on the photography company?

00:57:33 Yeah, photography is basically standard now.

00:57:36 So the Google Pixel Night Light,

00:57:40 I don’t know if you’ve ever tried it,

00:57:42 but it’s astonishing.

00:57:43 You take a picture in almost pitch black

00:57:45 and you get back a very high quality image.

00:57:49 And it’s not because of the lens.

00:57:51 Same stuff with like adding the bokeh

00:57:53 to the background blurring,

00:57:55 it’s done computationally.

00:57:57 This is the pixel right here.

00:57:58 Yeah, basically everybody now

00:58:01 is doing most of the fanciest stuff

00:58:05 on their phones with computational photography

00:58:07 and also increasingly people are putting

00:58:08 more than one lens on the back of the camera.

00:58:11 So the same will happen for audio for sure.

00:58:14 And there’s applications in the audio side.

00:58:16 If you look at an Alexa type device,

00:58:19 most people I’ve seen,

00:58:20 especially I worked at Google before,

00:58:22 when you look at noise background removal,

00:58:25 you don’t think of multiple sources of audio.

00:58:29 You don’t play with that as much

00:58:31 as I would hope people would.

00:58:31 But I mean, you can still do it even with one.

00:58:33 Like again, not much work’s been done in this area.

00:58:36 So we’re actually gonna be releasing an audio library soon,

00:58:39 which hopefully will encourage development of this

00:58:41 because it’s so underused.

00:58:43 The basic approach we used for our super resolution

00:58:46 and which Jason uses for dealtify

00:58:48 of generating high quality images,

00:58:50 the exact same approach would work for audio.

00:58:53 No one’s done it yet,

00:58:54 but it would be a couple of months work.

00:58:57 Okay, also learning rate in terms of Dawn Bench.

00:59:01 There’s some magic on learning rate

00:59:03 that you played around with that’s kind of interesting.

00:59:05 Yeah, so this is all work that came

00:59:06 from a guy called Leslie Smith.

00:59:09 Leslie’s a researcher who, like us,

00:59:12 cares a lot about just the practicalities

00:59:15 of training neural networks quickly and accurately,

00:59:20 which I think is what everybody should care about,

00:59:22 but almost nobody does.

00:59:24 And he discovered something very interesting,

00:59:28 which he calls super convergence,

00:59:29 which is there are certain networks

00:59:31 that with certain settings of high parameters

00:59:33 could suddenly be trained 10 times faster

00:59:37 by using a 10 times higher learning rate.

00:59:39 Now, no one published that paper

00:59:43 because it’s not an area of kind of active research

00:59:49 in the academic world.

00:59:50 No academics recognize that this is important.

00:59:52 And also deep learning in academia

00:59:56 is not considered a experimental science.

00:59:59 So unlike in physics where you could say like,

01:00:02 I just saw a subatomic particle do something

01:00:05 which the theory doesn’t explain,

01:00:07 you could publish that without an explanation.

01:00:10 And then in the next 60 years,

01:00:11 people can try to work out how to explain it.

01:00:14 We don’t allow this in the deep learning world.

01:00:16 So it’s literally impossible for Leslie

01:00:19 to publish a paper that says,

01:00:21 I’ve just seen something amazing happen.

01:00:23 This thing trained 10 times faster than it should have.

01:00:25 I don’t know why.

01:00:27 And so the reviewers were like,

01:00:28 well, you can’t publish that because you don’t know why.

01:00:30 So anyway.

01:00:31 That’s important to pause on

01:00:32 because there’s so many discoveries

01:00:34 that would need to start like that.

01:00:36 Every other scientific field I know of works that way.

01:00:39 I don’t know why ours is uniquely disinterested

01:00:43 in publishing unexplained experimental results,

01:00:47 but there it is.

01:00:48 So it wasn’t published.

01:00:51 Having said that,

01:00:52 I read a lot more unpublished papers than published papers

01:00:56 because that’s where you find the interesting insights.

01:01:00 So I absolutely read this paper.

01:01:02 And I was just like,

01:01:04 this is astonishingly mind blowing and weird

01:01:08 and awesome.

01:01:09 And like, why isn’t everybody only talking about this?

01:01:12 Because like, if you can train these things 10 times faster,

01:01:15 they also generalize better

01:01:16 because you’re doing less epochs,

01:01:18 which means you look at the data less,

01:01:20 you get better accuracy.

01:01:22 So I’ve been kind of studying that ever since.

01:01:24 And eventually Leslie kind of figured out

01:01:28 a lot of how to get this done.

01:01:30 And we added minor tweaks.

01:01:32 And a big part of the trick

01:01:33 is starting at a very low learning rate,

01:01:36 very gradually increasing it.

01:01:37 So as you’re training your model,

01:01:39 you would take very small steps at the start

01:01:42 and you gradually make them bigger and bigger

01:01:44 until eventually you’re taking much bigger steps

01:01:46 than anybody thought was possible.

01:01:49 There’s a few other little tricks to make it work,

01:01:51 but basically we can reliably get super convergence.

01:01:55 And so for the Dawn Bench thing,

01:01:56 we were using just much higher learning rates

01:01:59 than people expected to work.

01:02:02 What do you think the future of,

01:02:03 I mean, it makes so much sense

01:02:04 for that to be a critical hyperparameter learning rate

01:02:07 that you vary.

01:02:08 What do you think the future

01:02:09 of learning rate magic looks like?

01:02:13 Well, there’s been a lot of great work

01:02:14 in the last 12 months in this area.

01:02:17 And people are increasingly realizing that optimize,

01:02:20 like we just have no idea really how optimizers work.

01:02:23 And the combination of weight decay,

01:02:25 which is how we regularize optimizers,

01:02:27 and the learning rate,

01:02:29 and then other things like the epsilon we use

01:02:31 in the Adam optimizer,

01:02:32 they all work together in weird ways.

01:02:36 And different parts of the model,

01:02:38 this is another thing we’ve done a lot of work on

01:02:40 is research into how different parts of the model

01:02:43 should be trained at different rates in different ways.

01:02:46 So we do something we call discriminative learning rates,

01:02:49 which is really important,

01:02:50 particularly for transfer learning.

01:02:53 So really, I think in the last 12 months,

01:02:54 a lot of people have realized

01:02:55 that all this stuff is important.

01:02:57 There’s been a lot of great work coming out

01:03:00 and we’re starting to see algorithms appear,

01:03:03 which have very, very few dials, if any,

01:03:06 that you have to touch.

01:03:07 So I think what’s gonna happen

01:03:09 is the idea of a learning rate,

01:03:10 well, it almost already has disappeared

01:03:12 in the latest research.

01:03:14 And instead, it’s just like we know enough

01:03:18 about how to interpret the gradients

01:03:22 and the change of gradients we see

01:03:23 to know how to set every parameter

01:03:25 in an optimal way.

01:03:26 So you see the future of deep learning

01:03:30 where really, where’s the input of a human expert needed?

01:03:34 Well, hopefully the input of a human expert

01:03:36 will be almost entirely unneeded

01:03:38 from the deep learning point of view.

01:03:40 So again, like Google’s approach to this

01:03:43 is to try and use thousands of times more compute

01:03:46 to run lots and lots of models at the same time

01:03:49 and hope that one of them is good.

01:03:51 AutoML kind of thing?

01:03:51 Yeah, AutoML kind of stuff, which I think is insane.

01:03:56 When you better understand the mechanics

01:03:59 of how models learn,

01:04:01 you don’t have to try a thousand different models

01:04:03 to find which one happens to work the best.

01:04:05 You can just jump straight to the best one,

01:04:08 which means that it’s more accessible

01:04:09 in terms of compute, cheaper,

01:04:12 and also with less hyperparameters to set,

01:04:14 it means you don’t need deep learning experts

01:04:16 to train your deep learning model for you,

01:04:19 which means that domain experts can do more of the work,

01:04:22 which means that now you can focus the human time

01:04:24 on the kind of interpretation, the data gathering,

01:04:28 identifying model errors and stuff like that.

01:04:31 Yeah, the data side.

01:04:32 How often do you work with data these days

01:04:34 in terms of the cleaning, looking at it?

01:04:37 Like Darwin looked at different species

01:04:41 while traveling about.

01:04:42 Do you look at data?

01:04:45 Have you in your roots in Kaggle?

01:04:48 Always, yeah.

01:04:48 Look at data.

01:04:49 Yeah, I mean, it’s a key part of our course.

01:04:51 It’s like before we train a model in the course,

01:04:53 we see how to look at the data.

01:04:55 And then the first thing we do

01:04:56 after we train our first model,

01:04:57 which we fine tune an ImageNet model for five minutes.

01:05:00 And then the thing we immediately do after that

01:05:02 is we learn how to analyze the results of the model

01:05:05 by looking at examples of misclassified images

01:05:08 and looking at a classification matrix,

01:05:10 and then doing research on Google

01:05:15 to learn about the kinds of things that it’s misclassifying.

01:05:18 So to me, one of the really cool things

01:05:19 about machine learning models in general

01:05:21 is that when you interpret them,

01:05:24 they tell you about things like

01:05:25 what are the most important features,

01:05:27 which groups are you misclassifying,

01:05:29 and they help you become a domain expert more quickly

01:05:32 because you can focus your time on the bits

01:05:34 that the model is telling you is important.

01:05:38 So it lets you deal with things like data leakage,

01:05:40 for example, if it says,

01:05:41 oh, the main feature I’m looking at is customer ID.

01:05:45 And you’re like, oh, customer ID should be predictive.

01:05:47 And then you can talk to the people

01:05:50 that manage customer IDs and they’ll tell you like,

01:05:53 oh yes, as soon as a customer’s application is accepted,

01:05:57 we add a one on the end of their customer ID or something.

01:06:01 So yeah, looking at data,

01:06:03 particularly from the lens of which parts of the data

01:06:06 the model says is important is super important.

01:06:09 Yeah, and using the model to almost debug the data

01:06:12 to learn more about the data.

01:06:14 Exactly.

01:06:16 What are the different cloud options

01:06:18 for training your own networks?

01:06:20 Last question related to DawnBench.

01:06:21 Well, it’s part of a lot of the work you do,

01:06:24 but from a perspective of performance,

01:06:27 I think you’ve written this in a blog post.

01:06:29 There’s AWS, there’s TPU from Google.

01:06:32 What’s your sense?

01:06:33 What the future holds?

01:06:34 What would you recommend now in terms of training?

01:06:37 So from a hardware point of view,

01:06:40 Google’s TPUs and the best Nvidia GPUs are similar.

01:06:45 I mean, maybe the TPUs are like 30% faster,

01:06:47 but they’re also much harder to program.

01:06:49 There isn’t a clear leader in terms of hardware right now,

01:06:54 although much more importantly,

01:06:56 the Nvidia GPUs are much more programmable.

01:06:59 They’ve got much more written for all of them.

01:07:00 So like that’s the clear leader for me

01:07:03 and where I would spend my time

01:07:04 as a researcher and practitioner.

01:07:08 But then in terms of the platform,

01:07:12 I mean, we’re super lucky now with stuff like Google GCP,

01:07:16 Google Cloud, and AWS that you can access a GPU

01:07:21 pretty quickly and easily.

01:07:25 But I mean, for AWS, it’s still too hard.

01:07:28 Like you have to find an AMI and get the instance running

01:07:33 and then install the software you want and blah, blah, blah.

01:07:37 GCP is currently the best way to get started

01:07:40 on a full server environment

01:07:42 because they have a fantastic fast AI in PyTorch ready

01:07:46 to go instance, which has all the courses preinstalled.

01:07:51 It has Jupyter Notebook pre running.

01:07:53 Jupyter Notebook is this wonderful

01:07:55 interactive computing system,

01:07:57 which everybody basically should be using

01:08:00 for any kind of data driven research.

01:08:02 But then even better than that,

01:08:05 there are platforms like Salamander, which we own

01:08:09 and Paperspace, where literally you click a single button

01:08:13 and it pops up a Jupyter Notebook straight away

01:08:17 without any kind of installation or anything.

01:08:22 And all the course notebooks are all preinstalled.

01:08:25 So like for me, this is one of the things

01:08:28 we spent a lot of time kind of curating and working on.

01:08:34 Because when we first started our courses,

01:08:35 the biggest problem was people dropped out of lesson one

01:08:39 because they couldn’t get an AWS instance running.

01:08:42 So things are so much better now.

01:08:44 And like we actually have, if you go to course.fast.ai,

01:08:47 the first thing it says is here’s how to get started

01:08:49 with your GPU.

01:08:50 And there’s like, you just click on the link

01:08:52 and you click start and you’re going.

01:08:55 You’ll go GCP.

01:08:56 I have to confess, I’ve never used the Google GCP.

01:08:58 Yeah, GCP gives you $300 of compute for free,

01:09:01 which is really nice.

01:09:03 But as I say, Salamander and Paperspace

01:09:07 are even easier still.

01:09:09 Okay.

01:09:10 So from the perspective of deep learning frameworks,

01:09:15 you work with fast.ai, if you go to this framework,

01:09:18 and PyTorch and TensorFlow.

01:09:21 What are the strengths of each platform in your perspective?

01:09:25 So in terms of what we’ve done our research on

01:09:28 and taught in our course,

01:09:30 we started with Theano and Keras,

01:09:34 and then we switched to TensorFlow and Keras,

01:09:38 and then we switched to PyTorch,

01:09:40 and then we switched to PyTorch and fast.ai.

01:09:42 And that kind of reflects a growth and development

01:09:47 of the ecosystem of deep learning libraries.

01:09:52 Theano and TensorFlow were great,

01:09:57 but were much harder to teach and to do research

01:10:00 and development on because they define

01:10:02 what’s called a computational graph upfront,

01:10:05 a static graph, where you basically have to say,

01:10:07 here are all the things that I’m gonna eventually do

01:10:10 in my model, and then later on you say,

01:10:13 okay, do those things with this data.

01:10:15 And you can’t like debug them,

01:10:17 you can’t do them step by step,

01:10:18 you can’t program them interactively

01:10:20 in a Jupyter notebook and so forth.

01:10:22 PyTorch was not the first,

01:10:23 but PyTorch was certainly the strongest entrant

01:10:26 to come along and say, let’s not do it that way,

01:10:28 let’s just use normal Python.

01:10:31 And everything you know about in Python

01:10:32 is just gonna work, and we’ll figure out

01:10:35 how to make that run on the GPU as and when necessary.

01:10:40 That turned out to be a huge leap

01:10:44 in terms of what we could do with our research

01:10:46 and what we could do with our teaching.

01:10:49 Because it wasn’t limiting.

01:10:51 Yeah, I mean, it was critical for us

01:10:52 for something like DawnBench

01:10:53 to be able to rapidly try things.

01:10:55 It’s just so much harder to be a researcher

01:10:57 and practitioner when you have to do everything upfront

01:11:00 and you can’t inspect it.

01:11:03 Problem with PyTorch is it’s not at all accessible

01:11:07 to newcomers because you have to like

01:11:10 write your own training loop and manage the gradients

01:11:12 and all this stuff.

01:11:15 And it’s also like not great for researchers

01:11:17 because you’re spending your time dealing

01:11:19 with all this boilerplate and overhead

01:11:21 rather than thinking about your algorithm.

01:11:23 So we ended up writing this very multi layered API

01:11:27 that at the top level, you can train

01:11:29 a state of the art neural network

01:11:31 in three lines of code.

01:11:33 And which kind of talks to an API,

01:11:35 which talks to an API, which talks to an API,

01:11:36 which like you can dive into at any level

01:11:38 and get progressively closer to the machine

01:11:42 kind of levels of control.

01:11:45 And this is the fast AI library.

01:11:47 That’s been critical for us and for our students

01:11:51 and for lots of people that have won deep learning

01:11:54 competitions with it and written academic papers with it.

01:11:58 It’s made a big difference.

01:12:00 We’re still limited though by Python.

01:12:03 And particularly this problem with things like

01:12:06 recurrent neural nets say where you just can’t change things

01:12:11 unless you accept it going so slowly that it’s impractical.

01:12:15 So in the latest incarnation of the course

01:12:18 and with some of the research we’re now starting to do,

01:12:20 we’re starting to do stuff, some stuff in Swift.

01:12:24 I think we’re three years away from that

01:12:28 being super practical, but I’m in no hurry.

01:12:31 I’m very happy to invest the time to get there.

01:12:35 But with that, we actually already have a nascent version

01:12:39 of the fast AI library for vision running

01:12:42 on Swift and TensorFlow.

01:12:44 Cause a Python for TensorFlow is not gonna cut it.

01:12:48 It’s just a disaster.

01:12:49 What they did was they tried to replicate

01:12:53 the bits that people were saying they like about PyTorch,

01:12:57 this kind of interactive computation,

01:12:59 but they didn’t actually change

01:13:00 their foundational runtime components.

01:13:03 So they kind of added this like syntax sugar

01:13:06 they call TF Eager, TensorFlow Eager,

01:13:08 which makes it look a lot like PyTorch,

01:13:10 but it’s 10 times slower than PyTorch

01:13:12 to actually do a step.

01:13:16 So because they didn’t invest the time in like retooling

01:13:20 the foundations, cause their code base is so horribly

01:13:23 complex.

01:13:24 Yeah, I think it’s probably very difficult

01:13:25 to do that kind of retooling.

01:13:26 Yeah, well, particularly the way TensorFlow was written,

01:13:28 it was written by a lot of people very quickly

01:13:31 in a very disorganized way.

01:13:33 So like when you actually look in the code,

01:13:35 as I do often, I’m always just like,

01:13:37 Oh God, what were they thinking?

01:13:38 It’s just, it’s pretty awful.

01:13:41 So I’m really extremely negative

01:13:45 about the potential future for Python for TensorFlow.

01:13:50 But Swift for TensorFlow can be a different beast altogether.

01:13:53 It can be like, it can basically be a layer on top of MLIR

01:13:57 that takes advantage of, you know,

01:14:00 all the great compiler stuff that Swift builds on with LLVM

01:14:04 and yeah, I think it will be absolutely fantastic.

01:14:10 Well, you’re inspiring me to try.

01:14:11 I haven’t truly felt the pain of TensorFlow 2.0 Python.

01:14:17 It’s fine by me, but of…

01:14:21 Yeah, I mean, it does the job

01:14:22 if you’re using like predefined things

01:14:25 that somebody has already written.

01:14:27 But if you actually compare, you know,

01:14:29 like I’ve had to do,

01:14:31 cause I’ve been having to do a lot of stuff

01:14:32 with TensorFlow recently,

01:14:33 you actually compare like,

01:14:34 okay, I want to write something from scratch

01:14:37 and you’re like, I just keep finding it’s like,

01:14:38 Oh, it’s running 10 times slower than PyTorch.

01:14:41 So is the biggest cost,

01:14:43 let’s throw running time out the window.

01:14:47 How long it takes you to program?

01:14:49 That’s not too different now,

01:14:50 thanks to TensorFlow Eager, that’s not too different.

01:14:54 But because so many things take so long to run,

01:14:58 you wouldn’t run it at 10 times slower.

01:15:00 Like you just go like, Oh, this is taking too long.

01:15:03 And also there’s a lot of things

01:15:04 which are just less programmable,

01:15:05 like tf.data, which is the way data processing works

01:15:08 in TensorFlow is just this big mess.

01:15:11 It’s incredibly inefficient.

01:15:13 And they kind of had to write it that way

01:15:14 because of the TPU problems I described earlier.

01:15:19 So I just, you know,

01:15:22 I just feel like they’ve got this huge technical debt,

01:15:24 which they’re not going to solve

01:15:26 without starting from scratch.

01:15:27 So here’s an interesting question then,

01:15:29 if there’s a new student starting today,

01:15:34 what would you recommend they use?

01:15:37 Well, I mean, we obviously recommend Fastai and PyTorch

01:15:40 because we teach new students and that’s what we teach with.

01:15:43 So we would very strongly recommend that

01:15:46 because it will let you get on top of the concepts

01:15:50 much more quickly.

01:15:51 So then you’ll become an actual,

01:15:53 and you’ll also learn the actual state

01:15:54 of the art techniques, you know,

01:15:56 so you actually get world class results.

01:15:59 Honestly, it doesn’t much matter what library you learn

01:16:03 because switching from the trainer to MXNet

01:16:08 to TensorFlow to PyTorch is gonna be a couple of days work

01:16:12 as long as you understand the foundation as well.

01:16:15 But you think will Swift creep in there

01:16:19 as a thing that people start using?

01:16:22 Not for a few years,

01:16:24 particularly because like Swift has no data science

01:16:29 community, libraries, schooling.

01:16:33 And the Swift community has a total lack of appreciation

01:16:39 and understanding of numeric computing.

01:16:40 So like they keep on making stupid decisions, you know,

01:16:43 for years, they’ve just done dumb things

01:16:45 around performance and prioritization.

01:16:50 That’s clearly changing now

01:16:53 because the developer of Swift, Chris Latner,

01:16:58 is working at Google on Swift for TensorFlow.

01:17:00 So like that’s a priority.

01:17:04 It’ll be interesting to see what happens with Apple

01:17:05 because like Apple hasn’t shown any sign of caring

01:17:10 about numeric programming in Swift.

01:17:13 So I mean, hopefully they’ll get off their ass

01:17:17 and start appreciating this

01:17:18 because currently all of their low level libraries

01:17:22 are not written in Swift.

01:17:25 They’re not particularly Swifty at all,

01:17:27 stuff like CoreML, they’re really pretty rubbish.

01:17:30 So yeah, so there’s a long way to go.

01:17:33 But at least one nice thing is that Swift for TensorFlow

01:17:36 can actually directly use Python code and Python libraries

01:17:40 in a literally the entire lesson one notebook of fast AI

01:17:45 runs in Swift right now in Python mode.

01:17:48 So that’s a nice intermediate thing.

01:17:51 How long does it take?

01:17:53 If you look at the two fast AI courses,

01:17:57 how long does it take to get from point zero

01:18:00 to completing both courses?

01:18:03 It varies a lot.

01:18:05 Somewhere between two months and two years generally.

01:18:13 So for two months, how many hours a day on average?

01:18:16 So like somebody who is a very competent coder

01:18:20 can do 70 hours per course and pick up 70.

01:18:27 70, seven zero, that’s it, okay.

01:18:30 But a lot of people I know take a year off

01:18:35 to study fast AI full time and say at the end of the year,

01:18:40 they feel pretty competent

01:18:43 because generally there’s a lot of other things you do

01:18:45 like generally they’ll be entering Kaggle competitions,

01:18:48 they might be reading Ian Goodfellow’s book,

01:18:51 they might, they’ll be doing a bunch of stuff

01:18:54 and often particularly if they are a domain expert,

01:18:57 their coding skills might be a little

01:19:00 on the pedestrian side.

01:19:01 So part of it’s just like doing a lot more writing.

01:19:04 What do you find is the bottleneck for people usually

01:19:07 except getting started and setting stuff up?

01:19:11 I would say coding.

01:19:13 Yeah, I would say the best,

01:19:14 the people who are strong coders pick it up the best.

01:19:18 Although another bottleneck is people who have a lot

01:19:21 of experience of classic statistics can really struggle

01:19:27 because the intuition is so the opposite

01:19:30 of what they’re used to.

01:19:30 They’re very used to like trying to reduce the number

01:19:33 of parameters in their model

01:19:34 and looking at individual coefficients and stuff like that.

01:19:39 So I find people who have a lot of coding background

01:19:42 and know nothing about statistics

01:19:44 are generally gonna be the best off.

01:19:48 So you taught several courses on deep learning

01:19:51 and as Feynman says,

01:19:52 best way to understand something is to teach it.

01:19:55 What have you learned about deep learning from teaching it?

01:19:59 A lot.

01:20:00 That’s a key reason for me to teach the courses.

01:20:03 I mean, obviously it’s gonna be necessary

01:20:04 to achieve our goal of getting domain experts

01:20:07 to be familiar with deep learning,

01:20:09 but it was also necessary for me to achieve my goal

01:20:12 of being really familiar with deep learning.

01:20:18 I mean, to see so many domain experts

01:20:24 from so many different backgrounds,

01:20:25 it’s definitely, I wouldn’t say taught me,

01:20:28 but convinced me something that I liked to believe was true,

01:20:32 which was anyone can do it.

01:20:34 So there’s a lot of kind of snobbishness out there

01:20:37 about only certain people can learn to code.

01:20:40 Only certain people are gonna be smart enough

01:20:42 like do AI, that’s definitely bullshit.

01:20:45 I’ve seen so many people from so many different backgrounds

01:20:48 get state of the art results in their domain areas now.

01:20:53 It’s definitely taught me that the key differentiator

01:20:57 between people that succeed

01:20:58 and people that fail is tenacity.

01:21:00 That seems to be basically the only thing that matters.

01:21:05 A lot of people give up.

01:21:06 But of the ones who don’t give up,

01:21:09 pretty much everybody succeeds.

01:21:12 Even if at first I’m just kind of like thinking like,

01:21:15 wow, they really aren’t quite getting it yet, are they?

01:21:18 But eventually people get it and they succeed.

01:21:22 So I think that’s been,

01:21:24 I think they’re both things I liked to believe was true,

01:21:26 but I don’t feel like I really had strong evidence

01:21:28 for them to be true,

01:21:29 but now I can say I’ve seen it again and again.

01:21:32 I’ve seen it again and again. So what advice do you have

01:21:37 for someone who wants to get started in deep learning?

01:21:42 Train lots of models.

01:21:44 That’s how you learn it.

01:21:47 So I think, it’s not just me,

01:21:51 I think our course is very good,

01:21:53 but also lots of people independently

01:21:54 have said it’s very good.

01:21:55 It recently won the COGx award for AI courses

01:21:58 as being the best in the world.

01:21:59 So I’d say come to our course, course.fast.ai.

01:22:02 And the thing I keep on hopping on in my lessons

01:22:05 is train models, print out the inputs to the models,

01:22:09 print out to the outputs to the models,

01:22:11 like study, change the inputs a bit,

01:22:15 look at how the outputs vary,

01:22:17 just run lots of experiments

01:22:18 to get an intuitive understanding of what’s going on.

01:22:25 To get hooked, do you think, you mentioned training,

01:22:29 do you think just running the models inference,

01:22:32 like if we talk about getting started?

01:22:35 No, you’ve got to fine tune the models.

01:22:37 So that’s the critical thing,

01:22:39 because at that point you now have a model

01:22:41 that’s in your domain area.

01:22:43 So there’s no point running somebody else’s model

01:22:46 because it’s not your model.

01:22:48 So it only takes five minutes to fine tune a model

01:22:50 for the data you care about.

01:22:52 And in lesson two of the course,

01:22:53 we teach you how to create your own data set from scratch

01:22:56 by scripting Google image search.

01:22:58 So, and we show you how to actually create

01:23:01 a web application running online.

01:23:02 So I create one in the course that differentiates

01:23:05 between a teddy bear, a grizzly bear and a brown bear.

01:23:08 And it does it with basically 100% accuracy,

01:23:11 took me about four minutes to scrape the images

01:23:13 from Google search in the script.

01:23:15 There’s a little graphical widgets we have in the notebook

01:23:18 that help you clean up the data set.

01:23:21 There’s other widgets that help you study the results

01:23:24 to see where the errors are happening.

01:23:26 And so now we’ve got over a thousand replies

01:23:29 in our share your work here thread

01:23:31 of students saying, here’s the thing I built.

01:23:34 And so there’s people who like,

01:23:35 and a lot of them are state of the art.

01:23:37 Like somebody said, oh, I tried looking

01:23:39 at Devangari characters and I couldn’t believe it.

01:23:41 The thing that came out was more accurate

01:23:43 than the best academic paper after lesson one.

01:23:46 And then there’s others which are just more kind of fun,

01:23:48 like somebody who’s doing Trinidad and Tobago hummingbirds.

01:23:53 She said that’s kind of their national bird

01:23:54 and she’s got something that can now classify Trinidad

01:23:57 and Tobago hummingbirds.

01:23:58 So yeah, train models, fine tune models with your data set

01:24:02 and then study their inputs and outputs.

01:24:05 How much is Fast.ai courses?

01:24:07 Free.

01:24:08 Everything we do is free.

01:24:10 We have no revenue sources of any kind.

01:24:12 It’s just a service to the community.

01:24:15 You’re a saint.

01:24:16 Okay, once a person understands the basics,

01:24:20 trains a bunch of models,

01:24:22 if we look at the scale of years,

01:24:25 what advice do you have for someone wanting

01:24:27 to eventually become an expert?

01:24:30 Train lots of models.

01:24:31 But specifically train lots of models in your domain area.

01:24:35 So an expert what, right?

01:24:37 We don’t need more expert,

01:24:39 like create slightly evolutionary research in areas

01:24:45 that everybody’s studying.

01:24:46 We need experts at using deep learning

01:24:50 to diagnose malaria.

01:24:52 Or we need experts at using deep learning

01:24:55 to analyze language to study media bias.

01:25:01 So we need experts in analyzing fisheries

01:25:08 to identify problem areas in the ocean.

01:25:11 That’s what we need.

01:25:13 So become the expert in your passion area.

01:25:17 And this is a tool which you can use for just about anything

01:25:21 and you’ll be able to do that thing better

01:25:22 than other people, particularly by combining it

01:25:25 with your passion and domain expertise.

01:25:27 So that’s really interesting.

01:25:28 Even if you do wanna innovate on transfer learning

01:25:30 or active learning, your thought is,

01:25:34 I mean, it’s one I certainly share,

01:25:36 is you also need to find a domain or data set

01:25:40 that you actually really care for.

01:25:42 If you’re not working on a real problem that you understand,

01:25:45 how do you know if you’re doing it any good?

01:25:48 How do you know if your results are good?

01:25:49 How do you know if you’re getting bad results?

01:25:50 Why are you getting bad results?

01:25:52 Is it a problem with the data?

01:25:54 Like, how do you know you’re doing anything useful?

01:25:57 Yeah, to me, the only really interesting research is,

01:26:00 not the only, but the vast majority

01:26:02 of interesting research is like,

01:26:04 try and solve an actual problem and solve it really well.

01:26:06 So both understanding sufficient tools

01:26:09 on the deep learning side and becoming a domain expert

01:26:13 in a particular domain are really things

01:26:15 within reach for anybody.

01:26:18 Yeah, I mean, to me, I would compare it

01:26:20 to like studying self driving cars,

01:26:23 having never looked at a car or been in a car

01:26:26 or turned a car on, which is like the way it is

01:26:29 for a lot of people, they’ll study some academic data set

01:26:33 where they literally have no idea about that.

01:26:36 By the way, I’m not sure how familiar

01:26:37 with autonomous vehicles, but that is literally,

01:26:40 you describe a large percentage of robotics folks

01:26:43 working in self driving cars is they actually

01:26:45 haven’t considered driving.

01:26:48 They haven’t actually looked at what driving looks like.

01:26:50 They haven’t driven.

01:26:51 And it’s a problem because you know,

01:26:53 when you’ve actually driven, you know,

01:26:54 like these are the things that happened

01:26:55 to me when I was driving.

01:26:57 There’s nothing that beats the real world examples

01:26:59 of just experiencing them.

01:27:02 You’ve created many successful startups.

01:27:04 What does it take to create a successful startup?

01:27:08 Same thing as becoming a successful

01:27:11 deep learning practitioner, which is not giving up.

01:27:15 So you can run out of money or run out of time

01:27:23 or run out of something, you know,

01:27:24 but if you keep costs super low

01:27:28 and try and save up some money beforehand

01:27:29 so you can afford to have some time,

01:27:35 then just sticking with it is one important thing.

01:27:38 Doing something you understand and care about is important.

01:27:42 By something, I don’t mean,

01:27:44 the biggest problem I see with deep learning people

01:27:46 is they do a PhD in deep learning

01:27:50 and then they try and commercialize their PhD.

01:27:52 It is a waste of time

01:27:53 because that doesn’t solve an actual problem.

01:27:55 You picked your PhD topic

01:27:57 because it was an interesting kind of engineering

01:28:00 or math or research exercise.

01:28:02 But yeah, if you’ve actually spent time as a recruiter

01:28:06 and you know that most of your time was spent

01:28:09 sifting through resumes

01:28:10 and you know that most of the time

01:28:12 you’re just looking for certain kinds of things

01:28:14 and you can try doing that with a model for a few minutes

01:28:19 and see whether that’s something which a model

01:28:21 seems to be able to do as well as you could,

01:28:23 then you’re on the right track to creating a startup.

01:28:27 And then I think just, yeah, being, just be pragmatic and

01:28:32 try and stay away from venture capital money

01:28:36 as long as possible, preferably forever.

01:28:39 So yeah, on that point, do you venture capital?

01:28:43 So did you, were you able to successfully run startups

01:28:47 with self funded for quite a while?

01:28:48 Yeah, so my first two were self funded

01:28:50 and that was the right way to do it.

01:28:52 Is that scary?

01:28:54 No, VC startups are much more scary

01:28:57 because you have these people on your back

01:29:00 who do this all the time and who have done it for years

01:29:03 telling you grow, grow, grow, grow.

01:29:05 And they don’t care if you fail.

01:29:07 They only care if you don’t grow fast enough.

01:29:09 So that’s scary.

01:29:10 Whereas doing the ones myself, well, with partners

01:29:16 who were friends was nice

01:29:18 because like we just went along at a pace that made sense

01:29:22 and we were able to build it to something

01:29:23 which was big enough that we never had to work again

01:29:27 but was not big enough that any VC

01:29:29 would think it was impressive.

01:29:31 And that was enough for us to be excited, you know?

01:29:35 So I thought that’s a much better way

01:29:38 to do things than most people.

01:29:40 In generally speaking, not for yourself

01:29:41 but how do you make money during that process?

01:29:44 Do you cut into savings?

01:29:47 So yeah, so for, so I started Fast Mail

01:29:49 and Optimal Decisions at the same time in 1999

01:29:52 with two different friends.

01:29:54 And for Fast Mail, I guess I spent $70 a month

01:30:01 on the server.

01:30:04 And when the server ran out of space

01:30:06 I put a payments button on the front page

01:30:09 and said, if you want more than 10 mega space

01:30:11 you have to pay $10 a year.

01:30:15 And.

01:30:16 So run low, like keep your costs down.

01:30:18 Yeah, so I kept my costs down.

01:30:19 And once, you know, once I needed to spend more money

01:30:22 I asked people to spend the money for me.

01:30:25 And that, that was that.

01:30:28 Basically from then on, we were making money

01:30:30 and I was profitable from then.

01:30:35 For Optimal Decisions, it was a bit harder

01:30:37 because we were trying to sell something

01:30:40 that was more like a $1 million sale.

01:30:42 But what we did was we would sell scoping projects.

01:30:46 So kind of like prototypy projects

01:30:50 but rather than doing it for free

01:30:51 we would sell them 50 to $100,000.

01:30:54 So again, we were covering our costs

01:30:56 and also making the client feel

01:30:58 like we were doing something valuable.

01:31:00 So in both cases, we were profitable from six months in.

01:31:06 Ah, nevertheless, it’s scary.

01:31:08 I mean, yeah, sure.

01:31:10 I mean, it’s, it’s scary before you jump in

01:31:13 and I just, I guess I was comparing it

01:31:15 to the scarediness of VC.

01:31:18 I felt like with VC stuff, it was more scary.

01:31:20 Kind of much more in somebody else’s hands,

01:31:24 will they fund you or not?

01:31:26 And what do they think of what you’re doing?

01:31:27 I also found it very difficult with VCs,

01:31:29 back startups to actually do the thing

01:31:32 which I thought was important for the company

01:31:34 rather than doing the thing

01:31:35 which I thought would make the VC happy.

01:31:38 And VCs always tell you not to do the thing

01:31:40 that makes them happy.

01:31:42 But then if you don’t do the thing that makes them happy

01:31:44 they get sad, so.

01:31:46 And do you think optimizing for the,

01:31:48 whatever they call it, the exit is a good thing

01:31:51 to optimize for?

01:31:53 I mean, it can be, but not at the VC level

01:31:54 because the VC exit needs to be, you know, a thousand X.

01:31:59 So where else the lifestyle exit,

01:32:03 if you can sell something for $10 million,

01:32:05 then you’ve made it, right?

01:32:06 So I don’t, it depends.

01:32:09 If you want to build something that’s gonna,

01:32:11 you’re kind of happy to do forever, then fine.

01:32:13 If you want to build something you want to sell

01:32:16 in three years time, that’s fine too.

01:32:18 I mean, they’re both perfectly good outcomes.

01:32:21 So you’re learning Swift now, in a way.

01:32:24 I mean, you’ve already.

01:32:25 I’m trying to.

01:32:26 And I read that you use, at least in some cases,

01:32:31 space repetition as a mechanism for learning new things.

01:32:34 I use Anki quite a lot myself.

01:32:36 Me too.

01:32:38 I actually never talk to anybody about it.

01:32:41 Don’t know how many people do it,

01:32:44 but it works incredibly well for me.

01:32:46 Can you talk to your experience?

01:32:47 Like how did you, what do you?

01:32:51 First of all, okay, let’s back it up.

01:32:53 What is space repetition?

01:32:55 So space repetition is an idea created

01:33:00 by a psychologist named Ebbinghaus.

01:33:04 I don’t know, must be a couple of hundred years ago

01:33:06 or something, 150 years ago.

01:33:08 He did something which sounds pretty damn tedious.

01:33:10 He wrote down random sequences of letters on cards

01:33:15 and tested how well he would remember

01:33:18 those random sequences a day later, a week later, whatever.

01:33:23 He discovered that there was this kind of a curve

01:33:26 where his probability of remembering one of them

01:33:28 would be dramatically smaller the next day

01:33:30 and then a little bit smaller the next day

01:33:31 and a little bit smaller the next day.

01:33:33 What he discovered is that if he revised those cards

01:33:36 after a day, the probabilities would decrease

01:33:41 at a smaller rate.

01:33:42 And then if you revise them again a week later,

01:33:44 they would decrease at a smaller rate again.

01:33:47 And so he basically figured out a roughly optimal equation

01:33:51 for when you should revise something you wanna remember.

01:33:56 So space repetition learning is using this simple algorithm,

01:34:00 just something like revise something after a day

01:34:03 and then three days and then a week and then three weeks

01:34:06 and so forth.

01:34:07 And so if you use a program like Anki, as you know,

01:34:10 it will just do that for you.

01:34:12 And it will say, did you remember this?

01:34:14 And if you say no, it will reschedule it back

01:34:17 to appear again like 10 times faster

01:34:20 than it otherwise would have.

01:34:23 It’s a kind of a way of being guaranteed to learn something

01:34:27 because by definition, if you’re not learning it,

01:34:30 it will be rescheduled to be revised more quickly.

01:34:33 Unfortunately though, it’s also like,

01:34:36 it doesn’t let you fool yourself.

01:34:37 If you’re not learning something,

01:34:40 you know like your revisions will just get more and more.

01:34:44 So you have to find ways to learn things productively

01:34:48 and effectively like treat your brain well.

01:34:50 So using like mnemonics and stories and context

01:34:54 and stuff like that.

01:34:57 So yeah, it’s a super great technique.

01:34:59 It’s like learning how to learn is something

01:35:01 which everybody should learn

01:35:03 before they actually learn anything.

01:35:05 But almost nobody does.

01:35:07 So what have you, so it certainly works well

01:35:10 for learning new languages for, I mean,

01:35:13 for learning like small projects almost.

01:35:16 But do you, you know, I started using it for,

01:35:19 I forget who wrote a blog post about this inspired me.

01:35:22 It might’ve been you, I’m not sure.

01:35:26 I started when I read papers,

01:35:28 I’ll concepts and ideas, I’ll put them.

01:35:31 Was it Michael Nielsen?

01:35:32 It was Michael Nielsen.

01:35:33 So Michael started doing this recently

01:35:36 and has been writing about it.

01:35:41 So the kind of today’s Ebbinghaus

01:35:43 is a guy called Peter Wozniak

01:35:45 who developed a system called SuperMemo.

01:35:47 And he’s been basically trying to become like

01:35:51 the world’s greatest Renaissance man

01:35:54 over the last few decades.

01:35:55 He’s basically lived his life

01:35:57 with space repetition learning for everything.

01:36:03 I, and sort of like,

01:36:05 Michael’s only very recently got into this,

01:36:07 but he started really getting excited

01:36:08 about doing it for a lot of different things.

01:36:11 For me personally, I actually don’t use it

01:36:14 for anything except Chinese.

01:36:16 And the reason for that is that

01:36:20 Chinese is specifically a thing I made a conscious decision

01:36:23 that I want to continue to remember,

01:36:27 even if I don’t get much of a chance to exercise it,

01:36:30 cause like I’m not often in China, so I don’t.

01:36:33 Or else something like programming languages or papers.

01:36:38 I have a very different approach,

01:36:39 which is I try not to learn anything from them,

01:36:43 but instead I try to identify the important concepts

01:36:47 and like actually ingest them.

01:36:48 So like really understand that concept deeply

01:36:53 and study it carefully.

01:36:54 I will decide if it really is important,

01:36:56 if it is like incorporated into our library,

01:37:01 you know, incorporated into how I do things

01:37:04 or decide it’s not worth it, say.

01:37:07 So I find, I find I then remember the things

01:37:12 that I care about because I’m using it all the time.

01:37:15 So I’ve, for the last 25 years,

01:37:20 I’ve committed to spending at least half of every day

01:37:23 learning or practicing something new,

01:37:25 which is all my colleagues have always hated

01:37:28 because it always looks like I’m not working on

01:37:31 what I’m meant to be working on,

01:37:32 but it always means I do everything faster

01:37:34 because I’ve been practicing a lot of stuff.

01:37:36 So I kind of give myself a lot of opportunity

01:37:39 to practice new things.

01:37:41 And so I find now I don’t,

01:37:43 yeah, I don’t often kind of find myself

01:37:47 wishing I could remember something

01:37:50 because if it’s something that’s useful,

01:37:51 then I’ve been using it a lot.

01:37:53 It’s easy enough to look it up on Google,

01:37:56 but speaking Chinese, you can’t look it up on Google.

01:37:59 Do you have advice for people learning new things?

01:38:01 So if you, what have you learned as a process as a,

01:38:04 I mean, it all starts with just making the hours

01:38:07 and the day available.

01:38:08 Yeah, you got to stick with it,

01:38:10 which is again, the number one thing

01:38:12 that 99% of people don’t do.

01:38:13 So the people I started learning Chinese with,

01:38:15 none of them were still doing it 12 months later.

01:38:18 I’m still doing it 10 years later.

01:38:20 I tried to stay in touch with them,

01:38:21 but they just, no one did it.

01:38:24 For something like Chinese,

01:38:26 like study how human learning works.

01:38:28 So every one of my Chinese flashcards

01:38:31 is associated with a story.

01:38:33 And that story is specifically designed to be memorable.

01:38:36 And we find things memorable,

01:38:37 which are like funny or disgusting or sexy

01:38:41 or related to people that we know or care about.

01:38:44 So I try to make sure all of the stories

01:38:46 that are in my head have those characteristics.

01:38:51 Yeah, so you have to, you know,

01:38:52 you won’t remember things well

01:38:53 if they don’t have some context.

01:38:56 And yeah, you won’t remember them well

01:38:57 if you don’t regularly practice them,

01:39:00 whether it be just part of your day to day life

01:39:02 or the Chinese and me flashcards.

01:39:06 I mean, the other thing is,

01:39:07 I’ll let yourself fail sometimes.

01:39:09 So like I’ve had various medical problems

01:39:11 over the last few years.

01:39:13 And basically my flashcards

01:39:16 just stopped for about three years.

01:39:18 And there’ve been other times I’ve stopped for a few months

01:39:22 and it’s so hard because you get back to it

01:39:24 and it’s like, you have 18,000 cards due.

01:39:27 It’s like, and so you just have to go, all right,

01:39:30 well, I can either stop and give up everything

01:39:34 or just decide to do this every day for the next two years

01:39:37 until I get back to it.

01:39:39 The amazing thing has been that even after three years,

01:39:41 I, you know, the Chinese were still in there.

01:39:45 Like it was so much faster to relearn

01:39:48 than it was to learn the first time.

01:39:50 Yeah, absolutely.

01:39:52 It’s in there.

01:39:53 I have the same with guitar, with music and so on.

01:39:56 It’s sad because the work sometimes takes away

01:39:59 and then you won’t play for a year.

01:40:01 But really, if you then just get back to it every day,

01:40:03 you’re right there again.

01:40:06 What do you think is the next big breakthrough

01:40:08 in artificial intelligence?

01:40:09 What are your hopes in deep learning or beyond

01:40:12 that people should be working on

01:40:14 or you hope there’ll be breakthroughs?

01:40:16 I don’t think it’s possible to predict.

01:40:17 I think what we already have

01:40:20 is an incredibly powerful platform

01:40:23 to solve lots of societally important problems

01:40:26 that are currently unsolved.

01:40:27 So I just hope that people will,

01:40:29 lots of people will learn this toolkit and try to use it.

01:40:33 I don’t think we need a lot of new technological breakthroughs

01:40:36 to do a lot of great work right now.

01:40:39 And when do you think we’re going to create

01:40:42 a human level intelligence system?

01:40:45 Do you think?

01:40:46 Don’t know.

01:40:46 How hard is it?

01:40:47 How far away are we?

01:40:48 Don’t know.

01:40:49 Don’t know.

01:40:50 I have no way to know.

01:40:51 I don’t know why people make predictions about this

01:40:53 because there’s no data and nothing to go on.

01:40:57 And it’s just like,

01:41:00 there’s so many societally important problems

01:41:03 to solve right now.

01:41:04 I just don’t find it a really interesting question

01:41:08 to even answer.

01:41:10 So in terms of societally important problems,

01:41:12 what’s the problem that is within reach?

01:41:16 Well, I mean, for example,

01:41:17 there are problems that AI creates, right?

01:41:19 So more specifically,

01:41:23 labor force displacement is going to be huge

01:41:26 and people keep making this

01:41:29 frivolous econometric argument of being like,

01:41:31 oh, there’s been other things that aren’t AI

01:41:33 that have come along before

01:41:34 and haven’t created massive labor force displacement,

01:41:37 therefore AI won’t.

01:41:39 So that’s a serious concern for you?

01:41:41 Oh yeah.

01:41:42 Andrew Yang is running on it.

01:41:43 Yeah, it’s, I’m desperately concerned.

01:41:47 And you see already that the changing workplace

01:41:53 has led to a hollowing out of the middle class.

01:41:55 You’re seeing that students coming out of school today

01:41:59 have a less rosy financial future ahead of them

01:42:03 than their parents did,

01:42:03 which has never happened in recent,

01:42:06 in the last few hundred years.

01:42:08 You know, we’ve always had progress before.

01:42:11 And you see this turning into anxiety

01:42:15 and despair and even violence.

01:42:19 So I very much worry about that.

01:42:23 You’ve written quite a bit about ethics too.

01:42:25 I do think that every data scientist

01:42:29 working with deep learning needs to recognize

01:42:33 they have an incredibly high leverage tool

01:42:35 that they’re using that can influence society

01:42:37 in lots of ways.

01:42:39 And if they’re doing research,

01:42:40 that that research is gonna be used by people

01:42:42 doing this kind of work.

01:42:44 And they have a responsibility to consider the consequences

01:42:48 and to think about things like

01:42:51 how will humans be in the loop here?

01:42:53 How do we avoid runaway feedback loops?

01:42:56 How do we ensure an appeals process for humans

01:42:59 that are impacted by my algorithm?

01:43:01 How do I ensure that the constraints of my algorithm

01:43:04 are adequately explained to the people

01:43:06 that end up using them?

01:43:09 There’s all kinds of human issues

01:43:11 which only data scientists are actually

01:43:15 in the right place to educate people are about,

01:43:17 but data scientists tend to think of themselves

01:43:20 as just engineers and that they don’t need

01:43:23 to be part of that process, which is wrong.

01:43:26 Well, you’re in the perfect position to educate them better,

01:43:30 to read literature, to read history, to learn from history.

01:43:35 Well, Jeremy, thank you so much for everything you do

01:43:39 for inspiring huge amount of people,

01:43:41 getting them into deep learning

01:43:42 and having the ripple effects,

01:43:45 the flap of a butterfly’s wings

01:43:47 that will probably change the world.

01:43:48 So thank you very much.

01:43:50 Thank you, thank you, thank you, thank you.