Transcript
00:00:00 The following is a conversation with Jeremy Howard.
00:00:03 He’s the founder of FastAI, a research institute dedicated
00:00:07 to making deep learning more accessible.
00:00:09 He’s also a distinguished research scientist
00:00:12 at the University of San Francisco,
00:00:14 a former president of Kaggle,
00:00:16 as well as a top ranking competitor there.
00:00:18 And in general, he’s a successful entrepreneur,
00:00:21 educator, researcher, and an inspiring personality
00:00:25 in the AI community.
00:00:27 When someone asks me, how do I get started with deep learning?
00:00:30 FastAI is one of the top places that point them to.
00:00:33 It’s free, it’s easy to get started,
00:00:35 it’s insightful and accessible,
00:00:37 and if I may say so, it has very little BS
00:00:40 that can sometimes dilute the value of educational content
00:00:44 on popular topics like deep learning.
00:00:46 FastAI has a focus on practical application of deep learning
00:00:50 and hands on exploration of the cutting edge
00:00:52 that is incredibly both accessible to beginners
00:00:56 and useful to experts.
00:00:57 This is the Artificial Intelligence Podcast.
00:01:01 If you enjoy it, subscribe on YouTube,
00:01:03 give it five stars on iTunes,
00:01:05 support it on Patreon,
00:01:06 or simply connect with me on Twitter
00:01:09 at Lex Friedman, spelled F R I D M A N.
00:01:13 And now, here’s my conversation with Jeremy Howard.
00:01:18 What’s the first program you ever written?
00:01:21 First program I wrote that I remember
00:01:24 would be at high school.
00:01:29 I did an assignment where I decided
00:01:31 to try to find out if there were some better musical scales
00:01:36 than the normal 12 tone, 12 interval scale.
00:01:40 So I wrote a program on my Commodore 64 in basic
00:01:43 that searched through other scale sizes
00:01:46 to see if it could find one
00:01:47 where there were more accurate harmonies.
00:01:51 Like mid tone?
00:01:53 Like you want an actual exactly three to two ratio
00:01:56 or else with a 12 interval scale,
00:01:59 it’s not exactly three to two, for example.
00:02:01 So that’s well tempered as they say in there.
00:02:05 And basic on a Commodore 64.
00:02:07 Where was the interest in music from?
00:02:09 Or is it just technical?
00:02:10 I did music all my life.
00:02:12 So I played saxophone and clarinet and piano
00:02:15 and guitar and drums and whatever.
00:02:18 How does that thread go through your life?
00:02:22 Where’s music today?
00:02:24 It’s not where I wish it was.
00:02:28 For various reasons, couldn’t really keep it going,
00:02:30 particularly because I had a lot of problems
00:02:31 with RSI with my fingers.
00:02:33 And so I had to kind of like cut back anything
00:02:35 that used hands and fingers.
00:02:39 I hope one day I’ll be able to get back to it health wise.
00:02:43 So there’s a love for music underlying it all.
00:02:46 Yeah.
00:02:47 What’s your favorite instrument?
00:02:49 Saxophone.
00:02:50 Sax.
00:02:51 Or baritone saxophone.
00:02:52 Well, probably bass saxophone, but they’re awkward.
00:02:57 Well, I always love it when music
00:03:00 is coupled with programming.
00:03:01 There’s something about a brain that utilizes those
00:03:04 that emerges with creative ideas.
00:03:07 So you’ve used and studied quite a few programming languages.
00:03:11 Can you give an overview of what you’ve used?
00:03:15 What are the pros and cons of each?
00:03:17 Well, my favorite programming environment,
00:03:20 well, most certainly was Microsoft Access
00:03:24 back in like the earliest days.
00:03:26 So that was Visual Basic for applications,
00:03:28 which is not a good programming language,
00:03:30 but the programming environment was fantastic.
00:03:33 It’s like the ability to create, you know,
00:03:38 user interfaces and tie data and actions to them
00:03:41 and create reports and all that
00:03:43 as I’ve never seen anything as good.
00:03:46 There’s things nowadays like Airtable,
00:03:48 which are like small subsets of that,
00:03:54 which people love for good reason,
00:03:56 but unfortunately, nobody’s ever achieved
00:04:00 anything like that.
00:04:01 What is that?
00:04:01 If you could pause on that for a second.
00:04:03 Oh, Access?
00:04:04 Is it a database?
00:04:06 It was a database program that Microsoft produced,
00:04:09 part of Office, and they kind of withered, you know,
00:04:13 but basically it lets you in a totally graphical way
00:04:16 create tables and relationships and queries
00:04:18 and tie them to forms and set up, you know,
00:04:21 event handlers and calculations.
00:04:24 And it was a very complete powerful system
00:04:28 designed for not massive scalable things,
00:04:31 but for like useful little applications that I loved.
00:04:36 So what’s the connection between Excel and Access?
00:04:40 So very close.
00:04:42 So Access kind of was the relational database equivalent,
00:04:47 if you like.
00:04:47 So people still do a lot of that stuff
00:04:50 that should be in Access in Excel as they know it.
00:04:53 Excel’s great as well.
00:04:54 So, but it’s just not as rich a programming model
00:04:59 as VBA combined with a relational database.
00:05:04 And so I’ve always loved relational databases,
00:05:06 but today programming on top of relational database
00:05:10 is just a lot more of a headache.
00:05:13 You know, you generally either need to kind of,
00:05:15 you know, you need something that connects,
00:05:18 that runs some kind of database server
00:05:19 unless you use SQLite, which has its own issues.
00:05:25 Then you kind of often,
00:05:25 if you want to get a nice programming model,
00:05:27 you’ll need to like create an, add an ORM on top.
00:05:30 And then, I don’t know,
00:05:31 there’s all these pieces to tie together
00:05:34 and it’s just a lot more awkward than it should be.
00:05:37 There are people that are trying to make it easier.
00:05:39 So in particular, I think of F sharp, you know, Don Syme,
00:05:42 who, him and his team have done a great job
00:05:45 of making something like a database appear
00:05:50 in the type system.
00:05:51 So you actually get like tab completion for fields
00:05:54 and tables and stuff like that.
00:05:57 Anyway, so that was kind of, anyway,
00:05:59 so like that whole VBA office thing, I guess,
00:06:01 was a starting point, which I still miss.
00:06:04 And I got into standard Visual Basic, which…
00:06:07 That’s interesting, just to pause on that for a second.
00:06:09 It’s interesting that you’re connecting programming languages
00:06:13 to the ease of management of data.
00:06:17 Yeah.
00:06:18 So in your use of programming languages,
00:06:20 you always had a love and a connection with data.
00:06:24 I’ve always been interested in doing useful things
00:06:28 for myself and for others,
00:06:29 which generally means getting some data
00:06:31 and doing something with it and putting it out there again.
00:06:34 So that’s been my interest throughout.
00:06:38 So I also did a lot of stuff with AppleScript
00:06:41 back in the early days.
00:06:43 So it’s kind of nice being able to get the computer
00:06:48 and computers to talk to each other
00:06:50 and to do things for you.
00:06:52 And then I think that one,
00:06:54 the programming language I most loved then
00:06:58 would have been Delphi, which was Object Pascal,
00:07:02 created by Anders Heilsberg,
00:07:04 who previously did Turbo Pascal
00:07:07 and then went on to create.NET
00:07:08 and then went on to create TypeScript.
00:07:11 Delphi was amazing because it was like a compiled,
00:07:14 fast language that was as easy to use as Visual Basic.
00:07:20 Delphi, what is it similar to in more modern languages?
00:07:27 Visual Basic.
00:07:28 Visual Basic.
00:07:29 Yeah, but a compiled, fast version.
00:07:32 So I’m not sure there’s anything quite like it anymore.
00:07:37 If you took like C Sharp or Java
00:07:40 and got rid of the virtual machine
00:07:42 and replaced it with something,
00:07:43 you could compile a small type binary.
00:07:46 I feel like it’s where Swift could get to
00:07:50 with the new Swift UI
00:07:52 and the cross platform development going on.
00:07:56 Like that’s one of my dreams
00:07:59 is that we’ll hopefully get back to where Delphi was.
00:08:02 There is actually a free Pascal project nowadays
00:08:08 called Lazarus,
00:08:09 which is also attempting to kind of recreate Delphi.
00:08:13 So they’re making good progress.
00:08:16 So, okay, Delphi,
00:08:18 that’s one of your favorite programming languages.
00:08:20 Well, it’s programming environments.
00:08:22 Again, I’d say Pascal’s not a nice language.
00:08:26 If you wanted to know specifically
00:08:27 about what languages I like,
00:08:29 I would definitely pick J as being an amazingly wonderful
00:08:33 language.
00:08:35 What’s J?
00:08:37 J, are you aware of APL?
00:08:39 I am not, except from doing a little research
00:08:42 on the work you’ve done.
00:08:44 Okay, so not at all surprising you’re not familiar with it
00:08:48 because it’s not well known,
00:08:49 but it’s actually one of the main families
00:08:54 of programming languages going back to the late 50s,
00:08:57 early 60s.
00:08:57 So there was a couple of major directions.
00:09:01 One was the kind of Lambda Calculus Alonzo Church direction,
00:09:06 which I guess kind of lisp and scheme and whatever,
00:09:09 which has a history going back
00:09:12 to the early days of computing.
00:09:13 The second was the kind of imperative slash OO,
00:09:18 algo similar going on to C, C++ and so forth.
00:09:23 There was a third,
00:09:24 which are called array oriented languages,
00:09:26 which started with a paper by a guy called Ken Iverson,
00:09:31 which was actually a math theory paper,
00:09:35 not a programming paper.
00:09:37 It was called Notation as a Tool for Thought.
00:09:41 And it was the development of a new way,
00:09:43 a new type of math notation.
00:09:45 And the idea is that this math notation
00:09:47 was much more flexible, expressive,
00:09:51 and also well defined than traditional math notation,
00:09:55 which is none of those things.
00:09:56 Math notation is awful.
00:09:59 And so he actually turned that into a programming language
00:10:02 and cause this was the early 50s or the sorry, late 50s,
00:10:05 all the names were available.
00:10:06 So he called his language a programming language or APL.
00:10:10 APL.
00:10:11 So APL is a implementation of notation
00:10:15 as a tool for thought by which he means math notation.
00:10:18 And Ken and his son went on to do many things,
00:10:22 but eventually they actually produced a new language
00:10:26 that was built on top of all the learnings of APL.
00:10:28 And that was called J.
00:10:30 And J is the most expressive, composable language
00:10:39 of beautifully designed language I’ve ever seen.
00:10:42 Does it have object oriented components?
00:10:44 Does it have that kind of thing?
00:10:45 Not really, it’s an array oriented language.
00:10:47 It’s the third path.
00:10:51 Are you saying array?
00:10:52 Array oriented, yeah.
00:10:53 What does it mean to be array oriented?
00:10:55 So array oriented means that you generally
00:10:57 don’t use any loops,
00:10:59 but the whole thing is done with kind of
00:11:02 a extreme version of broadcasting,
00:11:06 if you’re familiar with that NumPy slash Python concept.
00:11:09 So you do a lot with one line of code.
00:11:14 It looks a lot like math notation, highly compact.
00:11:19 And the idea is that you can kind of,
00:11:22 because you can do so much with one line of code,
00:11:24 a single screen of code is very unlikely to,
00:11:27 you very rarely need more than that
00:11:29 to express your program.
00:11:31 And so you can kind of keep it all in your head
00:11:33 and you can kind of clearly communicate it.
00:11:36 It’s interesting that APL created two main branches,
00:11:40 K and J.
00:11:41 J is this kind of like open source,
00:11:44 niche community of crazy enthusiasts like me.
00:11:49 And then the other path, K, was fascinating.
00:11:52 It’s an astonishingly expensive programming language,
00:11:56 which many of the world’s
00:11:58 most ludicrously rich hedge funds use.
00:12:02 So the entire K machine is so small
00:12:06 it sits inside level three cache on your CPU.
00:12:09 And it easily wins every benchmark I’ve ever seen
00:12:14 in terms of data processing speed.
00:12:16 But you don’t come across it very much
00:12:17 because it’s like $100,000 per CPU to run it.
00:12:22 It’s like this path of programming languages
00:12:26 is just so much, I don’t know,
00:12:28 so much more powerful in every way
00:12:30 than the ones that almost anybody uses every day.
00:12:33 So it’s all about computation.
00:12:37 It’s really focused on computation.
00:12:38 It’s pretty heavily focused on computation.
00:12:40 I mean, so much of programming
00:12:43 is data processing by definition.
00:12:45 So there’s a lot of things you can do with it.
00:12:48 But yeah, there’s not much work being done
00:12:51 on making like user interface toolkits or whatever.
00:12:57 I mean, there’s some, but they’re not great.
00:12:59 At the same time, you’ve done a lot of stuff
00:13:00 with Perl and Python.
00:13:03 So where does that fit into the picture of J and K and APL?
00:13:08 Well, it’s just much more pragmatic.
00:13:11 Like in the end, you kind of have to end up
00:13:13 where the libraries are, you know?
00:13:17 Like, cause to me, my focus is on productivity.
00:13:21 I just want to get stuff done and solve problems.
00:13:23 So Perl was great.
00:13:27 I created an email company called FastMail
00:13:29 and Perl was great cause back in the late nineties,
00:13:32 early two thousands, it just had a lot of stuff it could do.
00:13:38 I still had to write my own monitoring system
00:13:41 and my own web framework, my own whatever,
00:13:43 cause like none of that stuff existed.
00:13:45 But it was a super flexible language to do that in.
00:13:50 And you used Perl for FastMail, you used it as a backend?
00:13:54 Like so everything was written in Perl?
00:13:55 Yeah, yeah, everything, everything was Perl.
00:13:58 Why do you think Perl hasn’t succeeded
00:14:02 or hasn’t dominated the market where Python
00:14:05 really takes over a lot of the tasks?
00:14:07 Well, I mean, Perl did dominate.
00:14:09 It was everything, everywhere,
00:14:13 but then the guy that ran Perl, Larry Wohl,
00:14:17 kind of just didn’t put the time in anymore.
00:14:22 And no project can be successful if there isn’t,
00:14:28 you know, particularly one that started with a strong leader
00:14:31 that loses that strong leadership.
00:14:35 So then Python has kind of replaced it.
00:14:37 You know, Python is a lot less elegant language
00:14:43 in nearly every way,
00:14:45 but it has the data science libraries
00:14:48 and a lot of them are pretty great.
00:14:51 So I kind of use it
00:14:56 cause it’s the best we have,
00:14:58 but it’s definitely not good enough.
00:15:01 But what do you think the future of programming looks like?
00:15:04 What do you hope the future of programming looks like
00:15:06 if we zoom in on the computational fields,
00:15:08 on data science, on machine learning?
00:15:11 I hope Swift is successful
00:15:15 because the goal of Swift,
00:15:19 the way Chris Latner describes it,
00:15:21 is to be infinitely hackable.
00:15:22 And that’s what I want.
00:15:23 I want something where me and the people I do research with
00:15:26 and my students can look at
00:15:29 and change everything from top to bottom.
00:15:32 There’s nothing mysterious and magical and inaccessible.
00:15:36 Unfortunately with Python, it’s the opposite of that
00:15:38 because Python is so slow.
00:15:40 It’s extremely unhackable.
00:15:42 You get to a point where it’s like,
00:15:43 okay, from here on down at C.
00:15:45 So your debugger doesn’t work in the same way.
00:15:47 Your profiler doesn’t work in the same way.
00:15:48 Your build system doesn’t work in the same way.
00:15:50 It’s really not very hackable at all.
00:15:53 What’s the part you like to be hackable?
00:15:55 Is it for the objective of optimizing training
00:16:00 of neural networks, inference of neural networks?
00:16:02 Is it performance of the system
00:16:04 or is there some non performance related, just?
00:16:07 It’s everything.
00:16:09 I mean, in the end, I want to be productive
00:16:11 as a practitioner.
00:16:13 So that means that, so like at the moment,
00:16:16 our understanding of deep learning is incredibly primitive.
00:16:20 There’s very little we understand.
00:16:21 Most things don’t work very well,
00:16:23 even though it works better than anything else out there.
00:16:26 There’s so many opportunities to make it better.
00:16:28 So you look at any domain area,
00:16:31 like, I don’t know, speech recognition with deep learning
00:16:35 or natural language processing classification
00:16:38 with deep learning or whatever.
00:16:39 Every time I look at an area with deep learning,
00:16:41 I always see like, oh, it’s terrible.
00:16:44 There’s lots and lots of obviously stupid ways
00:16:47 to do things that need to be fixed.
00:16:50 So then I want to be able to jump in there
00:16:51 and quickly experiment and make them better.
00:16:54 You think the programming language has a role in that?
00:16:59 Huge role, yeah.
00:17:00 So currently, Python has a big gap
00:17:05 in terms of our ability to innovate,
00:17:09 particularly around recurrent neural networks
00:17:11 and natural language processing.
00:17:14 Because it’s so slow, the actual loop
00:17:18 where we actually loop through words,
00:17:20 we have to do that whole thing in CUDA C.
00:17:23 So we actually can’t innovate with the kernel,
00:17:27 the heart of that most important algorithm.
00:17:31 And it’s just a huge problem.
00:17:33 And this happens all over the place.
00:17:36 So we hit research limitations.
00:17:40 Another example, convolutional neural networks,
00:17:42 which are actually the most popular architecture
00:17:44 for lots of things, maybe most things in deep learning.
00:17:48 We almost certainly should be using
00:17:50 sparse convolutional neural networks,
00:17:52 but only like two people are,
00:17:55 because to do it, you have to rewrite
00:17:57 all of that CUDA C level stuff.
00:17:59 And yeah, just researchers and practitioners don’t.
00:18:04 So there’s just big gaps in what people actually research on,
00:18:09 what people actually implement
00:18:10 because of the programming language problem.
00:18:13 So you think it’s just too difficult to write in CUDA C
00:18:20 that a higher level programming language like Swift
00:18:24 should enable the easier,
00:18:30 fooling around creative stuff with RNNs
00:18:33 or with sparse convolutional neural networks?
00:18:34 Kind of.
00:18:35 Who’s at fault?
00:18:37 Who’s at charge of making it easy
00:18:41 for a researcher to play around?
00:18:42 I mean, no one’s at fault,
00:18:43 just nobody’s got around to it yet,
00:18:45 or it’s just, it’s hard, right?
00:18:46 And I mean, part of the fault is that we ignored
00:18:49 that whole APL kind of direction.
00:18:53 Nearly everybody did for 60 years, 50 years.
00:18:57 But recently people have been starting to
00:19:01 reinvent pieces of that
00:19:03 and kind of create some interesting new directions
00:19:05 in the compiler technology.
00:19:07 So the place where that’s particularly happening right now
00:19:11 is something called MLIR,
00:19:13 which is something that, again,
00:19:14 Chris Latina, the Swift guy, is leading.
00:19:18 And yeah, because it’s actually not gonna be Swift
00:19:20 on its own that solves this problem,
00:19:22 because the problem is that currently writing
00:19:24 a acceptably fast, you know, GPU program
00:19:30 is too complicated regardless of what language you use.
00:19:33 Right.
00:19:36 And that’s just because if you have to deal with the fact
00:19:38 that I’ve got, you know, 10,000 threads
00:19:41 and I have to synchronize between them all
00:19:43 and I have to put my thing into grid blocks
00:19:45 and think about warps and all this stuff,
00:19:47 it’s just so much boilerplate that to do that well,
00:19:50 you have to be a specialist at that
00:19:52 and it’s gonna be a year’s work to, you know,
00:19:56 optimize that algorithm in that way.
00:19:59 But with things like tensor comprehensions
00:20:03 and TILE and MLIR and TVM,
00:20:07 there’s all these various projects
00:20:08 which are all about saying,
00:20:10 let’s let people create like domain specific languages
00:20:14 for tensor computations.
00:20:16 These are the kinds of things we do generally
00:20:19 on the GPU for deep learning and then have a compiler
00:20:22 which can optimize that tensor computation.
00:20:28 A lot of this work is actually sitting
00:20:29 on top of a project called Halide,
00:20:32 which is a mind blowing project where they came up
00:20:37 with such a domain specific language.
00:20:38 In fact, two, one domain specific language for expressing
00:20:41 this is what my tensor computation is
00:20:43 and another domain specific language for expressing
00:20:46 this is the kind of the way I want you to structure
00:20:50 the compilation of that and like do it block by block
00:20:53 and do these bits in parallel.
00:20:54 And they were able to show how you can compress
00:20:57 the amount of code by 10X compared to optimized GPU code
00:21:03 and get the same performance.
00:21:05 So that’s like, so these other things are kind of sitting
00:21:08 on top of that kind of research and MLIR is pulling a lot
00:21:12 of those best practices together.
00:21:15 And now we’re starting to see work done on making all
00:21:18 of that directly accessible through Swift
00:21:21 so that I could use Swift to kind of write those
00:21:23 domain specific languages and hopefully we’ll get
00:21:27 then Swift CUDA kernels written in a very expressive
00:21:30 and concise way that looks a bit like J and APL
00:21:34 and then Swift layers on top of that
00:21:36 and then a Swift UI on top of that.
00:21:38 And it’ll be so nice if we can get to that point.
00:21:42 Now does it all eventually boil down to CUDA
00:21:46 and NVIDIA GPUs?
00:21:48 Unfortunately at the moment it does,
00:21:50 but one of the nice things about MLIR if AMD ever
00:21:54 gets their act together which they probably won’t
00:21:56 is that they or others could write MLIR backends
00:22:02 for other GPUs or rather tensor computation devices
00:22:09 of which today there are increasing number
00:22:11 like Graph Core or Vertex AI or whatever.
00:22:18 So yeah, being able to target lots of backends
00:22:22 would be another benefit of this
00:22:23 and the market really needs competition
00:22:26 because at the moment NVIDIA is massively overcharging
00:22:29 for their kind of enterprise class cards
00:22:33 because there is no serious competition
00:22:36 because nobody else is doing the software properly.
00:22:39 In the cloud there is some competition, right?
00:22:41 But…
00:22:42 Not really, other than TPUs perhaps,
00:22:45 but TPUs are almost unprogrammable at the moment.
00:22:48 So TPUs have the same problem that you can’t?
00:22:51 It’s even worse.
00:22:52 So TPUs, Google actually made an explicit decision
00:22:54 to make them almost entirely unprogrammable
00:22:57 because they felt that there was too much IP in there
00:22:59 and if they gave people direct access to program them,
00:23:02 people would learn their secrets.
00:23:04 So you can’t actually directly program the memory
00:23:09 in a TPU.
00:23:11 You can’t even directly create code that runs on
00:23:15 and that you look at on the machine that has the TPU,
00:23:18 it all goes through a virtual machine.
00:23:19 So all you can really do is this kind of cookie cutter thing
00:23:22 of like plug in high level stuff together,
00:23:26 which is just super tedious and annoying
00:23:30 and totally unnecessary.
00:23:32 So what was the, tell me if you could,
00:23:36 the origin story of fast AI.
00:23:38 What is the motivation, its mission, its dream?
00:23:43 So I guess the founding story is heavily tied
00:23:48 to my previous startup, which is a company called Analytic,
00:23:51 which was the first company to focus on deep learning
00:23:54 for medicine and I created that because I saw
00:23:58 that was a huge opportunity to,
00:24:02 there’s about a 10X shortage of the number of doctors
00:24:05 in the world, in the developing world that we need.
00:24:08 I expected it would take about 300 years
00:24:11 to train enough doctors to meet that gap.
00:24:13 But I guess that maybe if we used deep learning
00:24:19 for some of the analytics, we could maybe make it
00:24:22 so you don’t need as highly trained doctors.
00:24:25 For diagnosis.
00:24:26 For diagnosis and treatment planning.
00:24:27 Where’s the biggest benefit just before we get to fast AI,
00:24:31 where’s the biggest benefit of AI
00:24:33 and medicine that you see today?
00:24:36 And maybe next time.
00:24:37 Not much happening today in terms of like stuff
00:24:39 that’s actually out there, it’s very early.
00:24:41 But in terms of the opportunity,
00:24:42 it’s to take markets like India and China and Indonesia,
00:24:48 which have big populations, Africa,
00:24:52 small numbers of doctors,
00:24:55 and provide diagnostic, particularly treatment planning
00:25:00 and triage kind of on device so that if you do a test
00:25:05 for malaria or tuberculosis or whatever,
00:25:09 you immediately get something that even a healthcare worker
00:25:12 that’s had a month of training can get
00:25:16 a very high quality assessment of whether the patient
00:25:20 might be at risk and tell, okay,
00:25:22 we’ll send them off to a hospital.
00:25:25 So for example, in Africa, outside of South Africa,
00:25:29 there’s only five pediatric radiologists
00:25:31 for the entire continent.
00:25:32 So most countries don’t have any.
00:25:34 So if your kid is sick and they need something diagnosed
00:25:37 through medical imaging, the person,
00:25:39 even if you’re able to get medical imaging done,
00:25:41 the person that looks at it will be a nurse at best.
00:25:46 But actually in India, for example, and China,
00:25:50 almost no x rays are read by anybody,
00:25:52 by any trained professional because they don’t have enough.
00:25:57 So if instead we had a algorithm that could take
00:26:02 the most likely high risk 5% and say triage,
00:26:08 basically say, okay, someone needs to look at this,
00:26:11 it would massively change the kind of way
00:26:14 that what’s possible with medicine in the developing world.
00:26:18 And remember, they have, increasingly they have money.
00:26:21 They’re the developing world, they’re not the poor world,
00:26:23 they’re the developing world.
00:26:24 So they have the money.
00:26:25 So they’re building the hospitals,
00:26:27 they’re getting the diagnostic equipment,
00:26:30 but there’s no way for a very long time
00:26:33 will they be able to have the expertise.
00:26:37 Shortage of expertise, okay.
00:26:38 And that’s where the deep learning systems can step in
00:26:41 and magnify the expertise they do have.
00:26:44 Exactly, yeah.
00:26:46 So you do see, just to linger a little bit longer,
00:26:51 the interaction, do you still see the human experts
00:26:55 still at the core of these systems?
00:26:57 Yeah, absolutely.
00:26:58 Is there something in medicine
00:26:59 that could be automated almost completely?
00:27:01 I don’t see the point of even thinking about that
00:27:03 because we have such a shortage of people.
00:27:06 Why would we want to find a way not to use them?
00:27:09 We have people, so the idea of like,
00:27:13 even from an economic point of view,
00:27:14 if you can make them 10X more productive,
00:27:17 getting rid of the person,
00:27:18 doesn’t impact your unit economics at all.
00:27:21 And it totally ignores the fact
00:27:23 that there are things people do better than machines.
00:27:26 So it’s just to me,
00:27:27 that’s not a useful way of framing the problem.
00:27:32 I guess, just to clarify,
00:27:33 I guess I meant there may be some problems
00:27:36 where you can avoid even going to the expert ever,
00:27:40 sort of maybe preventative care or some basic stuff,
00:27:44 allowing food,
00:27:44 allowing the expert to focus on the things
00:27:46 that are really that, you know.
00:27:49 Well, that’s what the triage would do, right?
00:27:50 So the triage would say,
00:27:52 okay, there’s 99% sure there’s nothing here.
00:27:58 So that can be done on device
00:28:01 and they can just say, okay, go home.
00:28:03 So the experts are being used to look at the stuff
00:28:07 which has some chance it’s worth looking at,
00:28:10 which most things it’s not, it’s fine.
00:28:14 Why do you think that is?
00:28:15 You know, it’s fine.
00:28:16 Why do you think we haven’t quite made progress on that yet
00:28:19 in terms of the scale of how much AI is applied
00:28:27 in the medical field?
00:28:27 Oh, there’s a lot of reasons.
00:28:28 I mean, one is it’s pretty new.
00:28:29 I only started in Liddick in like 2014.
00:28:32 And before that, it’s hard to express
00:28:36 to what degree the medical world
00:28:37 was not aware of the opportunities here.
00:28:40 So I went to RSNA,
00:28:42 which is the world’s largest radiology conference.
00:28:46 And I told everybody I could, you know,
00:28:49 like I’m doing this thing with deep learning,
00:28:51 please come and check it out.
00:28:53 And no one had any idea what I was talking about
00:28:56 and no one had any interest in it.
00:28:59 So like we’ve come from absolute zero, which is hard.
00:29:05 And then the whole regulatory framework, education system,
00:29:09 everything is just set up to think of doctoring
00:29:13 in a very different way.
00:29:14 So today there is a small number of people
00:29:17 who are deep learning practitioners
00:29:20 and doctors at the same time.
00:29:23 And we’re starting to see the first ones
00:29:24 come out of their PhD programs.
00:29:26 So Zach Kahane over in Boston, Cambridge
00:29:31 has a number of students now who are data science experts,
00:29:37 deep learning experts, and actual medical doctors.
00:29:43 Quite a few doctors have completed our fast AI course now
00:29:47 and are publishing papers and creating journal reading groups
00:29:52 in the American Council of Radiology.
00:29:55 And like, it’s just starting to happen,
00:29:57 but it’s gonna be a long time coming.
00:29:59 It’s gonna happen, but it’s gonna be a long process.
00:30:02 The regulators have to learn how to regulate this.
00:30:04 They have to build guidelines.
00:30:08 And then the lawyers at hospitals
00:30:12 have to develop a new way of understanding
00:30:15 that sometimes it makes sense for data to be looked at
00:30:22 in raw form in large quantities
00:30:24 in order to create well changing results.
00:30:27 Yeah, so the regulation around data, all that,
00:30:30 it sounds probably the hardest problem,
00:30:33 but sounds reminiscent of autonomous vehicles as well.
00:30:36 Many of the same regulatory challenges,
00:30:38 many of the same data challenges.
00:30:40 Yeah, I mean, funnily enough,
00:30:41 the problem is less the regulation
00:30:43 and more the interpretation of that regulation
00:30:45 by lawyers in hospitals.
00:30:48 So HIPAA is actually, was designed to pay,
00:30:52 and HIPAA does not stand for privacy.
00:30:56 It stands for portability.
00:30:57 It’s actually meant to be a way that data can be used.
00:31:01 And it was created with lots of gray areas
00:31:04 because the idea is that would be more practical
00:31:06 and it would help people to use this legislation
00:31:10 to actually share data in a more thoughtful way.
00:31:13 Unfortunately, it’s done the opposite
00:31:15 because when a lawyer sees a gray area,
00:31:17 they say, oh, if we don’t know, we won’t get sued,
00:31:20 then we can’t do it.
00:31:22 So HIPAA is not exactly the problem.
00:31:26 The problem is more that there’s,
00:31:29 hospital lawyers are not incented
00:31:31 to make bold decisions about data portability.
00:31:36 Or even to embrace technology that saves lives.
00:31:40 They more want to not get in trouble
00:31:42 for embracing that technology.
00:31:44 It also saves lives in a very abstract way,
00:31:47 which is like, oh, we’ve been able to release
00:31:49 these 100,000 anonymized records.
00:31:52 I can’t point to the specific person
00:31:54 whose life that saved.
00:31:55 I can say like, oh, we ended up with this paper
00:31:57 which found this result,
00:31:58 which diagnosed a thousand more people
00:32:02 than we would have otherwise,
00:32:03 but it’s like, which ones were helped?
00:32:05 It’s very abstract.
00:32:07 And on the counter side of that,
00:32:09 you may be able to point to a life that was taken
00:32:13 because of something that was.
00:32:14 Yeah, or a person whose privacy was violated.
00:32:18 It’s like, oh, this specific person was deidentified.
00:32:24 So, identified.
00:32:25 Just a fascinating topic.
00:32:27 We’re jumping around.
00:32:28 We’ll get back to fast AI,
00:32:29 but on the question of privacy,
00:32:32 data is the fuel for so much innovation in deep learning.
00:32:38 What’s your sense on privacy?
00:32:39 Whether we’re talking about Twitter, Facebook, YouTube,
00:32:44 just the technologies like in the medical field
00:32:48 that rely on people’s data in order to create impact.
00:32:53 How do we get that right,
00:32:56 respecting people’s privacy and yet creating technology
00:33:01 that is learning from data?
00:33:03 One of my areas of focus is on doing more with less data.
00:33:08 More with less data, which,
00:33:11 so most vendors, unfortunately,
00:33:14 are strongly incented to find ways
00:33:17 to require more data and more computation.
00:33:20 So, Google and IBM being the most obvious.
00:33:24 IBM.
00:33:25 Yeah, so Watson.
00:33:27 So, Google and IBM both strongly push the idea
00:33:31 that you have to be,
00:33:33 that they have more data and more computation
00:33:35 and more intelligent people than anybody else.
00:33:37 And so you have to trust them to do things
00:33:39 because nobody else can do it.
00:33:42 And Google’s very upfront about this,
00:33:45 like Jeff Dean has gone out there and given talks
00:33:48 and said, our goal is to require
00:33:50 a thousand times more computation, but less people.
00:33:55 Our goal is to use the people that you have better
00:34:00 and the data you have better
00:34:01 and the computation you have better.
00:34:03 So, one of the things that we’ve discovered is,
00:34:06 or at least highlighted,
00:34:08 is that you very, very, very often
00:34:11 don’t need much data at all.
00:34:13 And so the data you already have in your organization
00:34:16 will be enough to get state of the art results.
00:34:19 So, like my starting point would be to kind of say
00:34:21 around privacy is a lot of people are looking for ways
00:34:25 to share data and aggregate data,
00:34:28 but I think often that’s unnecessary.
00:34:29 They assume that they need more data than they do
00:34:32 because they’re not familiar with the basics
00:34:34 of transfer learning, which is this critical technique
00:34:38 for needing orders of magnitude less data.
00:34:42 Is your sense, one reason you might wanna collect data
00:34:44 from everyone is like in the recommender system context,
00:34:50 where your individual, Jeremy Howard’s individual data
00:34:54 is the most useful for providing a product
00:34:58 that’s impactful for you.
00:34:59 So, for giving you advertisements,
00:35:02 for recommending to you movies,
00:35:04 for doing medical diagnosis,
00:35:07 is your sense we can build with a small amount of data,
00:35:11 general models that will have a huge impact
00:35:15 for most people that we don’t need to have data
00:35:18 from each individual?
00:35:19 On the whole, I’d say yes.
00:35:20 I mean, there are things like,
00:35:25 you know, recommender systems have this cold start problem
00:35:28 where, you know, Jeremy is a new customer,
00:35:30 we haven’t seen him before, so we can’t recommend him things
00:35:33 based on what else he’s bought and liked with us.
00:35:36 And there’s various workarounds to that.
00:35:38 Like in a lot of music programs,
00:35:40 we’ll start out by saying, which of these artists do you like?
00:35:44 Which of these albums do you like?
00:35:46 Which of these songs do you like?
00:35:49 Netflix used to do that, nowadays they tend not to.
00:35:53 People kind of don’t like that
00:35:54 because they think, oh, we don’t wanna bother the user.
00:35:57 So, you could work around that
00:35:58 by having some kind of data sharing
00:36:00 where you get my marketing record from Axiom or whatever,
00:36:04 and try to guess from that.
00:36:06 To me, the benefit to me and to society
00:36:12 of saving me five minutes on answering some questions
00:36:16 versus the negative externalities of the privacy issue
00:36:23 doesn’t add up.
00:36:24 So, I think like a lot of the time,
00:36:26 the places where people are invading our privacy
00:36:30 in order to provide convenience
00:36:32 is really about just trying to make them more money
00:36:36 and they move these negative externalities
00:36:40 to places that they don’t have to pay for them.
00:36:44 So, when you actually see regulations appear
00:36:48 that actually cause the companies
00:36:50 that create these negative externalities
00:36:52 to have to pay for it themselves,
00:36:53 they say, well, we can’t do it anymore.
00:36:56 So, the cost is actually too high.
00:36:58 But for something like medicine,
00:37:00 yeah, I mean, the hospital has my medical imaging,
00:37:05 my pathology studies, my medical records,
00:37:08 and also I own my medical data.
00:37:11 So, you can, so I help a startup called Doc.ai.
00:37:16 One of the things Doc.ai does is that it has an app.
00:37:19 You can connect to, you know, Sutter Health
00:37:23 and LabCorp and Walgreens
00:37:26 and download your medical data to your phone
00:37:29 and then upload it again at your discretion
00:37:33 to share it as you wish.
00:37:35 So, with that kind of approach,
00:37:38 we can share our medical information
00:37:41 with the people we want to.
00:37:44 Yeah, so control.
00:37:45 I mean, really being able to control
00:37:47 who you share it with and so on.
00:37:48 Yeah.
00:37:49 So, that has a beautiful, interesting tangent
00:37:53 to return back to the origin story of Fast.ai.
00:37:59 Right, so before I started Fast.ai,
00:38:02 I spent a year researching
00:38:06 where are the biggest opportunities for deep learning?
00:38:10 Because I knew from my time at Kaggle in particular
00:38:14 that deep learning had kind of hit this threshold point
00:38:16 where it was rapidly becoming the state of the art approach
00:38:19 in every area that looked at it.
00:38:21 And I’d been working with neural nets for over 20 years.
00:38:25 I knew that from a theoretical point of view,
00:38:27 once it hit that point,
00:38:28 it would do that in kind of just about every domain.
00:38:31 And so I kind of spent a year researching
00:38:34 what are the domains that’s gonna have
00:38:36 the biggest low hanging fruit
00:38:37 in the shortest time period.
00:38:39 I picked medicine, but there were so many
00:38:42 I could have picked.
00:38:43 And so there was a kind of level of frustration for me
00:38:46 of like, okay, I’m really glad we’ve opened up
00:38:49 the medical deep learning world.
00:38:51 And today it’s huge, as you know,
00:38:53 but we can’t do, I can’t do everything.
00:38:58 I don’t even know, like in medicine,
00:39:00 it took me a really long time to even get a sense
00:39:02 of like what kind of problems do medical practitioners solve?
00:39:05 What kind of data do they have?
00:39:06 Who has that data?
00:39:08 So I kind of felt like I need to approach this differently
00:39:12 if I wanna maximize the positive impact of deep learning.
00:39:16 Rather than me picking an area
00:39:19 and trying to become good at it and building something,
00:39:21 I should let people who are already domain experts
00:39:24 in those areas and who already have the data
00:39:27 do it themselves.
00:39:29 So that was the reason for Fast.ai
00:39:33 is to basically try and figure out
00:39:36 how to get deep learning into the hands of people
00:39:40 who could benefit from it and help them to do so
00:39:43 in as quick and easy and effective a way as possible.
00:39:47 Got it, so sort of empower the domain experts.
00:39:50 Yeah, and like partly it’s because like,
00:39:54 unlike most people in this field,
00:39:56 my background is very applied and industrial.
00:39:59 Like my first job was at McKinsey & Company.
00:40:02 I spent 10 years in management consulting.
00:40:04 I spend a lot of time with domain experts.
00:40:10 So I kind of respect them and appreciate them.
00:40:12 And I know that’s where the value generation in society is.
00:40:16 And so I also know how most of them can’t code
00:40:21 and most of them don’t have the time to invest
00:40:26 three years in a graduate degree or whatever.
00:40:29 So I was like, how do I upskill those domain experts?
00:40:33 I think that would be a super powerful thing,
00:40:36 the biggest societal impact I could have.
00:40:40 So yeah, that was the thinking.
00:40:41 So much of Fast.ai students and researchers
00:40:45 and the things you teach are pragmatically minded,
00:40:50 practically minded,
00:40:52 figuring out ways how to solve real problems and fast.
00:40:55 So from your experience,
00:40:57 what’s the difference between theory
00:40:59 and practice of deep learning?
00:41:03 Well, most of the research in the deep learning world
00:41:07 is a total waste of time.
00:41:09 Right, that’s what I was getting at.
00:41:11 Yeah.
00:41:12 It’s a problem in science in general.
00:41:16 Scientists need to be published,
00:41:19 which means they need to work on things
00:41:21 that their peers are extremely familiar with
00:41:24 and can recognize in advance in that area.
00:41:26 So that means that they all need to work on the same thing.
00:41:30 And so it really, and the thing they work on,
00:41:33 there’s nothing to encourage them to work on things
00:41:35 that are practically useful.
00:41:38 So you get just a whole lot of research,
00:41:41 which is minor advances and stuff
00:41:43 that’s been very highly studied
00:41:44 and has no significant practical impact.
00:41:49 Whereas the things that really make a difference,
00:41:50 like I mentioned transfer learning,
00:41:52 like if we can do better at transfer learning,
00:41:55 then it’s this like world changing thing
00:41:58 where suddenly like lots more people
00:41:59 can do world class work with less resources and less data.
00:42:06 But almost nobody works on that.
00:42:08 Or another example, active learning,
00:42:10 which is the study of like,
00:42:11 how do we get more out of the human beings in the loop?
00:42:15 That’s my favorite topic.
00:42:17 Yeah, so active learning is great,
00:42:18 but it’s almost nobody working on it
00:42:21 because it’s just not a trendy thing right now.
00:42:23 You know what somebody, sorry to interrupt,
00:42:27 you’re saying that nobody is publishing on active learning,
00:42:31 but there’s people inside companies,
00:42:33 anybody who actually has to solve a problem,
00:42:36 they’re going to innovate on active learning.
00:42:39 Yeah, everybody kind of reinvents active learning
00:42:42 when they actually have to work in practice
00:42:43 because they start labeling things and they think,
00:42:46 gosh, this is taking a long time and it’s very expensive.
00:42:49 And then they start thinking,
00:42:51 well, why am I labeling everything?
00:42:52 I’m only, the machine’s only making mistakes
00:42:54 on those two classes.
00:42:56 They’re the hard ones.
00:42:56 Maybe I’ll just start labeling those two classes.
00:42:58 And then you start thinking,
00:43:00 well, why did I do that manually?
00:43:01 Why can’t I just get the system to tell me
00:43:03 which things are going to be hardest?
00:43:05 It’s an obvious thing to do, but yeah,
00:43:08 it’s just like transfer learning.
00:43:11 It’s understudied and the academic world
00:43:14 just has no reason to care about practical results.
00:43:17 The funny thing is,
00:43:18 like I’ve only really ever written one paper.
00:43:19 I hate writing papers.
00:43:21 And I didn’t even write it.
00:43:22 It was my colleague, Sebastian Ruder,
00:43:24 who actually wrote it.
00:43:25 I just did the research for it,
00:43:28 but it was basically introducing transfer learning,
00:43:30 successful transfer learning to NLP for the first time.
00:43:34 The algorithm is called ULM fit.
00:43:36 And it actually, I actually wrote it for the course,
00:43:42 for the Fast AI course.
00:43:43 I wanted to teach people NLP and I thought,
00:43:45 I only want to teach people practical stuff.
00:43:47 And I think the only practical stuff is transfer learning.
00:43:50 And I couldn’t find any examples of transfer learning in NLP.
00:43:53 So I just did it.
00:43:54 And I was shocked to find that as soon as I did it,
00:43:57 which, you know, the basic prototype took a couple of days,
00:44:01 smashed the state of the art
00:44:02 on one of the most important data sets
00:44:04 in a field that I knew nothing about.
00:44:06 And I just thought, well, this is ridiculous.
00:44:10 And so I spoke to Sebastian about it
00:44:13 and he kindly offered to write it up, the results.
00:44:17 And so it ended up being published in ACL,
00:44:21 which is the top computational linguistics conference.
00:44:25 So like people do actually care once you do it,
00:44:28 but I guess it’s difficult for maybe like junior researchers
00:44:32 or like, I don’t care whether I get citations
00:44:36 or papers or whatever.
00:44:37 There’s nothing in my life that makes that important,
00:44:39 which is why I’ve never actually bothered
00:44:41 to write a paper myself.
00:44:43 But for people who do,
00:44:43 I guess they have to pick the kind of safe option,
00:44:49 which is like, yeah, make a slight improvement
00:44:52 on something that everybody’s already working on.
00:44:54 Yeah, nobody does anything interesting
00:44:58 or succeeds in life with the safe option.
00:45:01 Although, I mean, the nice thing is,
00:45:02 nowadays everybody is now working on NLP transfer learning
00:45:05 because since that time we’ve had GPT and GPT2 and BERT,
00:45:09 and, you know, it’s like, it’s, so yeah,
00:45:12 once you show that something’s possible,
00:45:15 everybody jumps in, I guess, so.
00:45:17 I hope to be a part of,
00:45:19 and I hope to see more innovation
00:45:20 and active learning in the same way.
00:45:22 I think transfer learning and active learning
00:45:24 are fascinating, public, open work.
00:45:27 I actually helped start a startup called Platform AI,
00:45:29 which is really all about active learning.
00:45:31 And yeah, it’s been interesting trying to kind of see
00:45:35 what research is out there and make the most of it.
00:45:37 And there’s basically none.
00:45:39 So we’ve had to do all our own research.
00:45:41 Once again, and just as you described.
00:45:44 Can you tell the story of the Stanford competition,
00:45:47 Dawn Bench, and FastAI’s achievement on it?
00:45:51 Sure, so something which I really enjoy
00:45:54 is that I basically teach two courses a year,
00:45:57 the Practical Deep Learning for Coders,
00:45:59 which is kind of the introductory course,
00:46:02 and then Cutting Edge Deep Learning for Coders,
00:46:04 which is the kind of research level course.
00:46:08 And while I teach those courses,
00:46:10 I basically have a big office
00:46:16 at the University of San Francisco,
00:46:18 big enough for like 30 people.
00:46:19 And I invite anybody, any student who wants to come
00:46:22 and hang out with me while I build the course.
00:46:25 And so generally it’s full.
00:46:26 And so we have 20 or 30 people in a big office
00:46:30 with nothing to do but study deep learning.
00:46:33 So it was during one of these times
00:46:35 that somebody in the group said,
00:46:37 oh, there’s a thing called Dawn Bench
00:46:40 that looks interesting.
00:46:41 And I was like, what the hell is that?
00:46:42 And they set out some competition
00:46:44 to see how quickly you can train a model.
00:46:46 Seems kind of, not exactly relevant to what we’re doing,
00:46:50 but it sounds like the kind of thing
00:46:51 which you might be interested in.
00:46:52 And I checked it out and I was like,
00:46:53 oh crap, there’s only 10 days till it’s over.
00:46:55 It’s too late.
00:46:58 And we’re kind of busy trying to teach this course.
00:47:00 But we’re like, oh, it would make an interesting case study
00:47:05 for the course.
00:47:06 It’s like, it’s all the stuff we’re already doing.
00:47:08 Why don’t we just put together
00:47:09 our current best practices and ideas?
00:47:12 So me and I guess about four students
00:47:16 just decided to give it a go.
00:47:17 And we focused on this small one called Cifar 10,
00:47:20 which is little 32 by 32 pixel images.
00:47:24 Can you say what Dawn Bench is?
00:47:26 Yeah, so it’s a competition to train a model
00:47:28 as fast as possible.
00:47:29 It was run by Stanford.
00:47:30 And it’s cheap as possible too.
00:47:32 That’s also another one for as cheap as possible.
00:47:34 And there was a couple of categories,
00:47:36 ImageNet and Cifar 10.
00:47:38 So ImageNet is this big 1.3 million image thing
00:47:42 that took a couple of days to train.
00:47:45 Remember a friend of mine, Pete Warden,
00:47:47 who’s now at Google.
00:47:51 I remember he told me how he trained ImageNet
00:47:53 a few years ago when he basically like had this
00:47:58 little granny flat out the back
00:47:59 that he turned into his ImageNet training center.
00:48:01 And he figured, you know, after like a year of work,
00:48:03 he figured out how to train it in like 10 days or something.
00:48:07 It’s like, that was a big job.
00:48:08 Whereas Cifar 10, at that time,
00:48:10 you could train in a few hours.
00:48:12 You know, it’s much smaller and easier.
00:48:14 So we thought we’d try Cifar 10.
00:48:18 And yeah, I’ve really never done that before.
00:48:23 Like I’d never really,
00:48:24 like things like using more than one GPU at a time
00:48:27 was something I tried to avoid.
00:48:29 Cause to me, it’s like very against the whole idea
00:48:32 of accessibility is should better do things with one GPU.
00:48:35 I mean, have you asked in the past before,
00:48:38 after having accomplished something,
00:48:39 how do I do this faster, much faster?
00:48:42 Oh, always, but it’s always, for me,
00:48:44 it’s always how do I make it much faster on a single GPU
00:48:47 that a normal person could afford in their day to day life.
00:48:50 It’s not how could I do it faster by, you know,
00:48:53 having a huge data center.
00:48:55 Cause to me, it’s all about like,
00:48:57 as many people should better use something as possible
00:48:59 without fussing around with infrastructure.
00:49:04 So anyways, in this case it’s like, well,
00:49:06 we can use eight GPUs just by renting a AWS machine.
00:49:10 So we thought we’d try that.
00:49:11 And yeah, basically using the stuff we were already doing,
00:49:16 we were able to get, you know, the speed,
00:49:20 you know, within a few days we had the speed down to,
00:49:23 I don’t know, a very small number of minutes.
00:49:26 I can’t remember exactly how many minutes it was,
00:49:28 but it might’ve been like 10 minutes or something.
00:49:31 And so, yeah, we found ourselves
00:49:32 at the top of the leaderboard easily
00:49:34 for both time and money, which really shocked me
00:49:39 cause the other people competing in this
00:49:40 were like Google and Intel and stuff
00:49:41 who I like know a lot more about this stuff
00:49:43 than I think we do.
00:49:45 So then we were emboldened.
00:49:46 We thought let’s try the ImageNet one too.
00:49:50 I mean, it seemed way out of our league,
00:49:53 but our goal was to get under 12 hours.
00:49:55 And we did, which was really exciting.
00:49:59 But we didn’t put anything up on the leaderboard,
00:50:01 but we were down to like 10 hours.
00:50:03 But then Google put in like five hours or something
00:50:09 and we’re just like, oh, we’re so screwed.
00:50:13 But we kind of thought, we’ll keep trying.
00:50:16 You know, if Google can do it in five,
00:50:17 I mean, Google did on five hours on something
00:50:19 on like a TPU pod or something, like a lot of hardware.
00:50:23 But we kind of like had a bunch of ideas to try.
00:50:26 Like a really simple thing was
00:50:28 why are we using these big images?
00:50:30 They’re like 224 or 256 by 256 pixels.
00:50:35 You know, why don’t we try smaller ones?
00:50:37 And just to elaborate, there’s a constraint
00:50:40 on the accuracy that your trained model
00:50:42 is supposed to achieve, right?
00:50:43 Yeah, you gotta achieve 93%, I think it was,
00:50:46 for ImageNet, exactly.
00:50:49 Which is very tough, so you have to.
00:50:51 Yeah, 93%, like they picked a good threshold.
00:50:54 It was a little bit higher
00:50:56 than what the most commonly used ResNet 50 model
00:51:00 could achieve at that time.
00:51:03 So yeah, so it’s quite a difficult problem to solve.
00:51:08 But yeah, we realized if we actually
00:51:09 just use 64 by 64 images,
00:51:14 it trained a pretty good model.
00:51:16 And then we could take that same model
00:51:18 and just give it a couple of epochs to learn 224 by 224 images.
00:51:21 And it was basically already trained.
00:51:24 It makes a lot of sense.
00:51:25 Like if you teach somebody,
00:51:26 like here’s what a dog looks like
00:51:28 and you show them low res versions,
00:51:30 and then you say, here’s a really clear picture of a dog,
00:51:33 they already know what a dog looks like.
00:51:35 So that like just, we jumped to the front
00:51:39 and we ended up winning parts of that competition.
00:51:43 We actually ended up doing a distributed version
00:51:47 over multiple machines a couple of months later
00:51:49 and ended up at the top of the leaderboard.
00:51:51 We had 18 minutes.
00:51:53 ImageNet.
00:51:53 Yeah, and it was,
00:51:55 and people have just kept on blasting through
00:51:57 again and again since then, so.
00:52:00 So what’s your view on multi GPU
00:52:03 or multiple machine training in general
00:52:06 as a way to speed code up?
00:52:09 I think it’s largely a waste of time.
00:52:11 Both of them.
00:52:12 I think it’s largely a waste of time.
00:52:13 Both multi GPU on a single machine and.
00:52:15 Yeah, particularly multi machines,
00:52:17 cause it’s just clunky.
00:52:21 Multi GPUs is less clunky than it used to be,
00:52:25 but to me anything that slows down your iteration speed
00:52:28 is a waste of time.
00:52:31 So you could maybe do your very last,
00:52:34 you know, perfecting of the model on multi GPUs
00:52:38 if you need to, but.
00:52:40 So for example, I think doing stuff on ImageNet
00:52:44 is generally a waste of time.
00:52:46 Why test things on 1.3 million images?
00:52:48 Most of us don’t use 1.3 million images.
00:52:51 And we’ve also done research that shows that
00:52:53 doing things on a smaller subset of images
00:52:56 gives you the same relative answers anyway.
00:52:59 So from a research point of view, why waste that time?
00:53:02 So actually I released a couple of new data sets recently.
00:53:06 One is called ImageNet,
00:53:07 the French ImageNet, which is a small subset of ImageNet,
00:53:12 which is designed to be easy to classify.
00:53:15 What’s, how do you spell ImageNet?
00:53:17 It’s got an extra T and E at the end,
00:53:19 cause it’s very French.
00:53:20 And then another one called ImageWolf,
00:53:24 which is a subset of ImageNet that only contains dog breeds.
00:53:29 And that’s a hard one, right?
00:53:31 That’s a hard one.
00:53:31 And I’ve discovered that if you just look at these
00:53:34 two subsets, you can train things on a single GPU
00:53:37 in 10 minutes.
00:53:39 And the results you get are directly transferable
00:53:42 to ImageNet nearly all the time.
00:53:44 And so now I’m starting to see some researchers
00:53:46 start to use these much smaller data sets.
00:53:48 I so deeply love the way you think,
00:53:51 because I think you might’ve written a blog post
00:53:55 saying that sort of going these big data sets
00:54:00 is encouraging people to not think creatively.
00:54:03 Absolutely.
00:54:04 So you’re too, it sort of constrains you to train
00:54:08 on large resources.
00:54:09 And because you have these resources,
00:54:11 you think more research will be better.
00:54:13 And then you start, so like somehow you kill the creativity.
00:54:17 Yeah, and even worse than that, Lex,
00:54:19 I keep hearing from people who say,
00:54:21 I decided not to get into deep learning
00:54:23 because I don’t believe it’s accessible to people
00:54:26 outside of Google to do useful work.
00:54:28 So like I see a lot of people make an explicit decision
00:54:31 to not learn this incredibly valuable tool
00:54:35 because they’ve drunk the Google Koolaid,
00:54:39 which is that only Google’s big enough
00:54:40 and smart enough to do it.
00:54:42 And I just find that so disappointing and it’s so wrong.
00:54:45 And I think all of the major breakthroughs in AI
00:54:49 in the next 20 years will be doable on a single GPU.
00:54:53 Like I would say, my sense is all the big sort of.
00:54:57 Well, let’s put it this way.
00:54:58 None of the big breakthroughs of the last 20 years
00:55:00 have required multiple GPUs.
00:55:01 So like batch norm, ReLU, Dropout.
00:55:05 To demonstrate that there’s something to them.
00:55:08 Every one of them, none of them has required multiple GPUs.
00:55:11 GANs, the original GANs didn’t require multiple GPUs.
00:55:15 Well, and we’ve actually recently shown
00:55:18 that you don’t even need GANs.
00:55:19 So we’ve developed GAN level outcomes without needing GANs.
00:55:24 And we can now do it with, again,
00:55:26 by using transfer learning,
00:55:27 we can do it in a couple of hours on a single GPU.
00:55:30 You’re just using a generator model
00:55:31 without the adversarial part?
00:55:32 Yeah, so we’ve found loss functions
00:55:35 that work super well without the adversarial part.
00:55:38 And then one of our students, a guy called Jason Antich,
00:55:41 has created a system called dealtify,
00:55:44 which uses this technique to colorize
00:55:47 old black and white movies.
00:55:48 You can do it on a single GPU,
00:55:50 colorize a whole movie in a couple of hours.
00:55:52 And one of the things that Jason and I did together
00:55:56 was we figured out how to add a little bit of GAN
00:56:00 at the very end, which it turns out for colorization
00:56:02 makes it just a bit brighter and nicer.
00:56:05 And then Jason did masses of experiments
00:56:07 to figure out exactly how much to do,
00:56:09 but it’s still all done on his home machine
00:56:12 on a single GPU in his lounge room.
00:56:15 And if you think about colorizing Hollywood movies,
00:56:19 that sounds like something a huge studio would have to do,
00:56:21 but he has the world’s best results on this.
00:56:25 There’s this problem of microphones.
00:56:27 We’re just talking to microphones now.
00:56:29 It’s such a pain in the ass to have these microphones
00:56:32 to get good quality audio.
00:56:34 And I tried to see if it’s possible to plop down
00:56:36 a bunch of cheap sensors and reconstruct
00:56:39 higher quality audio from multiple sources.
00:56:41 Because right now I haven’t seen the work from,
00:56:45 okay, we can say even expensive mics
00:56:47 automatically combining audio from multiple sources
00:56:50 to improve the combined audio.
00:56:52 People haven’t done that.
00:56:53 And that feels like a learning problem.
00:56:55 So hopefully somebody can.
00:56:56 Well, I mean, it’s evidently doable
00:56:58 and it should have been done by now.
00:57:01 I felt the same way about computational photography
00:57:03 four years ago.
00:57:05 Why are we investing in big lenses
00:57:07 when three cheap lenses plus actually
00:57:10 a little bit of intentional movement,
00:57:13 so like take a few frames,
00:57:16 gives you enough information
00:57:18 to get excellent subpixel resolution,
00:57:20 which particularly with deep learning,
00:57:22 you would know exactly what you meant to be looking at.
00:57:25 We can totally do the same thing with audio.
00:57:28 I think it’s madness that it hasn’t been done yet.
00:57:30 Is there progress on the photography company?
00:57:33 Yeah, photography is basically standard now.
00:57:36 So the Google Pixel Night Light,
00:57:40 I don’t know if you’ve ever tried it,
00:57:42 but it’s astonishing.
00:57:43 You take a picture in almost pitch black
00:57:45 and you get back a very high quality image.
00:57:49 And it’s not because of the lens.
00:57:51 Same stuff with like adding the bokeh
00:57:53 to the background blurring,
00:57:55 it’s done computationally.
00:57:57 This is the pixel right here.
00:57:58 Yeah, basically everybody now
00:58:01 is doing most of the fanciest stuff
00:58:05 on their phones with computational photography
00:58:07 and also increasingly people are putting
00:58:08 more than one lens on the back of the camera.
00:58:11 So the same will happen for audio for sure.
00:58:14 And there’s applications in the audio side.
00:58:16 If you look at an Alexa type device,
00:58:19 most people I’ve seen,
00:58:20 especially I worked at Google before,
00:58:22 when you look at noise background removal,
00:58:25 you don’t think of multiple sources of audio.
00:58:29 You don’t play with that as much
00:58:31 as I would hope people would.
00:58:31 But I mean, you can still do it even with one.
00:58:33 Like again, not much work’s been done in this area.
00:58:36 So we’re actually gonna be releasing an audio library soon,
00:58:39 which hopefully will encourage development of this
00:58:41 because it’s so underused.
00:58:43 The basic approach we used for our super resolution
00:58:46 and which Jason uses for dealtify
00:58:48 of generating high quality images,
00:58:50 the exact same approach would work for audio.
00:58:53 No one’s done it yet,
00:58:54 but it would be a couple of months work.
00:58:57 Okay, also learning rate in terms of Dawn Bench.
00:59:01 There’s some magic on learning rate
00:59:03 that you played around with that’s kind of interesting.
00:59:05 Yeah, so this is all work that came
00:59:06 from a guy called Leslie Smith.
00:59:09 Leslie’s a researcher who, like us,
00:59:12 cares a lot about just the practicalities
00:59:15 of training neural networks quickly and accurately,
00:59:20 which I think is what everybody should care about,
00:59:22 but almost nobody does.
00:59:24 And he discovered something very interesting,
00:59:28 which he calls super convergence,
00:59:29 which is there are certain networks
00:59:31 that with certain settings of high parameters
00:59:33 could suddenly be trained 10 times faster
00:59:37 by using a 10 times higher learning rate.
00:59:39 Now, no one published that paper
00:59:43 because it’s not an area of kind of active research
00:59:49 in the academic world.
00:59:50 No academics recognize that this is important.
00:59:52 And also deep learning in academia
00:59:56 is not considered a experimental science.
00:59:59 So unlike in physics where you could say like,
01:00:02 I just saw a subatomic particle do something
01:00:05 which the theory doesn’t explain,
01:00:07 you could publish that without an explanation.
01:00:10 And then in the next 60 years,
01:00:11 people can try to work out how to explain it.
01:00:14 We don’t allow this in the deep learning world.
01:00:16 So it’s literally impossible for Leslie
01:00:19 to publish a paper that says,
01:00:21 I’ve just seen something amazing happen.
01:00:23 This thing trained 10 times faster than it should have.
01:00:25 I don’t know why.
01:00:27 And so the reviewers were like,
01:00:28 well, you can’t publish that because you don’t know why.
01:00:30 So anyway.
01:00:31 That’s important to pause on
01:00:32 because there’s so many discoveries
01:00:34 that would need to start like that.
01:00:36 Every other scientific field I know of works that way.
01:00:39 I don’t know why ours is uniquely disinterested
01:00:43 in publishing unexplained experimental results,
01:00:47 but there it is.
01:00:48 So it wasn’t published.
01:00:51 Having said that,
01:00:52 I read a lot more unpublished papers than published papers
01:00:56 because that’s where you find the interesting insights.
01:01:00 So I absolutely read this paper.
01:01:02 And I was just like,
01:01:04 this is astonishingly mind blowing and weird
01:01:08 and awesome.
01:01:09 And like, why isn’t everybody only talking about this?
01:01:12 Because like, if you can train these things 10 times faster,
01:01:15 they also generalize better
01:01:16 because you’re doing less epochs,
01:01:18 which means you look at the data less,
01:01:20 you get better accuracy.
01:01:22 So I’ve been kind of studying that ever since.
01:01:24 And eventually Leslie kind of figured out
01:01:28 a lot of how to get this done.
01:01:30 And we added minor tweaks.
01:01:32 And a big part of the trick
01:01:33 is starting at a very low learning rate,
01:01:36 very gradually increasing it.
01:01:37 So as you’re training your model,
01:01:39 you would take very small steps at the start
01:01:42 and you gradually make them bigger and bigger
01:01:44 until eventually you’re taking much bigger steps
01:01:46 than anybody thought was possible.
01:01:49 There’s a few other little tricks to make it work,
01:01:51 but basically we can reliably get super convergence.
01:01:55 And so for the Dawn Bench thing,
01:01:56 we were using just much higher learning rates
01:01:59 than people expected to work.
01:02:02 What do you think the future of,
01:02:03 I mean, it makes so much sense
01:02:04 for that to be a critical hyperparameter learning rate
01:02:07 that you vary.
01:02:08 What do you think the future
01:02:09 of learning rate magic looks like?
01:02:13 Well, there’s been a lot of great work
01:02:14 in the last 12 months in this area.
01:02:17 And people are increasingly realizing that optimize,
01:02:20 like we just have no idea really how optimizers work.
01:02:23 And the combination of weight decay,
01:02:25 which is how we regularize optimizers,
01:02:27 and the learning rate,
01:02:29 and then other things like the epsilon we use
01:02:31 in the Adam optimizer,
01:02:32 they all work together in weird ways.
01:02:36 And different parts of the model,
01:02:38 this is another thing we’ve done a lot of work on
01:02:40 is research into how different parts of the model
01:02:43 should be trained at different rates in different ways.
01:02:46 So we do something we call discriminative learning rates,
01:02:49 which is really important,
01:02:50 particularly for transfer learning.
01:02:53 So really, I think in the last 12 months,
01:02:54 a lot of people have realized
01:02:55 that all this stuff is important.
01:02:57 There’s been a lot of great work coming out
01:03:00 and we’re starting to see algorithms appear,
01:03:03 which have very, very few dials, if any,
01:03:06 that you have to touch.
01:03:07 So I think what’s gonna happen
01:03:09 is the idea of a learning rate,
01:03:10 well, it almost already has disappeared
01:03:12 in the latest research.
01:03:14 And instead, it’s just like we know enough
01:03:18 about how to interpret the gradients
01:03:22 and the change of gradients we see
01:03:23 to know how to set every parameter
01:03:25 in an optimal way.
01:03:26 So you see the future of deep learning
01:03:30 where really, where’s the input of a human expert needed?
01:03:34 Well, hopefully the input of a human expert
01:03:36 will be almost entirely unneeded
01:03:38 from the deep learning point of view.
01:03:40 So again, like Google’s approach to this
01:03:43 is to try and use thousands of times more compute
01:03:46 to run lots and lots of models at the same time
01:03:49 and hope that one of them is good.
01:03:51 AutoML kind of thing?
01:03:51 Yeah, AutoML kind of stuff, which I think is insane.
01:03:56 When you better understand the mechanics
01:03:59 of how models learn,
01:04:01 you don’t have to try a thousand different models
01:04:03 to find which one happens to work the best.
01:04:05 You can just jump straight to the best one,
01:04:08 which means that it’s more accessible
01:04:09 in terms of compute, cheaper,
01:04:12 and also with less hyperparameters to set,
01:04:14 it means you don’t need deep learning experts
01:04:16 to train your deep learning model for you,
01:04:19 which means that domain experts can do more of the work,
01:04:22 which means that now you can focus the human time
01:04:24 on the kind of interpretation, the data gathering,
01:04:28 identifying model errors and stuff like that.
01:04:31 Yeah, the data side.
01:04:32 How often do you work with data these days
01:04:34 in terms of the cleaning, looking at it?
01:04:37 Like Darwin looked at different species
01:04:41 while traveling about.
01:04:42 Do you look at data?
01:04:45 Have you in your roots in Kaggle?
01:04:48 Always, yeah.
01:04:48 Look at data.
01:04:49 Yeah, I mean, it’s a key part of our course.
01:04:51 It’s like before we train a model in the course,
01:04:53 we see how to look at the data.
01:04:55 And then the first thing we do
01:04:56 after we train our first model,
01:04:57 which we fine tune an ImageNet model for five minutes.
01:05:00 And then the thing we immediately do after that
01:05:02 is we learn how to analyze the results of the model
01:05:05 by looking at examples of misclassified images
01:05:08 and looking at a classification matrix,
01:05:10 and then doing research on Google
01:05:15 to learn about the kinds of things that it’s misclassifying.
01:05:18 So to me, one of the really cool things
01:05:19 about machine learning models in general
01:05:21 is that when you interpret them,
01:05:24 they tell you about things like
01:05:25 what are the most important features,
01:05:27 which groups are you misclassifying,
01:05:29 and they help you become a domain expert more quickly
01:05:32 because you can focus your time on the bits
01:05:34 that the model is telling you is important.
01:05:38 So it lets you deal with things like data leakage,
01:05:40 for example, if it says,
01:05:41 oh, the main feature I’m looking at is customer ID.
01:05:45 And you’re like, oh, customer ID should be predictive.
01:05:47 And then you can talk to the people
01:05:50 that manage customer IDs and they’ll tell you like,
01:05:53 oh yes, as soon as a customer’s application is accepted,
01:05:57 we add a one on the end of their customer ID or something.
01:06:01 So yeah, looking at data,
01:06:03 particularly from the lens of which parts of the data
01:06:06 the model says is important is super important.
01:06:09 Yeah, and using the model to almost debug the data
01:06:12 to learn more about the data.
01:06:14 Exactly.
01:06:16 What are the different cloud options
01:06:18 for training your own networks?
01:06:20 Last question related to DawnBench.
01:06:21 Well, it’s part of a lot of the work you do,
01:06:24 but from a perspective of performance,
01:06:27 I think you’ve written this in a blog post.
01:06:29 There’s AWS, there’s TPU from Google.
01:06:32 What’s your sense?
01:06:33 What the future holds?
01:06:34 What would you recommend now in terms of training?
01:06:37 So from a hardware point of view,
01:06:40 Google’s TPUs and the best Nvidia GPUs are similar.
01:06:45 I mean, maybe the TPUs are like 30% faster,
01:06:47 but they’re also much harder to program.
01:06:49 There isn’t a clear leader in terms of hardware right now,
01:06:54 although much more importantly,
01:06:56 the Nvidia GPUs are much more programmable.
01:06:59 They’ve got much more written for all of them.
01:07:00 So like that’s the clear leader for me
01:07:03 and where I would spend my time
01:07:04 as a researcher and practitioner.
01:07:08 But then in terms of the platform,
01:07:12 I mean, we’re super lucky now with stuff like Google GCP,
01:07:16 Google Cloud, and AWS that you can access a GPU
01:07:21 pretty quickly and easily.
01:07:25 But I mean, for AWS, it’s still too hard.
01:07:28 Like you have to find an AMI and get the instance running
01:07:33 and then install the software you want and blah, blah, blah.
01:07:37 GCP is currently the best way to get started
01:07:40 on a full server environment
01:07:42 because they have a fantastic fast AI in PyTorch ready
01:07:46 to go instance, which has all the courses preinstalled.
01:07:51 It has Jupyter Notebook pre running.
01:07:53 Jupyter Notebook is this wonderful
01:07:55 interactive computing system,
01:07:57 which everybody basically should be using
01:08:00 for any kind of data driven research.
01:08:02 But then even better than that,
01:08:05 there are platforms like Salamander, which we own
01:08:09 and Paperspace, where literally you click a single button
01:08:13 and it pops up a Jupyter Notebook straight away
01:08:17 without any kind of installation or anything.
01:08:22 And all the course notebooks are all preinstalled.
01:08:25 So like for me, this is one of the things
01:08:28 we spent a lot of time kind of curating and working on.
01:08:34 Because when we first started our courses,
01:08:35 the biggest problem was people dropped out of lesson one
01:08:39 because they couldn’t get an AWS instance running.
01:08:42 So things are so much better now.
01:08:44 And like we actually have, if you go to course.fast.ai,
01:08:47 the first thing it says is here’s how to get started
01:08:49 with your GPU.
01:08:50 And there’s like, you just click on the link
01:08:52 and you click start and you’re going.
01:08:55 You’ll go GCP.
01:08:56 I have to confess, I’ve never used the Google GCP.
01:08:58 Yeah, GCP gives you $300 of compute for free,
01:09:01 which is really nice.
01:09:03 But as I say, Salamander and Paperspace
01:09:07 are even easier still.
01:09:09 Okay.
01:09:10 So from the perspective of deep learning frameworks,
01:09:15 you work with fast.ai, if you go to this framework,
01:09:18 and PyTorch and TensorFlow.
01:09:21 What are the strengths of each platform in your perspective?
01:09:25 So in terms of what we’ve done our research on
01:09:28 and taught in our course,
01:09:30 we started with Theano and Keras,
01:09:34 and then we switched to TensorFlow and Keras,
01:09:38 and then we switched to PyTorch,
01:09:40 and then we switched to PyTorch and fast.ai.
01:09:42 And that kind of reflects a growth and development
01:09:47 of the ecosystem of deep learning libraries.
01:09:52 Theano and TensorFlow were great,
01:09:57 but were much harder to teach and to do research
01:10:00 and development on because they define
01:10:02 what’s called a computational graph upfront,
01:10:05 a static graph, where you basically have to say,
01:10:07 here are all the things that I’m gonna eventually do
01:10:10 in my model, and then later on you say,
01:10:13 okay, do those things with this data.
01:10:15 And you can’t like debug them,
01:10:17 you can’t do them step by step,
01:10:18 you can’t program them interactively
01:10:20 in a Jupyter notebook and so forth.
01:10:22 PyTorch was not the first,
01:10:23 but PyTorch was certainly the strongest entrant
01:10:26 to come along and say, let’s not do it that way,
01:10:28 let’s just use normal Python.
01:10:31 And everything you know about in Python
01:10:32 is just gonna work, and we’ll figure out
01:10:35 how to make that run on the GPU as and when necessary.
01:10:40 That turned out to be a huge leap
01:10:44 in terms of what we could do with our research
01:10:46 and what we could do with our teaching.
01:10:49 Because it wasn’t limiting.
01:10:51 Yeah, I mean, it was critical for us
01:10:52 for something like DawnBench
01:10:53 to be able to rapidly try things.
01:10:55 It’s just so much harder to be a researcher
01:10:57 and practitioner when you have to do everything upfront
01:11:00 and you can’t inspect it.
01:11:03 Problem with PyTorch is it’s not at all accessible
01:11:07 to newcomers because you have to like
01:11:10 write your own training loop and manage the gradients
01:11:12 and all this stuff.
01:11:15 And it’s also like not great for researchers
01:11:17 because you’re spending your time dealing
01:11:19 with all this boilerplate and overhead
01:11:21 rather than thinking about your algorithm.
01:11:23 So we ended up writing this very multi layered API
01:11:27 that at the top level, you can train
01:11:29 a state of the art neural network
01:11:31 in three lines of code.
01:11:33 And which kind of talks to an API,
01:11:35 which talks to an API, which talks to an API,
01:11:36 which like you can dive into at any level
01:11:38 and get progressively closer to the machine
01:11:42 kind of levels of control.
01:11:45 And this is the fast AI library.
01:11:47 That’s been critical for us and for our students
01:11:51 and for lots of people that have won deep learning
01:11:54 competitions with it and written academic papers with it.
01:11:58 It’s made a big difference.
01:12:00 We’re still limited though by Python.
01:12:03 And particularly this problem with things like
01:12:06 recurrent neural nets say where you just can’t change things
01:12:11 unless you accept it going so slowly that it’s impractical.
01:12:15 So in the latest incarnation of the course
01:12:18 and with some of the research we’re now starting to do,
01:12:20 we’re starting to do stuff, some stuff in Swift.
01:12:24 I think we’re three years away from that
01:12:28 being super practical, but I’m in no hurry.
01:12:31 I’m very happy to invest the time to get there.
01:12:35 But with that, we actually already have a nascent version
01:12:39 of the fast AI library for vision running
01:12:42 on Swift and TensorFlow.
01:12:44 Cause a Python for TensorFlow is not gonna cut it.
01:12:48 It’s just a disaster.
01:12:49 What they did was they tried to replicate
01:12:53 the bits that people were saying they like about PyTorch,
01:12:57 this kind of interactive computation,
01:12:59 but they didn’t actually change
01:13:00 their foundational runtime components.
01:13:03 So they kind of added this like syntax sugar
01:13:06 they call TF Eager, TensorFlow Eager,
01:13:08 which makes it look a lot like PyTorch,
01:13:10 but it’s 10 times slower than PyTorch
01:13:12 to actually do a step.
01:13:16 So because they didn’t invest the time in like retooling
01:13:20 the foundations, cause their code base is so horribly
01:13:23 complex.
01:13:24 Yeah, I think it’s probably very difficult
01:13:25 to do that kind of retooling.
01:13:26 Yeah, well, particularly the way TensorFlow was written,
01:13:28 it was written by a lot of people very quickly
01:13:31 in a very disorganized way.
01:13:33 So like when you actually look in the code,
01:13:35 as I do often, I’m always just like,
01:13:37 Oh God, what were they thinking?
01:13:38 It’s just, it’s pretty awful.
01:13:41 So I’m really extremely negative
01:13:45 about the potential future for Python for TensorFlow.
01:13:50 But Swift for TensorFlow can be a different beast altogether.
01:13:53 It can be like, it can basically be a layer on top of MLIR
01:13:57 that takes advantage of, you know,
01:14:00 all the great compiler stuff that Swift builds on with LLVM
01:14:04 and yeah, I think it will be absolutely fantastic.
01:14:10 Well, you’re inspiring me to try.
01:14:11 I haven’t truly felt the pain of TensorFlow 2.0 Python.
01:14:17 It’s fine by me, but of…
01:14:21 Yeah, I mean, it does the job
01:14:22 if you’re using like predefined things
01:14:25 that somebody has already written.
01:14:27 But if you actually compare, you know,
01:14:29 like I’ve had to do,
01:14:31 cause I’ve been having to do a lot of stuff
01:14:32 with TensorFlow recently,
01:14:33 you actually compare like,
01:14:34 okay, I want to write something from scratch
01:14:37 and you’re like, I just keep finding it’s like,
01:14:38 Oh, it’s running 10 times slower than PyTorch.
01:14:41 So is the biggest cost,
01:14:43 let’s throw running time out the window.
01:14:47 How long it takes you to program?
01:14:49 That’s not too different now,
01:14:50 thanks to TensorFlow Eager, that’s not too different.
01:14:54 But because so many things take so long to run,
01:14:58 you wouldn’t run it at 10 times slower.
01:15:00 Like you just go like, Oh, this is taking too long.
01:15:03 And also there’s a lot of things
01:15:04 which are just less programmable,
01:15:05 like tf.data, which is the way data processing works
01:15:08 in TensorFlow is just this big mess.
01:15:11 It’s incredibly inefficient.
01:15:13 And they kind of had to write it that way
01:15:14 because of the TPU problems I described earlier.
01:15:19 So I just, you know,
01:15:22 I just feel like they’ve got this huge technical debt,
01:15:24 which they’re not going to solve
01:15:26 without starting from scratch.
01:15:27 So here’s an interesting question then,
01:15:29 if there’s a new student starting today,
01:15:34 what would you recommend they use?
01:15:37 Well, I mean, we obviously recommend Fastai and PyTorch
01:15:40 because we teach new students and that’s what we teach with.
01:15:43 So we would very strongly recommend that
01:15:46 because it will let you get on top of the concepts
01:15:50 much more quickly.
01:15:51 So then you’ll become an actual,
01:15:53 and you’ll also learn the actual state
01:15:54 of the art techniques, you know,
01:15:56 so you actually get world class results.
01:15:59 Honestly, it doesn’t much matter what library you learn
01:16:03 because switching from the trainer to MXNet
01:16:08 to TensorFlow to PyTorch is gonna be a couple of days work
01:16:12 as long as you understand the foundation as well.
01:16:15 But you think will Swift creep in there
01:16:19 as a thing that people start using?
01:16:22 Not for a few years,
01:16:24 particularly because like Swift has no data science
01:16:29 community, libraries, schooling.
01:16:33 And the Swift community has a total lack of appreciation
01:16:39 and understanding of numeric computing.
01:16:40 So like they keep on making stupid decisions, you know,
01:16:43 for years, they’ve just done dumb things
01:16:45 around performance and prioritization.
01:16:50 That’s clearly changing now
01:16:53 because the developer of Swift, Chris Latner,
01:16:58 is working at Google on Swift for TensorFlow.
01:17:00 So like that’s a priority.
01:17:04 It’ll be interesting to see what happens with Apple
01:17:05 because like Apple hasn’t shown any sign of caring
01:17:10 about numeric programming in Swift.
01:17:13 So I mean, hopefully they’ll get off their ass
01:17:17 and start appreciating this
01:17:18 because currently all of their low level libraries
01:17:22 are not written in Swift.
01:17:25 They’re not particularly Swifty at all,
01:17:27 stuff like CoreML, they’re really pretty rubbish.
01:17:30 So yeah, so there’s a long way to go.
01:17:33 But at least one nice thing is that Swift for TensorFlow
01:17:36 can actually directly use Python code and Python libraries
01:17:40 in a literally the entire lesson one notebook of fast AI
01:17:45 runs in Swift right now in Python mode.
01:17:48 So that’s a nice intermediate thing.
01:17:51 How long does it take?
01:17:53 If you look at the two fast AI courses,
01:17:57 how long does it take to get from point zero
01:18:00 to completing both courses?
01:18:03 It varies a lot.
01:18:05 Somewhere between two months and two years generally.
01:18:13 So for two months, how many hours a day on average?
01:18:16 So like somebody who is a very competent coder
01:18:20 can do 70 hours per course and pick up 70.
01:18:27 70, seven zero, that’s it, okay.
01:18:30 But a lot of people I know take a year off
01:18:35 to study fast AI full time and say at the end of the year,
01:18:40 they feel pretty competent
01:18:43 because generally there’s a lot of other things you do
01:18:45 like generally they’ll be entering Kaggle competitions,
01:18:48 they might be reading Ian Goodfellow’s book,
01:18:51 they might, they’ll be doing a bunch of stuff
01:18:54 and often particularly if they are a domain expert,
01:18:57 their coding skills might be a little
01:19:00 on the pedestrian side.
01:19:01 So part of it’s just like doing a lot more writing.
01:19:04 What do you find is the bottleneck for people usually
01:19:07 except getting started and setting stuff up?
01:19:11 I would say coding.
01:19:13 Yeah, I would say the best,
01:19:14 the people who are strong coders pick it up the best.
01:19:18 Although another bottleneck is people who have a lot
01:19:21 of experience of classic statistics can really struggle
01:19:27 because the intuition is so the opposite
01:19:30 of what they’re used to.
01:19:30 They’re very used to like trying to reduce the number
01:19:33 of parameters in their model
01:19:34 and looking at individual coefficients and stuff like that.
01:19:39 So I find people who have a lot of coding background
01:19:42 and know nothing about statistics
01:19:44 are generally gonna be the best off.
01:19:48 So you taught several courses on deep learning
01:19:51 and as Feynman says,
01:19:52 best way to understand something is to teach it.
01:19:55 What have you learned about deep learning from teaching it?
01:19:59 A lot.
01:20:00 That’s a key reason for me to teach the courses.
01:20:03 I mean, obviously it’s gonna be necessary
01:20:04 to achieve our goal of getting domain experts
01:20:07 to be familiar with deep learning,
01:20:09 but it was also necessary for me to achieve my goal
01:20:12 of being really familiar with deep learning.
01:20:18 I mean, to see so many domain experts
01:20:24 from so many different backgrounds,
01:20:25 it’s definitely, I wouldn’t say taught me,
01:20:28 but convinced me something that I liked to believe was true,
01:20:32 which was anyone can do it.
01:20:34 So there’s a lot of kind of snobbishness out there
01:20:37 about only certain people can learn to code.
01:20:40 Only certain people are gonna be smart enough
01:20:42 like do AI, that’s definitely bullshit.
01:20:45 I’ve seen so many people from so many different backgrounds
01:20:48 get state of the art results in their domain areas now.
01:20:53 It’s definitely taught me that the key differentiator
01:20:57 between people that succeed
01:20:58 and people that fail is tenacity.
01:21:00 That seems to be basically the only thing that matters.
01:21:05 A lot of people give up.
01:21:06 But of the ones who don’t give up,
01:21:09 pretty much everybody succeeds.
01:21:12 Even if at first I’m just kind of like thinking like,
01:21:15 wow, they really aren’t quite getting it yet, are they?
01:21:18 But eventually people get it and they succeed.
01:21:22 So I think that’s been,
01:21:24 I think they’re both things I liked to believe was true,
01:21:26 but I don’t feel like I really had strong evidence
01:21:28 for them to be true,
01:21:29 but now I can say I’ve seen it again and again.
01:21:32 I’ve seen it again and again. So what advice do you have
01:21:37 for someone who wants to get started in deep learning?
01:21:42 Train lots of models.
01:21:44 That’s how you learn it.
01:21:47 So I think, it’s not just me,
01:21:51 I think our course is very good,
01:21:53 but also lots of people independently
01:21:54 have said it’s very good.
01:21:55 It recently won the COGx award for AI courses
01:21:58 as being the best in the world.
01:21:59 So I’d say come to our course, course.fast.ai.
01:22:02 And the thing I keep on hopping on in my lessons
01:22:05 is train models, print out the inputs to the models,
01:22:09 print out to the outputs to the models,
01:22:11 like study, change the inputs a bit,
01:22:15 look at how the outputs vary,
01:22:17 just run lots of experiments
01:22:18 to get an intuitive understanding of what’s going on.
01:22:25 To get hooked, do you think, you mentioned training,
01:22:29 do you think just running the models inference,
01:22:32 like if we talk about getting started?
01:22:35 No, you’ve got to fine tune the models.
01:22:37 So that’s the critical thing,
01:22:39 because at that point you now have a model
01:22:41 that’s in your domain area.
01:22:43 So there’s no point running somebody else’s model
01:22:46 because it’s not your model.
01:22:48 So it only takes five minutes to fine tune a model
01:22:50 for the data you care about.
01:22:52 And in lesson two of the course,
01:22:53 we teach you how to create your own data set from scratch
01:22:56 by scripting Google image search.
01:22:58 So, and we show you how to actually create
01:23:01 a web application running online.
01:23:02 So I create one in the course that differentiates
01:23:05 between a teddy bear, a grizzly bear and a brown bear.
01:23:08 And it does it with basically 100% accuracy,
01:23:11 took me about four minutes to scrape the images
01:23:13 from Google search in the script.
01:23:15 There’s a little graphical widgets we have in the notebook
01:23:18 that help you clean up the data set.
01:23:21 There’s other widgets that help you study the results
01:23:24 to see where the errors are happening.
01:23:26 And so now we’ve got over a thousand replies
01:23:29 in our share your work here thread
01:23:31 of students saying, here’s the thing I built.
01:23:34 And so there’s people who like,
01:23:35 and a lot of them are state of the art.
01:23:37 Like somebody said, oh, I tried looking
01:23:39 at Devangari characters and I couldn’t believe it.
01:23:41 The thing that came out was more accurate
01:23:43 than the best academic paper after lesson one.
01:23:46 And then there’s others which are just more kind of fun,
01:23:48 like somebody who’s doing Trinidad and Tobago hummingbirds.
01:23:53 She said that’s kind of their national bird
01:23:54 and she’s got something that can now classify Trinidad
01:23:57 and Tobago hummingbirds.
01:23:58 So yeah, train models, fine tune models with your data set
01:24:02 and then study their inputs and outputs.
01:24:05 How much is Fast.ai courses?
01:24:07 Free.
01:24:08 Everything we do is free.
01:24:10 We have no revenue sources of any kind.
01:24:12 It’s just a service to the community.
01:24:15 You’re a saint.
01:24:16 Okay, once a person understands the basics,
01:24:20 trains a bunch of models,
01:24:22 if we look at the scale of years,
01:24:25 what advice do you have for someone wanting
01:24:27 to eventually become an expert?
01:24:30 Train lots of models.
01:24:31 But specifically train lots of models in your domain area.
01:24:35 So an expert what, right?
01:24:37 We don’t need more expert,
01:24:39 like create slightly evolutionary research in areas
01:24:45 that everybody’s studying.
01:24:46 We need experts at using deep learning
01:24:50 to diagnose malaria.
01:24:52 Or we need experts at using deep learning
01:24:55 to analyze language to study media bias.
01:25:01 So we need experts in analyzing fisheries
01:25:08 to identify problem areas in the ocean.
01:25:11 That’s what we need.
01:25:13 So become the expert in your passion area.
01:25:17 And this is a tool which you can use for just about anything
01:25:21 and you’ll be able to do that thing better
01:25:22 than other people, particularly by combining it
01:25:25 with your passion and domain expertise.
01:25:27 So that’s really interesting.
01:25:28 Even if you do wanna innovate on transfer learning
01:25:30 or active learning, your thought is,
01:25:34 I mean, it’s one I certainly share,
01:25:36 is you also need to find a domain or data set
01:25:40 that you actually really care for.
01:25:42 If you’re not working on a real problem that you understand,
01:25:45 how do you know if you’re doing it any good?
01:25:48 How do you know if your results are good?
01:25:49 How do you know if you’re getting bad results?
01:25:50 Why are you getting bad results?
01:25:52 Is it a problem with the data?
01:25:54 Like, how do you know you’re doing anything useful?
01:25:57 Yeah, to me, the only really interesting research is,
01:26:00 not the only, but the vast majority
01:26:02 of interesting research is like,
01:26:04 try and solve an actual problem and solve it really well.
01:26:06 So both understanding sufficient tools
01:26:09 on the deep learning side and becoming a domain expert
01:26:13 in a particular domain are really things
01:26:15 within reach for anybody.
01:26:18 Yeah, I mean, to me, I would compare it
01:26:20 to like studying self driving cars,
01:26:23 having never looked at a car or been in a car
01:26:26 or turned a car on, which is like the way it is
01:26:29 for a lot of people, they’ll study some academic data set
01:26:33 where they literally have no idea about that.
01:26:36 By the way, I’m not sure how familiar
01:26:37 with autonomous vehicles, but that is literally,
01:26:40 you describe a large percentage of robotics folks
01:26:43 working in self driving cars is they actually
01:26:45 haven’t considered driving.
01:26:48 They haven’t actually looked at what driving looks like.
01:26:50 They haven’t driven.
01:26:51 And it’s a problem because you know,
01:26:53 when you’ve actually driven, you know,
01:26:54 like these are the things that happened
01:26:55 to me when I was driving.
01:26:57 There’s nothing that beats the real world examples
01:26:59 of just experiencing them.
01:27:02 You’ve created many successful startups.
01:27:04 What does it take to create a successful startup?
01:27:08 Same thing as becoming a successful
01:27:11 deep learning practitioner, which is not giving up.
01:27:15 So you can run out of money or run out of time
01:27:23 or run out of something, you know,
01:27:24 but if you keep costs super low
01:27:28 and try and save up some money beforehand
01:27:29 so you can afford to have some time,
01:27:35 then just sticking with it is one important thing.
01:27:38 Doing something you understand and care about is important.
01:27:42 By something, I don’t mean,
01:27:44 the biggest problem I see with deep learning people
01:27:46 is they do a PhD in deep learning
01:27:50 and then they try and commercialize their PhD.
01:27:52 It is a waste of time
01:27:53 because that doesn’t solve an actual problem.
01:27:55 You picked your PhD topic
01:27:57 because it was an interesting kind of engineering
01:28:00 or math or research exercise.
01:28:02 But yeah, if you’ve actually spent time as a recruiter
01:28:06 and you know that most of your time was spent
01:28:09 sifting through resumes
01:28:10 and you know that most of the time
01:28:12 you’re just looking for certain kinds of things
01:28:14 and you can try doing that with a model for a few minutes
01:28:19 and see whether that’s something which a model
01:28:21 seems to be able to do as well as you could,
01:28:23 then you’re on the right track to creating a startup.
01:28:27 And then I think just, yeah, being, just be pragmatic and
01:28:32 try and stay away from venture capital money
01:28:36 as long as possible, preferably forever.
01:28:39 So yeah, on that point, do you venture capital?
01:28:43 So did you, were you able to successfully run startups
01:28:47 with self funded for quite a while?
01:28:48 Yeah, so my first two were self funded
01:28:50 and that was the right way to do it.
01:28:52 Is that scary?
01:28:54 No, VC startups are much more scary
01:28:57 because you have these people on your back
01:29:00 who do this all the time and who have done it for years
01:29:03 telling you grow, grow, grow, grow.
01:29:05 And they don’t care if you fail.
01:29:07 They only care if you don’t grow fast enough.
01:29:09 So that’s scary.
01:29:10 Whereas doing the ones myself, well, with partners
01:29:16 who were friends was nice
01:29:18 because like we just went along at a pace that made sense
01:29:22 and we were able to build it to something
01:29:23 which was big enough that we never had to work again
01:29:27 but was not big enough that any VC
01:29:29 would think it was impressive.
01:29:31 And that was enough for us to be excited, you know?
01:29:35 So I thought that’s a much better way
01:29:38 to do things than most people.
01:29:40 In generally speaking, not for yourself
01:29:41 but how do you make money during that process?
01:29:44 Do you cut into savings?
01:29:47 So yeah, so for, so I started Fast Mail
01:29:49 and Optimal Decisions at the same time in 1999
01:29:52 with two different friends.
01:29:54 And for Fast Mail, I guess I spent $70 a month
01:30:01 on the server.
01:30:04 And when the server ran out of space
01:30:06 I put a payments button on the front page
01:30:09 and said, if you want more than 10 mega space
01:30:11 you have to pay $10 a year.
01:30:15 And.
01:30:16 So run low, like keep your costs down.
01:30:18 Yeah, so I kept my costs down.
01:30:19 And once, you know, once I needed to spend more money
01:30:22 I asked people to spend the money for me.
01:30:25 And that, that was that.
01:30:28 Basically from then on, we were making money
01:30:30 and I was profitable from then.
01:30:35 For Optimal Decisions, it was a bit harder
01:30:37 because we were trying to sell something
01:30:40 that was more like a $1 million sale.
01:30:42 But what we did was we would sell scoping projects.
01:30:46 So kind of like prototypy projects
01:30:50 but rather than doing it for free
01:30:51 we would sell them 50 to $100,000.
01:30:54 So again, we were covering our costs
01:30:56 and also making the client feel
01:30:58 like we were doing something valuable.
01:31:00 So in both cases, we were profitable from six months in.
01:31:06 Ah, nevertheless, it’s scary.
01:31:08 I mean, yeah, sure.
01:31:10 I mean, it’s, it’s scary before you jump in
01:31:13 and I just, I guess I was comparing it
01:31:15 to the scarediness of VC.
01:31:18 I felt like with VC stuff, it was more scary.
01:31:20 Kind of much more in somebody else’s hands,
01:31:24 will they fund you or not?
01:31:26 And what do they think of what you’re doing?
01:31:27 I also found it very difficult with VCs,
01:31:29 back startups to actually do the thing
01:31:32 which I thought was important for the company
01:31:34 rather than doing the thing
01:31:35 which I thought would make the VC happy.
01:31:38 And VCs always tell you not to do the thing
01:31:40 that makes them happy.
01:31:42 But then if you don’t do the thing that makes them happy
01:31:44 they get sad, so.
01:31:46 And do you think optimizing for the,
01:31:48 whatever they call it, the exit is a good thing
01:31:51 to optimize for?
01:31:53 I mean, it can be, but not at the VC level
01:31:54 because the VC exit needs to be, you know, a thousand X.
01:31:59 So where else the lifestyle exit,
01:32:03 if you can sell something for $10 million,
01:32:05 then you’ve made it, right?
01:32:06 So I don’t, it depends.
01:32:09 If you want to build something that’s gonna,
01:32:11 you’re kind of happy to do forever, then fine.
01:32:13 If you want to build something you want to sell
01:32:16 in three years time, that’s fine too.
01:32:18 I mean, they’re both perfectly good outcomes.
01:32:21 So you’re learning Swift now, in a way.
01:32:24 I mean, you’ve already.
01:32:25 I’m trying to.
01:32:26 And I read that you use, at least in some cases,
01:32:31 space repetition as a mechanism for learning new things.
01:32:34 I use Anki quite a lot myself.
01:32:36 Me too.
01:32:38 I actually never talk to anybody about it.
01:32:41 Don’t know how many people do it,
01:32:44 but it works incredibly well for me.
01:32:46 Can you talk to your experience?
01:32:47 Like how did you, what do you?
01:32:51 First of all, okay, let’s back it up.
01:32:53 What is space repetition?
01:32:55 So space repetition is an idea created
01:33:00 by a psychologist named Ebbinghaus.
01:33:04 I don’t know, must be a couple of hundred years ago
01:33:06 or something, 150 years ago.
01:33:08 He did something which sounds pretty damn tedious.
01:33:10 He wrote down random sequences of letters on cards
01:33:15 and tested how well he would remember
01:33:18 those random sequences a day later, a week later, whatever.
01:33:23 He discovered that there was this kind of a curve
01:33:26 where his probability of remembering one of them
01:33:28 would be dramatically smaller the next day
01:33:30 and then a little bit smaller the next day
01:33:31 and a little bit smaller the next day.
01:33:33 What he discovered is that if he revised those cards
01:33:36 after a day, the probabilities would decrease
01:33:41 at a smaller rate.
01:33:42 And then if you revise them again a week later,
01:33:44 they would decrease at a smaller rate again.
01:33:47 And so he basically figured out a roughly optimal equation
01:33:51 for when you should revise something you wanna remember.
01:33:56 So space repetition learning is using this simple algorithm,
01:34:00 just something like revise something after a day
01:34:03 and then three days and then a week and then three weeks
01:34:06 and so forth.
01:34:07 And so if you use a program like Anki, as you know,
01:34:10 it will just do that for you.
01:34:12 And it will say, did you remember this?
01:34:14 And if you say no, it will reschedule it back
01:34:17 to appear again like 10 times faster
01:34:20 than it otherwise would have.
01:34:23 It’s a kind of a way of being guaranteed to learn something
01:34:27 because by definition, if you’re not learning it,
01:34:30 it will be rescheduled to be revised more quickly.
01:34:33 Unfortunately though, it’s also like,
01:34:36 it doesn’t let you fool yourself.
01:34:37 If you’re not learning something,
01:34:40 you know like your revisions will just get more and more.
01:34:44 So you have to find ways to learn things productively
01:34:48 and effectively like treat your brain well.
01:34:50 So using like mnemonics and stories and context
01:34:54 and stuff like that.
01:34:57 So yeah, it’s a super great technique.
01:34:59 It’s like learning how to learn is something
01:35:01 which everybody should learn
01:35:03 before they actually learn anything.
01:35:05 But almost nobody does.
01:35:07 So what have you, so it certainly works well
01:35:10 for learning new languages for, I mean,
01:35:13 for learning like small projects almost.
01:35:16 But do you, you know, I started using it for,
01:35:19 I forget who wrote a blog post about this inspired me.
01:35:22 It might’ve been you, I’m not sure.
01:35:26 I started when I read papers,
01:35:28 I’ll concepts and ideas, I’ll put them.
01:35:31 Was it Michael Nielsen?
01:35:32 It was Michael Nielsen.
01:35:33 So Michael started doing this recently
01:35:36 and has been writing about it.
01:35:41 So the kind of today’s Ebbinghaus
01:35:43 is a guy called Peter Wozniak
01:35:45 who developed a system called SuperMemo.
01:35:47 And he’s been basically trying to become like
01:35:51 the world’s greatest Renaissance man
01:35:54 over the last few decades.
01:35:55 He’s basically lived his life
01:35:57 with space repetition learning for everything.
01:36:03 I, and sort of like,
01:36:05 Michael’s only very recently got into this,
01:36:07 but he started really getting excited
01:36:08 about doing it for a lot of different things.
01:36:11 For me personally, I actually don’t use it
01:36:14 for anything except Chinese.
01:36:16 And the reason for that is that
01:36:20 Chinese is specifically a thing I made a conscious decision
01:36:23 that I want to continue to remember,
01:36:27 even if I don’t get much of a chance to exercise it,
01:36:30 cause like I’m not often in China, so I don’t.
01:36:33 Or else something like programming languages or papers.
01:36:38 I have a very different approach,
01:36:39 which is I try not to learn anything from them,
01:36:43 but instead I try to identify the important concepts
01:36:47 and like actually ingest them.
01:36:48 So like really understand that concept deeply
01:36:53 and study it carefully.
01:36:54 I will decide if it really is important,
01:36:56 if it is like incorporated into our library,
01:37:01 you know, incorporated into how I do things
01:37:04 or decide it’s not worth it, say.
01:37:07 So I find, I find I then remember the things
01:37:12 that I care about because I’m using it all the time.
01:37:15 So I’ve, for the last 25 years,
01:37:20 I’ve committed to spending at least half of every day
01:37:23 learning or practicing something new,
01:37:25 which is all my colleagues have always hated
01:37:28 because it always looks like I’m not working on
01:37:31 what I’m meant to be working on,
01:37:32 but it always means I do everything faster
01:37:34 because I’ve been practicing a lot of stuff.
01:37:36 So I kind of give myself a lot of opportunity
01:37:39 to practice new things.
01:37:41 And so I find now I don’t,
01:37:43 yeah, I don’t often kind of find myself
01:37:47 wishing I could remember something
01:37:50 because if it’s something that’s useful,
01:37:51 then I’ve been using it a lot.
01:37:53 It’s easy enough to look it up on Google,
01:37:56 but speaking Chinese, you can’t look it up on Google.
01:37:59 Do you have advice for people learning new things?
01:38:01 So if you, what have you learned as a process as a,
01:38:04 I mean, it all starts with just making the hours
01:38:07 and the day available.
01:38:08 Yeah, you got to stick with it,
01:38:10 which is again, the number one thing
01:38:12 that 99% of people don’t do.
01:38:13 So the people I started learning Chinese with,
01:38:15 none of them were still doing it 12 months later.
01:38:18 I’m still doing it 10 years later.
01:38:20 I tried to stay in touch with them,
01:38:21 but they just, no one did it.
01:38:24 For something like Chinese,
01:38:26 like study how human learning works.
01:38:28 So every one of my Chinese flashcards
01:38:31 is associated with a story.
01:38:33 And that story is specifically designed to be memorable.
01:38:36 And we find things memorable,
01:38:37 which are like funny or disgusting or sexy
01:38:41 or related to people that we know or care about.
01:38:44 So I try to make sure all of the stories
01:38:46 that are in my head have those characteristics.
01:38:51 Yeah, so you have to, you know,
01:38:52 you won’t remember things well
01:38:53 if they don’t have some context.
01:38:56 And yeah, you won’t remember them well
01:38:57 if you don’t regularly practice them,
01:39:00 whether it be just part of your day to day life
01:39:02 or the Chinese and me flashcards.
01:39:06 I mean, the other thing is,
01:39:07 I’ll let yourself fail sometimes.
01:39:09 So like I’ve had various medical problems
01:39:11 over the last few years.
01:39:13 And basically my flashcards
01:39:16 just stopped for about three years.
01:39:18 And there’ve been other times I’ve stopped for a few months
01:39:22 and it’s so hard because you get back to it
01:39:24 and it’s like, you have 18,000 cards due.
01:39:27 It’s like, and so you just have to go, all right,
01:39:30 well, I can either stop and give up everything
01:39:34 or just decide to do this every day for the next two years
01:39:37 until I get back to it.
01:39:39 The amazing thing has been that even after three years,
01:39:41 I, you know, the Chinese were still in there.
01:39:45 Like it was so much faster to relearn
01:39:48 than it was to learn the first time.
01:39:50 Yeah, absolutely.
01:39:52 It’s in there.
01:39:53 I have the same with guitar, with music and so on.
01:39:56 It’s sad because the work sometimes takes away
01:39:59 and then you won’t play for a year.
01:40:01 But really, if you then just get back to it every day,
01:40:03 you’re right there again.
01:40:06 What do you think is the next big breakthrough
01:40:08 in artificial intelligence?
01:40:09 What are your hopes in deep learning or beyond
01:40:12 that people should be working on
01:40:14 or you hope there’ll be breakthroughs?
01:40:16 I don’t think it’s possible to predict.
01:40:17 I think what we already have
01:40:20 is an incredibly powerful platform
01:40:23 to solve lots of societally important problems
01:40:26 that are currently unsolved.
01:40:27 So I just hope that people will,
01:40:29 lots of people will learn this toolkit and try to use it.
01:40:33 I don’t think we need a lot of new technological breakthroughs
01:40:36 to do a lot of great work right now.
01:40:39 And when do you think we’re going to create
01:40:42 a human level intelligence system?
01:40:45 Do you think?
01:40:46 Don’t know.
01:40:46 How hard is it?
01:40:47 How far away are we?
01:40:48 Don’t know.
01:40:49 Don’t know.
01:40:50 I have no way to know.
01:40:51 I don’t know why people make predictions about this
01:40:53 because there’s no data and nothing to go on.
01:40:57 And it’s just like,
01:41:00 there’s so many societally important problems
01:41:03 to solve right now.
01:41:04 I just don’t find it a really interesting question
01:41:08 to even answer.
01:41:10 So in terms of societally important problems,
01:41:12 what’s the problem that is within reach?
01:41:16 Well, I mean, for example,
01:41:17 there are problems that AI creates, right?
01:41:19 So more specifically,
01:41:23 labor force displacement is going to be huge
01:41:26 and people keep making this
01:41:29 frivolous econometric argument of being like,
01:41:31 oh, there’s been other things that aren’t AI
01:41:33 that have come along before
01:41:34 and haven’t created massive labor force displacement,
01:41:37 therefore AI won’t.
01:41:39 So that’s a serious concern for you?
01:41:41 Oh yeah.
01:41:42 Andrew Yang is running on it.
01:41:43 Yeah, it’s, I’m desperately concerned.
01:41:47 And you see already that the changing workplace
01:41:53 has led to a hollowing out of the middle class.
01:41:55 You’re seeing that students coming out of school today
01:41:59 have a less rosy financial future ahead of them
01:42:03 than their parents did,
01:42:03 which has never happened in recent,
01:42:06 in the last few hundred years.
01:42:08 You know, we’ve always had progress before.
01:42:11 And you see this turning into anxiety
01:42:15 and despair and even violence.
01:42:19 So I very much worry about that.
01:42:23 You’ve written quite a bit about ethics too.
01:42:25 I do think that every data scientist
01:42:29 working with deep learning needs to recognize
01:42:33 they have an incredibly high leverage tool
01:42:35 that they’re using that can influence society
01:42:37 in lots of ways.
01:42:39 And if they’re doing research,
01:42:40 that that research is gonna be used by people
01:42:42 doing this kind of work.
01:42:44 And they have a responsibility to consider the consequences
01:42:48 and to think about things like
01:42:51 how will humans be in the loop here?
01:42:53 How do we avoid runaway feedback loops?
01:42:56 How do we ensure an appeals process for humans
01:42:59 that are impacted by my algorithm?
01:43:01 How do I ensure that the constraints of my algorithm
01:43:04 are adequately explained to the people
01:43:06 that end up using them?
01:43:09 There’s all kinds of human issues
01:43:11 which only data scientists are actually
01:43:15 in the right place to educate people are about,
01:43:17 but data scientists tend to think of themselves
01:43:20 as just engineers and that they don’t need
01:43:23 to be part of that process, which is wrong.
01:43:26 Well, you’re in the perfect position to educate them better,
01:43:30 to read literature, to read history, to learn from history.
01:43:35 Well, Jeremy, thank you so much for everything you do
01:43:39 for inspiring huge amount of people,
01:43:41 getting them into deep learning
01:43:42 and having the ripple effects,
01:43:45 the flap of a butterfly’s wings
01:43:47 that will probably change the world.
01:43:48 So thank you very much.
01:43:50 Thank you, thank you, thank you, thank you.