François Chollet: Keras, Deep Learning, and the Progress of AI #38

Transcript

00:00:00 The following is a conversation with Francois Chollet.

00:00:03 He’s the creator of Keras,

00:00:05 which is an open source deep learning library

00:00:08 that is designed to enable fast, user friendly experimentation

00:00:11 with deep neural networks.

00:00:13 It serves as an interface to several deep learning libraries,

00:00:16 most popular of which is TensorFlow,

00:00:19 and it was integrated into the TensorFlow main code base

00:00:22 a while ago.

00:00:24 Meaning, if you want to create, train,

00:00:27 and use neural networks,

00:00:28 probably the easiest and most popular option

00:00:31 is to use Keras inside TensorFlow.

00:00:34 Aside from creating an exceptionally useful

00:00:37 and popular library,

00:00:38 Francois is also a world class AI researcher

00:00:41 and software engineer at Google.

00:00:44 And he’s definitely an outspoken,

00:00:46 if not controversial personality in the AI world,

00:00:50 especially in the realm of ideas

00:00:52 around the future of artificial intelligence.

00:00:55 This is the Artificial Intelligence Podcast.

00:00:58 If you enjoy it, subscribe on YouTube,

00:01:01 give it five stars on iTunes,

00:01:02 support it on Patreon,

00:01:04 or simply connect with me on Twitter

00:01:06 at Lex Friedman, spelled F R I D M A N.

00:01:09 And now, here’s my conversation with Francois Chollet.

00:01:14 You’re known for not sugarcoating your opinions

00:01:17 and speaking your mind about ideas in AI,

00:01:19 especially on Twitter.

00:01:21 It’s one of my favorite Twitter accounts.

00:01:22 So what’s one of the more controversial ideas

00:01:26 you’ve expressed online and gotten some heat for?

00:01:30 How do you pick?

00:01:33 How do I pick?

00:01:33 Yeah, no, I think if you go through the trouble

00:01:36 of maintaining a Twitter account,

00:01:39 you might as well speak your mind, you know?

00:01:41 Otherwise, what’s even the point of having a Twitter account?

00:01:44 It’s like having a nice car

00:01:45 and just leaving it in the garage.

00:01:48 Yeah, so what’s one thing for which I got

00:01:50 a lot of pushback?

00:01:53 Perhaps, you know, that time I wrote something

00:01:56 about the idea of intelligence explosion,

00:02:00 and I was questioning the idea

00:02:04 and the reasoning behind this idea.

00:02:06 And I got a lot of pushback on that.

00:02:09 I got a lot of flak for it.

00:02:11 So yeah, so intelligence explosion,

00:02:13 I’m sure you’re familiar with the idea,

00:02:14 but it’s the idea that if you were to build

00:02:18 general AI problem solving algorithms,

00:02:22 well, the problem of building such an AI,

00:02:27 that itself is a problem that could be solved by your AI,

00:02:30 and maybe it could be solved better

00:02:31 than what humans can do.

00:02:33 So your AI could start tweaking its own algorithm,

00:02:36 could start making a better version of itself,

00:02:39 and so on iteratively in a recursive fashion.

00:02:43 And so you would end up with an AI

00:02:47 with exponentially increasing intelligence.

00:02:50 That’s right.

00:02:50 And I was basically questioning this idea,

00:02:55 first of all, because the notion of intelligence explosion

00:02:59 uses an implicit definition of intelligence

00:03:02 that doesn’t sound quite right to me.

00:03:05 It considers intelligence as a property of a brain

00:03:11 that you can consider in isolation,

00:03:13 like the height of a building, for instance.

00:03:16 But that’s not really what intelligence is.

00:03:19 Intelligence emerges from the interaction

00:03:22 between a brain, a body,

00:03:25 like embodied intelligence, and an environment.

00:03:28 And if you’re missing one of these pieces,

00:03:30 then you cannot really define intelligence anymore.

00:03:33 So just tweaking a brain to make it smaller and smaller

00:03:36 doesn’t actually make any sense to me.

00:03:39 So first of all,

00:03:39 you’re crushing the dreams of many people, right?

00:03:43 So there’s a, let’s look at like Sam Harris.

00:03:46 Actually, a lot of physicists, Max Tegmark,

00:03:48 people who think the universe

00:03:52 is an information processing system,

00:03:54 our brain is kind of an information processing system.

00:03:57 So what’s the theoretical limit?

00:03:59 Like, it doesn’t make sense that there should be some,

00:04:04 it seems naive to think that our own brain

00:04:07 is somehow the limit of the capabilities

00:04:10 of this information system.

00:04:11 I’m playing devil’s advocate here.

00:04:13 This information processing system.

00:04:15 And then if you just scale it,

00:04:17 if you’re able to build something

00:04:19 that’s on par with the brain,

00:04:20 you just, the process that builds it just continues

00:04:24 and it’ll improve exponentially.

00:04:26 So that’s the logic that’s used actually

00:04:30 by almost everybody

00:04:32 that is worried about super human intelligence.

00:04:36 So you’re trying to make,

00:04:39 so most people who are skeptical of that

00:04:40 are kind of like, this doesn’t,

00:04:43 their thought process, this doesn’t feel right.

00:04:46 Like that’s for me as well.

00:04:47 So I’m more like, it doesn’t,

00:04:51 the whole thing is shrouded in mystery

00:04:52 where you can’t really say anything concrete,

00:04:55 but you could say this doesn’t feel right.

00:04:57 This doesn’t feel like that’s how the brain works.

00:05:00 And you’re trying to with your blog posts

00:05:02 and now making it a little more explicit.

00:05:05 So one idea is that the brain isn’t exist alone.

00:05:10 It exists within the environment.

00:05:13 So you can’t exponentially,

00:05:15 you would have to somehow exponentially improve

00:05:18 the environment and the brain together almost.

00:05:20 Yeah, in order to create something that’s much smarter

00:05:25 in some kind of,

00:05:27 of course we don’t have a definition of intelligence.

00:05:29 That’s correct, that’s correct.

00:05:31 I don’t think, you should look at very smart people today,

00:05:34 even humans, not even talking about AIs.

00:05:37 I don’t think their brain

00:05:38 and the performance of their brain is the bottleneck

00:05:41 to their expressed intelligence, to their achievements.

00:05:46 You cannot just tweak one part of this system,

00:05:49 like of this brain, body, environment system

00:05:52 and expect that capabilities like what emerges

00:05:55 out of this system to just explode exponentially.

00:06:00 Because anytime you improve one part of a system

00:06:04 with many interdependencies like this,

00:06:06 there’s a new bottleneck that arises, right?

00:06:09 And I don’t think even today for very smart people,

00:06:12 their brain is not the bottleneck

00:06:15 to the sort of problems they can solve, right?

00:06:17 In fact, many very smart people today,

00:06:20 you know, they are not actually solving

00:06:22 any big scientific problems, they’re not Einstein.

00:06:24 They’re like Einstein, but you know, the patent clerk days.

00:06:29 Like Einstein became Einstein

00:06:31 because this was a meeting of a genius

00:06:36 with a big problem at the right time, right?

00:06:39 But maybe this meeting could have never happened

00:06:42 and then Einstein would have just been a patent clerk, right?

00:06:44 And in fact, many people today are probably like

00:06:49 genius level smart, but you wouldn’t know

00:06:52 because they’re not really expressing any of that.

00:06:54 Wow, that’s brilliant.

00:06:55 So we can think of the world, Earth,

00:06:58 but also the universe as just as a space of problems.

00:07:02 So all these problems and tasks are roaming it

00:07:05 of various difficulty.

00:07:06 And there’s agents, creatures like ourselves

00:07:10 and animals and so on that are also roaming it.

00:07:13 And then you get coupled with a problem

00:07:16 and then you solve it.

00:07:17 But without that coupling,

00:07:19 you can’t demonstrate your quote unquote intelligence.

00:07:22 Exactly, intelligence is the meeting

00:07:24 of great problem solving capabilities

00:07:27 with a great problem.

00:07:28 And if you don’t have the problem,

00:07:30 you don’t really express any intelligence.

00:07:32 All you’re left with is potential intelligence,

00:07:34 like the performance of your brain

00:07:36 or how high your IQ is,

00:07:38 which in itself is just a number, right?

00:07:42 So you mentioned problem solving capacity.

00:07:46 Yeah.

00:07:47 What do you think of as problem solving capacity?

00:07:51 Can you try to define intelligence?

00:07:56 Like what does it mean to be more or less intelligent?

00:08:00 Is it completely coupled to a particular problem

00:08:03 or is there something a little bit more universal?

00:08:05 Yeah, I do believe all intelligence

00:08:07 is specialized intelligence.

00:08:09 Even human intelligence has some degree of generality.

00:08:12 Well, all intelligent systems have some degree of generality

00:08:15 but they’re always specialized in one category of problems.

00:08:19 So the human intelligence is specialized

00:08:21 in the human experience.

00:08:23 And that shows at various levels,

00:08:25 that shows in some prior knowledge that’s innate

00:08:30 that we have at birth.

00:08:32 Knowledge about things like agents,

00:08:35 goal driven behavior, visual priors

00:08:38 about what makes an object, priors about time and so on.

00:08:43 That shows also in the way we learn.

00:08:45 For instance, it’s very, very easy for us

00:08:47 to pick up language.

00:08:49 It’s very, very easy for us to learn certain things

00:08:52 because we are basically hard coded to learn them.

00:08:54 And we are specialized in solving certain kinds of problem

00:08:58 and we are quite useless

00:08:59 when it comes to other kinds of problems.

00:09:01 For instance, we are not really designed

00:09:06 to handle very long term problems.

00:09:08 We have no capability of seeing the very long term.

00:09:12 We don’t have very much working memory.

00:09:18 So how do you think about long term?

00:09:20 Do you think long term planning,

00:09:21 are we talking about scale of years, millennia?

00:09:24 What do you mean by long term?

00:09:26 We’re not very good.

00:09:28 Well, human intelligence is specialized

00:09:29 in the human experience.

00:09:30 And human experience is very short.

00:09:32 One lifetime is short.

00:09:34 Even within one lifetime,

00:09:35 we have a very hard time envisioning things

00:09:40 on a scale of years.

00:09:41 It’s very difficult to project yourself

00:09:43 at a scale of five years, at a scale of 10 years and so on.

00:09:46 We can solve only fairly narrowly scoped problems.

00:09:50 So when it comes to solving bigger problems,

00:09:52 larger scale problems,

00:09:53 we are not actually doing it on an individual level.

00:09:56 So it’s not actually our brain doing it.

00:09:59 We have this thing called civilization, right?

00:10:03 Which is itself a sort of problem solving system,

00:10:06 a sort of artificially intelligent system, right?

00:10:10 And it’s not running on one brain,

00:10:12 it’s running on a network of brains.

00:10:14 In fact, it’s running on much more

00:10:15 than a network of brains.

00:10:16 It’s running on a lot of infrastructure,

00:10:20 like books and computers and the internet

00:10:23 and human institutions and so on.

00:10:25 And that is capable of handling problems

00:10:30 on a much greater scale than any individual human.

00:10:33 If you look at computer science, for instance,

00:10:37 that’s an institution that solves problems

00:10:39 and it is superhuman, right?

00:10:42 It operates on a greater scale.

00:10:44 It can solve much bigger problems

00:10:46 than an individual human could.

00:10:49 And science itself, science as a system, as an institution,

00:10:52 is a kind of artificially intelligent problem solving

00:10:57 algorithm that is superhuman.

00:10:59 Yeah, it’s, at least computer science

00:11:02 is like a theorem prover at a scale of thousands,

00:11:07 maybe hundreds of thousands of human beings.

00:11:10 At that scale, what do you think is an intelligent agent?

00:11:14 So there’s us humans at the individual level,

00:11:18 there is millions, maybe billions of bacteria in our skin.

00:11:23 There is, that’s at the smaller scale.

00:11:26 You can even go to the particle level

00:11:29 as systems that behave,

00:11:31 you can say intelligently in some ways.

00:11:35 And then you can look at the earth as a single organism,

00:11:37 you can look at our galaxy

00:11:39 and even the universe as a single organism.

00:11:42 Do you think, how do you think about scale

00:11:44 in defining intelligent systems?

00:11:46 And we’re here at Google, there is millions of devices

00:11:50 doing computation just in a distributed way.

00:11:53 How do you think about intelligence versus scale?

00:11:55 You can always characterize anything as a system.

00:12:00 I think people who talk about things

00:12:03 like intelligence explosion,

00:12:05 tend to focus on one agent is basically one brain,

00:12:08 like one brain considered in isolation,

00:12:10 like a brain, a jaw that’s controlling a body

00:12:13 in a very like top to bottom kind of fashion.

00:12:16 And that body is pursuing goals into an environment.

00:12:19 So it’s a very hierarchical view.

00:12:20 You have the brain at the top of the pyramid,

00:12:22 then you have the body just plainly receiving orders.

00:12:25 And then the body is manipulating objects

00:12:27 in the environment and so on.

00:12:28 So everything is subordinate to this one thing,

00:12:32 this epicenter, which is the brain.

00:12:34 But in real life, intelligent agents

00:12:37 don’t really work like this, right?

00:12:39 There is no strong delimitation

00:12:40 between the brain and the body to start with.

00:12:43 You have to look not just at the brain,

00:12:45 but at the nervous system.

00:12:46 But then the nervous system and the body

00:12:48 are naturally two separate entities.

00:12:50 So you have to look at an entire animal as one agent.

00:12:53 But then you start realizing as you observe an animal

00:12:57 over any length of time,

00:13:00 that a lot of the intelligence of an animal

00:13:03 is actually externalized.

00:13:04 That’s especially true for humans.

00:13:06 A lot of our intelligence is externalized.

00:13:08 When you write down some notes,

00:13:10 that is externalized intelligence.

00:13:11 When you write a computer program,

00:13:14 you are externalizing cognition.

00:13:16 So it’s externalizing books, it’s externalized in computers,

00:13:19 the internet, in other humans.

00:13:23 It’s externalizing language and so on.

00:13:25 So there is no hard delimitation

00:13:30 of what makes an intelligent agent.

00:13:32 It’s all about context.

00:13:34 Okay, but AlphaGo is better at Go

00:13:38 than the best human player.

00:13:42 There’s levels of skill here.

00:13:45 So do you think there’s such a ability,

00:13:48 such a concept as intelligence explosion

00:13:52 in a specific task?

00:13:54 And then, well, yeah.

00:13:57 Do you think it’s possible to have a category of tasks

00:14:00 on which you do have something

00:14:02 like an exponential growth of ability

00:14:05 to solve that particular problem?

00:14:07 I think if you consider a specific vertical,

00:14:10 it’s probably possible to some extent.

00:14:15 I also don’t think we have to speculate about it

00:14:18 because we have real world examples

00:14:22 of recursively self improving intelligent systems, right?

00:14:26 So for instance, science is a problem solving system,

00:14:30 a knowledge generation system,

00:14:32 like a system that experiences the world in some sense

00:14:36 and then gradually understands it and can act on it.

00:14:40 And that system is superhuman

00:14:42 and it is clearly recursively self improving

00:14:45 because science feeds into technology.

00:14:47 Technology can be used to build better tools,

00:14:50 better computers, better instrumentation and so on,

00:14:52 which in turn can make science faster, right?

00:14:56 So science is probably the closest thing we have today

00:15:00 to a recursively self improving superhuman AI.

00:15:04 And you can just observe is science,

00:15:08 is scientific progress to the exploding,

00:15:10 which itself is an interesting question.

00:15:12 You can use that as a basis to try to understand

00:15:15 what will happen with a superhuman AI

00:15:17 that has a science like behavior.

00:15:21 Let me linger on it a little bit more.

00:15:23 What is your intuition why an intelligence explosion

00:15:27 is not possible?

00:15:28 Like taking the scientific,

00:15:30 all the semi scientific revolutions,

00:15:33 why can’t we slightly accelerate that process?

00:15:38 So you can absolutely accelerate

00:15:41 any problem solving process.

00:15:43 So a recursively self improvement

00:15:46 is absolutely a real thing.

00:15:48 But what happens with a recursively self improving system

00:15:51 is typically not explosion

00:15:53 because no system exists in isolation.

00:15:56 And so tweaking one part of the system

00:15:58 means that suddenly another part of the system

00:16:00 becomes a bottleneck.

00:16:02 And if you look at science, for instance,

00:16:03 which is clearly a recursively self improving,

00:16:06 clearly a problem solving system,

00:16:09 scientific progress is not actually exploding.

00:16:12 If you look at science,

00:16:13 what you see is the picture of a system

00:16:16 that is consuming an exponentially increasing

00:16:19 amount of resources,

00:16:20 but it’s having a linear output

00:16:23 in terms of scientific progress.

00:16:26 And maybe that will seem like a very strong claim.

00:16:28 Many people are actually saying that,

00:16:31 scientific progress is exponential,

00:16:34 but when they’re claiming this,

00:16:36 they’re actually looking at indicators

00:16:38 of resource consumption by science.

00:16:43 For instance, the number of papers being published,

00:16:47 the number of patents being filed and so on,

00:16:49 which are just completely correlated

00:16:53 with how many people are working on science today.

00:16:58 So it’s actually an indicator of resource consumption,

00:17:00 but what you should look at is the output,

00:17:03 is progress in terms of the knowledge

00:17:06 that science generates,

00:17:08 in terms of the scope and significance

00:17:10 of the problems that we solve.

00:17:12 And some people have actually been trying to measure that.

00:17:16 Like Michael Nielsen, for instance,

00:17:20 he had a very nice paper,

00:17:21 I think that was last year about it.

00:17:25 So his approach to measure scientific progress

00:17:28 was to look at the timeline of scientific discoveries

00:17:33 over the past, you know, 100, 150 years.

00:17:37 And for each major discovery,

00:17:41 ask a panel of experts to rate

00:17:44 the significance of the discovery.

00:17:46 And if the output of science as an institution

00:17:49 were exponential,

00:17:50 you would expect the temporal density of significance

00:17:56 to go up exponentially.

00:17:58 Maybe because there’s a faster rate of discoveries,

00:18:00 maybe because the discoveries are, you know,

00:18:02 increasingly more important.

00:18:04 And what actually happens

00:18:06 if you plot this temporal density of significance

00:18:10 measured in this way,

00:18:11 is that you see very much a flat graph.

00:18:14 You see a flat graph across all disciplines,

00:18:16 across physics, biology, medicine, and so on.

00:18:19 And it actually makes a lot of sense

00:18:22 if you think about it,

00:18:23 because think about the progress of physics

00:18:26 110 years ago, right?

00:18:28 It was a time of crazy change.

00:18:30 Think about the progress of technology,

00:18:31 you know, 170 years ago,

00:18:34 when we started having, you know,

00:18:35 replacing horses with cars,

00:18:37 when we started having electricity and so on.

00:18:40 It was a time of incredible change.

00:18:41 And today is also a time of very, very fast change,

00:18:44 but it would be an unfair characterization

00:18:48 to say that today technology and science

00:18:50 are moving way faster than they did 50 years ago

00:18:52 or 100 years ago.

00:18:54 And if you do try to rigorously plot

00:18:59 the temporal density of the significance,

00:19:04 yeah, of significance, sorry,

00:19:07 you do see very flat curves.

00:19:09 And you can check out the paper

00:19:12 that Michael Nielsen had about this idea.

00:19:16 And so the way I interpret it is,

00:19:20 as you make progress in a given field,

00:19:24 or in a given subfield of science,

00:19:26 it becomes exponentially more difficult

00:19:28 to make further progress.

00:19:30 Like the very first person to work on information theory.

00:19:35 If you enter a new field,

00:19:36 and it’s still the very early years,

00:19:37 there’s a lot of low hanging fruit you can pick.

00:19:41 That’s right, yeah.

00:19:42 But the next generation of researchers

00:19:43 is gonna have to dig much harder, actually,

00:19:48 to make smaller discoveries,

00:19:50 probably larger number of smaller discoveries,

00:19:52 and to achieve the same amount of impact,

00:19:54 you’re gonna need a much greater head count.

00:19:57 And that’s exactly the picture you’re seeing with science,

00:20:00 that the number of scientists and engineers

00:20:03 is in fact increasing exponentially.

00:20:06 The amount of computational resources

00:20:08 that are available to science

00:20:10 is increasing exponentially and so on.

00:20:11 So the resource consumption of science is exponential,

00:20:15 but the output in terms of progress,

00:20:18 in terms of significance, is linear.

00:20:21 And the reason why is because,

00:20:23 and even though science is regressively self improving,

00:20:26 meaning that scientific progress

00:20:28 turns into technological progress,

00:20:30 which in turn helps science.

00:20:32 If you look at computers, for instance,

00:20:35 our products of science and computers

00:20:38 are tremendously useful in speeding up science.

00:20:41 The internet, same thing, the internet is a technology

00:20:43 that’s made possible by very recent scientific advances.

00:20:47 And itself, because it enables scientists to network,

00:20:52 to communicate, to exchange papers and ideas much faster,

00:20:55 it is a way to speed up scientific progress.

00:20:57 So even though you’re looking

00:20:58 at a regressively self improving system,

00:21:01 it is consuming exponentially more resources

00:21:04 to produce the same amount of problem solving, very much.

00:21:09 So that’s a fascinating way to paint it,

00:21:11 and certainly that holds for the deep learning community.

00:21:14 If you look at the temporal, what did you call it,

00:21:18 the temporal density of significant ideas,

00:21:21 if you look at in deep learning,

00:21:24 I think, I’d have to think about that,

00:21:26 but if you really look at significant ideas

00:21:29 in deep learning, they might even be decreasing.

00:21:32 So I do believe the per paper significance is decreasing,

00:21:39 but the amount of papers

00:21:41 is still today exponentially increasing.

00:21:43 So I think if you look at an aggregate,

00:21:45 my guess is that you would see a linear progress.

00:21:48 If you were to sum the significance of all papers,

00:21:56 you would see roughly in your progress.

00:21:58 And in my opinion, it is not a coincidence

00:22:03 that you’re seeing linear progress in science

00:22:05 despite exponential resource consumption.

00:22:07 I think the resource consumption

00:22:10 is dynamically adjusting itself to maintain linear progress

00:22:15 because we as a community expect linear progress,

00:22:18 meaning that if we start investing less

00:22:21 and seeing less progress, it means that suddenly

00:22:23 there are some lower hanging fruits that become available

00:22:26 and someone’s gonna step up and pick them, right?

00:22:31 So it’s very much like a market for discoveries and ideas.

00:22:36 But there’s another fundamental part

00:22:38 which you’re highlighting, which as a hypothesis

00:22:41 as science or like the space of ideas,

00:22:45 any one path you travel down,

00:22:48 it gets exponentially more difficult

00:22:51 to get a new way to develop new ideas.

00:22:54 And your sense is that’s gonna hold

00:22:57 across our mysterious universe.

00:23:01 Yes, well, exponential progress

00:23:03 triggers exponential friction.

00:23:05 So that if you tweak one part of the system,

00:23:07 suddenly some other part becomes a bottleneck, right?

00:23:10 For instance, let’s say you develop some device

00:23:14 that measures its own acceleration

00:23:17 and then it has some engine

00:23:18 and it outputs even more acceleration

00:23:20 in proportion of its own acceleration

00:23:22 and you drop it somewhere,

00:23:23 it’s not gonna reach infinite speed

00:23:25 because it exists in a certain context.

00:23:29 So the air around it is gonna generate friction

00:23:31 and it’s gonna block it at some top speed.

00:23:34 And even if you were to consider the broader context

00:23:37 and lift the bottleneck there,

00:23:39 like the bottleneck of friction,

00:23:43 then some other part of the system

00:23:45 would start stepping in and creating exponential friction,

00:23:48 maybe the speed of flight or whatever.

00:23:49 And this definitely holds true

00:23:51 when you look at the problem solving algorithm

00:23:54 that is being run by science as an institution,

00:23:58 science as a system.

00:23:59 As you make more and more progress,

00:24:01 despite having this recursive self improvement component,

00:24:06 you are encountering exponential friction.

00:24:09 The more researchers you have working on different ideas,

00:24:13 the more overhead you have

00:24:14 in terms of communication across researchers.

00:24:18 If you look at, you were mentioning quantum mechanics, right?

00:24:22 Well, if you want to start making significant discoveries

00:24:26 today, significant progress in quantum mechanics,

00:24:29 there is an amount of knowledge you have to ingest,

00:24:33 which is huge.

00:24:34 So there’s a very large overhead

00:24:36 to even start to contribute.

00:24:39 There’s a large amount of overhead

00:24:40 to synchronize across researchers and so on.

00:24:44 And of course, the significant practical experiments

00:24:48 are going to require exponentially expensive equipment

00:24:52 because the easier ones have already been run, right?

00:24:56 So in your senses, there’s no way escaping,

00:25:00 there’s no way of escaping this kind of friction

00:25:04 with artificial intelligence systems.

00:25:08 Yeah, no, I think science is a very good way

00:25:11 to model what would happen with a superhuman

00:25:14 recursive research improving AI.

00:25:16 That’s your sense, I mean, the…

00:25:18 That’s my intuition.

00:25:19 It’s not like a mathematical proof of anything.

00:25:23 That’s not my point.

00:25:24 Like, I’m not trying to prove anything.

00:25:26 I’m just trying to make an argument

00:25:27 to question the narrative of intelligence explosion,

00:25:31 which is quite a dominant narrative.

00:25:32 And you do get a lot of pushback if you go against it.

00:25:35 Because, so for many people, right,

00:25:39 AI is not just a subfield of computer science.

00:25:42 It’s more like a belief system.

00:25:44 Like this belief that the world is headed towards an event,

00:25:48 the singularity, past which, you know, AI will become…

00:25:55 will go exponential very much,

00:25:57 and the world will be transformed,

00:25:58 and humans will become obsolete.

00:26:00 And if you go against this narrative,

00:26:03 because it is not really a scientific argument,

00:26:06 but more of a belief system,

00:26:08 it is part of the identity of many people.

00:26:11 If you go against this narrative,

00:26:12 it’s like you’re attacking the identity

00:26:14 of people who believe in it.

00:26:15 It’s almost like saying God doesn’t exist,

00:26:17 or something.

00:26:19 So you do get a lot of pushback

00:26:21 if you try to question these ideas.

00:26:24 First of all, I believe most people,

00:26:26 they might not be as eloquent or explicit as you’re being,

00:26:29 but most people in computer science

00:26:30 are most people who actually have built

00:26:33 anything that you could call AI, quote, unquote,

00:26:36 would agree with you.

00:26:38 They might not be describing in the same kind of way.

00:26:40 It’s more, so the pushback you’re getting

00:26:43 is from people who get attached to the narrative

00:26:48 from, not from a place of science,

00:26:51 but from a place of imagination.

00:26:53 That’s correct, that’s correct.

00:26:54 So why do you think that’s so appealing?

00:26:56 Because the usual dreams that people have

00:27:02 when you create a superintelligence system

00:27:03 past the singularity,

00:27:05 that what people imagine is somehow always destructive.

00:27:09 Do you have, if you were put on your psychology hat,

00:27:12 what’s, why is it so appealing to imagine

00:27:17 the ways that all of human civilization will be destroyed?

00:27:20 I think it’s a good story.

00:27:22 You know, it’s a good story.

00:27:23 And very interestingly, it mirrors a religious stories,

00:27:28 right, religious mythology.

00:27:30 If you look at the mythology of most civilizations,

00:27:34 it’s about the world being headed towards some final events

00:27:38 in which the world will be destroyed

00:27:40 and some new world order will arise

00:27:42 that will be mostly spiritual,

00:27:44 like the apocalypse followed by a paradise probably, right?

00:27:49 It’s a very appealing story on a fundamental level.

00:27:52 And we all need stories.

00:27:54 We all need stories to structure the way we see the world,

00:27:58 especially at timescales

00:27:59 that are beyond our ability to make predictions, right?

00:28:04 So on a more serious non exponential explosion,

00:28:08 question, do you think there will be a time

00:28:15 when we’ll create something like human level intelligence

00:28:19 or intelligent systems that will make you sit back

00:28:23 and be just surprised at damn how smart this thing is?

00:28:28 That doesn’t require exponential growth

00:28:30 or an exponential improvement,

00:28:32 but what’s your sense of the timeline and so on

00:28:35 that you’ll be really surprised at certain capabilities?

00:28:41 And we’ll talk about limitations and deep learning.

00:28:42 So do you think in your lifetime,

00:28:44 you’ll be really damn surprised?

00:28:46 Around 2013, 2014, I was many times surprised

00:28:51 by the capabilities of deep learning actually.

00:28:53 That was before we had assessed exactly

00:28:55 what deep learning could do and could not do.

00:28:57 And it felt like a time of immense potential.

00:29:00 And then we started narrowing it down,

00:29:03 but I was very surprised.

00:29:04 I would say it has already happened.

00:29:07 Was there a moment, there must’ve been a day in there

00:29:10 where your surprise was almost bordering

00:29:14 on the belief of the narrative that we just discussed.

00:29:19 Was there a moment,

00:29:20 because you’ve written quite eloquently

00:29:22 about the limits of deep learning,

00:29:23 was there a moment that you thought

00:29:25 that maybe deep learning is limitless?

00:29:30 No, I don’t think I’ve ever believed this.

00:29:32 What was really shocking is that it worked.

00:29:35 It worked at all, yeah.

00:29:37 But there’s a big jump between being able

00:29:40 to do really good computer vision

00:29:43 and human level intelligence.

00:29:44 So I don’t think at any point I wasn’t under the impression

00:29:49 that the results we got in computer vision

00:29:51 meant that we were very close to human level intelligence.

00:29:54 I don’t think we’re very close to human level intelligence.

00:29:56 I do believe that there’s no reason

00:29:58 why we won’t achieve it at some point.

00:30:01 I also believe that it’s the problem

00:30:06 with talking about human level intelligence

00:30:08 that implicitly you’re considering

00:30:11 like an axis of intelligence with different levels,

00:30:14 but that’s not really how intelligence works.

00:30:16 Intelligence is very multi dimensional.

00:30:19 And so there’s the question of capabilities,

00:30:22 but there’s also the question of being human like,

00:30:25 and it’s two very different things.

00:30:27 Like you can build potentially

00:30:28 very advanced intelligent agents

00:30:30 that are not human like at all.

00:30:32 And you can also build very human like agents.

00:30:35 And these are two very different things, right?

00:30:37 Right.

00:30:38 Let’s go from the philosophical to the practical.

00:30:42 Can you give me a history of Keras

00:30:44 and all the major deep learning frameworks

00:30:46 that you kind of remember in relation to Keras

00:30:48 and in general, TensorFlow, Theano, the old days.

00:30:52 Can you give a brief overview Wikipedia style history

00:30:55 and your role in it before we return to AGI discussions?

00:30:59 Yeah, that’s a broad topic.

00:31:00 So I started working on Keras.

00:31:04 It was the name Keras at the time.

00:31:06 I actually picked the name like

00:31:08 just the day I was going to release it.

00:31:10 So I started working on it in February, 2015.

00:31:14 And so at the time there weren’t too many people

00:31:17 working on deep learning, maybe like fewer than 10,000.

00:31:20 The software tooling was not really developed.

00:31:25 So the main deep learning library was Cafe,

00:31:28 which was mostly C++.

00:31:30 Why do you say Cafe was the main one?

00:31:32 Cafe was vastly more popular than Theano

00:31:36 in late 2014, early 2015.

00:31:38 Cafe was the one library that everyone was using

00:31:42 for computer vision.

00:31:43 And computer vision was the most popular problem

00:31:46 in deep learning at the time.

00:31:46 Absolutely.

00:31:47 Like ConvNets was like the subfield of deep learning

00:31:50 that everyone was working on.

00:31:53 So myself, so in late 2014,

00:31:57 I was actually interested in RNNs,

00:32:00 in recurrent neural networks,

00:32:01 which was a very niche topic at the time, right?

00:32:05 It really took off around 2016.

00:32:08 And so I was looking for good tools.

00:32:11 I had used Torch 7, I had used Theano,

00:32:14 used Theano a lot in Kaggle competitions.

00:32:19 I had used Cafe.

00:32:20 And there was no like good solution for RNNs at the time.

00:32:25 Like there was no reusable open source implementation

00:32:28 of an LSTM, for instance.

00:32:30 So I decided to build my own.

00:32:32 And at first, the pitch for that was,

00:32:35 it was gonna be mostly around LSTM recurrent neural networks.

00:32:39 It was gonna be in Python.

00:32:42 An important decision at the time

00:32:44 that was kind of not obvious

00:32:45 is that the models would be defined via Python code,

00:32:50 which was kind of like going against the mainstream

00:32:54 at the time because Cafe, Pylon 2, and so on,

00:32:58 like all the big libraries were actually going

00:33:00 with the approach of setting configuration files

00:33:03 in YAML to define models.

00:33:05 So some libraries were using code to define models,

00:33:08 like Torch 7, obviously, but that was not Python.

00:33:12 Lasagne was like a Theano based very early library

00:33:16 that was, I think, developed, I don’t remember exactly,

00:33:18 probably late 2014.

00:33:20 It’s Python as well.

00:33:21 It’s Python as well.

00:33:22 It was like on top of Theano.

00:33:24 And so I started working on something

00:33:29 and the value proposition at the time was that

00:33:32 not only what I think was the first

00:33:36 reusable open source implementation of LSTM,

00:33:40 you could combine RNNs and covenants

00:33:44 with the same library,

00:33:45 which is not really possible before,

00:33:46 like Cafe was only doing covenants.

00:33:50 And it was kind of easy to use

00:33:52 because, so before I was using Theano,

00:33:54 I was actually using scikitlin

00:33:55 and I loved scikitlin for its usability.

00:33:58 So I drew a lot of inspiration from scikitlin

00:34:01 when I made Keras.

00:34:02 It’s almost like scikitlin for neural networks.

00:34:05 The fit function.

00:34:06 Exactly, the fit function,

00:34:07 like reducing a complex string loop

00:34:10 to a single function call, right?

00:34:12 And of course, some people will say,

00:34:14 this is hiding a lot of details,

00:34:16 but that’s exactly the point, right?

00:34:18 The magic is the point.

00:34:20 So it’s magical, but in a good way.

00:34:22 It’s magical in the sense that it’s delightful.

00:34:24 Yeah, yeah.

00:34:26 I’m actually quite surprised.

00:34:27 I didn’t know that it was born out of desire

00:34:29 to implement RNNs and LSTMs.

00:34:32 It was.

00:34:33 That’s fascinating.

00:34:34 So you were actually one of the first people

00:34:36 to really try to attempt

00:34:37 to get the major architectures together.

00:34:41 And it’s also interesting.

00:34:42 You made me realize that that was a design decision at all

00:34:45 is defining the model and code.

00:34:47 Just, I’m putting myself in your shoes,

00:34:49 whether the YAML, especially if cafe was the most popular.

00:34:53 It was the most popular by far.

00:34:54 If I was, if I were, yeah, I don’t,

00:34:58 I didn’t like the YAML thing,

00:34:59 but it makes more sense that you will put

00:35:02 in a configuration file, the definition of a model.

00:35:05 That’s an interesting gutsy move

00:35:07 to stick with defining it in code.

00:35:10 Just if you look back.

00:35:11 Other libraries were doing it as well,

00:35:13 but it was definitely the more niche option.

00:35:16 Yeah.

00:35:17 Okay, Keras and then.

00:35:18 So I released Keras in March, 2015,

00:35:21 and it got users pretty much from the start.

00:35:24 So the deep learning community was very, very small

00:35:25 at the time.

00:35:27 Lots of people were starting to be interested in LSTM.

00:35:30 So it was gonna release it at the right time

00:35:32 because it was offering an easy to use LSTM implementation.

00:35:35 Exactly at the time where lots of people started

00:35:37 to be intrigued by the capabilities of RNN, RNNs for NLP.

00:35:42 So it grew from there.

00:35:43 Then I joined Google about six months later,

00:35:51 and that was actually completely unrelated to Keras.

00:35:54 So I actually joined a research team

00:35:57 working on image classification,

00:35:59 mostly like computer vision.

00:36:00 So I was doing computer vision research

00:36:02 at Google initially.

00:36:03 And immediately when I joined Google,

00:36:05 I was exposed to the early internal version of TensorFlow.

00:36:10 And the way it appeared to me at the time,

00:36:13 and it was definitely the way it was at the time

00:36:15 is that this was an improved version of Theano.

00:36:20 So I immediately knew I had to port Keras

00:36:24 to this new TensorFlow thing.

00:36:26 And I was actually very busy as a noobler,

00:36:29 as a new Googler.

00:36:31 So I had not time to work on that.

00:36:34 But then in November, I think it was November, 2015,

00:36:38 TensorFlow got released.

00:36:41 And it was kind of like my wake up call

00:36:44 that, hey, I had to actually go and make it happen.

00:36:47 So in December, I ported Keras to run on top of TensorFlow,

00:36:52 but it was not exactly a port.

00:36:53 It was more like a refactoring

00:36:55 where I was abstracting away

00:36:57 all the backend functionality into one module

00:37:00 so that the same code base

00:37:02 could run on top of multiple backends.

00:37:05 So on top of TensorFlow or Theano.

00:37:07 And for the next year,

00:37:09 Theano stayed as the default option.

00:37:15 It was easier to use, somewhat less buggy.

00:37:20 It was much faster, especially when it came to audience.

00:37:23 But eventually, TensorFlow overtook it.

00:37:27 And TensorFlow, the early TensorFlow,

00:37:30 has similar architectural decisions as Theano, right?

00:37:33 So it was a natural transition.

00:37:37 Yeah, absolutely.

00:37:38 So what, I mean, that still Keras is a side,

00:37:42 almost fun project, right?

00:37:45 Yeah, so it was not my job assignment.

00:37:49 It was not.

00:37:50 I was doing it on the side.

00:37:52 And even though it grew to have a lot of users

00:37:55 for a deep learning library at the time, like Stroud 2016,

00:37:59 but I wasn’t doing it as my main job.

00:38:02 So things started changing in,

00:38:04 I think it must have been maybe October, 2016.

00:38:10 So one year later.

00:38:12 So Rajat, who was the lead on TensorFlow,

00:38:15 basically showed up one day in our building

00:38:19 where I was doing like,

00:38:20 so I was doing research and things like,

00:38:21 so I did a lot of computer vision research,

00:38:24 also collaborations with Christian Zighetti

00:38:27 and deep learning for theorem proving.

00:38:29 It was a really interesting research topic.

00:38:34 And so Rajat was saying,

00:38:37 hey, we saw Keras, we like it.

00:38:41 We saw that you’re at Google.

00:38:42 Why don’t you come over for like a quarter

00:38:45 and work with us?

00:38:47 And I was like, yeah, that sounds like a great opportunity.

00:38:49 Let’s do it.

00:38:50 And so I started working on integrating the Keras API

00:38:55 into TensorFlow more tightly.

00:38:57 So what followed up is a sort of like temporary

00:39:02 TensorFlow only version of Keras

00:39:05 that was in TensorFlow.com Trib for a while.

00:39:09 And finally moved to TensorFlow Core.

00:39:12 And I’ve never actually gotten back

00:39:15 to my old team doing research.

00:39:17 Well, it’s kind of funny that somebody like you

00:39:22 who dreams of, or at least sees the power of AI systems

00:39:28 that reason and theorem proving we’ll talk about

00:39:31 has also created a system that makes the most basic

00:39:36 kind of Lego building that is deep learning

00:39:40 super accessible, super easy.

00:39:42 So beautifully so.

00:39:43 It’s a funny irony that you’re both,

00:39:47 you’re responsible for both things,

00:39:49 but so TensorFlow 2.0 is kind of, there’s a sprint.

00:39:54 I don’t know how long it’ll take,

00:39:55 but there’s a sprint towards the finish.

00:39:56 What do you look, what are you working on these days?

00:40:01 What are you excited about?

00:40:02 What are you excited about in 2.0?

00:40:04 I mean, eager execution.

00:40:05 There’s so many things that just make it a lot easier

00:40:08 to work.

00:40:09 What are you excited about and what’s also really hard?

00:40:13 What are the problems you have to kind of solve?

00:40:15 So I’ve spent the past year and a half working on

00:40:19 TensorFlow 2.0 and it’s been a long journey.

00:40:22 I’m actually extremely excited about it.

00:40:25 I think it’s a great product.

00:40:26 It’s a delightful product compared to TensorFlow 1.0.

00:40:29 We’ve made huge progress.

00:40:32 So on the Keras side, what I’m really excited about is that,

00:40:37 so previously Keras has been this very easy to use

00:40:42 high level interface to do deep learning.

00:40:45 But if you wanted to,

00:40:50 if you wanted a lot of flexibility,

00:40:53 the Keras framework was probably not the optimal way

00:40:57 to do things compared to just writing everything

00:40:59 from scratch.

00:41:01 So in some way, the framework was getting in the way.

00:41:04 And in TensorFlow 2.0, you don’t have this at all, actually.

00:41:07 You have the usability of the high level interface,

00:41:11 but you have the flexibility of this lower level interface.

00:41:14 And you have this spectrum of workflows

00:41:16 where you can get more or less usability

00:41:21 and flexibility trade offs depending on your needs, right?

00:41:26 You can write everything from scratch

00:41:29 and you get a lot of help doing so

00:41:32 by subclassing models and writing some train loops

00:41:36 using ego execution.

00:41:38 It’s very flexible, it’s very easy to debug,

00:41:40 it’s very powerful.

00:41:42 But all of this integrates seamlessly

00:41:45 with higher level features up to the classic Keras workflows,

00:41:49 which are very scikit learn like

00:41:51 and are ideal for a data scientist,

00:41:56 machine learning engineer type of profile.

00:41:58 So now you can have the same framework

00:42:00 offering the same set of APIs

00:42:02 that enable a spectrum of workflows

00:42:05 that are more or less low level, more or less high level

00:42:08 that are suitable for profiles ranging from researchers

00:42:13 to data scientists and everything in between.

00:42:15 Yeah, so that’s super exciting.

00:42:16 I mean, it’s not just that,

00:42:18 it’s connected to all kinds of tooling.

00:42:21 You can go on mobile, you can go with TensorFlow Lite,

00:42:24 you can go in the cloud or serving and so on.

00:42:27 It all is connected together.

00:42:28 Now some of the best software written ever

00:42:31 is often done by one person, sometimes two.

00:42:36 So with a Google, you’re now seeing sort of Keras

00:42:40 having to be integrated in TensorFlow,

00:42:42 I’m sure has a ton of engineers working on.

00:42:46 And there’s, I’m sure a lot of tricky design decisions

00:42:51 to be made.

00:42:52 How does that process usually happen

00:42:54 from at least your perspective?

00:42:56 What are the debates like?

00:43:00 Is there a lot of thinking,

00:43:04 considering different options and so on?

00:43:06 Yes.

00:43:08 So a lot of the time I spend at Google

00:43:12 is actually discussing design discussions, right?

00:43:17 Writing design docs, participating in design review meetings

00:43:20 and so on.

00:43:22 This is as important as actually writing a code.

00:43:25 Right.

00:43:26 So there’s a lot of thought, there’s a lot of thought

00:43:28 and a lot of care that is taken

00:43:32 in coming up with these decisions

00:43:34 and taking into account all of our users

00:43:37 because TensorFlow has this extremely diverse user base,

00:43:40 right?

00:43:41 It’s not like just one user segment

00:43:43 where everyone has the same needs.

00:43:45 We have small scale production users,

00:43:47 large scale production users.

00:43:49 We have startups, we have researchers,

00:43:53 you know, it’s all over the place.

00:43:55 And we have to cater to all of their needs.

00:43:57 If I just look at the standard debates

00:44:00 of C++ or Python, there’s some heated debates.

00:44:04 Do you have those at Google?

00:44:06 I mean, they’re not heated in terms of emotionally,

00:44:08 but there’s probably multiple ways to do it, right?

00:44:10 So how do you arrive through those design meetings

00:44:14 at the best way to do it?

00:44:15 Especially in deep learning where the field is evolving

00:44:19 as you’re doing it.

00:44:21 Is there some magic to it?

00:44:23 Is there some magic to the process?

00:44:26 I don’t know if there’s magic to the process,

00:44:28 but there definitely is a process.

00:44:30 So making design decisions

00:44:33 is about satisfying a set of constraints,

00:44:36 but also trying to do so in the simplest way possible,

00:44:39 because this is what can be maintained,

00:44:42 this is what can be expanded in the future.

00:44:44 So you don’t want to naively satisfy the constraints

00:44:49 by just, you know, for each capability you need available,

00:44:51 you’re gonna come up with one argument in your API

00:44:53 and so on.

00:44:54 You want to design APIs that are modular and hierarchical

00:45:00 so that they have an API surface

00:45:04 that is as small as possible, right?

00:45:07 And you want this modular hierarchical architecture

00:45:11 to reflect the way that domain experts

00:45:14 think about the problem.

00:45:16 Because as a domain expert,

00:45:17 when you are reading about a new API,

00:45:19 you’re reading a tutorial or some docs pages,

00:45:24 you already have a way that you’re thinking about the problem.

00:45:28 You already have like certain concepts in mind

00:45:32 and you’re thinking about how they relate together.

00:45:35 And when you’re reading docs,

00:45:37 you’re trying to build as quickly as possible

00:45:40 a mapping between the concepts featured in your API

00:45:45 and the concepts in your mind.

00:45:46 So you’re trying to map your mental model

00:45:48 as a domain expert to the way things work in the API.

00:45:53 So you need an API and an underlying implementation

00:45:57 that are reflecting the way people think about these things.

00:46:00 So in minimizing the time it takes to do the mapping.

00:46:02 Yes, minimizing the time,

00:46:04 the cognitive load there is

00:46:06 in ingesting this new knowledge about your API.

00:46:10 An API should not be self referential

00:46:13 or referring to implementation details.

00:46:15 It should only be referring to domain specific concepts

00:46:19 that people already understand.

00:46:23 Brilliant.

00:46:24 So what’s the future of Keras and TensorFlow look like?

00:46:27 What does TensorFlow 3.0 look like?

00:46:30 So that’s kind of too far in the future for me to answer,

00:46:33 especially since I’m not even the one making these decisions.

00:46:37 Okay.

00:46:39 But so from my perspective,

00:46:41 which is just one perspective

00:46:43 among many different perspectives on the TensorFlow team,

00:46:47 I’m really excited by developing even higher level APIs,

00:46:52 higher level than Keras.

00:46:53 I’m really excited by hyperparameter tuning,

00:46:56 by automated machine learning, AutoML.

00:47:01 I think the future is not just, you know,

00:47:03 defining a model like you were assembling Lego blocks

00:47:07 and then collect fit on it.

00:47:09 It’s more like an automagical model

00:47:13 that would just look at your data

00:47:16 and optimize the objective you’re after, right?

00:47:19 So that’s what I’m looking into.

00:47:23 Yeah, so you put the baby into a room with the problem

00:47:26 and come back a few hours later

00:47:28 with a fully solved problem.

00:47:30 Exactly, it’s not like a box of Legos.

00:47:33 It’s more like the combination of a kid

00:47:35 that’s really good at Legos and a box of Legos.

00:47:38 It’s just building the thing on its own.

00:47:41 Very nice.

00:47:42 So that’s an exciting future.

00:47:44 I think there’s a huge amount of applications

00:47:46 and revolutions to be had

00:47:49 under the constraints of the discussion we previously had.

00:47:52 But what do you think of the current limits of deep learning?

00:47:57 If we look specifically at these function approximators

00:48:03 that tries to generalize from data.

00:48:06 You’ve talked about local versus extreme generalization.

00:48:11 You mentioned that neural networks don’t generalize well

00:48:13 and humans do.

00:48:14 So there’s this gap.

00:48:17 And you’ve also mentioned that extreme generalization

00:48:20 requires something like reasoning to fill those gaps.

00:48:23 So how can we start trying to build systems like that?

00:48:27 Right, yeah, so this is by design, right?

00:48:30 Deep learning models are like huge parametric models,

00:48:37 differentiable, so continuous,

00:48:39 that go from an input space to an output space.

00:48:42 And they’re trained with gradient descent.

00:48:44 So they’re trained pretty much point by point.

00:48:47 They are learning a continuous geometric morphing

00:48:50 from an input vector space to an output vector space.

00:48:55 And because this is done point by point,

00:48:58 a deep neural network can only make sense

00:49:02 of points in experience space that are very close

00:49:05 to things that it has already seen in string data.

00:49:08 At best, it can do interpolation across points.

00:49:13 But that means in order to train your network,

00:49:17 you need a dense sampling of the input cross output space,

00:49:22 almost a point by point sampling,

00:49:25 which can be very expensive if you’re dealing

00:49:27 with complex real world problems,

00:49:29 like autonomous driving, for instance, or robotics.

00:49:33 It’s doable if you’re looking at the subset

00:49:36 of the visual space.

00:49:37 But even then, it’s still fairly expensive.

00:49:38 You still need millions of examples.

00:49:40 And it’s only going to be able to make sense of things

00:49:44 that are very close to what it has seen before.

00:49:46 And in contrast to that, well, of course,

00:49:49 you have human intelligence.

00:49:50 But even if you’re not looking at human intelligence,

00:49:53 you can look at very simple rules, algorithms.

00:49:56 If you have a symbolic rule,

00:49:58 it can actually apply to a very, very large set of inputs

00:50:03 because it is abstract.

00:50:04 It is not obtained by doing a point by point mapping.

00:50:10 For instance, if you try to learn a sorting algorithm

00:50:14 using a deep neural network,

00:50:15 well, you’re very much limited to learning point by point

00:50:20 what the sorted representation of this specific list is like.

00:50:24 But instead, you could have a very, very simple

00:50:29 sorting algorithm written in a few lines.

00:50:31 Maybe it’s just two nested loops.

00:50:35 And it can process any list at all because it is abstract,

00:50:41 because it is a set of rules.

00:50:42 So deep learning is really like point by point

00:50:45 geometric morphings, train with good and decent.

00:50:48 And meanwhile, abstract rules can generalize much better.

00:50:53 And I think the future is we need to combine the two.

00:50:56 So how do we, do you think, combine the two?

00:50:59 How do we combine good point by point functions

00:51:03 with programs, which is what the symbolic AI type systems?

00:51:08 At which levels the combination happen?

00:51:11 I mean, obviously we’re jumping into the realm

00:51:14 of where there’s no good answers.

00:51:16 It’s just kind of ideas and intuitions and so on.

00:51:20 Well, if you look at the really successful AI systems

00:51:23 today, I think they are already hybrid systems

00:51:26 that are combining symbolic AI with deep learning.

00:51:29 For instance, successful robotics systems

00:51:32 are already mostly model based, rule based,

00:51:37 things like planning algorithms and so on.

00:51:39 At the same time, they’re using deep learning

00:51:42 as perception modules.

00:51:43 Sometimes they’re using deep learning as a way

00:51:46 to inject fuzzy intuition into a rule based process.

00:51:50 If you look at the system like in a self driving car,

00:51:54 it’s not just one big end to end neural network.

00:51:57 You know, that wouldn’t work at all.

00:51:59 Precisely because in order to train that,

00:52:00 you would need a dense sampling of experience base

00:52:05 when it comes to driving,

00:52:06 which is completely unrealistic, obviously.

00:52:08 Instead, the self driving car is mostly

00:52:13 symbolic, you know, it’s software, it’s programmed by hand.

00:52:18 So it’s mostly based on explicit models.

00:52:21 In this case, mostly 3D models of the environment

00:52:25 around the car, but it’s interfacing with the real world

00:52:29 using deep learning modules, right?

00:52:31 So the deep learning there serves as a way

00:52:33 to convert the raw sensory information

00:52:36 to something usable by symbolic systems.

00:52:39 Okay, well, let’s linger on that a little more.

00:52:42 So dense sampling from input to output.

00:52:45 You said it’s obviously very difficult.

00:52:48 Is it possible?

00:52:50 In the case of self driving, you mean?

00:52:51 Let’s say self driving, right?

00:52:53 Self driving for many people,

00:52:57 let’s not even talk about self driving,

00:52:59 let’s talk about steering, so staying inside the lane.

00:53:05 Lane following, yeah, it’s definitely a problem

00:53:07 you can solve with an end to end deep learning model,

00:53:08 but that’s like one small subset.

00:53:10 Hold on a second.

00:53:11 Yeah, I don’t know why you’re jumping

00:53:12 from the extreme so easily,

00:53:14 because I disagree with you on that.

00:53:16 I think, well, it’s not obvious to me

00:53:21 that you can solve lane following.

00:53:23 No, it’s not obvious, I think it’s doable.

00:53:25 I think in general, there is no hard limitations

00:53:31 to what you can learn with a deep neural network,

00:53:33 as long as the search space is rich enough,

00:53:40 is flexible enough, and as long as you have

00:53:42 this dense sampling of the input cross output space.

00:53:45 The problem is that this dense sampling

00:53:47 could mean anything from 10,000 examples

00:53:51 to like trillions and trillions.

00:53:52 So that’s my question.

00:53:54 So what’s your intuition?

00:53:56 And if you could just give it a chance

00:53:58 and think what kind of problems can be solved

00:54:01 by getting a huge amounts of data

00:54:04 and thereby creating a dense mapping.

00:54:08 So let’s think about natural language dialogue,

00:54:12 the Turing test.

00:54:14 Do you think the Turing test can be solved

00:54:17 with a neural network alone?

00:54:21 Well, the Turing test is all about tricking people

00:54:24 into believing they’re talking to a human.

00:54:26 And I don’t think that’s actually very difficult

00:54:29 because it’s more about exploiting human perception

00:54:35 and not so much about intelligence.

00:54:37 There’s a big difference between mimicking

00:54:39 intelligent behavior and actual intelligent behavior.

00:54:42 So, okay, let’s look at maybe the Alexa prize and so on.

00:54:45 The different formulations of the natural language

00:54:47 conversation that are less about mimicking

00:54:50 and more about maintaining a fun conversation

00:54:52 that lasts for 20 minutes.

00:54:54 That’s a little less about mimicking

00:54:56 and that’s more about, I mean, it’s still mimicking,

00:54:59 but it’s more about being able to carry forward

00:55:01 a conversation with all the tangents that happen

00:55:03 in dialogue and so on.

00:55:05 Do you think that problem is learnable

00:55:08 with a neural network that does the point to point mapping?

00:55:14 So I think it would be very, very challenging

00:55:16 to do this with deep learning.

00:55:17 I don’t think it’s out of the question either.

00:55:21 I wouldn’t rule it out.

00:55:23 The space of problems that can be solved

00:55:25 with a large neural network.

00:55:26 What’s your sense about the space of those problems?

00:55:30 So useful problems for us.

00:55:32 In theory, it’s infinite, right?

00:55:34 You can solve any problem.

00:55:36 In practice, well, deep learning is a great fit

00:55:39 for perception problems.

00:55:41 In general, any problem which is naturally amenable

00:55:47 to explicit handcrafted rules or rules that you can generate

00:55:52 by exhaustive search over some program space.

00:55:56 So perception, artificial intuition,

00:55:59 as long as you have a sufficient training dataset.

00:56:03 And that’s the question, I mean, perception,

00:56:05 there’s interpretation and understanding of the scene,

00:56:08 which seems to be outside the reach

00:56:10 of current perception systems.

00:56:12 So do you think larger networks will be able

00:56:15 to start to understand the physics

00:56:18 and the physics of the scene,

00:56:21 the three dimensional structure and relationships

00:56:23 of objects in the scene and so on?

00:56:25 Or really that’s where symbolic AI has to step in?

00:56:28 Well, it’s always possible to solve these problems

00:56:34 with deep learning.

00:56:36 It’s just extremely inefficient.

00:56:38 A model would be an explicit rule based abstract model

00:56:42 would be a far better, more compressed

00:56:45 representation of physics.

00:56:46 Then learning just this mapping between

00:56:49 in this situation, this thing happens.

00:56:50 If you change the situation slightly,

00:56:52 then this other thing happens and so on.

00:56:54 Do you think it’s possible to automatically generate

00:56:57 the programs that would require that kind of reasoning?

00:57:02 Or does it have to, so the way the expert systems fail,

00:57:05 there’s so many facts about the world

00:57:07 had to be hand coded in.

00:57:08 Do you think it’s possible to learn those logical statements

00:57:14 that are true about the world and their relationships?

00:57:18 Do you think, I mean, that’s kind of what theorem proving

00:57:20 at a basic level is trying to do, right?

00:57:22 Yeah, except it’s much harder to formulate statements

00:57:26 about the world compared to formulating

00:57:28 mathematical statements.

00:57:30 Statements about the world tend to be subjective.

00:57:34 So can you learn rule based models?

00:57:39 Yes, definitely.

00:57:40 That’s the field of program synthesis.

00:57:43 However, today we just don’t really know how to do it.

00:57:48 So it’s very much a grass search or tree search problem.

00:57:52 And so we are limited to the sort of tree session grass

00:57:56 search algorithms that we have today.

00:57:58 Personally, I think genetic algorithms are very promising.

00:58:02 So almost like genetic programming.

00:58:04 Genetic programming, exactly.

00:58:05 Can you discuss the field of program synthesis?

00:58:08 Like how many people are working and thinking about it?

00:58:14 Where we are in the history of program synthesis

00:58:17 and what are your hopes for it?

00:58:20 Well, if it were deep learning, this is like the 90s.

00:58:24 So meaning that we already have existing solutions.

00:58:29 We are starting to have some basic understanding

00:58:34 of what this is about.

00:58:35 But it’s still a field that is in its infancy.

00:58:38 There are very few people working on it.

00:58:40 There are very few real world applications.

00:58:44 So the one real world application I’m aware of

00:58:47 is Flash Fill in Excel.

00:58:51 It’s a way to automatically learn very simple programs

00:58:55 to format cells in an Excel spreadsheet

00:58:58 from a few examples.

00:59:00 For instance, learning a way to format a date, things like that.

00:59:02 Oh, that’s fascinating.

00:59:03 Yeah.

00:59:04 You know, OK, that’s a fascinating topic.

00:59:06 I always wonder when I provide a few samples to Excel,

00:59:10 what it’s able to figure out.

00:59:12 Like just giving it a few dates, what

00:59:15 are you able to figure out from the pattern I just gave you?

00:59:18 That’s a fascinating question.

00:59:19 And it’s fascinating whether that’s learnable patterns.

00:59:23 And you’re saying they’re working on that.

00:59:25 How big is the toolbox currently?

00:59:28 Are we completely in the dark?

00:59:29 So if you said the 90s.

00:59:30 In terms of program synthesis?

00:59:31 No.

00:59:32 So I would say, so maybe 90s is even too optimistic.

00:59:37 Because by the 90s, we already understood back prop.

00:59:41 We already understood the engine of deep learning,

00:59:43 even though we couldn’t really see its potential quite.

00:59:47 Today, I don’t think we have found

00:59:48 the engine of program synthesis.

00:59:50 So we’re in the winter before back prop.

00:59:52 Yeah.

00:59:54 In a way, yes.

00:59:55 So I do believe program synthesis and general discrete search

01:00:00 over rule based models is going to be

01:00:02 a cornerstone of AI research in the next century.

01:00:06 And that doesn’t mean we are going to drop deep learning.

01:00:10 Deep learning is immensely useful.

01:00:11 Like, being able to learn is a very flexible, adaptable,

01:00:17 parametric model.

01:00:18 So it’s got to understand that’s actually immensely useful.

01:00:20 All it’s doing is pattern cognition.

01:00:23 But being good at pattern cognition, given lots of data,

01:00:25 is just extremely powerful.

01:00:27 So we are still going to be working on deep learning.

01:00:30 We are going to be working on program synthesis.

01:00:31 We are going to be combining the two in increasingly automated

01:00:34 ways.

01:00:36 So let’s talk a little bit about data.

01:00:38 You’ve tweeted, about 10,000 deep learning papers

01:00:44 have been written about hard coding priors

01:00:47 about a specific task in a neural network architecture

01:00:49 works better than a lack of a prior.

01:00:52 Basically, summarizing all these efforts,

01:00:55 they put a name to an architecture.

01:00:56 But really, what they’re doing is hard coding some priors

01:00:59 that improve the performance of the system.

01:01:01 But which gets straight to the point is probably true.

01:01:06 So you say that you can always buy performance by,

01:01:09 in quotes, performance by either training on more data,

01:01:12 better data, or by injecting task information

01:01:15 to the architecture of the preprocessing.

01:01:18 However, this isn’t informative about the generalization power

01:01:21 the techniques use, the fundamental ability

01:01:23 to generalize.

01:01:24 Do you think we can go far by coming up

01:01:26 with better methods for this kind of cheating,

01:01:29 for better methods of large scale annotation of data?

01:01:33 So building better priors.

01:01:34 If you automate it, it’s not cheating anymore.

01:01:37 Right.

01:01:38 I’m joking about the cheating, but large scale.

01:01:41 So basically, I’m asking about something

01:01:46 that hasn’t, from my perspective,

01:01:48 been researched too much is exponential improvement

01:01:53 in annotation of data.

01:01:55 Do you often think about?

01:01:58 I think it’s actually been researched quite a bit.

01:02:00 You just don’t see publications about it.

01:02:02 Because people who publish papers

01:02:05 are going to publish about known benchmarks.

01:02:07 Sometimes they’re going to read a new benchmark.

01:02:09 People who actually have real world large scale

01:02:12 depending on problems, they’re going

01:02:13 to spend a lot of resources into data annotation

01:02:16 and good data annotation pipelines,

01:02:18 but you don’t see any papers about it.

01:02:19 That’s interesting.

01:02:20 So do you think, certainly resources,

01:02:22 but do you think there’s innovation happening?

01:02:24 Oh, yeah.

01:02:25 To clarify the point in the tweet.

01:02:28 So machine learning in general is

01:02:31 the science of generalization.

01:02:33 You want to generate knowledge that

01:02:37 can be reused across different data sets,

01:02:40 across different tasks.

01:02:42 And if instead you’re looking at one data set

01:02:45 and then you are hard coding knowledge about this task

01:02:50 into your architecture, this is no more useful

01:02:54 than training a network and then saying, oh, I

01:02:56 found these weight values perform well.

01:03:01 So David Ha, I don’t know if you know David,

01:03:05 he had a paper the other day about weight

01:03:08 agnostic neural networks.

01:03:10 And this is a very interesting paper

01:03:12 because it really illustrates the fact

01:03:14 that an architecture, even without weights,

01:03:17 an architecture is knowledge about a task.

01:03:21 It encodes knowledge.

01:03:23 And when it comes to architectures

01:03:25 that are uncrafted by researchers, in some cases,

01:03:30 it is very, very clear that all they are doing

01:03:34 is artificially reencoding the template that

01:03:38 corresponds to the proper way to solve the task encoding

01:03:44 a given data set.

01:03:45 For instance, I know if you looked

01:03:48 at the baby data set, which is about natural language

01:03:52 question answering, it is generated by an algorithm.

01:03:55 So this is a question answer pairs

01:03:57 that are generated by an algorithm.

01:03:59 The algorithm is solving a certain template.

01:04:01 Turns out, if you craft a network that

01:04:04 literally encodes this template, you

01:04:06 can solve this data set with nearly 100% accuracy.

01:04:09 But that doesn’t actually tell you

01:04:11 anything about how to solve question answering

01:04:14 in general, which is the point.

01:04:17 The question is just to linger on it,

01:04:19 whether it’s from the data side or from the size

01:04:21 of the network.

01:04:23 I don’t know if you’ve read the blog post by Rich Sutton,

01:04:25 The Bitter Lesson, where he says,

01:04:28 the biggest lesson that we can read from 70 years of AI

01:04:31 research is that general methods that leverage computation

01:04:34 are ultimately the most effective.

01:04:37 So as opposed to figuring out methods

01:04:39 that can generalize effectively, do you

01:04:41 think we can get pretty far by just having something

01:04:47 that leverages computation and the improvement of computation?

01:04:51 Yeah, so I think Rich is making a very good point, which

01:04:54 is that a lot of these papers, which are actually

01:04:57 all about manually hardcoding prior knowledge about a task

01:05:02 into some system, it doesn’t have

01:05:04 to be deep learning architecture, but into some system.

01:05:08 These papers are not actually making any impact.

01:05:11 Instead, what’s making really long term impact

01:05:14 is very simple, very general systems

01:05:18 that are really agnostic to all these tricks.

01:05:21 Because these tricks do not generalize.

01:05:23 And of course, the one general and simple thing

01:05:27 that you should focus on is that which leverages computation.

01:05:33 Because computation, the availability

01:05:36 of large scale computation has been increasing exponentially

01:05:39 following Moore’s law.

01:05:40 So if your algorithm is all about exploiting this,

01:05:44 then your algorithm is suddenly exponentially improving.

01:05:47 So I think Rich is definitely right.

01:05:52 However, he’s right about the past 70 years.

01:05:57 He’s like assessing the past 70 years.

01:05:59 I am not sure that this assessment will still

01:06:02 hold true for the next 70 years.

01:06:04 It might to some extent.

01:06:07 I suspect it will not.

01:06:08 Because the truth of his assessment

01:06:11 is a function of the context in which this research took place.

01:06:16 And the context is changing.

01:06:18 Moore’s law might not be applicable anymore,

01:06:21 for instance, in the future.

01:06:23 And I do believe that when you tweak one aspect of a system,

01:06:31 when you exploit one aspect of a system,

01:06:32 some other aspect starts becoming the bottleneck.

01:06:36 Let’s say you have unlimited computation.

01:06:38 Well, then data is the bottleneck.

01:06:41 And I think we are already starting

01:06:43 to be in a regime where our systems are

01:06:45 so large in scale and so data ingrained

01:06:48 that data today and the quality of data

01:06:50 and the scale of data is the bottleneck.

01:06:53 And in this environment, the bitter lesson from Rich

01:06:58 is not going to be true anymore.

01:07:00 So I think we are going to move from a focus

01:07:03 on a computation scale to focus on data efficiency.

01:07:09 Data efficiency.

01:07:10 So that’s getting to the question of symbolic AI.

01:07:13 But to linger on the deep learning approaches,

01:07:16 do you have hope for either unsupervised learning

01:07:19 or reinforcement learning, which are

01:07:23 ways of being more data efficient in terms

01:07:28 of the amount of data they need that required human annotation?

01:07:31 So unsupervised learning and reinforcement learning

01:07:34 are frameworks for learning, but they are not

01:07:36 like any specific technique.

01:07:39 So usually when people say reinforcement learning,

01:07:41 what they really mean is deep reinforcement learning,

01:07:43 which is like one approach which is actually very questionable.

01:07:47 The question I was asking was unsupervised learning

01:07:50 with deep neural networks and deep reinforcement learning.

01:07:54 Well, these are not really data efficient

01:07:56 because you’re still leveraging these huge parametric models

01:08:00 point by point with gradient descent.

01:08:03 It is more efficient in terms of the number of annotations,

01:08:08 the density of annotations you need.

01:08:09 So the idea being to learn the latent space around which

01:08:13 the data is organized and then map the sparse annotations

01:08:17 into it.

01:08:18 And sure, I mean, that’s clearly a very good idea.

01:08:23 It’s not really a topic I would be working on,

01:08:26 but it’s clearly a good idea.

01:08:28 So it would get us to solve some problems that?

01:08:31 It will get us to incremental improvements

01:08:34 in labeled data efficiency.

01:08:38 Do you have concerns about short term or long term threats

01:08:43 from AI, from artificial intelligence?

01:08:47 Yes, definitely to some extent.

01:08:50 And what’s the shape of those concerns?

01:08:52 This is actually something I’ve briefly written about.

01:08:56 But the capabilities of deep learning technology

01:09:02 can be used in many ways that are

01:09:05 concerning from mass surveillance with things

01:09:09 like facial recognition.

01:09:11 In general, tracking lots of data about everyone

01:09:15 and then being able to making sense of this data

01:09:18 to do identification, to do prediction.

01:09:22 That’s concerning.

01:09:23 That’s something that’s being very aggressively pursued

01:09:26 by totalitarian states like China.

01:09:31 One thing I am very much concerned about

01:09:34 is that our lives are increasingly online,

01:09:40 are increasingly digital, made of information,

01:09:43 made of information consumption and information production,

01:09:48 our digital footprint, I would say.

01:09:51 And if you absorb all of this data

01:09:56 and you are in control of where you consume information,

01:10:01 social networks and so on, recommendation engines,

01:10:06 then you can build a sort of reinforcement

01:10:10 loop for human behavior.

01:10:13 You can observe the state of your mind at time t.

01:10:18 You can predict how you would react

01:10:21 to different pieces of content, how

01:10:23 to get you to move your mind in a certain direction.

01:10:27 And then you can feed you the specific piece of content

01:10:33 that would move you in a specific direction.

01:10:35 And you can do this at scale in terms

01:10:41 of doing it continuously in real time.

01:10:44 You can also do it at scale in terms

01:10:46 of scaling this to many, many people, to entire populations.

01:10:50 So potentially, artificial intelligence,

01:10:53 even in its current state, if you combine it

01:10:57 with the internet, with the fact that all of our lives

01:11:01 are moving to digital devices and digital information

01:11:05 consumption and creation, what you get

01:11:08 is the possibility to achieve mass manipulation of behavior

01:11:14 and mass psychological control.

01:11:16 And this is a very real possibility.

01:11:18 Yeah, so you’re talking about any kind of recommender system.

01:11:22 Let’s look at the YouTube algorithm, Facebook,

01:11:26 anything that recommends content you should watch next.

01:11:29 And it’s fascinating to think that there’s

01:11:32 some aspects of human behavior that you can say a problem of,

01:11:41 is this person hold Republican beliefs or Democratic beliefs?

01:11:45 And this is a trivial, that’s an objective function.

01:11:50 And you can optimize, and you can measure,

01:11:52 and you can turn everybody into a Republican

01:11:54 or everybody into a Democrat.

01:11:56 I do believe it’s true.

01:11:57 So the human mind is very, if you look at the human mind

01:12:03 as a kind of computer program, it

01:12:05 has a very large exploit surface.

01:12:07 It has many, many vulnerabilities.

01:12:09 Exploit surfaces, yeah.

01:12:10 Ways you can control it.

01:12:13 For instance, when it comes to your political beliefs,

01:12:16 this is very much tied to your identity.

01:12:19 So for instance, if I’m in control of your news feed

01:12:23 on your favorite social media platforms,

01:12:26 this is actually where you’re getting your news from.

01:12:29 And of course, I can choose to only show you

01:12:32 news that will make you see the world in a specific way.

01:12:37 But I can also create incentives for you

01:12:41 to post about some political beliefs.

01:12:44 And then when I get you to express a statement,

01:12:47 if it’s a statement that me as the controller,

01:12:51 I want to reinforce.

01:12:53 I can just show it to people who will agree,

01:12:55 and they will like it.

01:12:56 And that will reinforce the statement in your mind.

01:12:59 If this is a statement I want you to,

01:13:02 this is a belief I want you to abandon,

01:13:05 I can, on the other hand, show it to opponents.

01:13:09 We’ll attack you.

01:13:10 And because they attack you, at the very least,

01:13:12 next time you will think twice about posting it.

01:13:16 But maybe you will even start believing this

01:13:20 because you got pushback.

01:13:22 So there are many ways in which social media platforms

01:13:28 can potentially control your opinions.

01:13:30 And today, so all of these things

01:13:35 are already being controlled by AI algorithms.

01:13:38 These algorithms do not have any explicit political goal

01:13:41 today.

01:13:42 Well, potentially they could, like if some totalitarian

01:13:48 government takes over social media platforms

01:13:52 and decides that now we are going to use this not just

01:13:55 for mass surveillance, but also for mass opinion control

01:13:58 and behavior control.

01:13:59 Very bad things could happen.

01:14:01 But what’s really fascinating and actually quite concerning

01:14:06 is that even without an explicit intent to manipulate,

01:14:11 you’re already seeing very dangerous dynamics

01:14:14 in terms of how these content recommendation

01:14:18 algorithms behave.

01:14:19 Because right now, the goal, the objective function

01:14:24 of these algorithms is to maximize engagement,

01:14:28 which seems fairly innocuous at first.

01:14:32 However, it is not because content

01:14:36 that will maximally engage people, get people to react

01:14:42 in an emotional way, get people to click on something.

01:14:44 It is very often content that is not

01:14:52 healthy to the public discourse.

01:14:54 For instance, fake news are far more

01:14:58 likely to get you to click on them than real news

01:15:01 simply because they are not constrained to reality.

01:15:06 So they can be as outrageous, as surprising,

01:15:11 as good stories as you want because they’re artificial.

01:15:15 To me, that’s an exciting world because so much good

01:15:18 can come.

01:15:19 So there’s an opportunity to educate people.

01:15:24 You can balance people’s worldview with other ideas.

01:15:31 So there’s so many objective functions.

01:15:33 The space of objective functions that

01:15:35 create better civilizations is large, arguably infinite.

01:15:40 But there’s also a large space that

01:15:43 creates division and destruction, civil war,

01:15:51 a lot of bad stuff.

01:15:53 And the worry is, naturally, probably that space

01:15:56 is bigger, first of all.

01:15:59 And if we don’t explicitly think about what kind of effects

01:16:04 are going to be observed from different objective functions,

01:16:08 then we’re going to get into trouble.

01:16:10 But the question is, how do we get into rooms

01:16:14 and have discussions, so inside Google, inside Facebook,

01:16:18 inside Twitter, and think about, OK,

01:16:21 how can we drive up engagement and, at the same time,

01:16:24 create a good society?

01:16:28 Is it even possible to have that kind

01:16:29 of philosophical discussion?

01:16:31 I think you can definitely try.

01:16:33 So from my perspective, I would feel rather uncomfortable

01:16:37 with companies that are uncomfortable with these new

01:16:41 student algorithms, with them making explicit decisions

01:16:47 to manipulate people’s opinions or behaviors,

01:16:50 even if the intent is good, because that’s

01:16:53 a very totalitarian mindset.

01:16:55 So instead, what I would like to see

01:16:57 is probably never going to happen,

01:16:58 because it’s not super realistic,

01:17:00 but that’s actually something I really care about.

01:17:02 I would like all these algorithms

01:17:06 to present configuration settings to their users,

01:17:10 so that the users can actually make the decision about how

01:17:14 they want to be impacted by these information

01:17:19 recommendation, content recommendation algorithms.

01:17:21 For instance, as a user of something

01:17:24 like YouTube or Twitter, maybe I want

01:17:26 to maximize learning about a specific topic.

01:17:30 So I want the algorithm to feed my curiosity,

01:17:36 which is in itself a very interesting problem.

01:17:38 So instead of maximizing my engagement,

01:17:41 it will maximize how fast and how much I’m learning.

01:17:44 And it will also take into account the accuracy,

01:17:47 hopefully, of the information I’m learning.

01:17:50 So yeah, the user should be able to determine exactly

01:17:55 how these algorithms are affecting their lives.

01:17:58 I don’t want actually any entity making decisions

01:18:03 about in which direction they’re going to try to manipulate me.

01:18:09 I want technology.

01:18:11 So AI, these algorithms are increasingly

01:18:14 going to be our interface to a world that is increasingly

01:18:18 made of information.

01:18:19 And I want everyone to be in control of this interface,

01:18:25 to interface with the world on their own terms.

01:18:29 So if someone wants these algorithms

01:18:32 to serve their own personal growth goals,

01:18:37 they should be able to configure these algorithms

01:18:40 in such a way.

01:18:41 Yeah, but so I know it’s painful to have explicit decisions.

01:18:46 But there is underlying explicit decisions,

01:18:51 which is some of the most beautiful fundamental

01:18:53 philosophy that we have before us,

01:18:57 which is personal growth.

01:19:01 If I want to watch videos from which I can learn,

01:19:05 what does that mean?

01:19:08 So if I have a checkbox that wants to emphasize learning,

01:19:11 there’s still an algorithm with explicit decisions in it

01:19:15 that would promote learning.

01:19:17 What does that mean for me?

01:19:19 For example, I’ve watched a documentary on flat Earth

01:19:22 theory, I guess.

01:19:27 I learned a lot.

01:19:28 I’m really glad I watched it.

01:19:29 It was a friend recommended it to me.

01:19:32 Because I don’t have such an allergic reaction to crazy

01:19:35 people, as my fellow colleagues do.

01:19:37 But it was very eye opening.

01:19:40 And for others, it might not be.

01:19:42 From others, they might just get turned off from that, same

01:19:45 with Republican and Democrat.

01:19:47 And it’s a non trivial problem.

01:19:50 And first of all, if it’s done well,

01:19:52 I don’t think it’s something that wouldn’t happen,

01:19:56 that YouTube wouldn’t be promoting,

01:19:59 or Twitter wouldn’t be.

01:20:00 It’s just a really difficult problem,

01:20:02 how to give people control.

01:20:05 Well, it’s mostly an interface design problem.

01:20:08 The way I see it, you want to create technology

01:20:11 that’s like a mentor, or a coach, or an assistant,

01:20:16 so that it’s not your boss.

01:20:20 You are in control of it.

01:20:22 You are telling it what to do for you.

01:20:25 And if you feel like it’s manipulating you,

01:20:27 it’s not actually doing what you want.

01:20:31 You should be able to switch to a different algorithm.

01:20:34 So that’s fine tune control.

01:20:36 You kind of learn that you’re trusting

01:20:38 the human collaboration.

01:20:40 I mean, that’s how I see autonomous vehicles too,

01:20:41 is giving as much information as possible,

01:20:44 and you learn that dance yourself.

01:20:47 Yeah, Adobe, I don’t know if you use Adobe product

01:20:50 for like Photoshop.

01:20:52 They’re trying to see if they can inject YouTube

01:20:55 into their interface, but basically allow you

01:20:57 to show you all these videos,

01:20:59 that everybody’s confused about what to do with features.

01:21:03 So basically teach people by linking to,

01:21:07 in that way, it’s an assistant that uses videos

01:21:10 as a basic element of information.

01:21:13 Okay, so what practically should people do

01:21:18 to try to fight against abuses of these algorithms,

01:21:24 or algorithms that manipulate us?

01:21:27 Honestly, it’s a very, very difficult problem,

01:21:29 because to start with, there is very little public awareness

01:21:32 of these issues.

01:21:35 Very few people would think there’s anything wrong

01:21:38 with the unused algorithm,

01:21:39 even though there is actually something wrong already,

01:21:42 which is that it’s trying to maximize engagement

01:21:44 most of the time, which has very negative side effects.

01:21:49 So ideally, so the very first thing is to stop

01:21:56 trying to purely maximize engagement,

01:21:59 try to propagate content based on popularity, right?

01:22:06 Instead, take into account the goals

01:22:11 and the profiles of each user.

01:22:13 So you will be, one example is, for instance,

01:22:16 when I look at topic recommendations on Twitter,

01:22:20 it’s like, you know, they have this news tab

01:22:24 with switch recommendations.

01:22:25 It’s always the worst coverage,

01:22:28 because it’s content that appeals

01:22:30 to the smallest common denominator

01:22:34 to all Twitter users, because they’re trying to optimize.

01:22:37 They’re purely trying to optimize popularity.

01:22:39 They’re purely trying to optimize engagement.

01:22:41 But that’s not what I want.

01:22:42 So they should put me in control of some setting

01:22:46 so that I define what’s the objective function

01:22:50 that Twitter is going to be following

01:22:52 to show me this content.

01:22:54 And honestly, so this is all about interface design.

01:22:57 And we are not, it’s not realistic

01:22:59 to give users control of a bunch of knobs

01:23:01 that define algorithm.

01:23:03 Instead, we should purely put them in charge

01:23:06 of defining the objective function.

01:23:09 Like, let the user tell us what they want to achieve,

01:23:13 how they want this algorithm to impact their lives.

01:23:15 So do you think it is that,

01:23:16 or do they provide individual article by article

01:23:19 reward structure where you give a signal,

01:23:21 I’m glad I saw this, or I’m glad I didn’t?

01:23:24 So like a Spotify type feedback mechanism,

01:23:28 it works to some extent.

01:23:30 I’m kind of skeptical about it

01:23:32 because the only way the algorithm,

01:23:34 the algorithm will attempt to relate your choices

01:23:39 with the choices of everyone else,

01:23:41 which might, you know, if you have an average profile

01:23:45 that works fine, I’m sure Spotify accommodations work fine

01:23:47 if you just like mainstream stuff.

01:23:49 If you don’t, it can be, it’s not optimal at all actually.

01:23:53 It’ll be in an efficient search

01:23:56 for the part of the Spotify world that represents you.

01:24:00 So it’s a tough problem,

01:24:02 but do note that even a feedback system

01:24:07 like what Spotify has does not give me control

01:24:10 over what the algorithm is trying to optimize for.

01:24:16 Well, public awareness, which is what we’re doing now,

01:24:19 is a good place to start.

01:24:21 Do you have concerns about longterm existential threats

01:24:25 of artificial intelligence?

01:24:28 Well, as I was saying,

01:24:31 our world is increasingly made of information.

01:24:33 AI algorithms are increasingly going to be our interface

01:24:36 to this world of information,

01:24:37 and somebody will be in control of these algorithms.

01:24:41 And that puts us in any kind of a bad situation, right?

01:24:45 It has risks.

01:24:46 It has risks coming from potentially large companies

01:24:50 wanting to optimize their own goals,

01:24:53 maybe profit, maybe something else.

01:24:55 Also from governments who might want to use these algorithms

01:25:00 as a means of control of the population.

01:25:03 Do you think there’s existential threat

01:25:05 that could arise from that?

01:25:06 So existential threat.

01:25:09 So maybe you’re referring to the singularity narrative

01:25:13 where robots just take over.

01:25:15 Well, I don’t, I’m not terminating robots,

01:25:18 and I don’t believe it has to be a singularity.

01:25:21 We’re just talking to, just like you said,

01:25:24 the algorithm controlling masses of populations.

01:25:28 The existential threat being,

01:25:32 hurt ourselves much like a nuclear war would hurt ourselves.

01:25:36 That kind of thing.

01:25:37 I don’t think that requires a singularity.

01:25:39 That requires a loss of control over AI algorithm.

01:25:42 Yes.

01:25:43 So I do agree there are concerning trends.

01:25:47 Honestly, I wouldn’t want to make any longterm predictions.

01:25:52 I don’t think today we really have the capability

01:25:56 to see what the dangers of AI

01:25:58 are going to be in 50 years, in 100 years.

01:26:01 I do see that we are already faced

01:26:04 with concrete and present dangers

01:26:08 surrounding the negative side effects

01:26:11 of content recombination systems, of newsfeed algorithms

01:26:14 concerning algorithmic bias as well.

01:26:18 So we are delegating more and more

01:26:22 decision processes to algorithms.

01:26:25 Some of these algorithms are uncrafted,

01:26:26 some are learned from data,

01:26:29 but we are delegating control.

01:26:32 Sometimes it’s a good thing, sometimes not so much.

01:26:36 And there is in general very little supervision

01:26:39 of this process, right?

01:26:41 So we are still in this period of very fast change,

01:26:45 even chaos, where society is restructuring itself,

01:26:50 turning into an information society,

01:26:53 which itself is turning into

01:26:54 an increasingly automated information passing society.

01:26:58 And well, yeah, I think the best we can do today

01:27:02 is try to raise awareness around some of these issues.

01:27:06 And I think we’re actually making good progress.

01:27:07 If you look at algorithmic bias, for instance,

01:27:12 three years ago, even two years ago,

01:27:14 very, very few people were talking about it.

01:27:17 And now all the big companies are talking about it.

01:27:20 They are often not in a very serious way,

01:27:22 but at least it is part of the public discourse.

01:27:24 You see people in Congress talking about it.

01:27:27 And it all started from raising awareness.

01:27:31 Right.

01:27:32 So in terms of alignment problem,

01:27:36 trying to teach as we allow algorithms,

01:27:39 just even recommender systems on Twitter,

01:27:43 encoding human values and morals,

01:27:48 decisions that touch on ethics,

01:27:50 how hard do you think that problem is?

01:27:52 How do we have lost functions in neural networks

01:27:57 that have some component,

01:27:58 some fuzzy components of human morals?

01:28:01 Well, I think this is really all about objective function engineering,

01:28:06 which is probably going to be increasingly a topic of concern in the future.

01:28:10 Like for now, we’re just using very naive loss functions

01:28:14 because the hard part is not actually what you’re trying to minimize.

01:28:17 It’s everything else.

01:28:19 But as the everything else is going to be increasingly automated,

01:28:22 we’re going to be focusing our human attention

01:28:27 on increasingly high level components,

01:28:30 like what’s actually driving the whole learning system,

01:28:32 like the objective function.

01:28:33 So loss function engineering is going to be,

01:28:36 loss function engineer is probably going to be a job title in the future.

01:28:40 And then the tooling you’re creating with Keras essentially

01:28:44 takes care of all the details underneath.

01:28:47 And basically the human expert is needed for exactly that.

01:28:52 That’s the idea.

01:28:53 Keras is the interface between the data you’re collecting

01:28:57 and the business goals.

01:28:59 And your job as an engineer is going to be to express your business goals

01:29:03 and your understanding of your business or your product,

01:29:06 your system as a kind of loss function or a kind of set of constraints.

01:29:11 Does the possibility of creating an AGI system excite you or scare you or bore you?

01:29:19 So intelligence can never really be general.

01:29:22 You know, at best it can have some degree of generality like human intelligence.

01:29:26 It also always has some specialization in the same way that human intelligence

01:29:30 is specialized in a certain category of problems,

01:29:33 is specialized in the human experience.

01:29:35 And when people talk about AGI,

01:29:37 I’m never quite sure if they’re talking about very, very smart AI,

01:29:42 so smart that it’s even smarter than humans,

01:29:45 or they’re talking about human like intelligence,

01:29:48 because these are different things.

01:29:49 Let’s say, presumably I’m oppressing you today with my humanness.

01:29:54 So imagine that I was in fact a robot.

01:29:59 So what does that mean?

01:30:01 That I’m impressing you with natural language processing.

01:30:04 Maybe if you weren’t able to see me, maybe this is a phone call.

01:30:07 So that kind of system.

01:30:10 Companion.

01:30:11 So that’s very much about building human like AI.

01:30:15 And you’re asking me, you know, is this an exciting perspective?

01:30:18 Yes.

01:30:19 I think so, yes.

01:30:21 Not so much because of what artificial human like intelligence could do,

01:30:28 but, you know, from an intellectual perspective,

01:30:30 I think if you could build truly human like intelligence,

01:30:34 that means you could actually understand human intelligence,

01:30:37 which is fascinating, right?

01:30:39 Human like intelligence is going to require emotions.

01:30:42 It’s going to require consciousness,

01:30:44 which is not things that would normally be required by an intelligent system.

01:30:49 If you look at, you know, we were mentioning earlier like science

01:30:53 as a superhuman problem solving agent or system,

01:30:59 it does not have consciousness, it doesn’t have emotions.

01:31:02 In general, so emotions,

01:31:04 I see consciousness as being on the same spectrum as emotions.

01:31:07 It is a component of the subjective experience

01:31:12 that is meant very much to guide behavior generation, right?

01:31:18 It’s meant to guide your behavior.

01:31:20 In general, human intelligence and animal intelligence

01:31:24 has evolved for the purpose of behavior generation, right?

01:31:29 Including in a social context.

01:31:30 So that’s why we actually need emotions.

01:31:32 That’s why we need consciousness.

01:31:34 An artificial intelligence system developed in a different context

01:31:38 may well never need them, may well never be conscious like science.

01:31:42 Well, on that point, I would argue it’s possible to imagine

01:31:47 that there’s echoes of consciousness in science

01:31:51 when viewed as an organism, that science is consciousness.

01:31:55 So, I mean, how would you go about testing this hypothesis?

01:31:59 How do you probe the subjective experience of an abstract system like science?

01:32:07 Well, the point of probing any subjective experience is impossible

01:32:10 because I’m not science, I’m Lex.

01:32:13 So I can’t probe another entity, it’s no more than bacteria on my skin.

01:32:20 You’re Lex, I can ask you questions about your subjective experience

01:32:24 and you can answer me, and that’s how I know you’re conscious.

01:32:28 Yes, but that’s because we speak the same language.

01:32:31 You perhaps, we have to speak the language of science in order to ask it.

01:32:35 Honestly, I don’t think consciousness, just like emotions of pain and pleasure,

01:32:40 is not something that inevitably arises

01:32:44 from any sort of sufficiently intelligent information processing.

01:32:47 It is a feature of the mind, and if you’ve not implemented it explicitly, it is not there.

01:32:53 So you think it’s an emergent feature of a particular architecture.

01:32:58 So do you think…

01:33:00 It’s a feature in the same sense.

01:33:02 So, again, the subjective experience is all about guiding behavior.

01:33:08 If the problems you’re trying to solve don’t really involve an embodied agent,

01:33:15 maybe in a social context, generating behavior and pursuing goals like this.

01:33:19 And if you look at science, that’s not really what’s happening.

01:33:22 Even though it is, it is a form of artificial AI, artificial intelligence,

01:33:27 in the sense that it is solving problems, it is accumulating knowledge,

01:33:31 accumulating solutions and so on.

01:33:35 So if you’re not explicitly implementing a subjective experience,

01:33:39 implementing certain emotions and implementing consciousness,

01:33:44 it’s not going to just spontaneously emerge.

01:33:47 Yeah.

01:33:48 But so for a system like, human like intelligence system that has consciousness,

01:33:53 do you think it needs to have a body?

01:33:55 Yes, definitely.

01:33:56 I mean, it doesn’t have to be a physical body, right?

01:33:59 And there’s not that much difference between a realistic simulation in the real world.

01:34:03 So there has to be something you have to preserve kind of thing.

01:34:06 Yes, but human like intelligence can only arise in a human like context.

01:34:11 Intelligence needs other humans in order for you to demonstrate

01:34:16 that you have human like intelligence, essentially.

01:34:19 Yes.

01:34:20 So what kind of tests and demonstration would be sufficient for you

01:34:28 to demonstrate human like intelligence?

01:34:30 Yeah.

01:34:31 Just out of curiosity, you’ve talked about in terms of theorem proving

01:34:35 and program synthesis, I think you’ve written about

01:34:38 that there’s no good benchmarks for this.

01:34:40 Yeah.

01:34:40 That’s one of the problems.

01:34:42 So let’s talk program synthesis.

01:34:46 So what do you imagine is a good…

01:34:48 I think it’s related questions for human like intelligence

01:34:51 and for program synthesis.

01:34:53 What’s a good benchmark for either or both?

01:34:56 Right.

01:34:56 So I mean, you’re actually asking two questions,

01:34:59 which is one is about quantifying intelligence

01:35:02 and comparing the intelligence of an artificial system

01:35:06 to the intelligence for human.

01:35:08 And the other is about the degree to which this intelligence is human like.

01:35:13 It’s actually two different questions.

01:35:16 So you mentioned earlier the Turing test.

01:35:19 Well, I actually don’t like the Turing test because it’s very lazy.

01:35:23 It’s all about completely bypassing the problem of defining and measuring intelligence

01:35:28 and instead delegating to a human judge or a panel of human judges.

01:35:34 So it’s a total copout, right?

01:35:38 If you want to measure how human like an agent is,

01:35:43 I think you have to make it interact with other humans.

01:35:47 Maybe it’s not necessarily a good idea to have these other humans be the judges.

01:35:53 Maybe you should just observe behavior and compare it to what a human would actually have done.

01:36:00 When it comes to measuring how smart, how clever an agent is

01:36:05 and comparing that to the degree of human intelligence.

01:36:11 So we’re already talking about two things, right?

01:36:13 The degree, kind of like the magnitude of an intelligence and its direction, right?

01:36:20 Like the norm of a vector and its direction.

01:36:23 And the direction is like human likeness and the magnitude, the norm is intelligence.

01:36:32 You could call it intelligence, right?

01:36:34 So the direction, your sense, the space of directions that are human like is very narrow.

01:36:41 Yeah.

01:36:42 So the way you would measure the magnitude of intelligence in a system

01:36:48 in a way that also enables you to compare it to that of a human.

01:36:54 Well, if you look at different benchmarks for intelligence today,

01:36:59 they’re all too focused on skill at a given task.

01:37:04 Like skill at playing chess, skill at playing Go, skill at playing Dota.

01:37:10 And I think that’s not the right way to go about it because you can always

01:37:15 beat a human at one specific task.

01:37:19 The reason why our skill at playing Go or juggling or anything is impressive

01:37:23 is because we are expressing this skill within a certain set of constraints.

01:37:28 If you remove the constraints, the constraints that we have one lifetime,

01:37:32 that we have this body and so on, if you remove the context,

01:37:36 if you have unlimited string data, if you can have access to, you know,

01:37:40 for instance, if you look at juggling, if you have no restriction on the hardware,

01:37:44 then achieving arbitrary levels of skill is not very interesting

01:37:48 and says nothing about the amount of intelligence you’ve achieved.

01:37:52 So if you want to measure intelligence, you need to rigorously define what

01:37:57 intelligence is, which in itself, you know, it’s a very challenging problem.

01:38:02 And do you think that’s possible?

01:38:04 To define intelligence? Yes, absolutely.

01:38:06 I mean, you can provide, many people have provided, you know, some definition.

01:38:10 I have my own definition.

01:38:12 Where does your definition begin?

01:38:13 Where does your definition begin if it doesn’t end?

01:38:16 Well, I think intelligence is essentially the efficiency

01:38:22 with which you turn experience into generalizable programs.

01:38:29 So what that means is it’s the efficiency with which

01:38:32 you turn a sampling of experience space into

01:38:36 the ability to process a larger chunk of experience space.

01:38:46 So measuring skill can be one proxy across many different tasks,

01:38:52 can be one proxy for measuring intelligence.

01:38:54 But if you want to only measure skill, you should control for two things.

01:38:58 You should control for the amount of experience that your system has

01:39:04 and the priors that your system has.

01:39:08 But if you look at two agents and you give them the same priors

01:39:13 and you give them the same amount of experience,

01:39:16 there is one of the agents that is going to learn programs,

01:39:21 representations, something, a model that will perform well

01:39:25 on the larger chunk of experience space than the other.

01:39:28 And that is the smaller agent.

01:39:30 Yeah. So if you fix the experience, which generate better programs,

01:39:37 better meaning more generalizable.

01:39:39 That’s really interesting.

01:39:40 That’s a very nice, clean definition of…

01:39:42 Oh, by the way, in this definition, it is already very obvious

01:39:47 that intelligence has to be specialized

01:39:49 because you’re talking about experience space

01:39:51 and you’re talking about segments of experience space.

01:39:54 You’re talking about priors and you’re talking about experience.

01:39:57 All of these things define the context in which intelligence emerges.

01:40:04 And you can never look at the totality of experience space, right?

01:40:09 So intelligence has to be specialized.

01:40:12 But it can be sufficiently large, the experience space,

01:40:14 even though it’s specialized.

01:40:16 There’s a certain point when the experience space is large enough

01:40:19 to where it might as well be general.

01:40:22 It feels general. It looks general.

01:40:23 Sure. I mean, it’s very relative.

01:40:25 Like, for instance, many people would say human intelligence is general.

01:40:29 In fact, it is quite specialized.

01:40:32 We can definitely build systems that start from the same innate priors

01:40:37 as what humans have at birth.

01:40:39 Because we already understand fairly well

01:40:42 what sort of priors we have as humans.

01:40:44 Like many people have worked on this problem.

01:40:46 Most notably, Elisabeth Spelke from Harvard.

01:40:51 I don’t know if you know her.

01:40:52 She’s worked a lot on what she calls core knowledge.

01:40:56 And it is very much about trying to determine and describe

01:41:00 what priors we are born with.

01:41:02 Like language skills and so on, all that kind of stuff.

01:41:04 Exactly.

01:41:06 So we have some pretty good understanding of what priors we are born with.

01:41:11 So we could…

01:41:13 So I’ve actually been working on a benchmark for the past couple years,

01:41:17 you know, on and off.

01:41:18 I hope to be able to release it at some point.

01:41:20 That’s exciting.

01:41:21 The idea is to measure the intelligence of systems

01:41:26 by countering for priors,

01:41:28 countering for amount of experience,

01:41:30 and by assuming the same priors as what humans are born with.

01:41:34 So that you can actually compare these scores to human intelligence.

01:41:39 You can actually have humans pass the same test in a way that’s fair.

01:41:43 Yeah. And so importantly, such a benchmark should be such that any amount

01:41:52 of practicing does not increase your score.

01:41:56 So try to picture a game where no matter how much you play this game,

01:42:01 that does not change your skill at the game.

01:42:05 Can you picture that?

01:42:05 As a person who deeply appreciates practice, I cannot actually.

01:42:11 There’s actually a very simple trick.

01:42:16 So in order to come up with a task,

01:42:19 so the only thing you can measure is skill at the task.

01:42:21 Yes.

01:42:22 All tasks are going to involve priors.

01:42:24 Yes.

01:42:25 The trick is to know what they are and to describe that.

01:42:29 And then you make sure that this is the same set of priors as what humans start with.

01:42:33 So you create a task that assumes these priors, that exactly documents these priors,

01:42:38 so that the priors are made explicit and there are no other priors involved.

01:42:42 And then you generate a certain number of samples in experience space for this task, right?

01:42:49 And this, for one task, assuming that the task is new for the agent passing it,

01:42:56 that’s one test of this definition of intelligence that we set up.

01:43:04 And now you can scale that to many different tasks,

01:43:06 that each task should be new to the agent passing it, right?

01:43:11 And also it should be human interpretable and understandable

01:43:14 so that you can actually have a human pass the same test.

01:43:16 And then you can compare the score of your machine and the score of your human.

01:43:19 Which could be a lot of stuff.

01:43:20 You could even start a task like MNIST.

01:43:23 Just as long as you start with the same set of priors.

01:43:28 So the problem with MNIST, humans are already trying to recognize digits, right?

01:43:35 But let’s say we’re considering objects that are not digits,

01:43:42 some completely arbitrary patterns.

01:43:44 Well, humans already come with visual priors about how to process that.

01:43:48 So in order to make the game fair, you would have to isolate these priors

01:43:54 and describe them and then express them as computational rules.

01:43:57 Having worked a lot with vision science people, that’s exceptionally difficult.

01:44:01 A lot of progress has been made.

01:44:03 There’s been a lot of good tests and basically reducing all of human vision into some good priors.

01:44:08 We’re still probably far away from that perfectly,

01:44:10 but as a start for a benchmark, that’s an exciting possibility.

01:44:14 Yeah, so Elisabeth Spelke actually lists objectness as one of the core knowledge priors.

01:44:24 Objectness, cool.

01:44:25 Objectness, yeah.

01:44:27 So we have priors about objectness, like about the visual space, about time,

01:44:31 about agents, about goal oriented behavior.

01:44:35 We have many different priors, but what’s interesting is that,

01:44:39 sure, we have this pretty diverse and rich set of priors,

01:44:43 but it’s also not that diverse, right?

01:44:46 We are not born into this world with a ton of knowledge about the world,

01:44:50 with only a small set of core knowledge.

01:44:58 Yeah, sorry, do you have a sense of how it feels to us humans that that set is not that large?

01:45:05 But just even the nature of time that we kind of integrate pretty effectively

01:45:09 through all of our perception, all of our reasoning,

01:45:12 maybe how, you know, do you have a sense of how easy it is to encode those priors?

01:45:17 Maybe it requires building a universe and then the human brain in order to encode those priors.

01:45:25 Or do you have a hope that it can be listed like an axiomatic?

01:45:28 I don’t think so.

01:45:29 So you have to keep in mind that any knowledge about the world that we are

01:45:33 born with is something that has to have been encoded into our DNA by evolution at some point.

01:45:41 Right.

01:45:41 And DNA is a very, very low bandwidth medium.

01:45:46 Like it’s extremely long and expensive to encode anything into DNA because first of all,

01:45:52 you need some sort of evolutionary pressure to guide this writing process.

01:45:57 And then, you know, the higher level of information you’re trying to write, the longer it’s going to take.

01:46:04 And the thing in the environment that you’re trying to encode knowledge about has to be stable

01:46:13 over this duration.

01:46:15 So you can only encode into DNA things that constitute an evolutionary advantage.

01:46:20 So this is actually a very small subset of all possible knowledge about the world.

01:46:25 You can only encode things that are stable, that are true, over very, very long periods of time,

01:46:32 typically millions of years.

01:46:33 For instance, we might have some visual prior about the shape of snakes, right?

01:46:38 But what makes a face, what’s the difference between a face and an art face?

01:46:44 But consider this interesting question.

01:46:48 Do we have any innate sense of the visual difference between a male face and a female face?

01:46:56 What do you think?

01:46:58 For a human, I mean.

01:46:59 I would have to look back into evolutionary history when the genders emerged.

01:47:04 But yeah, most…

01:47:06 I mean, the faces of humans are quite different from the faces of great apes.

01:47:10 Great apes, right?

01:47:12 Yeah.

01:47:13 That’s interesting.

01:47:14 Yeah, you couldn’t tell the face of a female chimpanzee from the face of a male chimpanzee,

01:47:22 probably.

01:47:23 Yeah, and I don’t think most humans have all that ability.

01:47:26 So we do have innate knowledge of what makes a face, but it’s actually impossible for us to

01:47:33 have any DNA encoded knowledge of the difference between a female human face and a male human face

01:47:40 because that knowledge, that information came up into the world actually very recently.

01:47:50 If you look at the slowness of the process of encoding knowledge into DNA.

01:47:56 Yeah, so that’s interesting.

01:47:57 That’s a really powerful argument that DNA is a low bandwidth and it takes a long time to encode.

01:48:02 That naturally creates a very efficient encoding.

01:48:05 But one important consequence of this is that, so yes, we are born into this world with a bunch of

01:48:12 knowledge, sometimes high level knowledge about the world, like the shape, the rough shape of a

01:48:17 snake, of the rough shape of a face.

01:48:20 But importantly, because this knowledge takes so long to write, almost all of this innate

01:48:26 knowledge is shared with our cousins, with great apes, right?

01:48:32 So it is not actually this innate knowledge that makes us special.

01:48:36 But to throw it right back at you from the earlier on in our discussion, it’s that encoding

01:48:42 might also include the entirety of the environment of Earth.

01:48:49 To some extent.

01:48:49 So it can include things that are important to survival and production, so for which there is

01:48:56 some evolutionary pressure, and things that are stable, constant over very, very, very long time

01:49:02 periods.

01:49:04 And honestly, it’s not that much information.

01:49:06 There’s also, besides the bandwidths constraint and the constraints of the writing process,

01:49:14 there’s also memory constraints, like DNA, the part of DNA that deals with the human brain,

01:49:21 it’s actually fairly small.

01:49:22 It’s like, you know, on the order of megabytes, right?

01:49:25 There’s not that much high level knowledge about the world you can encode.

01:49:31 That’s quite brilliant and hopeful for a benchmark that you’re referring to of encoding

01:49:38 priors.

01:49:39 I actually look forward to, I’m skeptical whether you can do it in the next couple of

01:49:43 years, but hopefully.

01:49:45 I’ve been working.

01:49:45 So honestly, it’s a very simple benchmark, and it’s not like a big breakthrough or anything.

01:49:49 It’s more like a fun side project, right?

01:49:53 But these fun, so is ImageNet.

01:49:56 These fun side projects could launch entire groups of efforts towards creating reasoning

01:50:04 systems and so on.

01:50:04 And I think…

01:50:05 Yeah, that’s the goal.

01:50:06 It’s trying to measure strong generalization, to measure the strength of abstraction in

01:50:12 our minds, well, in our minds and in artificial intelligence agencies.

01:50:16 And if there’s anything true about this science organism is its individual cells love competition.

01:50:24 So and benchmarks encourage competition.

01:50:26 So that’s an exciting possibility.

01:50:29 If you, do you think an AI winter is coming?

01:50:33 And how do we prevent it?

01:50:35 Not really.

01:50:36 So an AI winter is something that would occur when there’s a big mismatch between how we

01:50:42 are selling the capabilities of AI and the actual capabilities of AI.

01:50:47 And today, some deep learning is creating a lot of value.

01:50:50 And it will keep creating a lot of value in the sense that these models are applicable

01:50:56 to a very wide range of problems that are relevant today.

01:51:00 And we are only just getting started with applying these algorithms to every problem

01:51:05 they could be solving.

01:51:06 So deep learning will keep creating a lot of value for the time being.

01:51:10 What’s concerning, however, is that there’s a lot of hype around deep learning and around

01:51:15 AI.

01:51:16 There are lots of people are overselling the capabilities of these systems, not just

01:51:22 the capabilities, but also overselling the fact that they might be more or less, you

01:51:27 know, brain like, like given the kind of a mystical aspect, these technologies and also

01:51:36 overselling the pace of progress, which, you know, it might look fast in the sense that

01:51:43 we have this exponentially increasing number of papers.

01:51:47 But again, that’s just a simple consequence of the fact that we have ever more people

01:51:52 coming into the field.

01:51:54 It doesn’t mean the progress is actually exponentially fast.

01:51:58 Let’s say you’re trying to raise money for your startup or your research lab.

01:52:02 You might want to tell, you know, a grandiose story to investors about how deep learning

01:52:09 is just like the brain and how it can solve all these incredible problems like self driving

01:52:14 and robotics and so on.

01:52:15 And maybe you can tell them that the field is progressing so fast and we are going to

01:52:19 have AGI within 15 years or even 10 years.

01:52:23 And none of this is true.

01:52:25 And every time you’re like saying these things and an investor or, you know, a decision maker

01:52:32 believes them, well, this is like the equivalent of taking on credit card debt, but for trust,

01:52:41 right?

01:52:42 And maybe this will, you know, this will be what enables you to raise a lot of money,

01:52:50 but ultimately you are creating damage, you are damaging the field.

01:52:54 So that’s the concern is that that debt, that’s what happens with the other AI winters is

01:53:00 the concern is you actually tweeted about this with autonomous vehicles, right?

01:53:04 There’s almost every single company now have promised that they will have full autonomous

01:53:08 vehicles by 2021, 2022.

01:53:11 That’s a good example of the consequences of over hyping the capabilities of AI and

01:53:18 the pace of progress.

01:53:19 So because I work especially a lot recently in this area, I have a deep concern of what

01:53:25 happens when all of these companies after I’ve invested billions have a meeting and

01:53:30 say, how much do we actually, first of all, do we have an autonomous vehicle?

01:53:33 The answer will definitely be no.

01:53:35 And second will be, wait a minute, we’ve invested one, two, three, four billion dollars

01:53:40 into this and we made no profit.

01:53:43 And the reaction to that may be going very hard in other directions that might impact

01:53:49 even other industries.

01:53:50 And that’s what we call an AI winter is when there is backlash where no one believes any

01:53:55 of these promises anymore because they’ve turned that to be big lies the first time

01:53:59 around.

01:54:00 And this will definitely happen to some extent for autonomous vehicles because the public

01:54:06 and decision makers have been convinced that around 2015, they’ve been convinced by these

01:54:13 people who are trying to raise money for their startups and so on, that L5 driving was coming

01:54:19 in maybe 2016, maybe 2017, maybe 2018.

01:54:22 Now we’re in 2019, we’re still waiting for it.

01:54:27 And so I don’t believe we are going to have a full on AI winter because we have these

01:54:32 technologies that are producing a tremendous amount of real value.

01:54:37 But there is also too much hype.

01:54:39 So there will be some backlash, especially there will be backlash.

01:54:44 So some startups are trying to sell the dream of AGI and the fact that AGI is going to create

01:54:53 infinite value.

01:54:53 Like AGI is like a free lunch.

01:54:55 Like if you can develop an AI system that passes a certain threshold of IQ or something,

01:55:02 then suddenly you have infinite value.

01:55:04 And well, there are actually lots of investors buying into this idea and they will wait maybe

01:55:14 10, 15 years and nothing will happen.

01:55:17 And the next time around, well, maybe there will be a new generation of investors.

01:55:22 No one will care.

01:55:24 Human memory is fairly short after all.

01:55:27 I don’t know about you, but because I’ve spoken about AGI sometimes poetically, I get a lot

01:55:34 of emails from people giving me, they’re usually like a large manifestos of they’ve, they say

01:55:42 to me that they have created an AGI system or they know how to do it.

01:55:47 And there’s a long write up of how to do it.

01:55:48 I get a lot of these emails, yeah.

01:55:50 They’re a little bit feel like it’s generated by an AI system actually, but there’s usually

01:55:57 no diagram, you have a transformer generating crank papers about AGI.

01:56:06 So the question is about, because you’ve been such a good, you have a good radar for crank

01:56:12 papers, how do we know they’re not onto something?

01:56:16 How do I, so when you start to talk about AGI or anything like the reasoning benchmarks

01:56:24 and so on, so something that doesn’t have a benchmark, it’s really difficult to know.

01:56:29 I mean, I talked to Jeff Hawkins, who’s really looking at neuroscience approaches to how,

01:56:35 and there’s some, there’s echoes of really interesting ideas in at least Jeff’s case,

01:56:41 which he’s showing.

01:56:43 How do you usually think about this?

01:56:46 Like preventing yourself from being too narrow minded and elitist about deep learning, it

01:56:52 has to work on these particular benchmarks, otherwise it’s trash.

01:56:56 Well, you know, the thing is, intelligence does not exist in the abstract.

01:57:05 Intelligence has to be applied.

01:57:07 So if you don’t have a benchmark, if you have an improvement in some benchmark, maybe it’s

01:57:11 a new benchmark, right?

01:57:12 Maybe it’s not something we’ve been looking at before, but you do need a problem that

01:57:16 you’re trying to solve.

01:57:17 You’re not going to come up with a solution without a problem.

01:57:20 So you, general intelligence, I mean, you’ve clearly highlighted generalization.

01:57:26 If you want to claim that you have an intelligence system, it should come with a benchmark.

01:57:31 It should, yes, it should display capabilities of some kind.

01:57:35 It should show that it can create some form of value, even if it’s a very artificial form

01:57:41 of value.

01:57:42 And that’s also the reason why you don’t actually need to care about telling which papers have

01:57:48 actually some hidden potential and which do not.

01:57:53 Because if there is a new technique that’s actually creating value, this is going to

01:57:59 be brought to light very quickly because it’s actually making a difference.

01:58:02 So it’s the difference between something that is ineffectual and something that is actually

01:58:08 useful.

01:58:08 And ultimately usefulness is our guide, not just in this field, but if you look at science

01:58:14 in general, maybe there are many, many people over the years that have had some really interesting

01:58:19 theories of everything, but they were just completely useless.

01:58:22 And you don’t actually need to tell the interesting theories from the useless theories.

01:58:28 All you need is to see, is this actually having an effect on something else?

01:58:34 Is this actually useful?

01:58:35 Is this making an impact or not?

01:58:37 That’s beautifully put.

01:58:38 I mean, the same applies to quantum mechanics, to string theory, to the holographic principle.

01:58:43 We are doing deep learning because it works.

01:58:46 Before it started working, people considered people working on neural networks as cranks

01:58:52 very much.

01:58:54 No one was working on this anymore.

01:58:56 And now it’s working, which is what makes it valuable.

01:58:59 It’s not about being right.

01:59:01 It’s about being effective.

01:59:02 And nevertheless, the individual entities of this scientific mechanism, just like Yoshua

01:59:08 Banjo or Jan Lekun, they, while being called cranks, stuck with it.

01:59:12 Right?

01:59:12 Yeah.

01:59:13 And so us individual agents, even if everyone’s laughing at us, just stick with it.

01:59:18 If you believe you have something, you should stick with it and see it through.

01:59:23 That’s a beautiful inspirational message to end on.

01:59:25 Francois, thank you so much for talking today.

01:59:27 That was amazing.

01:59:28 Thank you.