Transcript
00:00:00 The following is a conversation with Francois Chollet.
00:00:03 He’s the creator of Keras,
00:00:05 which is an open source deep learning library
00:00:08 that is designed to enable fast, user friendly experimentation
00:00:11 with deep neural networks.
00:00:13 It serves as an interface to several deep learning libraries,
00:00:16 most popular of which is TensorFlow,
00:00:19 and it was integrated into the TensorFlow main code base
00:00:22 a while ago.
00:00:24 Meaning, if you want to create, train,
00:00:27 and use neural networks,
00:00:28 probably the easiest and most popular option
00:00:31 is to use Keras inside TensorFlow.
00:00:34 Aside from creating an exceptionally useful
00:00:37 and popular library,
00:00:38 Francois is also a world class AI researcher
00:00:41 and software engineer at Google.
00:00:44 And he’s definitely an outspoken,
00:00:46 if not controversial personality in the AI world,
00:00:50 especially in the realm of ideas
00:00:52 around the future of artificial intelligence.
00:00:55 This is the Artificial Intelligence Podcast.
00:00:58 If you enjoy it, subscribe on YouTube,
00:01:01 give it five stars on iTunes,
00:01:02 support it on Patreon,
00:01:04 or simply connect with me on Twitter
00:01:06 at Lex Friedman, spelled F R I D M A N.
00:01:09 And now, here’s my conversation with Francois Chollet.
00:01:14 You’re known for not sugarcoating your opinions
00:01:17 and speaking your mind about ideas in AI,
00:01:19 especially on Twitter.
00:01:21 It’s one of my favorite Twitter accounts.
00:01:22 So what’s one of the more controversial ideas
00:01:26 you’ve expressed online and gotten some heat for?
00:01:30 How do you pick?
00:01:33 How do I pick?
00:01:33 Yeah, no, I think if you go through the trouble
00:01:36 of maintaining a Twitter account,
00:01:39 you might as well speak your mind, you know?
00:01:41 Otherwise, what’s even the point of having a Twitter account?
00:01:44 It’s like having a nice car
00:01:45 and just leaving it in the garage.
00:01:48 Yeah, so what’s one thing for which I got
00:01:50 a lot of pushback?
00:01:53 Perhaps, you know, that time I wrote something
00:01:56 about the idea of intelligence explosion,
00:02:00 and I was questioning the idea
00:02:04 and the reasoning behind this idea.
00:02:06 And I got a lot of pushback on that.
00:02:09 I got a lot of flak for it.
00:02:11 So yeah, so intelligence explosion,
00:02:13 I’m sure you’re familiar with the idea,
00:02:14 but it’s the idea that if you were to build
00:02:18 general AI problem solving algorithms,
00:02:22 well, the problem of building such an AI,
00:02:27 that itself is a problem that could be solved by your AI,
00:02:30 and maybe it could be solved better
00:02:31 than what humans can do.
00:02:33 So your AI could start tweaking its own algorithm,
00:02:36 could start making a better version of itself,
00:02:39 and so on iteratively in a recursive fashion.
00:02:43 And so you would end up with an AI
00:02:47 with exponentially increasing intelligence.
00:02:50 That’s right.
00:02:50 And I was basically questioning this idea,
00:02:55 first of all, because the notion of intelligence explosion
00:02:59 uses an implicit definition of intelligence
00:03:02 that doesn’t sound quite right to me.
00:03:05 It considers intelligence as a property of a brain
00:03:11 that you can consider in isolation,
00:03:13 like the height of a building, for instance.
00:03:16 But that’s not really what intelligence is.
00:03:19 Intelligence emerges from the interaction
00:03:22 between a brain, a body,
00:03:25 like embodied intelligence, and an environment.
00:03:28 And if you’re missing one of these pieces,
00:03:30 then you cannot really define intelligence anymore.
00:03:33 So just tweaking a brain to make it smaller and smaller
00:03:36 doesn’t actually make any sense to me.
00:03:39 So first of all,
00:03:39 you’re crushing the dreams of many people, right?
00:03:43 So there’s a, let’s look at like Sam Harris.
00:03:46 Actually, a lot of physicists, Max Tegmark,
00:03:48 people who think the universe
00:03:52 is an information processing system,
00:03:54 our brain is kind of an information processing system.
00:03:57 So what’s the theoretical limit?
00:03:59 Like, it doesn’t make sense that there should be some,
00:04:04 it seems naive to think that our own brain
00:04:07 is somehow the limit of the capabilities
00:04:10 of this information system.
00:04:11 I’m playing devil’s advocate here.
00:04:13 This information processing system.
00:04:15 And then if you just scale it,
00:04:17 if you’re able to build something
00:04:19 that’s on par with the brain,
00:04:20 you just, the process that builds it just continues
00:04:24 and it’ll improve exponentially.
00:04:26 So that’s the logic that’s used actually
00:04:30 by almost everybody
00:04:32 that is worried about super human intelligence.
00:04:36 So you’re trying to make,
00:04:39 so most people who are skeptical of that
00:04:40 are kind of like, this doesn’t,
00:04:43 their thought process, this doesn’t feel right.
00:04:46 Like that’s for me as well.
00:04:47 So I’m more like, it doesn’t,
00:04:51 the whole thing is shrouded in mystery
00:04:52 where you can’t really say anything concrete,
00:04:55 but you could say this doesn’t feel right.
00:04:57 This doesn’t feel like that’s how the brain works.
00:05:00 And you’re trying to with your blog posts
00:05:02 and now making it a little more explicit.
00:05:05 So one idea is that the brain isn’t exist alone.
00:05:10 It exists within the environment.
00:05:13 So you can’t exponentially,
00:05:15 you would have to somehow exponentially improve
00:05:18 the environment and the brain together almost.
00:05:20 Yeah, in order to create something that’s much smarter
00:05:25 in some kind of,
00:05:27 of course we don’t have a definition of intelligence.
00:05:29 That’s correct, that’s correct.
00:05:31 I don’t think, you should look at very smart people today,
00:05:34 even humans, not even talking about AIs.
00:05:37 I don’t think their brain
00:05:38 and the performance of their brain is the bottleneck
00:05:41 to their expressed intelligence, to their achievements.
00:05:46 You cannot just tweak one part of this system,
00:05:49 like of this brain, body, environment system
00:05:52 and expect that capabilities like what emerges
00:05:55 out of this system to just explode exponentially.
00:06:00 Because anytime you improve one part of a system
00:06:04 with many interdependencies like this,
00:06:06 there’s a new bottleneck that arises, right?
00:06:09 And I don’t think even today for very smart people,
00:06:12 their brain is not the bottleneck
00:06:15 to the sort of problems they can solve, right?
00:06:17 In fact, many very smart people today,
00:06:20 you know, they are not actually solving
00:06:22 any big scientific problems, they’re not Einstein.
00:06:24 They’re like Einstein, but you know, the patent clerk days.
00:06:29 Like Einstein became Einstein
00:06:31 because this was a meeting of a genius
00:06:36 with a big problem at the right time, right?
00:06:39 But maybe this meeting could have never happened
00:06:42 and then Einstein would have just been a patent clerk, right?
00:06:44 And in fact, many people today are probably like
00:06:49 genius level smart, but you wouldn’t know
00:06:52 because they’re not really expressing any of that.
00:06:54 Wow, that’s brilliant.
00:06:55 So we can think of the world, Earth,
00:06:58 but also the universe as just as a space of problems.
00:07:02 So all these problems and tasks are roaming it
00:07:05 of various difficulty.
00:07:06 And there’s agents, creatures like ourselves
00:07:10 and animals and so on that are also roaming it.
00:07:13 And then you get coupled with a problem
00:07:16 and then you solve it.
00:07:17 But without that coupling,
00:07:19 you can’t demonstrate your quote unquote intelligence.
00:07:22 Exactly, intelligence is the meeting
00:07:24 of great problem solving capabilities
00:07:27 with a great problem.
00:07:28 And if you don’t have the problem,
00:07:30 you don’t really express any intelligence.
00:07:32 All you’re left with is potential intelligence,
00:07:34 like the performance of your brain
00:07:36 or how high your IQ is,
00:07:38 which in itself is just a number, right?
00:07:42 So you mentioned problem solving capacity.
00:07:46 Yeah.
00:07:47 What do you think of as problem solving capacity?
00:07:51 Can you try to define intelligence?
00:07:56 Like what does it mean to be more or less intelligent?
00:08:00 Is it completely coupled to a particular problem
00:08:03 or is there something a little bit more universal?
00:08:05 Yeah, I do believe all intelligence
00:08:07 is specialized intelligence.
00:08:09 Even human intelligence has some degree of generality.
00:08:12 Well, all intelligent systems have some degree of generality
00:08:15 but they’re always specialized in one category of problems.
00:08:19 So the human intelligence is specialized
00:08:21 in the human experience.
00:08:23 And that shows at various levels,
00:08:25 that shows in some prior knowledge that’s innate
00:08:30 that we have at birth.
00:08:32 Knowledge about things like agents,
00:08:35 goal driven behavior, visual priors
00:08:38 about what makes an object, priors about time and so on.
00:08:43 That shows also in the way we learn.
00:08:45 For instance, it’s very, very easy for us
00:08:47 to pick up language.
00:08:49 It’s very, very easy for us to learn certain things
00:08:52 because we are basically hard coded to learn them.
00:08:54 And we are specialized in solving certain kinds of problem
00:08:58 and we are quite useless
00:08:59 when it comes to other kinds of problems.
00:09:01 For instance, we are not really designed
00:09:06 to handle very long term problems.
00:09:08 We have no capability of seeing the very long term.
00:09:12 We don’t have very much working memory.
00:09:18 So how do you think about long term?
00:09:20 Do you think long term planning,
00:09:21 are we talking about scale of years, millennia?
00:09:24 What do you mean by long term?
00:09:26 We’re not very good.
00:09:28 Well, human intelligence is specialized
00:09:29 in the human experience.
00:09:30 And human experience is very short.
00:09:32 One lifetime is short.
00:09:34 Even within one lifetime,
00:09:35 we have a very hard time envisioning things
00:09:40 on a scale of years.
00:09:41 It’s very difficult to project yourself
00:09:43 at a scale of five years, at a scale of 10 years and so on.
00:09:46 We can solve only fairly narrowly scoped problems.
00:09:50 So when it comes to solving bigger problems,
00:09:52 larger scale problems,
00:09:53 we are not actually doing it on an individual level.
00:09:56 So it’s not actually our brain doing it.
00:09:59 We have this thing called civilization, right?
00:10:03 Which is itself a sort of problem solving system,
00:10:06 a sort of artificially intelligent system, right?
00:10:10 And it’s not running on one brain,
00:10:12 it’s running on a network of brains.
00:10:14 In fact, it’s running on much more
00:10:15 than a network of brains.
00:10:16 It’s running on a lot of infrastructure,
00:10:20 like books and computers and the internet
00:10:23 and human institutions and so on.
00:10:25 And that is capable of handling problems
00:10:30 on a much greater scale than any individual human.
00:10:33 If you look at computer science, for instance,
00:10:37 that’s an institution that solves problems
00:10:39 and it is superhuman, right?
00:10:42 It operates on a greater scale.
00:10:44 It can solve much bigger problems
00:10:46 than an individual human could.
00:10:49 And science itself, science as a system, as an institution,
00:10:52 is a kind of artificially intelligent problem solving
00:10:57 algorithm that is superhuman.
00:10:59 Yeah, it’s, at least computer science
00:11:02 is like a theorem prover at a scale of thousands,
00:11:07 maybe hundreds of thousands of human beings.
00:11:10 At that scale, what do you think is an intelligent agent?
00:11:14 So there’s us humans at the individual level,
00:11:18 there is millions, maybe billions of bacteria in our skin.
00:11:23 There is, that’s at the smaller scale.
00:11:26 You can even go to the particle level
00:11:29 as systems that behave,
00:11:31 you can say intelligently in some ways.
00:11:35 And then you can look at the earth as a single organism,
00:11:37 you can look at our galaxy
00:11:39 and even the universe as a single organism.
00:11:42 Do you think, how do you think about scale
00:11:44 in defining intelligent systems?
00:11:46 And we’re here at Google, there is millions of devices
00:11:50 doing computation just in a distributed way.
00:11:53 How do you think about intelligence versus scale?
00:11:55 You can always characterize anything as a system.
00:12:00 I think people who talk about things
00:12:03 like intelligence explosion,
00:12:05 tend to focus on one agent is basically one brain,
00:12:08 like one brain considered in isolation,
00:12:10 like a brain, a jaw that’s controlling a body
00:12:13 in a very like top to bottom kind of fashion.
00:12:16 And that body is pursuing goals into an environment.
00:12:19 So it’s a very hierarchical view.
00:12:20 You have the brain at the top of the pyramid,
00:12:22 then you have the body just plainly receiving orders.
00:12:25 And then the body is manipulating objects
00:12:27 in the environment and so on.
00:12:28 So everything is subordinate to this one thing,
00:12:32 this epicenter, which is the brain.
00:12:34 But in real life, intelligent agents
00:12:37 don’t really work like this, right?
00:12:39 There is no strong delimitation
00:12:40 between the brain and the body to start with.
00:12:43 You have to look not just at the brain,
00:12:45 but at the nervous system.
00:12:46 But then the nervous system and the body
00:12:48 are naturally two separate entities.
00:12:50 So you have to look at an entire animal as one agent.
00:12:53 But then you start realizing as you observe an animal
00:12:57 over any length of time,
00:13:00 that a lot of the intelligence of an animal
00:13:03 is actually externalized.
00:13:04 That’s especially true for humans.
00:13:06 A lot of our intelligence is externalized.
00:13:08 When you write down some notes,
00:13:10 that is externalized intelligence.
00:13:11 When you write a computer program,
00:13:14 you are externalizing cognition.
00:13:16 So it’s externalizing books, it’s externalized in computers,
00:13:19 the internet, in other humans.
00:13:23 It’s externalizing language and so on.
00:13:25 So there is no hard delimitation
00:13:30 of what makes an intelligent agent.
00:13:32 It’s all about context.
00:13:34 Okay, but AlphaGo is better at Go
00:13:38 than the best human player.
00:13:42 There’s levels of skill here.
00:13:45 So do you think there’s such a ability,
00:13:48 such a concept as intelligence explosion
00:13:52 in a specific task?
00:13:54 And then, well, yeah.
00:13:57 Do you think it’s possible to have a category of tasks
00:14:00 on which you do have something
00:14:02 like an exponential growth of ability
00:14:05 to solve that particular problem?
00:14:07 I think if you consider a specific vertical,
00:14:10 it’s probably possible to some extent.
00:14:15 I also don’t think we have to speculate about it
00:14:18 because we have real world examples
00:14:22 of recursively self improving intelligent systems, right?
00:14:26 So for instance, science is a problem solving system,
00:14:30 a knowledge generation system,
00:14:32 like a system that experiences the world in some sense
00:14:36 and then gradually understands it and can act on it.
00:14:40 And that system is superhuman
00:14:42 and it is clearly recursively self improving
00:14:45 because science feeds into technology.
00:14:47 Technology can be used to build better tools,
00:14:50 better computers, better instrumentation and so on,
00:14:52 which in turn can make science faster, right?
00:14:56 So science is probably the closest thing we have today
00:15:00 to a recursively self improving superhuman AI.
00:15:04 And you can just observe is science,
00:15:08 is scientific progress to the exploding,
00:15:10 which itself is an interesting question.
00:15:12 You can use that as a basis to try to understand
00:15:15 what will happen with a superhuman AI
00:15:17 that has a science like behavior.
00:15:21 Let me linger on it a little bit more.
00:15:23 What is your intuition why an intelligence explosion
00:15:27 is not possible?
00:15:28 Like taking the scientific,
00:15:30 all the semi scientific revolutions,
00:15:33 why can’t we slightly accelerate that process?
00:15:38 So you can absolutely accelerate
00:15:41 any problem solving process.
00:15:43 So a recursively self improvement
00:15:46 is absolutely a real thing.
00:15:48 But what happens with a recursively self improving system
00:15:51 is typically not explosion
00:15:53 because no system exists in isolation.
00:15:56 And so tweaking one part of the system
00:15:58 means that suddenly another part of the system
00:16:00 becomes a bottleneck.
00:16:02 And if you look at science, for instance,
00:16:03 which is clearly a recursively self improving,
00:16:06 clearly a problem solving system,
00:16:09 scientific progress is not actually exploding.
00:16:12 If you look at science,
00:16:13 what you see is the picture of a system
00:16:16 that is consuming an exponentially increasing
00:16:19 amount of resources,
00:16:20 but it’s having a linear output
00:16:23 in terms of scientific progress.
00:16:26 And maybe that will seem like a very strong claim.
00:16:28 Many people are actually saying that,
00:16:31 scientific progress is exponential,
00:16:34 but when they’re claiming this,
00:16:36 they’re actually looking at indicators
00:16:38 of resource consumption by science.
00:16:43 For instance, the number of papers being published,
00:16:47 the number of patents being filed and so on,
00:16:49 which are just completely correlated
00:16:53 with how many people are working on science today.
00:16:58 So it’s actually an indicator of resource consumption,
00:17:00 but what you should look at is the output,
00:17:03 is progress in terms of the knowledge
00:17:06 that science generates,
00:17:08 in terms of the scope and significance
00:17:10 of the problems that we solve.
00:17:12 And some people have actually been trying to measure that.
00:17:16 Like Michael Nielsen, for instance,
00:17:20 he had a very nice paper,
00:17:21 I think that was last year about it.
00:17:25 So his approach to measure scientific progress
00:17:28 was to look at the timeline of scientific discoveries
00:17:33 over the past, you know, 100, 150 years.
00:17:37 And for each major discovery,
00:17:41 ask a panel of experts to rate
00:17:44 the significance of the discovery.
00:17:46 And if the output of science as an institution
00:17:49 were exponential,
00:17:50 you would expect the temporal density of significance
00:17:56 to go up exponentially.
00:17:58 Maybe because there’s a faster rate of discoveries,
00:18:00 maybe because the discoveries are, you know,
00:18:02 increasingly more important.
00:18:04 And what actually happens
00:18:06 if you plot this temporal density of significance
00:18:10 measured in this way,
00:18:11 is that you see very much a flat graph.
00:18:14 You see a flat graph across all disciplines,
00:18:16 across physics, biology, medicine, and so on.
00:18:19 And it actually makes a lot of sense
00:18:22 if you think about it,
00:18:23 because think about the progress of physics
00:18:26 110 years ago, right?
00:18:28 It was a time of crazy change.
00:18:30 Think about the progress of technology,
00:18:31 you know, 170 years ago,
00:18:34 when we started having, you know,
00:18:35 replacing horses with cars,
00:18:37 when we started having electricity and so on.
00:18:40 It was a time of incredible change.
00:18:41 And today is also a time of very, very fast change,
00:18:44 but it would be an unfair characterization
00:18:48 to say that today technology and science
00:18:50 are moving way faster than they did 50 years ago
00:18:52 or 100 years ago.
00:18:54 And if you do try to rigorously plot
00:18:59 the temporal density of the significance,
00:19:04 yeah, of significance, sorry,
00:19:07 you do see very flat curves.
00:19:09 And you can check out the paper
00:19:12 that Michael Nielsen had about this idea.
00:19:16 And so the way I interpret it is,
00:19:20 as you make progress in a given field,
00:19:24 or in a given subfield of science,
00:19:26 it becomes exponentially more difficult
00:19:28 to make further progress.
00:19:30 Like the very first person to work on information theory.
00:19:35 If you enter a new field,
00:19:36 and it’s still the very early years,
00:19:37 there’s a lot of low hanging fruit you can pick.
00:19:41 That’s right, yeah.
00:19:42 But the next generation of researchers
00:19:43 is gonna have to dig much harder, actually,
00:19:48 to make smaller discoveries,
00:19:50 probably larger number of smaller discoveries,
00:19:52 and to achieve the same amount of impact,
00:19:54 you’re gonna need a much greater head count.
00:19:57 And that’s exactly the picture you’re seeing with science,
00:20:00 that the number of scientists and engineers
00:20:03 is in fact increasing exponentially.
00:20:06 The amount of computational resources
00:20:08 that are available to science
00:20:10 is increasing exponentially and so on.
00:20:11 So the resource consumption of science is exponential,
00:20:15 but the output in terms of progress,
00:20:18 in terms of significance, is linear.
00:20:21 And the reason why is because,
00:20:23 and even though science is regressively self improving,
00:20:26 meaning that scientific progress
00:20:28 turns into technological progress,
00:20:30 which in turn helps science.
00:20:32 If you look at computers, for instance,
00:20:35 our products of science and computers
00:20:38 are tremendously useful in speeding up science.
00:20:41 The internet, same thing, the internet is a technology
00:20:43 that’s made possible by very recent scientific advances.
00:20:47 And itself, because it enables scientists to network,
00:20:52 to communicate, to exchange papers and ideas much faster,
00:20:55 it is a way to speed up scientific progress.
00:20:57 So even though you’re looking
00:20:58 at a regressively self improving system,
00:21:01 it is consuming exponentially more resources
00:21:04 to produce the same amount of problem solving, very much.
00:21:09 So that’s a fascinating way to paint it,
00:21:11 and certainly that holds for the deep learning community.
00:21:14 If you look at the temporal, what did you call it,
00:21:18 the temporal density of significant ideas,
00:21:21 if you look at in deep learning,
00:21:24 I think, I’d have to think about that,
00:21:26 but if you really look at significant ideas
00:21:29 in deep learning, they might even be decreasing.
00:21:32 So I do believe the per paper significance is decreasing,
00:21:39 but the amount of papers
00:21:41 is still today exponentially increasing.
00:21:43 So I think if you look at an aggregate,
00:21:45 my guess is that you would see a linear progress.
00:21:48 If you were to sum the significance of all papers,
00:21:56 you would see roughly in your progress.
00:21:58 And in my opinion, it is not a coincidence
00:22:03 that you’re seeing linear progress in science
00:22:05 despite exponential resource consumption.
00:22:07 I think the resource consumption
00:22:10 is dynamically adjusting itself to maintain linear progress
00:22:15 because we as a community expect linear progress,
00:22:18 meaning that if we start investing less
00:22:21 and seeing less progress, it means that suddenly
00:22:23 there are some lower hanging fruits that become available
00:22:26 and someone’s gonna step up and pick them, right?
00:22:31 So it’s very much like a market for discoveries and ideas.
00:22:36 But there’s another fundamental part
00:22:38 which you’re highlighting, which as a hypothesis
00:22:41 as science or like the space of ideas,
00:22:45 any one path you travel down,
00:22:48 it gets exponentially more difficult
00:22:51 to get a new way to develop new ideas.
00:22:54 And your sense is that’s gonna hold
00:22:57 across our mysterious universe.
00:23:01 Yes, well, exponential progress
00:23:03 triggers exponential friction.
00:23:05 So that if you tweak one part of the system,
00:23:07 suddenly some other part becomes a bottleneck, right?
00:23:10 For instance, let’s say you develop some device
00:23:14 that measures its own acceleration
00:23:17 and then it has some engine
00:23:18 and it outputs even more acceleration
00:23:20 in proportion of its own acceleration
00:23:22 and you drop it somewhere,
00:23:23 it’s not gonna reach infinite speed
00:23:25 because it exists in a certain context.
00:23:29 So the air around it is gonna generate friction
00:23:31 and it’s gonna block it at some top speed.
00:23:34 And even if you were to consider the broader context
00:23:37 and lift the bottleneck there,
00:23:39 like the bottleneck of friction,
00:23:43 then some other part of the system
00:23:45 would start stepping in and creating exponential friction,
00:23:48 maybe the speed of flight or whatever.
00:23:49 And this definitely holds true
00:23:51 when you look at the problem solving algorithm
00:23:54 that is being run by science as an institution,
00:23:58 science as a system.
00:23:59 As you make more and more progress,
00:24:01 despite having this recursive self improvement component,
00:24:06 you are encountering exponential friction.
00:24:09 The more researchers you have working on different ideas,
00:24:13 the more overhead you have
00:24:14 in terms of communication across researchers.
00:24:18 If you look at, you were mentioning quantum mechanics, right?
00:24:22 Well, if you want to start making significant discoveries
00:24:26 today, significant progress in quantum mechanics,
00:24:29 there is an amount of knowledge you have to ingest,
00:24:33 which is huge.
00:24:34 So there’s a very large overhead
00:24:36 to even start to contribute.
00:24:39 There’s a large amount of overhead
00:24:40 to synchronize across researchers and so on.
00:24:44 And of course, the significant practical experiments
00:24:48 are going to require exponentially expensive equipment
00:24:52 because the easier ones have already been run, right?
00:24:56 So in your senses, there’s no way escaping,
00:25:00 there’s no way of escaping this kind of friction
00:25:04 with artificial intelligence systems.
00:25:08 Yeah, no, I think science is a very good way
00:25:11 to model what would happen with a superhuman
00:25:14 recursive research improving AI.
00:25:16 That’s your sense, I mean, the…
00:25:18 That’s my intuition.
00:25:19 It’s not like a mathematical proof of anything.
00:25:23 That’s not my point.
00:25:24 Like, I’m not trying to prove anything.
00:25:26 I’m just trying to make an argument
00:25:27 to question the narrative of intelligence explosion,
00:25:31 which is quite a dominant narrative.
00:25:32 And you do get a lot of pushback if you go against it.
00:25:35 Because, so for many people, right,
00:25:39 AI is not just a subfield of computer science.
00:25:42 It’s more like a belief system.
00:25:44 Like this belief that the world is headed towards an event,
00:25:48 the singularity, past which, you know, AI will become…
00:25:55 will go exponential very much,
00:25:57 and the world will be transformed,
00:25:58 and humans will become obsolete.
00:26:00 And if you go against this narrative,
00:26:03 because it is not really a scientific argument,
00:26:06 but more of a belief system,
00:26:08 it is part of the identity of many people.
00:26:11 If you go against this narrative,
00:26:12 it’s like you’re attacking the identity
00:26:14 of people who believe in it.
00:26:15 It’s almost like saying God doesn’t exist,
00:26:17 or something.
00:26:19 So you do get a lot of pushback
00:26:21 if you try to question these ideas.
00:26:24 First of all, I believe most people,
00:26:26 they might not be as eloquent or explicit as you’re being,
00:26:29 but most people in computer science
00:26:30 are most people who actually have built
00:26:33 anything that you could call AI, quote, unquote,
00:26:36 would agree with you.
00:26:38 They might not be describing in the same kind of way.
00:26:40 It’s more, so the pushback you’re getting
00:26:43 is from people who get attached to the narrative
00:26:48 from, not from a place of science,
00:26:51 but from a place of imagination.
00:26:53 That’s correct, that’s correct.
00:26:54 So why do you think that’s so appealing?
00:26:56 Because the usual dreams that people have
00:27:02 when you create a superintelligence system
00:27:03 past the singularity,
00:27:05 that what people imagine is somehow always destructive.
00:27:09 Do you have, if you were put on your psychology hat,
00:27:12 what’s, why is it so appealing to imagine
00:27:17 the ways that all of human civilization will be destroyed?
00:27:20 I think it’s a good story.
00:27:22 You know, it’s a good story.
00:27:23 And very interestingly, it mirrors a religious stories,
00:27:28 right, religious mythology.
00:27:30 If you look at the mythology of most civilizations,
00:27:34 it’s about the world being headed towards some final events
00:27:38 in which the world will be destroyed
00:27:40 and some new world order will arise
00:27:42 that will be mostly spiritual,
00:27:44 like the apocalypse followed by a paradise probably, right?
00:27:49 It’s a very appealing story on a fundamental level.
00:27:52 And we all need stories.
00:27:54 We all need stories to structure the way we see the world,
00:27:58 especially at timescales
00:27:59 that are beyond our ability to make predictions, right?
00:28:04 So on a more serious non exponential explosion,
00:28:08 question, do you think there will be a time
00:28:15 when we’ll create something like human level intelligence
00:28:19 or intelligent systems that will make you sit back
00:28:23 and be just surprised at damn how smart this thing is?
00:28:28 That doesn’t require exponential growth
00:28:30 or an exponential improvement,
00:28:32 but what’s your sense of the timeline and so on
00:28:35 that you’ll be really surprised at certain capabilities?
00:28:41 And we’ll talk about limitations and deep learning.
00:28:42 So do you think in your lifetime,
00:28:44 you’ll be really damn surprised?
00:28:46 Around 2013, 2014, I was many times surprised
00:28:51 by the capabilities of deep learning actually.
00:28:53 That was before we had assessed exactly
00:28:55 what deep learning could do and could not do.
00:28:57 And it felt like a time of immense potential.
00:29:00 And then we started narrowing it down,
00:29:03 but I was very surprised.
00:29:04 I would say it has already happened.
00:29:07 Was there a moment, there must’ve been a day in there
00:29:10 where your surprise was almost bordering
00:29:14 on the belief of the narrative that we just discussed.
00:29:19 Was there a moment,
00:29:20 because you’ve written quite eloquently
00:29:22 about the limits of deep learning,
00:29:23 was there a moment that you thought
00:29:25 that maybe deep learning is limitless?
00:29:30 No, I don’t think I’ve ever believed this.
00:29:32 What was really shocking is that it worked.
00:29:35 It worked at all, yeah.
00:29:37 But there’s a big jump between being able
00:29:40 to do really good computer vision
00:29:43 and human level intelligence.
00:29:44 So I don’t think at any point I wasn’t under the impression
00:29:49 that the results we got in computer vision
00:29:51 meant that we were very close to human level intelligence.
00:29:54 I don’t think we’re very close to human level intelligence.
00:29:56 I do believe that there’s no reason
00:29:58 why we won’t achieve it at some point.
00:30:01 I also believe that it’s the problem
00:30:06 with talking about human level intelligence
00:30:08 that implicitly you’re considering
00:30:11 like an axis of intelligence with different levels,
00:30:14 but that’s not really how intelligence works.
00:30:16 Intelligence is very multi dimensional.
00:30:19 And so there’s the question of capabilities,
00:30:22 but there’s also the question of being human like,
00:30:25 and it’s two very different things.
00:30:27 Like you can build potentially
00:30:28 very advanced intelligent agents
00:30:30 that are not human like at all.
00:30:32 And you can also build very human like agents.
00:30:35 And these are two very different things, right?
00:30:37 Right.
00:30:38 Let’s go from the philosophical to the practical.
00:30:42 Can you give me a history of Keras
00:30:44 and all the major deep learning frameworks
00:30:46 that you kind of remember in relation to Keras
00:30:48 and in general, TensorFlow, Theano, the old days.
00:30:52 Can you give a brief overview Wikipedia style history
00:30:55 and your role in it before we return to AGI discussions?
00:30:59 Yeah, that’s a broad topic.
00:31:00 So I started working on Keras.
00:31:04 It was the name Keras at the time.
00:31:06 I actually picked the name like
00:31:08 just the day I was going to release it.
00:31:10 So I started working on it in February, 2015.
00:31:14 And so at the time there weren’t too many people
00:31:17 working on deep learning, maybe like fewer than 10,000.
00:31:20 The software tooling was not really developed.
00:31:25 So the main deep learning library was Cafe,
00:31:28 which was mostly C++.
00:31:30 Why do you say Cafe was the main one?
00:31:32 Cafe was vastly more popular than Theano
00:31:36 in late 2014, early 2015.
00:31:38 Cafe was the one library that everyone was using
00:31:42 for computer vision.
00:31:43 And computer vision was the most popular problem
00:31:46 in deep learning at the time.
00:31:46 Absolutely.
00:31:47 Like ConvNets was like the subfield of deep learning
00:31:50 that everyone was working on.
00:31:53 So myself, so in late 2014,
00:31:57 I was actually interested in RNNs,
00:32:00 in recurrent neural networks,
00:32:01 which was a very niche topic at the time, right?
00:32:05 It really took off around 2016.
00:32:08 And so I was looking for good tools.
00:32:11 I had used Torch 7, I had used Theano,
00:32:14 used Theano a lot in Kaggle competitions.
00:32:19 I had used Cafe.
00:32:20 And there was no like good solution for RNNs at the time.
00:32:25 Like there was no reusable open source implementation
00:32:28 of an LSTM, for instance.
00:32:30 So I decided to build my own.
00:32:32 And at first, the pitch for that was,
00:32:35 it was gonna be mostly around LSTM recurrent neural networks.
00:32:39 It was gonna be in Python.
00:32:42 An important decision at the time
00:32:44 that was kind of not obvious
00:32:45 is that the models would be defined via Python code,
00:32:50 which was kind of like going against the mainstream
00:32:54 at the time because Cafe, Pylon 2, and so on,
00:32:58 like all the big libraries were actually going
00:33:00 with the approach of setting configuration files
00:33:03 in YAML to define models.
00:33:05 So some libraries were using code to define models,
00:33:08 like Torch 7, obviously, but that was not Python.
00:33:12 Lasagne was like a Theano based very early library
00:33:16 that was, I think, developed, I don’t remember exactly,
00:33:18 probably late 2014.
00:33:20 It’s Python as well.
00:33:21 It’s Python as well.
00:33:22 It was like on top of Theano.
00:33:24 And so I started working on something
00:33:29 and the value proposition at the time was that
00:33:32 not only what I think was the first
00:33:36 reusable open source implementation of LSTM,
00:33:40 you could combine RNNs and covenants
00:33:44 with the same library,
00:33:45 which is not really possible before,
00:33:46 like Cafe was only doing covenants.
00:33:50 And it was kind of easy to use
00:33:52 because, so before I was using Theano,
00:33:54 I was actually using scikitlin
00:33:55 and I loved scikitlin for its usability.
00:33:58 So I drew a lot of inspiration from scikitlin
00:34:01 when I made Keras.
00:34:02 It’s almost like scikitlin for neural networks.
00:34:05 The fit function.
00:34:06 Exactly, the fit function,
00:34:07 like reducing a complex string loop
00:34:10 to a single function call, right?
00:34:12 And of course, some people will say,
00:34:14 this is hiding a lot of details,
00:34:16 but that’s exactly the point, right?
00:34:18 The magic is the point.
00:34:20 So it’s magical, but in a good way.
00:34:22 It’s magical in the sense that it’s delightful.
00:34:24 Yeah, yeah.
00:34:26 I’m actually quite surprised.
00:34:27 I didn’t know that it was born out of desire
00:34:29 to implement RNNs and LSTMs.
00:34:32 It was.
00:34:33 That’s fascinating.
00:34:34 So you were actually one of the first people
00:34:36 to really try to attempt
00:34:37 to get the major architectures together.
00:34:41 And it’s also interesting.
00:34:42 You made me realize that that was a design decision at all
00:34:45 is defining the model and code.
00:34:47 Just, I’m putting myself in your shoes,
00:34:49 whether the YAML, especially if cafe was the most popular.
00:34:53 It was the most popular by far.
00:34:54 If I was, if I were, yeah, I don’t,
00:34:58 I didn’t like the YAML thing,
00:34:59 but it makes more sense that you will put
00:35:02 in a configuration file, the definition of a model.
00:35:05 That’s an interesting gutsy move
00:35:07 to stick with defining it in code.
00:35:10 Just if you look back.
00:35:11 Other libraries were doing it as well,
00:35:13 but it was definitely the more niche option.
00:35:16 Yeah.
00:35:17 Okay, Keras and then.
00:35:18 So I released Keras in March, 2015,
00:35:21 and it got users pretty much from the start.
00:35:24 So the deep learning community was very, very small
00:35:25 at the time.
00:35:27 Lots of people were starting to be interested in LSTM.
00:35:30 So it was gonna release it at the right time
00:35:32 because it was offering an easy to use LSTM implementation.
00:35:35 Exactly at the time where lots of people started
00:35:37 to be intrigued by the capabilities of RNN, RNNs for NLP.
00:35:42 So it grew from there.
00:35:43 Then I joined Google about six months later,
00:35:51 and that was actually completely unrelated to Keras.
00:35:54 So I actually joined a research team
00:35:57 working on image classification,
00:35:59 mostly like computer vision.
00:36:00 So I was doing computer vision research
00:36:02 at Google initially.
00:36:03 And immediately when I joined Google,
00:36:05 I was exposed to the early internal version of TensorFlow.
00:36:10 And the way it appeared to me at the time,
00:36:13 and it was definitely the way it was at the time
00:36:15 is that this was an improved version of Theano.
00:36:20 So I immediately knew I had to port Keras
00:36:24 to this new TensorFlow thing.
00:36:26 And I was actually very busy as a noobler,
00:36:29 as a new Googler.
00:36:31 So I had not time to work on that.
00:36:34 But then in November, I think it was November, 2015,
00:36:38 TensorFlow got released.
00:36:41 And it was kind of like my wake up call
00:36:44 that, hey, I had to actually go and make it happen.
00:36:47 So in December, I ported Keras to run on top of TensorFlow,
00:36:52 but it was not exactly a port.
00:36:53 It was more like a refactoring
00:36:55 where I was abstracting away
00:36:57 all the backend functionality into one module
00:37:00 so that the same code base
00:37:02 could run on top of multiple backends.
00:37:05 So on top of TensorFlow or Theano.
00:37:07 And for the next year,
00:37:09 Theano stayed as the default option.
00:37:15 It was easier to use, somewhat less buggy.
00:37:20 It was much faster, especially when it came to audience.
00:37:23 But eventually, TensorFlow overtook it.
00:37:27 And TensorFlow, the early TensorFlow,
00:37:30 has similar architectural decisions as Theano, right?
00:37:33 So it was a natural transition.
00:37:37 Yeah, absolutely.
00:37:38 So what, I mean, that still Keras is a side,
00:37:42 almost fun project, right?
00:37:45 Yeah, so it was not my job assignment.
00:37:49 It was not.
00:37:50 I was doing it on the side.
00:37:52 And even though it grew to have a lot of users
00:37:55 for a deep learning library at the time, like Stroud 2016,
00:37:59 but I wasn’t doing it as my main job.
00:38:02 So things started changing in,
00:38:04 I think it must have been maybe October, 2016.
00:38:10 So one year later.
00:38:12 So Rajat, who was the lead on TensorFlow,
00:38:15 basically showed up one day in our building
00:38:19 where I was doing like,
00:38:20 so I was doing research and things like,
00:38:21 so I did a lot of computer vision research,
00:38:24 also collaborations with Christian Zighetti
00:38:27 and deep learning for theorem proving.
00:38:29 It was a really interesting research topic.
00:38:34 And so Rajat was saying,
00:38:37 hey, we saw Keras, we like it.
00:38:41 We saw that you’re at Google.
00:38:42 Why don’t you come over for like a quarter
00:38:45 and work with us?
00:38:47 And I was like, yeah, that sounds like a great opportunity.
00:38:49 Let’s do it.
00:38:50 And so I started working on integrating the Keras API
00:38:55 into TensorFlow more tightly.
00:38:57 So what followed up is a sort of like temporary
00:39:02 TensorFlow only version of Keras
00:39:05 that was in TensorFlow.com Trib for a while.
00:39:09 And finally moved to TensorFlow Core.
00:39:12 And I’ve never actually gotten back
00:39:15 to my old team doing research.
00:39:17 Well, it’s kind of funny that somebody like you
00:39:22 who dreams of, or at least sees the power of AI systems
00:39:28 that reason and theorem proving we’ll talk about
00:39:31 has also created a system that makes the most basic
00:39:36 kind of Lego building that is deep learning
00:39:40 super accessible, super easy.
00:39:42 So beautifully so.
00:39:43 It’s a funny irony that you’re both,
00:39:47 you’re responsible for both things,
00:39:49 but so TensorFlow 2.0 is kind of, there’s a sprint.
00:39:54 I don’t know how long it’ll take,
00:39:55 but there’s a sprint towards the finish.
00:39:56 What do you look, what are you working on these days?
00:40:01 What are you excited about?
00:40:02 What are you excited about in 2.0?
00:40:04 I mean, eager execution.
00:40:05 There’s so many things that just make it a lot easier
00:40:08 to work.
00:40:09 What are you excited about and what’s also really hard?
00:40:13 What are the problems you have to kind of solve?
00:40:15 So I’ve spent the past year and a half working on
00:40:19 TensorFlow 2.0 and it’s been a long journey.
00:40:22 I’m actually extremely excited about it.
00:40:25 I think it’s a great product.
00:40:26 It’s a delightful product compared to TensorFlow 1.0.
00:40:29 We’ve made huge progress.
00:40:32 So on the Keras side, what I’m really excited about is that,
00:40:37 so previously Keras has been this very easy to use
00:40:42 high level interface to do deep learning.
00:40:45 But if you wanted to,
00:40:50 if you wanted a lot of flexibility,
00:40:53 the Keras framework was probably not the optimal way
00:40:57 to do things compared to just writing everything
00:40:59 from scratch.
00:41:01 So in some way, the framework was getting in the way.
00:41:04 And in TensorFlow 2.0, you don’t have this at all, actually.
00:41:07 You have the usability of the high level interface,
00:41:11 but you have the flexibility of this lower level interface.
00:41:14 And you have this spectrum of workflows
00:41:16 where you can get more or less usability
00:41:21 and flexibility trade offs depending on your needs, right?
00:41:26 You can write everything from scratch
00:41:29 and you get a lot of help doing so
00:41:32 by subclassing models and writing some train loops
00:41:36 using ego execution.
00:41:38 It’s very flexible, it’s very easy to debug,
00:41:40 it’s very powerful.
00:41:42 But all of this integrates seamlessly
00:41:45 with higher level features up to the classic Keras workflows,
00:41:49 which are very scikit learn like
00:41:51 and are ideal for a data scientist,
00:41:56 machine learning engineer type of profile.
00:41:58 So now you can have the same framework
00:42:00 offering the same set of APIs
00:42:02 that enable a spectrum of workflows
00:42:05 that are more or less low level, more or less high level
00:42:08 that are suitable for profiles ranging from researchers
00:42:13 to data scientists and everything in between.
00:42:15 Yeah, so that’s super exciting.
00:42:16 I mean, it’s not just that,
00:42:18 it’s connected to all kinds of tooling.
00:42:21 You can go on mobile, you can go with TensorFlow Lite,
00:42:24 you can go in the cloud or serving and so on.
00:42:27 It all is connected together.
00:42:28 Now some of the best software written ever
00:42:31 is often done by one person, sometimes two.
00:42:36 So with a Google, you’re now seeing sort of Keras
00:42:40 having to be integrated in TensorFlow,
00:42:42 I’m sure has a ton of engineers working on.
00:42:46 And there’s, I’m sure a lot of tricky design decisions
00:42:51 to be made.
00:42:52 How does that process usually happen
00:42:54 from at least your perspective?
00:42:56 What are the debates like?
00:43:00 Is there a lot of thinking,
00:43:04 considering different options and so on?
00:43:06 Yes.
00:43:08 So a lot of the time I spend at Google
00:43:12 is actually discussing design discussions, right?
00:43:17 Writing design docs, participating in design review meetings
00:43:20 and so on.
00:43:22 This is as important as actually writing a code.
00:43:25 Right.
00:43:26 So there’s a lot of thought, there’s a lot of thought
00:43:28 and a lot of care that is taken
00:43:32 in coming up with these decisions
00:43:34 and taking into account all of our users
00:43:37 because TensorFlow has this extremely diverse user base,
00:43:40 right?
00:43:41 It’s not like just one user segment
00:43:43 where everyone has the same needs.
00:43:45 We have small scale production users,
00:43:47 large scale production users.
00:43:49 We have startups, we have researchers,
00:43:53 you know, it’s all over the place.
00:43:55 And we have to cater to all of their needs.
00:43:57 If I just look at the standard debates
00:44:00 of C++ or Python, there’s some heated debates.
00:44:04 Do you have those at Google?
00:44:06 I mean, they’re not heated in terms of emotionally,
00:44:08 but there’s probably multiple ways to do it, right?
00:44:10 So how do you arrive through those design meetings
00:44:14 at the best way to do it?
00:44:15 Especially in deep learning where the field is evolving
00:44:19 as you’re doing it.
00:44:21 Is there some magic to it?
00:44:23 Is there some magic to the process?
00:44:26 I don’t know if there’s magic to the process,
00:44:28 but there definitely is a process.
00:44:30 So making design decisions
00:44:33 is about satisfying a set of constraints,
00:44:36 but also trying to do so in the simplest way possible,
00:44:39 because this is what can be maintained,
00:44:42 this is what can be expanded in the future.
00:44:44 So you don’t want to naively satisfy the constraints
00:44:49 by just, you know, for each capability you need available,
00:44:51 you’re gonna come up with one argument in your API
00:44:53 and so on.
00:44:54 You want to design APIs that are modular and hierarchical
00:45:00 so that they have an API surface
00:45:04 that is as small as possible, right?
00:45:07 And you want this modular hierarchical architecture
00:45:11 to reflect the way that domain experts
00:45:14 think about the problem.
00:45:16 Because as a domain expert,
00:45:17 when you are reading about a new API,
00:45:19 you’re reading a tutorial or some docs pages,
00:45:24 you already have a way that you’re thinking about the problem.
00:45:28 You already have like certain concepts in mind
00:45:32 and you’re thinking about how they relate together.
00:45:35 And when you’re reading docs,
00:45:37 you’re trying to build as quickly as possible
00:45:40 a mapping between the concepts featured in your API
00:45:45 and the concepts in your mind.
00:45:46 So you’re trying to map your mental model
00:45:48 as a domain expert to the way things work in the API.
00:45:53 So you need an API and an underlying implementation
00:45:57 that are reflecting the way people think about these things.
00:46:00 So in minimizing the time it takes to do the mapping.
00:46:02 Yes, minimizing the time,
00:46:04 the cognitive load there is
00:46:06 in ingesting this new knowledge about your API.
00:46:10 An API should not be self referential
00:46:13 or referring to implementation details.
00:46:15 It should only be referring to domain specific concepts
00:46:19 that people already understand.
00:46:23 Brilliant.
00:46:24 So what’s the future of Keras and TensorFlow look like?
00:46:27 What does TensorFlow 3.0 look like?
00:46:30 So that’s kind of too far in the future for me to answer,
00:46:33 especially since I’m not even the one making these decisions.
00:46:37 Okay.
00:46:39 But so from my perspective,
00:46:41 which is just one perspective
00:46:43 among many different perspectives on the TensorFlow team,
00:46:47 I’m really excited by developing even higher level APIs,
00:46:52 higher level than Keras.
00:46:53 I’m really excited by hyperparameter tuning,
00:46:56 by automated machine learning, AutoML.
00:47:01 I think the future is not just, you know,
00:47:03 defining a model like you were assembling Lego blocks
00:47:07 and then collect fit on it.
00:47:09 It’s more like an automagical model
00:47:13 that would just look at your data
00:47:16 and optimize the objective you’re after, right?
00:47:19 So that’s what I’m looking into.
00:47:23 Yeah, so you put the baby into a room with the problem
00:47:26 and come back a few hours later
00:47:28 with a fully solved problem.
00:47:30 Exactly, it’s not like a box of Legos.
00:47:33 It’s more like the combination of a kid
00:47:35 that’s really good at Legos and a box of Legos.
00:47:38 It’s just building the thing on its own.
00:47:41 Very nice.
00:47:42 So that’s an exciting future.
00:47:44 I think there’s a huge amount of applications
00:47:46 and revolutions to be had
00:47:49 under the constraints of the discussion we previously had.
00:47:52 But what do you think of the current limits of deep learning?
00:47:57 If we look specifically at these function approximators
00:48:03 that tries to generalize from data.
00:48:06 You’ve talked about local versus extreme generalization.
00:48:11 You mentioned that neural networks don’t generalize well
00:48:13 and humans do.
00:48:14 So there’s this gap.
00:48:17 And you’ve also mentioned that extreme generalization
00:48:20 requires something like reasoning to fill those gaps.
00:48:23 So how can we start trying to build systems like that?
00:48:27 Right, yeah, so this is by design, right?
00:48:30 Deep learning models are like huge parametric models,
00:48:37 differentiable, so continuous,
00:48:39 that go from an input space to an output space.
00:48:42 And they’re trained with gradient descent.
00:48:44 So they’re trained pretty much point by point.
00:48:47 They are learning a continuous geometric morphing
00:48:50 from an input vector space to an output vector space.
00:48:55 And because this is done point by point,
00:48:58 a deep neural network can only make sense
00:49:02 of points in experience space that are very close
00:49:05 to things that it has already seen in string data.
00:49:08 At best, it can do interpolation across points.
00:49:13 But that means in order to train your network,
00:49:17 you need a dense sampling of the input cross output space,
00:49:22 almost a point by point sampling,
00:49:25 which can be very expensive if you’re dealing
00:49:27 with complex real world problems,
00:49:29 like autonomous driving, for instance, or robotics.
00:49:33 It’s doable if you’re looking at the subset
00:49:36 of the visual space.
00:49:37 But even then, it’s still fairly expensive.
00:49:38 You still need millions of examples.
00:49:40 And it’s only going to be able to make sense of things
00:49:44 that are very close to what it has seen before.
00:49:46 And in contrast to that, well, of course,
00:49:49 you have human intelligence.
00:49:50 But even if you’re not looking at human intelligence,
00:49:53 you can look at very simple rules, algorithms.
00:49:56 If you have a symbolic rule,
00:49:58 it can actually apply to a very, very large set of inputs
00:50:03 because it is abstract.
00:50:04 It is not obtained by doing a point by point mapping.
00:50:10 For instance, if you try to learn a sorting algorithm
00:50:14 using a deep neural network,
00:50:15 well, you’re very much limited to learning point by point
00:50:20 what the sorted representation of this specific list is like.
00:50:24 But instead, you could have a very, very simple
00:50:29 sorting algorithm written in a few lines.
00:50:31 Maybe it’s just two nested loops.
00:50:35 And it can process any list at all because it is abstract,
00:50:41 because it is a set of rules.
00:50:42 So deep learning is really like point by point
00:50:45 geometric morphings, train with good and decent.
00:50:48 And meanwhile, abstract rules can generalize much better.
00:50:53 And I think the future is we need to combine the two.
00:50:56 So how do we, do you think, combine the two?
00:50:59 How do we combine good point by point functions
00:51:03 with programs, which is what the symbolic AI type systems?
00:51:08 At which levels the combination happen?
00:51:11 I mean, obviously we’re jumping into the realm
00:51:14 of where there’s no good answers.
00:51:16 It’s just kind of ideas and intuitions and so on.
00:51:20 Well, if you look at the really successful AI systems
00:51:23 today, I think they are already hybrid systems
00:51:26 that are combining symbolic AI with deep learning.
00:51:29 For instance, successful robotics systems
00:51:32 are already mostly model based, rule based,
00:51:37 things like planning algorithms and so on.
00:51:39 At the same time, they’re using deep learning
00:51:42 as perception modules.
00:51:43 Sometimes they’re using deep learning as a way
00:51:46 to inject fuzzy intuition into a rule based process.
00:51:50 If you look at the system like in a self driving car,
00:51:54 it’s not just one big end to end neural network.
00:51:57 You know, that wouldn’t work at all.
00:51:59 Precisely because in order to train that,
00:52:00 you would need a dense sampling of experience base
00:52:05 when it comes to driving,
00:52:06 which is completely unrealistic, obviously.
00:52:08 Instead, the self driving car is mostly
00:52:13 symbolic, you know, it’s software, it’s programmed by hand.
00:52:18 So it’s mostly based on explicit models.
00:52:21 In this case, mostly 3D models of the environment
00:52:25 around the car, but it’s interfacing with the real world
00:52:29 using deep learning modules, right?
00:52:31 So the deep learning there serves as a way
00:52:33 to convert the raw sensory information
00:52:36 to something usable by symbolic systems.
00:52:39 Okay, well, let’s linger on that a little more.
00:52:42 So dense sampling from input to output.
00:52:45 You said it’s obviously very difficult.
00:52:48 Is it possible?
00:52:50 In the case of self driving, you mean?
00:52:51 Let’s say self driving, right?
00:52:53 Self driving for many people,
00:52:57 let’s not even talk about self driving,
00:52:59 let’s talk about steering, so staying inside the lane.
00:53:05 Lane following, yeah, it’s definitely a problem
00:53:07 you can solve with an end to end deep learning model,
00:53:08 but that’s like one small subset.
00:53:10 Hold on a second.
00:53:11 Yeah, I don’t know why you’re jumping
00:53:12 from the extreme so easily,
00:53:14 because I disagree with you on that.
00:53:16 I think, well, it’s not obvious to me
00:53:21 that you can solve lane following.
00:53:23 No, it’s not obvious, I think it’s doable.
00:53:25 I think in general, there is no hard limitations
00:53:31 to what you can learn with a deep neural network,
00:53:33 as long as the search space is rich enough,
00:53:40 is flexible enough, and as long as you have
00:53:42 this dense sampling of the input cross output space.
00:53:45 The problem is that this dense sampling
00:53:47 could mean anything from 10,000 examples
00:53:51 to like trillions and trillions.
00:53:52 So that’s my question.
00:53:54 So what’s your intuition?
00:53:56 And if you could just give it a chance
00:53:58 and think what kind of problems can be solved
00:54:01 by getting a huge amounts of data
00:54:04 and thereby creating a dense mapping.
00:54:08 So let’s think about natural language dialogue,
00:54:12 the Turing test.
00:54:14 Do you think the Turing test can be solved
00:54:17 with a neural network alone?
00:54:21 Well, the Turing test is all about tricking people
00:54:24 into believing they’re talking to a human.
00:54:26 And I don’t think that’s actually very difficult
00:54:29 because it’s more about exploiting human perception
00:54:35 and not so much about intelligence.
00:54:37 There’s a big difference between mimicking
00:54:39 intelligent behavior and actual intelligent behavior.
00:54:42 So, okay, let’s look at maybe the Alexa prize and so on.
00:54:45 The different formulations of the natural language
00:54:47 conversation that are less about mimicking
00:54:50 and more about maintaining a fun conversation
00:54:52 that lasts for 20 minutes.
00:54:54 That’s a little less about mimicking
00:54:56 and that’s more about, I mean, it’s still mimicking,
00:54:59 but it’s more about being able to carry forward
00:55:01 a conversation with all the tangents that happen
00:55:03 in dialogue and so on.
00:55:05 Do you think that problem is learnable
00:55:08 with a neural network that does the point to point mapping?
00:55:14 So I think it would be very, very challenging
00:55:16 to do this with deep learning.
00:55:17 I don’t think it’s out of the question either.
00:55:21 I wouldn’t rule it out.
00:55:23 The space of problems that can be solved
00:55:25 with a large neural network.
00:55:26 What’s your sense about the space of those problems?
00:55:30 So useful problems for us.
00:55:32 In theory, it’s infinite, right?
00:55:34 You can solve any problem.
00:55:36 In practice, well, deep learning is a great fit
00:55:39 for perception problems.
00:55:41 In general, any problem which is naturally amenable
00:55:47 to explicit handcrafted rules or rules that you can generate
00:55:52 by exhaustive search over some program space.
00:55:56 So perception, artificial intuition,
00:55:59 as long as you have a sufficient training dataset.
00:56:03 And that’s the question, I mean, perception,
00:56:05 there’s interpretation and understanding of the scene,
00:56:08 which seems to be outside the reach
00:56:10 of current perception systems.
00:56:12 So do you think larger networks will be able
00:56:15 to start to understand the physics
00:56:18 and the physics of the scene,
00:56:21 the three dimensional structure and relationships
00:56:23 of objects in the scene and so on?
00:56:25 Or really that’s where symbolic AI has to step in?
00:56:28 Well, it’s always possible to solve these problems
00:56:34 with deep learning.
00:56:36 It’s just extremely inefficient.
00:56:38 A model would be an explicit rule based abstract model
00:56:42 would be a far better, more compressed
00:56:45 representation of physics.
00:56:46 Then learning just this mapping between
00:56:49 in this situation, this thing happens.
00:56:50 If you change the situation slightly,
00:56:52 then this other thing happens and so on.
00:56:54 Do you think it’s possible to automatically generate
00:56:57 the programs that would require that kind of reasoning?
00:57:02 Or does it have to, so the way the expert systems fail,
00:57:05 there’s so many facts about the world
00:57:07 had to be hand coded in.
00:57:08 Do you think it’s possible to learn those logical statements
00:57:14 that are true about the world and their relationships?
00:57:18 Do you think, I mean, that’s kind of what theorem proving
00:57:20 at a basic level is trying to do, right?
00:57:22 Yeah, except it’s much harder to formulate statements
00:57:26 about the world compared to formulating
00:57:28 mathematical statements.
00:57:30 Statements about the world tend to be subjective.
00:57:34 So can you learn rule based models?
00:57:39 Yes, definitely.
00:57:40 That’s the field of program synthesis.
00:57:43 However, today we just don’t really know how to do it.
00:57:48 So it’s very much a grass search or tree search problem.
00:57:52 And so we are limited to the sort of tree session grass
00:57:56 search algorithms that we have today.
00:57:58 Personally, I think genetic algorithms are very promising.
00:58:02 So almost like genetic programming.
00:58:04 Genetic programming, exactly.
00:58:05 Can you discuss the field of program synthesis?
00:58:08 Like how many people are working and thinking about it?
00:58:14 Where we are in the history of program synthesis
00:58:17 and what are your hopes for it?
00:58:20 Well, if it were deep learning, this is like the 90s.
00:58:24 So meaning that we already have existing solutions.
00:58:29 We are starting to have some basic understanding
00:58:34 of what this is about.
00:58:35 But it’s still a field that is in its infancy.
00:58:38 There are very few people working on it.
00:58:40 There are very few real world applications.
00:58:44 So the one real world application I’m aware of
00:58:47 is Flash Fill in Excel.
00:58:51 It’s a way to automatically learn very simple programs
00:58:55 to format cells in an Excel spreadsheet
00:58:58 from a few examples.
00:59:00 For instance, learning a way to format a date, things like that.
00:59:02 Oh, that’s fascinating.
00:59:03 Yeah.
00:59:04 You know, OK, that’s a fascinating topic.
00:59:06 I always wonder when I provide a few samples to Excel,
00:59:10 what it’s able to figure out.
00:59:12 Like just giving it a few dates, what
00:59:15 are you able to figure out from the pattern I just gave you?
00:59:18 That’s a fascinating question.
00:59:19 And it’s fascinating whether that’s learnable patterns.
00:59:23 And you’re saying they’re working on that.
00:59:25 How big is the toolbox currently?
00:59:28 Are we completely in the dark?
00:59:29 So if you said the 90s.
00:59:30 In terms of program synthesis?
00:59:31 No.
00:59:32 So I would say, so maybe 90s is even too optimistic.
00:59:37 Because by the 90s, we already understood back prop.
00:59:41 We already understood the engine of deep learning,
00:59:43 even though we couldn’t really see its potential quite.
00:59:47 Today, I don’t think we have found
00:59:48 the engine of program synthesis.
00:59:50 So we’re in the winter before back prop.
00:59:52 Yeah.
00:59:54 In a way, yes.
00:59:55 So I do believe program synthesis and general discrete search
01:00:00 over rule based models is going to be
01:00:02 a cornerstone of AI research in the next century.
01:00:06 And that doesn’t mean we are going to drop deep learning.
01:00:10 Deep learning is immensely useful.
01:00:11 Like, being able to learn is a very flexible, adaptable,
01:00:17 parametric model.
01:00:18 So it’s got to understand that’s actually immensely useful.
01:00:20 All it’s doing is pattern cognition.
01:00:23 But being good at pattern cognition, given lots of data,
01:00:25 is just extremely powerful.
01:00:27 So we are still going to be working on deep learning.
01:00:30 We are going to be working on program synthesis.
01:00:31 We are going to be combining the two in increasingly automated
01:00:34 ways.
01:00:36 So let’s talk a little bit about data.
01:00:38 You’ve tweeted, about 10,000 deep learning papers
01:00:44 have been written about hard coding priors
01:00:47 about a specific task in a neural network architecture
01:00:49 works better than a lack of a prior.
01:00:52 Basically, summarizing all these efforts,
01:00:55 they put a name to an architecture.
01:00:56 But really, what they’re doing is hard coding some priors
01:00:59 that improve the performance of the system.
01:01:01 But which gets straight to the point is probably true.
01:01:06 So you say that you can always buy performance by,
01:01:09 in quotes, performance by either training on more data,
01:01:12 better data, or by injecting task information
01:01:15 to the architecture of the preprocessing.
01:01:18 However, this isn’t informative about the generalization power
01:01:21 the techniques use, the fundamental ability
01:01:23 to generalize.
01:01:24 Do you think we can go far by coming up
01:01:26 with better methods for this kind of cheating,
01:01:29 for better methods of large scale annotation of data?
01:01:33 So building better priors.
01:01:34 If you automate it, it’s not cheating anymore.
01:01:37 Right.
01:01:38 I’m joking about the cheating, but large scale.
01:01:41 So basically, I’m asking about something
01:01:46 that hasn’t, from my perspective,
01:01:48 been researched too much is exponential improvement
01:01:53 in annotation of data.
01:01:55 Do you often think about?
01:01:58 I think it’s actually been researched quite a bit.
01:02:00 You just don’t see publications about it.
01:02:02 Because people who publish papers
01:02:05 are going to publish about known benchmarks.
01:02:07 Sometimes they’re going to read a new benchmark.
01:02:09 People who actually have real world large scale
01:02:12 depending on problems, they’re going
01:02:13 to spend a lot of resources into data annotation
01:02:16 and good data annotation pipelines,
01:02:18 but you don’t see any papers about it.
01:02:19 That’s interesting.
01:02:20 So do you think, certainly resources,
01:02:22 but do you think there’s innovation happening?
01:02:24 Oh, yeah.
01:02:25 To clarify the point in the tweet.
01:02:28 So machine learning in general is
01:02:31 the science of generalization.
01:02:33 You want to generate knowledge that
01:02:37 can be reused across different data sets,
01:02:40 across different tasks.
01:02:42 And if instead you’re looking at one data set
01:02:45 and then you are hard coding knowledge about this task
01:02:50 into your architecture, this is no more useful
01:02:54 than training a network and then saying, oh, I
01:02:56 found these weight values perform well.
01:03:01 So David Ha, I don’t know if you know David,
01:03:05 he had a paper the other day about weight
01:03:08 agnostic neural networks.
01:03:10 And this is a very interesting paper
01:03:12 because it really illustrates the fact
01:03:14 that an architecture, even without weights,
01:03:17 an architecture is knowledge about a task.
01:03:21 It encodes knowledge.
01:03:23 And when it comes to architectures
01:03:25 that are uncrafted by researchers, in some cases,
01:03:30 it is very, very clear that all they are doing
01:03:34 is artificially reencoding the template that
01:03:38 corresponds to the proper way to solve the task encoding
01:03:44 a given data set.
01:03:45 For instance, I know if you looked
01:03:48 at the baby data set, which is about natural language
01:03:52 question answering, it is generated by an algorithm.
01:03:55 So this is a question answer pairs
01:03:57 that are generated by an algorithm.
01:03:59 The algorithm is solving a certain template.
01:04:01 Turns out, if you craft a network that
01:04:04 literally encodes this template, you
01:04:06 can solve this data set with nearly 100% accuracy.
01:04:09 But that doesn’t actually tell you
01:04:11 anything about how to solve question answering
01:04:14 in general, which is the point.
01:04:17 The question is just to linger on it,
01:04:19 whether it’s from the data side or from the size
01:04:21 of the network.
01:04:23 I don’t know if you’ve read the blog post by Rich Sutton,
01:04:25 The Bitter Lesson, where he says,
01:04:28 the biggest lesson that we can read from 70 years of AI
01:04:31 research is that general methods that leverage computation
01:04:34 are ultimately the most effective.
01:04:37 So as opposed to figuring out methods
01:04:39 that can generalize effectively, do you
01:04:41 think we can get pretty far by just having something
01:04:47 that leverages computation and the improvement of computation?
01:04:51 Yeah, so I think Rich is making a very good point, which
01:04:54 is that a lot of these papers, which are actually
01:04:57 all about manually hardcoding prior knowledge about a task
01:05:02 into some system, it doesn’t have
01:05:04 to be deep learning architecture, but into some system.
01:05:08 These papers are not actually making any impact.
01:05:11 Instead, what’s making really long term impact
01:05:14 is very simple, very general systems
01:05:18 that are really agnostic to all these tricks.
01:05:21 Because these tricks do not generalize.
01:05:23 And of course, the one general and simple thing
01:05:27 that you should focus on is that which leverages computation.
01:05:33 Because computation, the availability
01:05:36 of large scale computation has been increasing exponentially
01:05:39 following Moore’s law.
01:05:40 So if your algorithm is all about exploiting this,
01:05:44 then your algorithm is suddenly exponentially improving.
01:05:47 So I think Rich is definitely right.
01:05:52 However, he’s right about the past 70 years.
01:05:57 He’s like assessing the past 70 years.
01:05:59 I am not sure that this assessment will still
01:06:02 hold true for the next 70 years.
01:06:04 It might to some extent.
01:06:07 I suspect it will not.
01:06:08 Because the truth of his assessment
01:06:11 is a function of the context in which this research took place.
01:06:16 And the context is changing.
01:06:18 Moore’s law might not be applicable anymore,
01:06:21 for instance, in the future.
01:06:23 And I do believe that when you tweak one aspect of a system,
01:06:31 when you exploit one aspect of a system,
01:06:32 some other aspect starts becoming the bottleneck.
01:06:36 Let’s say you have unlimited computation.
01:06:38 Well, then data is the bottleneck.
01:06:41 And I think we are already starting
01:06:43 to be in a regime where our systems are
01:06:45 so large in scale and so data ingrained
01:06:48 that data today and the quality of data
01:06:50 and the scale of data is the bottleneck.
01:06:53 And in this environment, the bitter lesson from Rich
01:06:58 is not going to be true anymore.
01:07:00 So I think we are going to move from a focus
01:07:03 on a computation scale to focus on data efficiency.
01:07:09 Data efficiency.
01:07:10 So that’s getting to the question of symbolic AI.
01:07:13 But to linger on the deep learning approaches,
01:07:16 do you have hope for either unsupervised learning
01:07:19 or reinforcement learning, which are
01:07:23 ways of being more data efficient in terms
01:07:28 of the amount of data they need that required human annotation?
01:07:31 So unsupervised learning and reinforcement learning
01:07:34 are frameworks for learning, but they are not
01:07:36 like any specific technique.
01:07:39 So usually when people say reinforcement learning,
01:07:41 what they really mean is deep reinforcement learning,
01:07:43 which is like one approach which is actually very questionable.
01:07:47 The question I was asking was unsupervised learning
01:07:50 with deep neural networks and deep reinforcement learning.
01:07:54 Well, these are not really data efficient
01:07:56 because you’re still leveraging these huge parametric models
01:08:00 point by point with gradient descent.
01:08:03 It is more efficient in terms of the number of annotations,
01:08:08 the density of annotations you need.
01:08:09 So the idea being to learn the latent space around which
01:08:13 the data is organized and then map the sparse annotations
01:08:17 into it.
01:08:18 And sure, I mean, that’s clearly a very good idea.
01:08:23 It’s not really a topic I would be working on,
01:08:26 but it’s clearly a good idea.
01:08:28 So it would get us to solve some problems that?
01:08:31 It will get us to incremental improvements
01:08:34 in labeled data efficiency.
01:08:38 Do you have concerns about short term or long term threats
01:08:43 from AI, from artificial intelligence?
01:08:47 Yes, definitely to some extent.
01:08:50 And what’s the shape of those concerns?
01:08:52 This is actually something I’ve briefly written about.
01:08:56 But the capabilities of deep learning technology
01:09:02 can be used in many ways that are
01:09:05 concerning from mass surveillance with things
01:09:09 like facial recognition.
01:09:11 In general, tracking lots of data about everyone
01:09:15 and then being able to making sense of this data
01:09:18 to do identification, to do prediction.
01:09:22 That’s concerning.
01:09:23 That’s something that’s being very aggressively pursued
01:09:26 by totalitarian states like China.
01:09:31 One thing I am very much concerned about
01:09:34 is that our lives are increasingly online,
01:09:40 are increasingly digital, made of information,
01:09:43 made of information consumption and information production,
01:09:48 our digital footprint, I would say.
01:09:51 And if you absorb all of this data
01:09:56 and you are in control of where you consume information,
01:10:01 social networks and so on, recommendation engines,
01:10:06 then you can build a sort of reinforcement
01:10:10 loop for human behavior.
01:10:13 You can observe the state of your mind at time t.
01:10:18 You can predict how you would react
01:10:21 to different pieces of content, how
01:10:23 to get you to move your mind in a certain direction.
01:10:27 And then you can feed you the specific piece of content
01:10:33 that would move you in a specific direction.
01:10:35 And you can do this at scale in terms
01:10:41 of doing it continuously in real time.
01:10:44 You can also do it at scale in terms
01:10:46 of scaling this to many, many people, to entire populations.
01:10:50 So potentially, artificial intelligence,
01:10:53 even in its current state, if you combine it
01:10:57 with the internet, with the fact that all of our lives
01:11:01 are moving to digital devices and digital information
01:11:05 consumption and creation, what you get
01:11:08 is the possibility to achieve mass manipulation of behavior
01:11:14 and mass psychological control.
01:11:16 And this is a very real possibility.
01:11:18 Yeah, so you’re talking about any kind of recommender system.
01:11:22 Let’s look at the YouTube algorithm, Facebook,
01:11:26 anything that recommends content you should watch next.
01:11:29 And it’s fascinating to think that there’s
01:11:32 some aspects of human behavior that you can say a problem of,
01:11:41 is this person hold Republican beliefs or Democratic beliefs?
01:11:45 And this is a trivial, that’s an objective function.
01:11:50 And you can optimize, and you can measure,
01:11:52 and you can turn everybody into a Republican
01:11:54 or everybody into a Democrat.
01:11:56 I do believe it’s true.
01:11:57 So the human mind is very, if you look at the human mind
01:12:03 as a kind of computer program, it
01:12:05 has a very large exploit surface.
01:12:07 It has many, many vulnerabilities.
01:12:09 Exploit surfaces, yeah.
01:12:10 Ways you can control it.
01:12:13 For instance, when it comes to your political beliefs,
01:12:16 this is very much tied to your identity.
01:12:19 So for instance, if I’m in control of your news feed
01:12:23 on your favorite social media platforms,
01:12:26 this is actually where you’re getting your news from.
01:12:29 And of course, I can choose to only show you
01:12:32 news that will make you see the world in a specific way.
01:12:37 But I can also create incentives for you
01:12:41 to post about some political beliefs.
01:12:44 And then when I get you to express a statement,
01:12:47 if it’s a statement that me as the controller,
01:12:51 I want to reinforce.
01:12:53 I can just show it to people who will agree,
01:12:55 and they will like it.
01:12:56 And that will reinforce the statement in your mind.
01:12:59 If this is a statement I want you to,
01:13:02 this is a belief I want you to abandon,
01:13:05 I can, on the other hand, show it to opponents.
01:13:09 We’ll attack you.
01:13:10 And because they attack you, at the very least,
01:13:12 next time you will think twice about posting it.
01:13:16 But maybe you will even start believing this
01:13:20 because you got pushback.
01:13:22 So there are many ways in which social media platforms
01:13:28 can potentially control your opinions.
01:13:30 And today, so all of these things
01:13:35 are already being controlled by AI algorithms.
01:13:38 These algorithms do not have any explicit political goal
01:13:41 today.
01:13:42 Well, potentially they could, like if some totalitarian
01:13:48 government takes over social media platforms
01:13:52 and decides that now we are going to use this not just
01:13:55 for mass surveillance, but also for mass opinion control
01:13:58 and behavior control.
01:13:59 Very bad things could happen.
01:14:01 But what’s really fascinating and actually quite concerning
01:14:06 is that even without an explicit intent to manipulate,
01:14:11 you’re already seeing very dangerous dynamics
01:14:14 in terms of how these content recommendation
01:14:18 algorithms behave.
01:14:19 Because right now, the goal, the objective function
01:14:24 of these algorithms is to maximize engagement,
01:14:28 which seems fairly innocuous at first.
01:14:32 However, it is not because content
01:14:36 that will maximally engage people, get people to react
01:14:42 in an emotional way, get people to click on something.
01:14:44 It is very often content that is not
01:14:52 healthy to the public discourse.
01:14:54 For instance, fake news are far more
01:14:58 likely to get you to click on them than real news
01:15:01 simply because they are not constrained to reality.
01:15:06 So they can be as outrageous, as surprising,
01:15:11 as good stories as you want because they’re artificial.
01:15:15 To me, that’s an exciting world because so much good
01:15:18 can come.
01:15:19 So there’s an opportunity to educate people.
01:15:24 You can balance people’s worldview with other ideas.
01:15:31 So there’s so many objective functions.
01:15:33 The space of objective functions that
01:15:35 create better civilizations is large, arguably infinite.
01:15:40 But there’s also a large space that
01:15:43 creates division and destruction, civil war,
01:15:51 a lot of bad stuff.
01:15:53 And the worry is, naturally, probably that space
01:15:56 is bigger, first of all.
01:15:59 And if we don’t explicitly think about what kind of effects
01:16:04 are going to be observed from different objective functions,
01:16:08 then we’re going to get into trouble.
01:16:10 But the question is, how do we get into rooms
01:16:14 and have discussions, so inside Google, inside Facebook,
01:16:18 inside Twitter, and think about, OK,
01:16:21 how can we drive up engagement and, at the same time,
01:16:24 create a good society?
01:16:28 Is it even possible to have that kind
01:16:29 of philosophical discussion?
01:16:31 I think you can definitely try.
01:16:33 So from my perspective, I would feel rather uncomfortable
01:16:37 with companies that are uncomfortable with these new
01:16:41 student algorithms, with them making explicit decisions
01:16:47 to manipulate people’s opinions or behaviors,
01:16:50 even if the intent is good, because that’s
01:16:53 a very totalitarian mindset.
01:16:55 So instead, what I would like to see
01:16:57 is probably never going to happen,
01:16:58 because it’s not super realistic,
01:17:00 but that’s actually something I really care about.
01:17:02 I would like all these algorithms
01:17:06 to present configuration settings to their users,
01:17:10 so that the users can actually make the decision about how
01:17:14 they want to be impacted by these information
01:17:19 recommendation, content recommendation algorithms.
01:17:21 For instance, as a user of something
01:17:24 like YouTube or Twitter, maybe I want
01:17:26 to maximize learning about a specific topic.
01:17:30 So I want the algorithm to feed my curiosity,
01:17:36 which is in itself a very interesting problem.
01:17:38 So instead of maximizing my engagement,
01:17:41 it will maximize how fast and how much I’m learning.
01:17:44 And it will also take into account the accuracy,
01:17:47 hopefully, of the information I’m learning.
01:17:50 So yeah, the user should be able to determine exactly
01:17:55 how these algorithms are affecting their lives.
01:17:58 I don’t want actually any entity making decisions
01:18:03 about in which direction they’re going to try to manipulate me.
01:18:09 I want technology.
01:18:11 So AI, these algorithms are increasingly
01:18:14 going to be our interface to a world that is increasingly
01:18:18 made of information.
01:18:19 And I want everyone to be in control of this interface,
01:18:25 to interface with the world on their own terms.
01:18:29 So if someone wants these algorithms
01:18:32 to serve their own personal growth goals,
01:18:37 they should be able to configure these algorithms
01:18:40 in such a way.
01:18:41 Yeah, but so I know it’s painful to have explicit decisions.
01:18:46 But there is underlying explicit decisions,
01:18:51 which is some of the most beautiful fundamental
01:18:53 philosophy that we have before us,
01:18:57 which is personal growth.
01:19:01 If I want to watch videos from which I can learn,
01:19:05 what does that mean?
01:19:08 So if I have a checkbox that wants to emphasize learning,
01:19:11 there’s still an algorithm with explicit decisions in it
01:19:15 that would promote learning.
01:19:17 What does that mean for me?
01:19:19 For example, I’ve watched a documentary on flat Earth
01:19:22 theory, I guess.
01:19:27 I learned a lot.
01:19:28 I’m really glad I watched it.
01:19:29 It was a friend recommended it to me.
01:19:32 Because I don’t have such an allergic reaction to crazy
01:19:35 people, as my fellow colleagues do.
01:19:37 But it was very eye opening.
01:19:40 And for others, it might not be.
01:19:42 From others, they might just get turned off from that, same
01:19:45 with Republican and Democrat.
01:19:47 And it’s a non trivial problem.
01:19:50 And first of all, if it’s done well,
01:19:52 I don’t think it’s something that wouldn’t happen,
01:19:56 that YouTube wouldn’t be promoting,
01:19:59 or Twitter wouldn’t be.
01:20:00 It’s just a really difficult problem,
01:20:02 how to give people control.
01:20:05 Well, it’s mostly an interface design problem.
01:20:08 The way I see it, you want to create technology
01:20:11 that’s like a mentor, or a coach, or an assistant,
01:20:16 so that it’s not your boss.
01:20:20 You are in control of it.
01:20:22 You are telling it what to do for you.
01:20:25 And if you feel like it’s manipulating you,
01:20:27 it’s not actually doing what you want.
01:20:31 You should be able to switch to a different algorithm.
01:20:34 So that’s fine tune control.
01:20:36 You kind of learn that you’re trusting
01:20:38 the human collaboration.
01:20:40 I mean, that’s how I see autonomous vehicles too,
01:20:41 is giving as much information as possible,
01:20:44 and you learn that dance yourself.
01:20:47 Yeah, Adobe, I don’t know if you use Adobe product
01:20:50 for like Photoshop.
01:20:52 They’re trying to see if they can inject YouTube
01:20:55 into their interface, but basically allow you
01:20:57 to show you all these videos,
01:20:59 that everybody’s confused about what to do with features.
01:21:03 So basically teach people by linking to,
01:21:07 in that way, it’s an assistant that uses videos
01:21:10 as a basic element of information.
01:21:13 Okay, so what practically should people do
01:21:18 to try to fight against abuses of these algorithms,
01:21:24 or algorithms that manipulate us?
01:21:27 Honestly, it’s a very, very difficult problem,
01:21:29 because to start with, there is very little public awareness
01:21:32 of these issues.
01:21:35 Very few people would think there’s anything wrong
01:21:38 with the unused algorithm,
01:21:39 even though there is actually something wrong already,
01:21:42 which is that it’s trying to maximize engagement
01:21:44 most of the time, which has very negative side effects.
01:21:49 So ideally, so the very first thing is to stop
01:21:56 trying to purely maximize engagement,
01:21:59 try to propagate content based on popularity, right?
01:22:06 Instead, take into account the goals
01:22:11 and the profiles of each user.
01:22:13 So you will be, one example is, for instance,
01:22:16 when I look at topic recommendations on Twitter,
01:22:20 it’s like, you know, they have this news tab
01:22:24 with switch recommendations.
01:22:25 It’s always the worst coverage,
01:22:28 because it’s content that appeals
01:22:30 to the smallest common denominator
01:22:34 to all Twitter users, because they’re trying to optimize.
01:22:37 They’re purely trying to optimize popularity.
01:22:39 They’re purely trying to optimize engagement.
01:22:41 But that’s not what I want.
01:22:42 So they should put me in control of some setting
01:22:46 so that I define what’s the objective function
01:22:50 that Twitter is going to be following
01:22:52 to show me this content.
01:22:54 And honestly, so this is all about interface design.
01:22:57 And we are not, it’s not realistic
01:22:59 to give users control of a bunch of knobs
01:23:01 that define algorithm.
01:23:03 Instead, we should purely put them in charge
01:23:06 of defining the objective function.
01:23:09 Like, let the user tell us what they want to achieve,
01:23:13 how they want this algorithm to impact their lives.
01:23:15 So do you think it is that,
01:23:16 or do they provide individual article by article
01:23:19 reward structure where you give a signal,
01:23:21 I’m glad I saw this, or I’m glad I didn’t?
01:23:24 So like a Spotify type feedback mechanism,
01:23:28 it works to some extent.
01:23:30 I’m kind of skeptical about it
01:23:32 because the only way the algorithm,
01:23:34 the algorithm will attempt to relate your choices
01:23:39 with the choices of everyone else,
01:23:41 which might, you know, if you have an average profile
01:23:45 that works fine, I’m sure Spotify accommodations work fine
01:23:47 if you just like mainstream stuff.
01:23:49 If you don’t, it can be, it’s not optimal at all actually.
01:23:53 It’ll be in an efficient search
01:23:56 for the part of the Spotify world that represents you.
01:24:00 So it’s a tough problem,
01:24:02 but do note that even a feedback system
01:24:07 like what Spotify has does not give me control
01:24:10 over what the algorithm is trying to optimize for.
01:24:16 Well, public awareness, which is what we’re doing now,
01:24:19 is a good place to start.
01:24:21 Do you have concerns about longterm existential threats
01:24:25 of artificial intelligence?
01:24:28 Well, as I was saying,
01:24:31 our world is increasingly made of information.
01:24:33 AI algorithms are increasingly going to be our interface
01:24:36 to this world of information,
01:24:37 and somebody will be in control of these algorithms.
01:24:41 And that puts us in any kind of a bad situation, right?
01:24:45 It has risks.
01:24:46 It has risks coming from potentially large companies
01:24:50 wanting to optimize their own goals,
01:24:53 maybe profit, maybe something else.
01:24:55 Also from governments who might want to use these algorithms
01:25:00 as a means of control of the population.
01:25:03 Do you think there’s existential threat
01:25:05 that could arise from that?
01:25:06 So existential threat.
01:25:09 So maybe you’re referring to the singularity narrative
01:25:13 where robots just take over.
01:25:15 Well, I don’t, I’m not terminating robots,
01:25:18 and I don’t believe it has to be a singularity.
01:25:21 We’re just talking to, just like you said,
01:25:24 the algorithm controlling masses of populations.
01:25:28 The existential threat being,
01:25:32 hurt ourselves much like a nuclear war would hurt ourselves.
01:25:36 That kind of thing.
01:25:37 I don’t think that requires a singularity.
01:25:39 That requires a loss of control over AI algorithm.
01:25:42 Yes.
01:25:43 So I do agree there are concerning trends.
01:25:47 Honestly, I wouldn’t want to make any longterm predictions.
01:25:52 I don’t think today we really have the capability
01:25:56 to see what the dangers of AI
01:25:58 are going to be in 50 years, in 100 years.
01:26:01 I do see that we are already faced
01:26:04 with concrete and present dangers
01:26:08 surrounding the negative side effects
01:26:11 of content recombination systems, of newsfeed algorithms
01:26:14 concerning algorithmic bias as well.
01:26:18 So we are delegating more and more
01:26:22 decision processes to algorithms.
01:26:25 Some of these algorithms are uncrafted,
01:26:26 some are learned from data,
01:26:29 but we are delegating control.
01:26:32 Sometimes it’s a good thing, sometimes not so much.
01:26:36 And there is in general very little supervision
01:26:39 of this process, right?
01:26:41 So we are still in this period of very fast change,
01:26:45 even chaos, where society is restructuring itself,
01:26:50 turning into an information society,
01:26:53 which itself is turning into
01:26:54 an increasingly automated information passing society.
01:26:58 And well, yeah, I think the best we can do today
01:27:02 is try to raise awareness around some of these issues.
01:27:06 And I think we’re actually making good progress.
01:27:07 If you look at algorithmic bias, for instance,
01:27:12 three years ago, even two years ago,
01:27:14 very, very few people were talking about it.
01:27:17 And now all the big companies are talking about it.
01:27:20 They are often not in a very serious way,
01:27:22 but at least it is part of the public discourse.
01:27:24 You see people in Congress talking about it.
01:27:27 And it all started from raising awareness.
01:27:31 Right.
01:27:32 So in terms of alignment problem,
01:27:36 trying to teach as we allow algorithms,
01:27:39 just even recommender systems on Twitter,
01:27:43 encoding human values and morals,
01:27:48 decisions that touch on ethics,
01:27:50 how hard do you think that problem is?
01:27:52 How do we have lost functions in neural networks
01:27:57 that have some component,
01:27:58 some fuzzy components of human morals?
01:28:01 Well, I think this is really all about objective function engineering,
01:28:06 which is probably going to be increasingly a topic of concern in the future.
01:28:10 Like for now, we’re just using very naive loss functions
01:28:14 because the hard part is not actually what you’re trying to minimize.
01:28:17 It’s everything else.
01:28:19 But as the everything else is going to be increasingly automated,
01:28:22 we’re going to be focusing our human attention
01:28:27 on increasingly high level components,
01:28:30 like what’s actually driving the whole learning system,
01:28:32 like the objective function.
01:28:33 So loss function engineering is going to be,
01:28:36 loss function engineer is probably going to be a job title in the future.
01:28:40 And then the tooling you’re creating with Keras essentially
01:28:44 takes care of all the details underneath.
01:28:47 And basically the human expert is needed for exactly that.
01:28:52 That’s the idea.
01:28:53 Keras is the interface between the data you’re collecting
01:28:57 and the business goals.
01:28:59 And your job as an engineer is going to be to express your business goals
01:29:03 and your understanding of your business or your product,
01:29:06 your system as a kind of loss function or a kind of set of constraints.
01:29:11 Does the possibility of creating an AGI system excite you or scare you or bore you?
01:29:19 So intelligence can never really be general.
01:29:22 You know, at best it can have some degree of generality like human intelligence.
01:29:26 It also always has some specialization in the same way that human intelligence
01:29:30 is specialized in a certain category of problems,
01:29:33 is specialized in the human experience.
01:29:35 And when people talk about AGI,
01:29:37 I’m never quite sure if they’re talking about very, very smart AI,
01:29:42 so smart that it’s even smarter than humans,
01:29:45 or they’re talking about human like intelligence,
01:29:48 because these are different things.
01:29:49 Let’s say, presumably I’m oppressing you today with my humanness.
01:29:54 So imagine that I was in fact a robot.
01:29:59 So what does that mean?
01:30:01 That I’m impressing you with natural language processing.
01:30:04 Maybe if you weren’t able to see me, maybe this is a phone call.
01:30:07 So that kind of system.
01:30:10 Companion.
01:30:11 So that’s very much about building human like AI.
01:30:15 And you’re asking me, you know, is this an exciting perspective?
01:30:18 Yes.
01:30:19 I think so, yes.
01:30:21 Not so much because of what artificial human like intelligence could do,
01:30:28 but, you know, from an intellectual perspective,
01:30:30 I think if you could build truly human like intelligence,
01:30:34 that means you could actually understand human intelligence,
01:30:37 which is fascinating, right?
01:30:39 Human like intelligence is going to require emotions.
01:30:42 It’s going to require consciousness,
01:30:44 which is not things that would normally be required by an intelligent system.
01:30:49 If you look at, you know, we were mentioning earlier like science
01:30:53 as a superhuman problem solving agent or system,
01:30:59 it does not have consciousness, it doesn’t have emotions.
01:31:02 In general, so emotions,
01:31:04 I see consciousness as being on the same spectrum as emotions.
01:31:07 It is a component of the subjective experience
01:31:12 that is meant very much to guide behavior generation, right?
01:31:18 It’s meant to guide your behavior.
01:31:20 In general, human intelligence and animal intelligence
01:31:24 has evolved for the purpose of behavior generation, right?
01:31:29 Including in a social context.
01:31:30 So that’s why we actually need emotions.
01:31:32 That’s why we need consciousness.
01:31:34 An artificial intelligence system developed in a different context
01:31:38 may well never need them, may well never be conscious like science.
01:31:42 Well, on that point, I would argue it’s possible to imagine
01:31:47 that there’s echoes of consciousness in science
01:31:51 when viewed as an organism, that science is consciousness.
01:31:55 So, I mean, how would you go about testing this hypothesis?
01:31:59 How do you probe the subjective experience of an abstract system like science?
01:32:07 Well, the point of probing any subjective experience is impossible
01:32:10 because I’m not science, I’m Lex.
01:32:13 So I can’t probe another entity, it’s no more than bacteria on my skin.
01:32:20 You’re Lex, I can ask you questions about your subjective experience
01:32:24 and you can answer me, and that’s how I know you’re conscious.
01:32:28 Yes, but that’s because we speak the same language.
01:32:31 You perhaps, we have to speak the language of science in order to ask it.
01:32:35 Honestly, I don’t think consciousness, just like emotions of pain and pleasure,
01:32:40 is not something that inevitably arises
01:32:44 from any sort of sufficiently intelligent information processing.
01:32:47 It is a feature of the mind, and if you’ve not implemented it explicitly, it is not there.
01:32:53 So you think it’s an emergent feature of a particular architecture.
01:32:58 So do you think…
01:33:00 It’s a feature in the same sense.
01:33:02 So, again, the subjective experience is all about guiding behavior.
01:33:08 If the problems you’re trying to solve don’t really involve an embodied agent,
01:33:15 maybe in a social context, generating behavior and pursuing goals like this.
01:33:19 And if you look at science, that’s not really what’s happening.
01:33:22 Even though it is, it is a form of artificial AI, artificial intelligence,
01:33:27 in the sense that it is solving problems, it is accumulating knowledge,
01:33:31 accumulating solutions and so on.
01:33:35 So if you’re not explicitly implementing a subjective experience,
01:33:39 implementing certain emotions and implementing consciousness,
01:33:44 it’s not going to just spontaneously emerge.
01:33:47 Yeah.
01:33:48 But so for a system like, human like intelligence system that has consciousness,
01:33:53 do you think it needs to have a body?
01:33:55 Yes, definitely.
01:33:56 I mean, it doesn’t have to be a physical body, right?
01:33:59 And there’s not that much difference between a realistic simulation in the real world.
01:34:03 So there has to be something you have to preserve kind of thing.
01:34:06 Yes, but human like intelligence can only arise in a human like context.
01:34:11 Intelligence needs other humans in order for you to demonstrate
01:34:16 that you have human like intelligence, essentially.
01:34:19 Yes.
01:34:20 So what kind of tests and demonstration would be sufficient for you
01:34:28 to demonstrate human like intelligence?
01:34:30 Yeah.
01:34:31 Just out of curiosity, you’ve talked about in terms of theorem proving
01:34:35 and program synthesis, I think you’ve written about
01:34:38 that there’s no good benchmarks for this.
01:34:40 Yeah.
01:34:40 That’s one of the problems.
01:34:42 So let’s talk program synthesis.
01:34:46 So what do you imagine is a good…
01:34:48 I think it’s related questions for human like intelligence
01:34:51 and for program synthesis.
01:34:53 What’s a good benchmark for either or both?
01:34:56 Right.
01:34:56 So I mean, you’re actually asking two questions,
01:34:59 which is one is about quantifying intelligence
01:35:02 and comparing the intelligence of an artificial system
01:35:06 to the intelligence for human.
01:35:08 And the other is about the degree to which this intelligence is human like.
01:35:13 It’s actually two different questions.
01:35:16 So you mentioned earlier the Turing test.
01:35:19 Well, I actually don’t like the Turing test because it’s very lazy.
01:35:23 It’s all about completely bypassing the problem of defining and measuring intelligence
01:35:28 and instead delegating to a human judge or a panel of human judges.
01:35:34 So it’s a total copout, right?
01:35:38 If you want to measure how human like an agent is,
01:35:43 I think you have to make it interact with other humans.
01:35:47 Maybe it’s not necessarily a good idea to have these other humans be the judges.
01:35:53 Maybe you should just observe behavior and compare it to what a human would actually have done.
01:36:00 When it comes to measuring how smart, how clever an agent is
01:36:05 and comparing that to the degree of human intelligence.
01:36:11 So we’re already talking about two things, right?
01:36:13 The degree, kind of like the magnitude of an intelligence and its direction, right?
01:36:20 Like the norm of a vector and its direction.
01:36:23 And the direction is like human likeness and the magnitude, the norm is intelligence.
01:36:32 You could call it intelligence, right?
01:36:34 So the direction, your sense, the space of directions that are human like is very narrow.
01:36:41 Yeah.
01:36:42 So the way you would measure the magnitude of intelligence in a system
01:36:48 in a way that also enables you to compare it to that of a human.
01:36:54 Well, if you look at different benchmarks for intelligence today,
01:36:59 they’re all too focused on skill at a given task.
01:37:04 Like skill at playing chess, skill at playing Go, skill at playing Dota.
01:37:10 And I think that’s not the right way to go about it because you can always
01:37:15 beat a human at one specific task.
01:37:19 The reason why our skill at playing Go or juggling or anything is impressive
01:37:23 is because we are expressing this skill within a certain set of constraints.
01:37:28 If you remove the constraints, the constraints that we have one lifetime,
01:37:32 that we have this body and so on, if you remove the context,
01:37:36 if you have unlimited string data, if you can have access to, you know,
01:37:40 for instance, if you look at juggling, if you have no restriction on the hardware,
01:37:44 then achieving arbitrary levels of skill is not very interesting
01:37:48 and says nothing about the amount of intelligence you’ve achieved.
01:37:52 So if you want to measure intelligence, you need to rigorously define what
01:37:57 intelligence is, which in itself, you know, it’s a very challenging problem.
01:38:02 And do you think that’s possible?
01:38:04 To define intelligence? Yes, absolutely.
01:38:06 I mean, you can provide, many people have provided, you know, some definition.
01:38:10 I have my own definition.
01:38:12 Where does your definition begin?
01:38:13 Where does your definition begin if it doesn’t end?
01:38:16 Well, I think intelligence is essentially the efficiency
01:38:22 with which you turn experience into generalizable programs.
01:38:29 So what that means is it’s the efficiency with which
01:38:32 you turn a sampling of experience space into
01:38:36 the ability to process a larger chunk of experience space.
01:38:46 So measuring skill can be one proxy across many different tasks,
01:38:52 can be one proxy for measuring intelligence.
01:38:54 But if you want to only measure skill, you should control for two things.
01:38:58 You should control for the amount of experience that your system has
01:39:04 and the priors that your system has.
01:39:08 But if you look at two agents and you give them the same priors
01:39:13 and you give them the same amount of experience,
01:39:16 there is one of the agents that is going to learn programs,
01:39:21 representations, something, a model that will perform well
01:39:25 on the larger chunk of experience space than the other.
01:39:28 And that is the smaller agent.
01:39:30 Yeah. So if you fix the experience, which generate better programs,
01:39:37 better meaning more generalizable.
01:39:39 That’s really interesting.
01:39:40 That’s a very nice, clean definition of…
01:39:42 Oh, by the way, in this definition, it is already very obvious
01:39:47 that intelligence has to be specialized
01:39:49 because you’re talking about experience space
01:39:51 and you’re talking about segments of experience space.
01:39:54 You’re talking about priors and you’re talking about experience.
01:39:57 All of these things define the context in which intelligence emerges.
01:40:04 And you can never look at the totality of experience space, right?
01:40:09 So intelligence has to be specialized.
01:40:12 But it can be sufficiently large, the experience space,
01:40:14 even though it’s specialized.
01:40:16 There’s a certain point when the experience space is large enough
01:40:19 to where it might as well be general.
01:40:22 It feels general. It looks general.
01:40:23 Sure. I mean, it’s very relative.
01:40:25 Like, for instance, many people would say human intelligence is general.
01:40:29 In fact, it is quite specialized.
01:40:32 We can definitely build systems that start from the same innate priors
01:40:37 as what humans have at birth.
01:40:39 Because we already understand fairly well
01:40:42 what sort of priors we have as humans.
01:40:44 Like many people have worked on this problem.
01:40:46 Most notably, Elisabeth Spelke from Harvard.
01:40:51 I don’t know if you know her.
01:40:52 She’s worked a lot on what she calls core knowledge.
01:40:56 And it is very much about trying to determine and describe
01:41:00 what priors we are born with.
01:41:02 Like language skills and so on, all that kind of stuff.
01:41:04 Exactly.
01:41:06 So we have some pretty good understanding of what priors we are born with.
01:41:11 So we could…
01:41:13 So I’ve actually been working on a benchmark for the past couple years,
01:41:17 you know, on and off.
01:41:18 I hope to be able to release it at some point.
01:41:20 That’s exciting.
01:41:21 The idea is to measure the intelligence of systems
01:41:26 by countering for priors,
01:41:28 countering for amount of experience,
01:41:30 and by assuming the same priors as what humans are born with.
01:41:34 So that you can actually compare these scores to human intelligence.
01:41:39 You can actually have humans pass the same test in a way that’s fair.
01:41:43 Yeah. And so importantly, such a benchmark should be such that any amount
01:41:52 of practicing does not increase your score.
01:41:56 So try to picture a game where no matter how much you play this game,
01:42:01 that does not change your skill at the game.
01:42:05 Can you picture that?
01:42:05 As a person who deeply appreciates practice, I cannot actually.
01:42:11 There’s actually a very simple trick.
01:42:16 So in order to come up with a task,
01:42:19 so the only thing you can measure is skill at the task.
01:42:21 Yes.
01:42:22 All tasks are going to involve priors.
01:42:24 Yes.
01:42:25 The trick is to know what they are and to describe that.
01:42:29 And then you make sure that this is the same set of priors as what humans start with.
01:42:33 So you create a task that assumes these priors, that exactly documents these priors,
01:42:38 so that the priors are made explicit and there are no other priors involved.
01:42:42 And then you generate a certain number of samples in experience space for this task, right?
01:42:49 And this, for one task, assuming that the task is new for the agent passing it,
01:42:56 that’s one test of this definition of intelligence that we set up.
01:43:04 And now you can scale that to many different tasks,
01:43:06 that each task should be new to the agent passing it, right?
01:43:11 And also it should be human interpretable and understandable
01:43:14 so that you can actually have a human pass the same test.
01:43:16 And then you can compare the score of your machine and the score of your human.
01:43:19 Which could be a lot of stuff.
01:43:20 You could even start a task like MNIST.
01:43:23 Just as long as you start with the same set of priors.
01:43:28 So the problem with MNIST, humans are already trying to recognize digits, right?
01:43:35 But let’s say we’re considering objects that are not digits,
01:43:42 some completely arbitrary patterns.
01:43:44 Well, humans already come with visual priors about how to process that.
01:43:48 So in order to make the game fair, you would have to isolate these priors
01:43:54 and describe them and then express them as computational rules.
01:43:57 Having worked a lot with vision science people, that’s exceptionally difficult.
01:44:01 A lot of progress has been made.
01:44:03 There’s been a lot of good tests and basically reducing all of human vision into some good priors.
01:44:08 We’re still probably far away from that perfectly,
01:44:10 but as a start for a benchmark, that’s an exciting possibility.
01:44:14 Yeah, so Elisabeth Spelke actually lists objectness as one of the core knowledge priors.
01:44:24 Objectness, cool.
01:44:25 Objectness, yeah.
01:44:27 So we have priors about objectness, like about the visual space, about time,
01:44:31 about agents, about goal oriented behavior.
01:44:35 We have many different priors, but what’s interesting is that,
01:44:39 sure, we have this pretty diverse and rich set of priors,
01:44:43 but it’s also not that diverse, right?
01:44:46 We are not born into this world with a ton of knowledge about the world,
01:44:50 with only a small set of core knowledge.
01:44:58 Yeah, sorry, do you have a sense of how it feels to us humans that that set is not that large?
01:45:05 But just even the nature of time that we kind of integrate pretty effectively
01:45:09 through all of our perception, all of our reasoning,
01:45:12 maybe how, you know, do you have a sense of how easy it is to encode those priors?
01:45:17 Maybe it requires building a universe and then the human brain in order to encode those priors.
01:45:25 Or do you have a hope that it can be listed like an axiomatic?
01:45:28 I don’t think so.
01:45:29 So you have to keep in mind that any knowledge about the world that we are
01:45:33 born with is something that has to have been encoded into our DNA by evolution at some point.
01:45:41 Right.
01:45:41 And DNA is a very, very low bandwidth medium.
01:45:46 Like it’s extremely long and expensive to encode anything into DNA because first of all,
01:45:52 you need some sort of evolutionary pressure to guide this writing process.
01:45:57 And then, you know, the higher level of information you’re trying to write, the longer it’s going to take.
01:46:04 And the thing in the environment that you’re trying to encode knowledge about has to be stable
01:46:13 over this duration.
01:46:15 So you can only encode into DNA things that constitute an evolutionary advantage.
01:46:20 So this is actually a very small subset of all possible knowledge about the world.
01:46:25 You can only encode things that are stable, that are true, over very, very long periods of time,
01:46:32 typically millions of years.
01:46:33 For instance, we might have some visual prior about the shape of snakes, right?
01:46:38 But what makes a face, what’s the difference between a face and an art face?
01:46:44 But consider this interesting question.
01:46:48 Do we have any innate sense of the visual difference between a male face and a female face?
01:46:56 What do you think?
01:46:58 For a human, I mean.
01:46:59 I would have to look back into evolutionary history when the genders emerged.
01:47:04 But yeah, most…
01:47:06 I mean, the faces of humans are quite different from the faces of great apes.
01:47:10 Great apes, right?
01:47:12 Yeah.
01:47:13 That’s interesting.
01:47:14 Yeah, you couldn’t tell the face of a female chimpanzee from the face of a male chimpanzee,
01:47:22 probably.
01:47:23 Yeah, and I don’t think most humans have all that ability.
01:47:26 So we do have innate knowledge of what makes a face, but it’s actually impossible for us to
01:47:33 have any DNA encoded knowledge of the difference between a female human face and a male human face
01:47:40 because that knowledge, that information came up into the world actually very recently.
01:47:50 If you look at the slowness of the process of encoding knowledge into DNA.
01:47:56 Yeah, so that’s interesting.
01:47:57 That’s a really powerful argument that DNA is a low bandwidth and it takes a long time to encode.
01:48:02 That naturally creates a very efficient encoding.
01:48:05 But one important consequence of this is that, so yes, we are born into this world with a bunch of
01:48:12 knowledge, sometimes high level knowledge about the world, like the shape, the rough shape of a
01:48:17 snake, of the rough shape of a face.
01:48:20 But importantly, because this knowledge takes so long to write, almost all of this innate
01:48:26 knowledge is shared with our cousins, with great apes, right?
01:48:32 So it is not actually this innate knowledge that makes us special.
01:48:36 But to throw it right back at you from the earlier on in our discussion, it’s that encoding
01:48:42 might also include the entirety of the environment of Earth.
01:48:49 To some extent.
01:48:49 So it can include things that are important to survival and production, so for which there is
01:48:56 some evolutionary pressure, and things that are stable, constant over very, very, very long time
01:49:02 periods.
01:49:04 And honestly, it’s not that much information.
01:49:06 There’s also, besides the bandwidths constraint and the constraints of the writing process,
01:49:14 there’s also memory constraints, like DNA, the part of DNA that deals with the human brain,
01:49:21 it’s actually fairly small.
01:49:22 It’s like, you know, on the order of megabytes, right?
01:49:25 There’s not that much high level knowledge about the world you can encode.
01:49:31 That’s quite brilliant and hopeful for a benchmark that you’re referring to of encoding
01:49:38 priors.
01:49:39 I actually look forward to, I’m skeptical whether you can do it in the next couple of
01:49:43 years, but hopefully.
01:49:45 I’ve been working.
01:49:45 So honestly, it’s a very simple benchmark, and it’s not like a big breakthrough or anything.
01:49:49 It’s more like a fun side project, right?
01:49:53 But these fun, so is ImageNet.
01:49:56 These fun side projects could launch entire groups of efforts towards creating reasoning
01:50:04 systems and so on.
01:50:04 And I think…
01:50:05 Yeah, that’s the goal.
01:50:06 It’s trying to measure strong generalization, to measure the strength of abstraction in
01:50:12 our minds, well, in our minds and in artificial intelligence agencies.
01:50:16 And if there’s anything true about this science organism is its individual cells love competition.
01:50:24 So and benchmarks encourage competition.
01:50:26 So that’s an exciting possibility.
01:50:29 If you, do you think an AI winter is coming?
01:50:33 And how do we prevent it?
01:50:35 Not really.
01:50:36 So an AI winter is something that would occur when there’s a big mismatch between how we
01:50:42 are selling the capabilities of AI and the actual capabilities of AI.
01:50:47 And today, some deep learning is creating a lot of value.
01:50:50 And it will keep creating a lot of value in the sense that these models are applicable
01:50:56 to a very wide range of problems that are relevant today.
01:51:00 And we are only just getting started with applying these algorithms to every problem
01:51:05 they could be solving.
01:51:06 So deep learning will keep creating a lot of value for the time being.
01:51:10 What’s concerning, however, is that there’s a lot of hype around deep learning and around
01:51:15 AI.
01:51:16 There are lots of people are overselling the capabilities of these systems, not just
01:51:22 the capabilities, but also overselling the fact that they might be more or less, you
01:51:27 know, brain like, like given the kind of a mystical aspect, these technologies and also
01:51:36 overselling the pace of progress, which, you know, it might look fast in the sense that
01:51:43 we have this exponentially increasing number of papers.
01:51:47 But again, that’s just a simple consequence of the fact that we have ever more people
01:51:52 coming into the field.
01:51:54 It doesn’t mean the progress is actually exponentially fast.
01:51:58 Let’s say you’re trying to raise money for your startup or your research lab.
01:52:02 You might want to tell, you know, a grandiose story to investors about how deep learning
01:52:09 is just like the brain and how it can solve all these incredible problems like self driving
01:52:14 and robotics and so on.
01:52:15 And maybe you can tell them that the field is progressing so fast and we are going to
01:52:19 have AGI within 15 years or even 10 years.
01:52:23 And none of this is true.
01:52:25 And every time you’re like saying these things and an investor or, you know, a decision maker
01:52:32 believes them, well, this is like the equivalent of taking on credit card debt, but for trust,
01:52:41 right?
01:52:42 And maybe this will, you know, this will be what enables you to raise a lot of money,
01:52:50 but ultimately you are creating damage, you are damaging the field.
01:52:54 So that’s the concern is that that debt, that’s what happens with the other AI winters is
01:53:00 the concern is you actually tweeted about this with autonomous vehicles, right?
01:53:04 There’s almost every single company now have promised that they will have full autonomous
01:53:08 vehicles by 2021, 2022.
01:53:11 That’s a good example of the consequences of over hyping the capabilities of AI and
01:53:18 the pace of progress.
01:53:19 So because I work especially a lot recently in this area, I have a deep concern of what
01:53:25 happens when all of these companies after I’ve invested billions have a meeting and
01:53:30 say, how much do we actually, first of all, do we have an autonomous vehicle?
01:53:33 The answer will definitely be no.
01:53:35 And second will be, wait a minute, we’ve invested one, two, three, four billion dollars
01:53:40 into this and we made no profit.
01:53:43 And the reaction to that may be going very hard in other directions that might impact
01:53:49 even other industries.
01:53:50 And that’s what we call an AI winter is when there is backlash where no one believes any
01:53:55 of these promises anymore because they’ve turned that to be big lies the first time
01:53:59 around.
01:54:00 And this will definitely happen to some extent for autonomous vehicles because the public
01:54:06 and decision makers have been convinced that around 2015, they’ve been convinced by these
01:54:13 people who are trying to raise money for their startups and so on, that L5 driving was coming
01:54:19 in maybe 2016, maybe 2017, maybe 2018.
01:54:22 Now we’re in 2019, we’re still waiting for it.
01:54:27 And so I don’t believe we are going to have a full on AI winter because we have these
01:54:32 technologies that are producing a tremendous amount of real value.
01:54:37 But there is also too much hype.
01:54:39 So there will be some backlash, especially there will be backlash.
01:54:44 So some startups are trying to sell the dream of AGI and the fact that AGI is going to create
01:54:53 infinite value.
01:54:53 Like AGI is like a free lunch.
01:54:55 Like if you can develop an AI system that passes a certain threshold of IQ or something,
01:55:02 then suddenly you have infinite value.
01:55:04 And well, there are actually lots of investors buying into this idea and they will wait maybe
01:55:14 10, 15 years and nothing will happen.
01:55:17 And the next time around, well, maybe there will be a new generation of investors.
01:55:22 No one will care.
01:55:24 Human memory is fairly short after all.
01:55:27 I don’t know about you, but because I’ve spoken about AGI sometimes poetically, I get a lot
01:55:34 of emails from people giving me, they’re usually like a large manifestos of they’ve, they say
01:55:42 to me that they have created an AGI system or they know how to do it.
01:55:47 And there’s a long write up of how to do it.
01:55:48 I get a lot of these emails, yeah.
01:55:50 They’re a little bit feel like it’s generated by an AI system actually, but there’s usually
01:55:57 no diagram, you have a transformer generating crank papers about AGI.
01:56:06 So the question is about, because you’ve been such a good, you have a good radar for crank
01:56:12 papers, how do we know they’re not onto something?
01:56:16 How do I, so when you start to talk about AGI or anything like the reasoning benchmarks
01:56:24 and so on, so something that doesn’t have a benchmark, it’s really difficult to know.
01:56:29 I mean, I talked to Jeff Hawkins, who’s really looking at neuroscience approaches to how,
01:56:35 and there’s some, there’s echoes of really interesting ideas in at least Jeff’s case,
01:56:41 which he’s showing.
01:56:43 How do you usually think about this?
01:56:46 Like preventing yourself from being too narrow minded and elitist about deep learning, it
01:56:52 has to work on these particular benchmarks, otherwise it’s trash.
01:56:56 Well, you know, the thing is, intelligence does not exist in the abstract.
01:57:05 Intelligence has to be applied.
01:57:07 So if you don’t have a benchmark, if you have an improvement in some benchmark, maybe it’s
01:57:11 a new benchmark, right?
01:57:12 Maybe it’s not something we’ve been looking at before, but you do need a problem that
01:57:16 you’re trying to solve.
01:57:17 You’re not going to come up with a solution without a problem.
01:57:20 So you, general intelligence, I mean, you’ve clearly highlighted generalization.
01:57:26 If you want to claim that you have an intelligence system, it should come with a benchmark.
01:57:31 It should, yes, it should display capabilities of some kind.
01:57:35 It should show that it can create some form of value, even if it’s a very artificial form
01:57:41 of value.
01:57:42 And that’s also the reason why you don’t actually need to care about telling which papers have
01:57:48 actually some hidden potential and which do not.
01:57:53 Because if there is a new technique that’s actually creating value, this is going to
01:57:59 be brought to light very quickly because it’s actually making a difference.
01:58:02 So it’s the difference between something that is ineffectual and something that is actually
01:58:08 useful.
01:58:08 And ultimately usefulness is our guide, not just in this field, but if you look at science
01:58:14 in general, maybe there are many, many people over the years that have had some really interesting
01:58:19 theories of everything, but they were just completely useless.
01:58:22 And you don’t actually need to tell the interesting theories from the useless theories.
01:58:28 All you need is to see, is this actually having an effect on something else?
01:58:34 Is this actually useful?
01:58:35 Is this making an impact or not?
01:58:37 That’s beautifully put.
01:58:38 I mean, the same applies to quantum mechanics, to string theory, to the holographic principle.
01:58:43 We are doing deep learning because it works.
01:58:46 Before it started working, people considered people working on neural networks as cranks
01:58:52 very much.
01:58:54 No one was working on this anymore.
01:58:56 And now it’s working, which is what makes it valuable.
01:58:59 It’s not about being right.
01:59:01 It’s about being effective.
01:59:02 And nevertheless, the individual entities of this scientific mechanism, just like Yoshua
01:59:08 Banjo or Jan Lekun, they, while being called cranks, stuck with it.
01:59:12 Right?
01:59:12 Yeah.
01:59:13 And so us individual agents, even if everyone’s laughing at us, just stick with it.
01:59:18 If you believe you have something, you should stick with it and see it through.
01:59:23 That’s a beautiful inspirational message to end on.
01:59:25 Francois, thank you so much for talking today.
01:59:27 That was amazing.
01:59:28 Thank you.