Transcript
00:00:00 The following is a conversation with Tommaso Poggio.
00:00:02 He’s a professor at MIT and is a director of the Center
00:00:06 for Brains, Minds, and Machines.
00:00:08 Cited over 100,000 times, his work
00:00:11 has had a profound impact on our understanding
00:00:14 of the nature of intelligence in both biological and artificial
00:00:18 neural networks.
00:00:19 He has been an advisor to many highly impactful researchers
00:00:23 and entrepreneurs in AI, including
00:00:26 Demis Hassabis of DeepMind, Amnon Shashua of Mobileye,
00:00:29 and Christoph Koch of the Allen Institute for Brain Science.
00:00:34 This conversation is part of the MIT course
00:00:36 on artificial general intelligence
00:00:38 and the artificial intelligence podcast.
00:00:40 If you enjoy it, subscribe on YouTube, iTunes,
00:00:42 or simply connect with me on Twitter
00:00:44 at Lex Friedman, spelled F R I D.
00:00:48 And now, here’s my conversation with Tommaso Poggio.
00:00:52 You’ve mentioned that in your childhood,
00:00:54 you’ve developed a fascination with physics, especially
00:00:57 the theory of relativity.
00:00:59 And that Einstein was also a childhood hero to you.
00:01:04 What aspect of Einstein’s genius, the nature of his genius,
00:01:09 do you think was essential for discovering
00:01:11 the theory of relativity?
00:01:12 You know, Einstein was a hero to me,
00:01:15 and I’m sure to many people, because he
00:01:18 was able to make, of course, a major, major contribution
00:01:23 to physics with simplifying a bit just a gedanken experiment,
00:01:31 a thought experiment, you know, imagining communication
00:01:37 with lights between a stationary observer
00:01:41 and somebody on a train.
00:01:43 And I thought, you know, the fact
00:01:47 that just with the force of his thought, of his thinking,
00:01:51 of his mind, he could get to something so deep
00:01:55 in terms of physical reality, how time
00:01:58 depend on space and speed, it was something
00:02:02 absolutely fascinating.
00:02:04 It was the power of intelligence,
00:02:06 the power of the mind.
00:02:08 Do you think the ability to imagine,
00:02:11 to visualize as he did, as a lot of great physicists do,
00:02:15 do you think that’s in all of us human beings?
00:02:18 Or is there something special to that one particular human
00:02:21 being?
00:02:22 I think, you know, all of us can learn and have, in principle,
00:02:30 similar breakthroughs.
00:02:33 There are lessons to be learned from Einstein.
00:02:37 He was one of five PhD students at ETA,
00:02:42 the Eidgenössische Technische Hochschule in Zurich,
00:02:47 in physics.
00:02:48 And he was the worst of the five,
00:02:50 the only one who did not get an academic position when
00:02:55 he graduated, when he finished his PhD.
00:02:57 And he went to work, as everybody knows,
00:03:01 for the patent office.
00:03:02 And so it’s not so much that he worked for the patent office,
00:03:05 but the fact that obviously he was smart,
00:03:08 but he was not a top student, obviously
00:03:11 was the anti conformist.
00:03:13 He was not thinking in the traditional way that probably
00:03:17 his teachers and the other students were doing.
00:03:20 So there is a lot to be said about trying
00:03:23 to do the opposite or something quite different from what
00:03:29 other people are doing.
00:03:31 That’s certainly true for the stock market.
00:03:32 Never buy if everybody’s buying.
00:03:36 And also true for science.
00:03:38 Yes.
00:03:39 So you’ve also mentioned, staying
00:03:42 on the theme of physics, that you were excited at a young age
00:03:47 by the mysteries of the universe that physics could uncover.
00:03:51 Such, as I saw mentioned, the possibility of time travel.
00:03:56 So the most out of the box question,
00:03:58 I think I’ll get to ask today, do you
00:04:00 think time travel is possible?
00:04:03 Well, it would be nice if it were possible right now.
00:04:07 In science, you never say no.
00:04:12 But your understanding of the nature of time.
00:04:15 Yeah.
00:04:15 It’s very likely that it’s not possible to travel in time.
00:04:22 We may be able to travel forward in time
00:04:26 if we can, for instance, freeze ourselves or go
00:04:31 on some spacecraft traveling close to the speed of light.
00:04:37 But in terms of actively traveling, for instance,
00:04:40 back in time, I find probably very unlikely.
00:04:45 So do you still hold the underlying dream
00:04:49 of the engineering intelligence that
00:04:52 will build systems that are able to do such huge leaps,
00:04:56 like discovering the kind of mechanism that would be
00:05:01 required to travel through time?
00:05:02 Do you still hold that dream or echoes of it
00:05:05 from your childhood?
00:05:07 Yeah.
00:05:08 I don’t think whether there are certain problems that probably
00:05:12 cannot be solved, depending what you believe
00:05:16 about the physical reality, like maybe totally impossible
00:05:21 to create energy from nothing or to travel back in time,
00:05:27 but about making machines that can think as well as we do
00:05:35 or better, or more likely, especially
00:05:38 in the short and midterm, help us think better,
00:05:42 which is, in a sense, is happening already
00:05:44 with the computers we have.
00:05:46 And it will happen more and more.
00:05:48 But that I certainly believe.
00:05:50 And I don’t see, in principle, why computers at some point
00:05:55 could not become more intelligent than we are,
00:05:59 although the word intelligence is a tricky one
00:06:03 and one we should discuss.
00:06:05 What I mean with that.
00:06:08 Intelligence, consciousness, words like love,
00:06:13 all these need to be disentangled.
00:06:16 So you’ve mentioned also that you believe
00:06:18 the problem of intelligence is the greatest problem
00:06:22 in science, greater than the origin of life
00:06:24 and the origin of the universe.
00:06:27 You’ve also, in the talk I’ve listened to,
00:06:30 said that you’re open to arguments against you.
00:06:34 So what do you think is the most captivating aspect
00:06:40 of this problem of understanding the nature of intelligence?
00:06:43 Why does it captivate you as it does?
00:06:47 Well, originally, I think one of the motivation
00:06:51 that I had as, I guess, a teenager when I was infatuated
00:06:56 with theory of relativity was really
00:06:59 that I found that there was the problem of time and space
00:07:05 and general relativity.
00:07:07 But there were so many other problems
00:07:10 of the same level of difficulty and importance
00:07:13 that I could, even if I were Einstein,
00:07:16 it was difficult to hope to solve all of them.
00:07:19 So what about solving a problem whose solution allowed
00:07:24 me to solve all the problems?
00:07:26 And this was, what if we could find the key to an intelligence
00:07:33 10 times better or faster than Einstein?
00:07:37 So that’s sort of seeing artificial intelligence
00:07:40 as a tool to expand our capabilities.
00:07:43 But is there just an inherent curiosity in you
00:07:47 in just understanding what it is in here
00:07:52 that makes it all work?
00:07:54 Yes, absolutely, you’re right.
00:07:55 So I started saying this was the motivation when
00:07:59 I was a teenager.
00:08:00 But soon after, I think the problem of human intelligence
00:08:07 became a real focus of my science and my research
00:08:15 because I think for me, the most interesting problem
00:08:22 is really asking who we are.
00:08:28 It’s asking not only a question about science,
00:08:31 but even about the very tool we are using to do science, which
00:08:36 is our brain.
00:08:37 How does our brain work?
00:08:39 From where does it come from?
00:08:42 What are its limitations?
00:08:43 Can we make it better?
00:08:46 And that, in many ways, is the ultimate question
00:08:50 that underlies this whole effort of science.
00:08:54 So you’ve made significant contributions
00:08:56 in both the science of intelligence
00:08:58 and the engineering of intelligence.
00:09:02 In a hypothetical way, let me ask,
00:09:05 how far do you think we can get in creating intelligence
00:09:07 systems without understanding the biological,
00:09:12 the understanding how the human brain creates intelligence?
00:09:15 Put another way, do you think we can
00:09:17 build a strong AI system without really getting at the core
00:09:22 understanding the functional nature of the brain?
00:09:25 Well, this is a real difficult question.
00:09:29 We did solve problems like flying
00:09:35 without really using too much our knowledge
00:09:40 about how birds fly.
00:09:44 It was important, I guess, to know that you could have
00:09:48 things heavier than air being able to fly, like birds.
00:09:56 But beyond that, probably we did not learn very much, some.
00:10:02 The Brothers Wright did learn a lot of observation
00:10:06 about birds and designing their aircraft.
00:10:12 But you can argue we did not use much of biology
00:10:16 in that particular case.
00:10:17 Now, in the case of intelligence,
00:10:20 I think that it’s a bit of a bet right now.
00:10:28 If you ask, OK, we all agree we’ll get at some point,
00:10:36 maybe soon, maybe later, to a machine that
00:10:39 is indistinguishable from my secretary,
00:10:42 say, in terms of what I can ask the machine to do.
00:10:47 I think we’ll get there.
00:10:49 And now the question is, you can ask people,
00:10:51 do you think we’ll get there without any knowledge
00:10:54 about the human brain?
00:10:57 Or that the best way to get there
00:10:59 is to understand better the human brain?
00:11:02 OK, this is, I think, an educated bet
00:11:05 that different people with different backgrounds
00:11:09 will decide in different ways.
00:11:11 The recent history of the progress
00:11:14 in AI in the last, I would say, five years or 10 years
00:11:18 has been that the main breakthroughs,
00:11:23 the main recent breakthroughs, really start from neuroscience.
00:11:32 I can mention reinforcement learning as one.
00:11:35 It’s one of the algorithms at the core of AlphaGo,
00:11:41 which is the system that beat the kind of an official world
00:11:45 champion of Go, Lee Sedol, two, three years ago in Seoul.
00:11:52 That’s one.
00:11:53 And that started really with the work of Pavlov in 1900,
00:12:00 Marvin Minsky in the 60s, and many other neuroscientists
00:12:05 later on.
00:12:07 And deep learning started, which is at the core, again,
00:12:12 of AlphaGo and systems like autonomous driving
00:12:17 systems for cars, like the systems that Mobileye,
00:12:22 which is a company started by one of my ex postdocs,
00:12:25 Amnon Shashua, did.
00:12:28 So that is at the core of those things.
00:12:30 And deep learning, really, the initial ideas
00:12:34 in terms of the architecture of these layered
00:12:37 hierarchical networks started with work of Torsten Wiesel
00:12:43 and David Hubel at Harvard up the river in the 60s.
00:12:47 So recent history suggests that neuroscience played a big role
00:12:53 in these breakthroughs.
00:12:54 My personal bet is that there is a good chance they continue
00:12:58 to play a big role.
00:12:59 Maybe not in all the future breakthroughs,
00:13:01 but in some of them.
00:13:03 At least in inspiration.
00:13:05 At least in inspiration, absolutely, yes.
00:13:07 So you studied both artificial and biological neural networks.
00:13:12 You said these mechanisms that underlie deep learning
00:13:17 and reinforcement learning.
00:13:19 But there is nevertheless significant differences
00:13:23 between biological and artificial neural networks
00:13:26 as they stand now.
00:13:27 So between the two, what do you find
00:13:30 is the most interesting, mysterious, maybe even
00:13:33 beautiful difference as it currently
00:13:35 stands in our understanding?
00:13:37 I must confess that until recently, I
00:13:41 found that the artificial networks, too simplistic
00:13:46 relative to real neural networks.
00:13:49 But recently, I’ve been starting to think that, yes,
00:13:53 there is a very big simplification of what
00:13:57 you find in the brain.
00:13:59 But on the other hand, they are much closer
00:14:03 in terms of the architecture to the brain
00:14:07 than other models that we had, that computer science used
00:14:11 as model of thinking, which were mathematical logics, LISP,
00:14:16 Prologue, and those kind of things.
00:14:19 So in comparison to those, they’re
00:14:21 much closer to the brain.
00:14:23 You have networks of neurons, which
00:14:26 is what the brain is about.
00:14:27 And the artificial neurons in the models, as I said,
00:14:32 caricature of the biological neurons.
00:14:35 But they’re still neurons, single units communicating
00:14:38 with other units, something that is absent
00:14:41 in the traditional computer type models of mathematics,
00:14:48 reasoning, and so on.
00:14:50 So what aspect would you like to see
00:14:53 in artificial neural networks added over time
00:14:57 as we try to figure out ways to improve them?
00:14:59 So one of the main differences and problems
00:15:07 in terms of deep learning today, and it’s not only
00:15:11 deep learning, and the brain, is the need for deep learning
00:15:16 techniques to have a lot of labeled examples.
00:15:23 For instance, for ImageNet, you have
00:15:24 like a training set, which is 1 million images, each one
00:15:29 labeled by some human in terms of which object is there.
00:15:34 And it’s clear that in biology, a baby
00:15:42 may be able to see millions of images
00:15:44 in the first years of life, but will not
00:15:47 have millions of labels given to him or her by parents
00:15:52 or caretakers.
00:15:56 So how do you solve that?
00:15:59 I think there is this interesting challenge
00:16:03 that today, deep learning and related techniques
00:16:08 are all about big data, big data meaning
00:16:11 a lot of examples labeled by humans,
00:16:18 whereas in nature, you have this big data
00:16:24 is n going to infinity.
00:16:26 That’s the best, n meaning labeled data.
00:16:30 But I think the biological world is more n going to 1.
00:16:34 A child can learn from a very small number
00:16:38 of labeled examples.
00:16:42 Like you tell a child, this is a car.
00:16:44 You don’t need to say, like in ImageNet, this is a car,
00:16:48 this is a car, this is not a car, this is not a car,
00:16:51 1 million times.
00:16:54 And of course, with AlphaGo, or at least the AlphaZero
00:16:57 variants, because the world of Go
00:17:01 is so simplistic that you can actually
00:17:05 learn by yourself through self play,
00:17:06 you can play against each other.
00:17:08 In the real world, the visual system
00:17:10 that you’ve studied extensively is a lot more complicated
00:17:14 than the game of Go.
00:17:16 On the comment about children, which
00:17:18 are fascinatingly good at learning new stuff,
00:17:23 how much of it do you think is hardware,
00:17:24 and how much of it is software?
00:17:26 Yeah, that’s a good, deep question.
00:17:29 In a sense, it’s the old question of nurture and nature,
00:17:32 how much is in the gene, and how much
00:17:36 is in the experience of an individual.
00:17:41 Obviously, it’s both that play a role.
00:17:44 And I believe that the way evolution gives,
00:17:53 puts prior information, so to speak, hardwired,
00:17:55 is not really hardwired.
00:17:58 But that’s essentially an hypothesis.
00:18:02 I think what’s going on is that evolution has almost
00:18:10 necessarily, if you believe in Darwin, is very opportunistic.
00:18:14 And think about our DNA and the DNA of Drosophila.
00:18:24 Our DNA does not have many more genes than Drosophila.
00:18:28 The fly.
00:18:29 The fly, the fruit fly.
00:18:32 Now, we know that the fruit fly does not
00:18:35 learn very much during its individual existence.
00:18:39 It looks like one of these machinery
00:18:42 that it’s really mostly, not 100%, but 95%,
00:18:47 hardcoded by the genes.
00:18:51 But since we don’t have many more genes than Drosophila,
00:18:55 evolution could encode in as a general learning machinery,
00:19:02 and then had to give very weak priors.
00:19:09 Like, for instance, let me give a specific example,
00:19:15 which is recent work by a member of our Center for Brains,
00:19:18 Minds, and Machines.
00:19:20 We know because of work of other people in our group
00:19:24 and other groups, that there are cells
00:19:26 in a part of our brain, neurons, that are tuned to faces.
00:19:31 They seem to be involved in face recognition.
00:19:33 Now, this face area seems to be present in young children
00:19:41 and adults.
00:19:44 And one question is, is there from the beginning?
00:19:48 Is hardwired by evolution?
00:19:51 Or somehow it’s learned very quickly.
00:19:55 So what’s your, by the way, a lot of the questions I’m asking,
00:19:58 the answer is we don’t really know.
00:20:00 But as a person who has contributed
00:20:04 some profound ideas in these fields,
00:20:06 you’re a good person to guess at some of these.
00:20:08 So of course, there’s a caveat before a lot of the stuff
00:20:11 we talk about.
00:20:11 But what is your hunch?
00:20:14 Is the face, the part of the brain
00:20:16 that seems to be concentrated on face recognition,
00:20:20 are you born with that?
00:20:21 Or you just is designed to learn that quickly,
00:20:25 like the face of the mother and so on?
00:20:26 My hunch, my bias was the second one, learned very quickly.
00:20:32 And it turns out that Marge Livingstone at Harvard
00:20:37 has done some amazing experiments in which she raised
00:20:41 baby monkeys, depriving them of faces
00:20:45 during the first weeks of life.
00:20:48 So they see technicians, but the technician have a mask.
00:20:53 Yes.
00:20:55 And so when they looked at the area
00:21:02 in the brain of these monkeys that were usually
00:21:05 defined faces, they found no face preference.
00:21:10 So my guess is that what evolution does in this case
00:21:16 is there is a plastic area, which
00:21:19 is plastic, which is kind of predetermined
00:21:22 to be imprinted very easily.
00:21:26 But the command from the gene is not a detailed circuitry
00:21:30 for a face template.
00:21:32 Could be, but this will require probably a lot of bits.
00:21:36 You had to specify a lot of connection of a lot of neurons.
00:21:39 Instead, the command from the gene
00:21:42 is something like imprint, memorize what you see most
00:21:47 often in the first two weeks of life,
00:21:49 especially in connection with food and maybe nipples.
00:21:53 I don’t know.
00:21:54 Well, source of food.
00:21:55 And so that area is very plastic at first and then solidifies.
00:22:00 It’d be interesting if a variant of that experiment
00:22:03 would show a different kind of pattern associated
00:22:06 with food than a face pattern, whether that could stick.
00:22:10 There are indications that during that experiment,
00:22:14 what the monkeys saw quite often were
00:22:19 the blue gloves of the technicians that were giving
00:22:23 to the baby monkeys the milk.
00:22:25 And some of the cells, instead of being face sensitive
00:22:29 in that area, are hand sensitive.
00:22:33 That’s fascinating.
00:22:35 Can you talk about what are the different parts of the brain
00:22:40 and, in your view, sort of loosely,
00:22:43 and how do they contribute to intelligence?
00:22:45 Do you see the brain as a bunch of different modules,
00:22:49 and they together come in the human brain
00:22:52 to create intelligence?
00:22:53 Or is it all one mush of the same kind
00:22:59 of fundamental architecture?
00:23:04 Yeah, that’s an important question.
00:23:08 And there was a phase in neuroscience back in the 1950
00:23:15 or so in which it was believed for a while
00:23:19 that the brain was equipotential.
00:23:21 This was the term.
00:23:22 You could cut out a piece, and nothing special
00:23:28 happened apart a little bit less performance.
00:23:32 There was a surgeon, Lashley, who
00:23:37 did a lot of experiments of this type with mice and rats
00:23:41 and concluded that every part of the brain
00:23:45 was essentially equivalent to any other one.
00:23:51 It turns out that that’s really not true.
00:23:56 There are very specific modules in the brain, as you said.
00:24:00 And people may lose the ability to speak
00:24:05 if you have a stroke in a certain region,
00:24:07 or may lose control of their legs in another region.
00:24:12 So they’re very specific.
00:24:14 The brain is also quite flexible and redundant,
00:24:17 so often it can correct things and take over functions
00:24:27 from one part of the brain to the other.
00:24:29 But really, there are specific modules.
00:24:33 So the answer that we know from this old work, which
00:24:40 was basically based on lesions, either on animals,
00:24:44 or very often there was a mine of very interesting data
00:24:52 coming from the war, from different types of injuries
00:25:00 that soldiers had in the brain.
00:25:03 And more recently, functional MRI,
00:25:09 which allow you to check which part of the brain
00:25:13 are active when you are doing different tasks,
00:25:21 can replace some of this.
00:25:23 You can see that certain parts of the brain are involved,
00:25:27 are active in certain tasks.
00:25:29 Vision, language, yeah, that’s right.
00:25:32 But sort of taking a step back to that part of the brain
00:25:36 that discovers that specializes in the face
00:25:39 and how that might be learned, what’s your intuition behind?
00:25:45 Is it possible that from a physicist perspective,
00:25:48 when you get lower and lower, that it’s all the same stuff
00:25:51 and it just, when you’re born, it’s plastic
00:25:54 and quickly figures out this part is going to be about vision,
00:25:58 this is going to be about language,
00:25:59 this is about common sense reasoning.
00:26:02 Do you have an intuition that that kind of learning
00:26:05 is going on really quickly, or is it really
00:26:07 kind of solidified in hardware?
00:26:09 That’s a great question.
00:26:11 So there are parts of the brain like the cerebellum
00:26:16 or the hippocampus that are quite different from each other.
00:26:21 They clearly have different anatomy,
00:26:23 different connectivity.
00:26:26 Then there is the cortex, which is the most developed part
00:26:33 of the brain in humans.
00:26:36 And in the cortex, you have different regions
00:26:39 of the cortex that are responsible for vision,
00:26:43 for audition, for motor control, for language.
00:26:47 Now, one of the big puzzles of this
00:26:50 is that in the cortex is the cortex is the cortex.
00:26:55 Looks like it is the same in terms of hardware,
00:27:00 in terms of type of neurons and connectivity
00:27:05 across these different modalities.
00:27:08 So for the cortex, I think aside these other parts
00:27:13 of the brain like spinal cord, hippocampus,
00:27:15 cerebellum, and so on, for the cortex,
00:27:18 I think your question about hardware and software
00:27:21 and learning and so on, I think is rather open.
00:27:28 And I find it very interesting for Risa
00:27:33 to think about an architecture, computer architecture, that
00:27:36 is good for vision and at the same time is good for language.
00:27:41 Seems to be so different problem areas that you have to solve.
00:27:49 But the underlying mechanism might be the same.
00:27:51 And that’s really instructive for artificial neural networks.
00:27:55 So we’ve done a lot of great work in vision,
00:27:58 in human vision, computer vision.
00:28:01 And you mentioned the problem of human vision
00:28:03 is really as difficult as the problem of general intelligence.
00:28:07 And maybe that connects to the cortex discussion.
00:28:11 Can you describe the human visual cortex
00:28:15 and how the humans begin to understand the world
00:28:20 through the raw sensory information?
00:28:22 What’s, for folks who are not familiar,
00:28:27 especially on the computer vision side,
00:28:30 we don’t often actually take a step back except saying
00:28:33 with a sentence or two that one is inspired by the other.
00:28:36 What is it that we know about the human visual cortex?
00:28:40 That’s interesting.
00:28:40 We know quite a bit.
00:28:41 At the same time, we don’t know a lot.
00:28:43 But the bit we know, in a sense, we know a lot of the details.
00:28:50 And many we don’t know.
00:28:53 And we know a lot of the top level,
00:28:58 the answer to the top level question.
00:29:00 But we don’t know some basic ones,
00:29:02 even in terms of general neuroscience, forgetting vision.
00:29:06 Why do we sleep?
00:29:08 It’s such a basic question.
00:29:11 And we really don’t have an answer to that.
00:29:15 So taking a step back on that.
00:29:17 So sleep, for example, is fascinating.
00:29:18 Do you think that’s a neuroscience question?
00:29:22 Or if we talk about abstractions, what do you
00:29:25 think is an interesting way to study intelligence
00:29:28 or most effective on the levels of abstraction?
00:29:30 Is it chemical, is it biological,
00:29:33 is it electrophysical, mathematical,
00:29:35 as you’ve done a lot of excellent work on that side?
00:29:37 Which psychology, at which level of abstraction do you think?
00:29:43 Well, in terms of levels of abstraction,
00:29:46 I think we need all of them.
00:29:50 It’s like if you ask me, what does it
00:29:54 mean to understand a computer?
00:29:57 That’s much simpler.
00:29:58 But in a computer, I could say, well,
00:30:01 I understand how to use PowerPoint.
00:30:04 That’s my level of understanding a computer.
00:30:08 It is reasonable.
00:30:09 It gives me some power to produce slides
00:30:11 and beautiful slides.
00:30:14 Now, you can ask somebody else.
00:30:17 He says, well, I know how the transistors work
00:30:19 that are inside the computer.
00:30:21 I can write the equation for transistor and diodes
00:30:25 and circuits, logical circuits.
00:30:29 And I can ask this guy, do you know how to operate PowerPoint?
00:30:32 No idea.
00:30:34 So do you think if we discovered computers walking amongst us
00:30:39 full of these transistors that are also operating
00:30:43 under windows and have PowerPoint,
00:30:45 do you think it’s digging in a little bit more?
00:30:49 How useful is it to understand the transistor in order
00:30:53 to be able to understand PowerPoint
00:30:58 and these higher level intelligent processes?
00:31:00 So I think in the case of computers,
00:31:03 because they were made by engineers, by us,
00:31:06 this different level of understanding
00:31:09 are rather separate on purpose.
00:31:13 They are separate modules so that the engineer that
00:31:17 designed the circuit for the chips does not
00:31:19 need to know what is inside PowerPoint.
00:31:23 And somebody can write the software translating
00:31:27 from one to the other.
00:31:30 So in that case, I don’t think understanding the transistor
00:31:36 helps you understand PowerPoint, or very little.
00:31:41 If you want to understand the computer, this question,
00:31:43 I would say you have to understand it
00:31:45 at different levels.
00:31:46 If you really want to build one, right?
00:31:51 But for the brain, I think these levels of understanding,
00:31:57 so the algorithms, which kind of computation,
00:32:00 the equivalent of PowerPoint, and the circuits,
00:32:04 the transistors, I think they are much more
00:32:07 intertwined with each other.
00:32:09 There is not a neatly level of the software separate
00:32:14 from the hardware.
00:32:15 And so that’s why I think in the case of the brain,
00:32:20 the problem is more difficult and more than for computers
00:32:23 requires the interaction, the collaboration
00:32:26 between different types of expertise.
00:32:30 The brain is a big hierarchical mess.
00:32:32 You can’t just disentangle levels.
00:32:35 I think you can, but it’s much more difficult.
00:32:37 And it’s not completely obvious.
00:32:40 And as I said, I think it’s one of the, personally,
00:32:44 I think is the greatest problem in science.
00:32:47 So I think it’s fair that it’s difficult.
00:32:51 That’s a difficult one.
00:32:53 That said, you do talk about compositionality
00:32:56 and why it might be useful.
00:32:58 And when you discuss why these neural networks,
00:33:01 in artificial or biological sense, learn anything,
00:33:05 you talk about compositionality.
00:33:07 See, there’s a sense that nature can be disentangled.
00:33:13 Or, well, all aspects of our cognition
00:33:19 could be disentangled to some degree.
00:33:22 So why do you think, first of all,
00:33:25 how do you see compositionality?
00:33:27 And why do you think it exists at all in nature?
00:33:31 I spoke about, I use the term compositionality
00:33:39 when we looked at deep neural networks, multilayers,
00:33:45 and trying to understand when and why they are more powerful
00:33:50 than more classical one layer networks,
00:33:54 like linear classifier, kernel machines, so called.
00:34:01 And what we found is that in terms
00:34:05 of approximating or learning or representing
00:34:08 a function, a mapping from an input to an output,
00:34:12 like from an image to the label in the image,
00:34:16 if this function has a particular structure,
00:34:20 then deep networks are much more powerful than shallow networks
00:34:26 to approximate the underlying function.
00:34:28 And the particular structure is a structure of compositionality.
00:34:33 If the function is made up of functions of function,
00:34:38 so that you need to look on when you are interpreting an image,
00:34:45 classifying an image, you don’t need
00:34:47 to look at all pixels at once.
00:34:51 But you can compute something from small groups of pixels.
00:34:57 And then you can compute something
00:34:59 on the output of this local computation and so on,
00:35:04 which is similar to what you do when you read a sentence.
00:35:07 You don’t need to read the first and the last letter.
00:35:11 But you can read syllables, combine them in words,
00:35:16 combine the words in sentences.
00:35:18 So this is this kind of structure.
00:35:21 So that’s as part of a discussion
00:35:22 of why deep neural networks may be more
00:35:26 effective than the shallow methods.
00:35:27 And is your sense, for most things
00:35:31 we can use neural networks for, those problems
00:35:37 are going to be compositional in nature, like language,
00:35:42 like vision?
00:35:44 How far can we get in this kind of way?
00:35:47 So here is almost philosophy.
00:35:51 Well, let’s go there.
00:35:53 Yeah, let’s go there.
00:35:54 So a friend of mine, Max Tegmark, who is a physicist at MIT.
00:36:00 I’ve talked to him on this thing.
00:36:01 Yeah, and he disagrees with you, right?
00:36:03 A little bit.
00:36:04 Yeah, we agree on most.
00:36:07 But the conclusion is a bit different.
00:36:10 His conclusion is that for images, for instance,
00:36:14 the compositional structure of this function
00:36:19 that we have to learn or to solve these problems
00:36:23 comes from physics, comes from the fact
00:36:27 that you have local interactions in physics
00:36:31 between atoms and other atoms, between particle
00:36:37 of matter and other particles, between planets
00:36:41 and other planets, between stars and other.
00:36:44 It’s all local.
00:36:48 And that’s true.
00:36:51 But you could push this argument a bit further.
00:36:56 Not this argument, actually.
00:36:57 You could argue that maybe that’s part of the truth.
00:37:02 But maybe what happens is kind of the opposite,
00:37:06 is that our brain is wired up as a deep network.
00:37:11 So it can learn, understand, solve
00:37:18 problems that have this compositional structure
00:37:22 and it cannot solve problems that don’t have
00:37:27 this compositional structure.
00:37:29 So the problems we are accustomed to, we think about,
00:37:34 we test our algorithms on, are this compositional structure
00:37:40 because our brain is made up.
00:37:42 And that’s, in a sense, an evolutionary perspective
00:37:45 that we’ve.
00:37:46 So the ones that didn’t have, that weren’t
00:37:50 dealing with the compositional nature of reality died off?
00:37:55 Yes, but also could be maybe the reason
00:38:00 why we have this local connectivity in the brain,
00:38:05 like simple cells in cortex looking
00:38:08 only at the small part of the image, each one of them,
00:38:11 and then other cells looking at the small number
00:38:14 of these simple cells and so on.
00:38:16 The reason for this may be purely
00:38:19 that it was difficult to grow long range connectivity.
00:38:25 So suppose it’s for biology.
00:38:28 It’s possible to grow short range connectivity but not
00:38:34 long range also because there is a limited number of long range
00:38:38 that you can.
00:38:39 And so you have this limitation from the biology.
00:38:45 And this means you build a deep convolutional network.
00:38:50 This would be something like a deep convolutional network.
00:38:53 And this is great for solving certain class of problems.
00:38:57 These are the ones we find easy and important for our life.
00:39:02 And yes, they were enough for us to survive.
00:39:07 And you can start a successful business
00:39:10 on solving those problems with Mobileye.
00:39:14 Driving is a compositional problem.
00:39:17 So on the learning task, we don’t
00:39:21 know much about how the brain learns
00:39:24 in terms of optimization.
00:39:26 So the thing that’s stochastic gradient descent
00:39:29 is what artificial neural networks use for the most part
00:39:33 to adjust the parameters in such a way that it’s
00:39:37 able to deal based on the label data,
00:39:40 it’s able to solve the problem.
00:39:42 So what’s your intuition about why it works at all?
00:39:50 How hard of a problem it is to optimize
00:39:53 a neural network, artificial neural network?
00:39:56 Is there other alternatives?
00:39:58 Just in general, your intuition is
00:40:01 behind this very simplistic algorithm
00:40:03 that seems to do pretty good, surprisingly so.
00:40:06 Yes.
00:40:07 So I find neuroscience, the architecture of cortex,
00:40:13 is really similar to the architecture of deep networks.
00:40:17 So there is a nice correspondence there
00:40:20 between the biology and this kind
00:40:23 of local connectivity, hierarchical architecture.
00:40:28 The stochastic gradient descent, as you said,
00:40:30 is a very simple technique.
00:40:35 It seems pretty unlikely that biology could do that
00:40:41 from what we know right now about cortex and neurons
00:40:47 and synapses.
00:40:50 So it’s a big question open whether there
00:40:53 are other optimization learning algorithms that
00:40:59 can replace stochastic gradient descent.
00:41:02 And my guess is yes, but nobody has found yet a real answer.
00:41:11 I mean, people are trying, still trying,
00:41:13 and there are some interesting ideas.
00:41:18 The fact that stochastic gradient descent
00:41:22 is so successful, this has become clearly not so
00:41:26 mysterious.
00:41:27 And the reason is that it’s an interesting fact.
00:41:33 It’s a change, in a sense, in how
00:41:36 people think about statistics.
00:41:39 And this is the following, is that typically when
00:41:45 you had data and you had, say, a model with parameters,
00:41:51 you are trying to fit the model to the data,
00:41:54 to fit the parameter.
00:41:55 Typically, the kind of crowd wisdom type idea
00:42:04 was you should have at least twice the number of data
00:42:09 than the number of parameters.
00:42:12 Maybe 10 times is better.
00:42:15 Now, the way you train neural networks these days
00:42:19 is that they have 10 or 100 times more parameters
00:42:23 than data, exactly the opposite.
00:42:26 And it has been one of the puzzles about neural networks.
00:42:34 How can you get something that really works
00:42:37 when you have so much freedom?
00:42:40 From that little data, it can generalize somehow.
00:42:43 Right, exactly.
00:42:44 Do you think the stochastic nature of it
00:42:46 is essential, the randomness?
00:42:48 So I think we have some initial understanding
00:42:50 why this happens.
00:42:52 But one nice side effect of having
00:42:56 this overparameterization, more parameters than data,
00:43:00 is that when you look for the minima of a loss function,
00:43:04 like stochastic gradient descent is doing,
00:43:08 you find I made some calculations based
00:43:12 on some old basic theorem of algebra called the Bezu
00:43:19 theorem that gives you an estimate of the number
00:43:23 of solution of a system of polynomial equation.
00:43:25 Anyway, the bottom line is that there are probably
00:43:30 more minima for a typical deep networks
00:43:36 than atoms in the universe.
00:43:39 Just to say, there are a lot because
00:43:42 of the overparameterization.
00:43:44 A more global minimum, zero minimum, good minimum.
00:43:50 A more global minima.
00:43:51 Yeah, a lot of them.
00:43:53 So you have a lot of solutions.
00:43:54 So it’s not so surprising that you can find them
00:43:57 relatively easily.
00:44:00 And this is because of the overparameterization.
00:44:04 The overparameterization sprinkles that entire space
00:44:07 with solutions that are pretty good.
00:44:09 It’s not so surprising, right?
00:44:11 It’s like if you have a system of linear equation
00:44:14 and you have more unknowns than equations, then you have,
00:44:18 we know, you have an infinite number of solutions.
00:44:22 And the question is to pick one.
00:44:24 That’s another story.
00:44:25 But you have an infinite number of solutions.
00:44:27 So there are a lot of value of your unknowns
00:44:31 that satisfy the equations.
00:44:33 But it’s possible that there’s a lot of those solutions that
00:44:36 aren’t very good.
00:44:37 What’s surprising is that they’re pretty good.
00:44:39 So that’s a good question.
00:44:40 Why can you pick one that generalizes well?
00:44:42 Yeah.
00:44:44 That’s a separate question with separate answers.
00:44:47 One theorem that people like to talk about that kind of
00:44:51 inspires imagination of the power of neural networks
00:44:53 is the universality, universal approximation theorem,
00:44:57 that you can approximate any computable function
00:45:00 with just a finite number of neurons
00:45:02 in a single hidden layer.
00:45:04 Do you find this theorem one surprising?
00:45:07 Do you find it useful, interesting, inspiring?
00:45:12 No, this one, I never found it very surprising.
00:45:16 It was known since the 80s, since I entered the field,
00:45:22 because it’s basically the same as Weierstrass theorem, which
00:45:27 says that I can approximate any continuous function
00:45:32 with a polynomial of sufficiently,
00:45:34 with a sufficient number of terms, monomials.
00:45:38 So basically the same.
00:45:39 And the proofs are very similar.
00:45:41 So your intuition was there was never
00:45:43 any doubt that neural networks in theory
00:45:45 could be very strong approximators.
00:45:48 Right.
00:45:48 The question, the interesting question,
00:45:50 is that if this theorem says you can approximate, fine.
00:45:58 But when you ask how many neurons, for instance,
00:46:03 or in the case of polynomial, how many monomials,
00:46:06 I need to get a good approximation.
00:46:11 Then it turns out that that depends
00:46:15 on the dimensionality of your function,
00:46:18 how many variables you have.
00:46:20 But it depends on the dimensionality
00:46:22 of your function in a bad way.
00:46:25 It’s, for instance, suppose you want
00:46:28 an error which is no worse than 10% in your approximation.
00:46:35 You come up with a network that approximate your function
00:46:38 within 10%.
00:46:40 Then it turns out that the number of units you need
00:46:44 are in the order of 10 to the dimensionality, d,
00:46:48 how many variables.
00:46:50 So if you have two variables, these two words,
00:46:54 you have 100 units and OK.
00:46:57 But if you have, say, 200 by 200 pixel images,
00:47:02 now this is 40,000, whatever.
00:47:06 We again go to the size of the universe pretty quickly.
00:47:09 Exactly, 10 to the 40,000 or something.
00:47:14 And so this is called the curse of dimensionality,
00:47:18 not quite appropriately.
00:47:22 And the hope is with the extra layers,
00:47:24 you can remove the curse.
00:47:28 What we proved is that if you have deep layers,
00:47:32 hierarchical architecture with the local connectivity
00:47:36 of the type of convolutional deep learning,
00:47:39 and if you’re dealing with a function that
00:47:42 has this kind of hierarchical architecture,
00:47:46 then you avoid completely the curse.
00:47:50 You’ve spoken a lot about supervised deep learning.
00:47:54 What are your thoughts, hopes, views
00:47:56 on the challenges of unsupervised learning
00:47:59 with GANs, with Generative Adversarial Networks?
00:48:05 Do you see those as distinct?
00:48:08 The power of GANs, do you see those
00:48:09 as distinct from supervised methods in neural networks,
00:48:13 or are they really all in the same representation ballpark?
00:48:16 GANs is one way to get estimation of probability
00:48:24 densities, which is a somewhat new way that people have not
00:48:28 done before.
00:48:30 I don’t know whether this will really play an important role
00:48:36 in intelligence.
00:48:39 Or it’s interesting.
00:48:43 I’m less enthusiastic about it than many people in the field.
00:48:48 I have the feeling that many people in the field
00:48:50 are really impressed by the ability
00:48:54 of producing realistic looking images in this generative way.
00:49:01 Which describes the popularity of the methods.
00:49:03 But you’re saying that while that’s exciting and cool
00:49:06 to look at, it may not be the tool that’s useful for it.
00:49:11 So you describe it kind of beautifully.
00:49:13 Current supervised methods go n to infinity
00:49:16 in terms of number of labeled points.
00:49:18 And we really have to figure out how to go to n to 1.
00:49:21 And you’re thinking GANs might help,
00:49:23 but they might not be the right.
00:49:25 I don’t think for that problem, which I really think
00:49:28 is important, I think they may help.
00:49:32 They certainly have applications,
00:49:33 for instance, in computer graphics.
00:49:35 And I did work long ago, which was
00:49:41 a little bit similar in terms of saying, OK, I have a network.
00:49:47 And I present images.
00:49:49 And I can input its images.
00:49:54 And output is, for instance, the pose of the image.
00:49:57 A face, how much is smiling, is rotated 45 degrees or not.
00:50:02 What about having a network that I train with the same data
00:50:07 set, but now I invert input and output.
00:50:10 Now the input is the pose or the expression, a number,
00:50:15 set of numbers.
00:50:16 And the output is the image.
00:50:18 And I train it.
00:50:20 And we did pretty good, interesting results
00:50:22 in terms of producing very realistic looking images.
00:50:27 It was a less sophisticated mechanism.
00:50:31 But the output was pretty less than GANs.
00:50:35 But the output was pretty much of the same quality.
00:50:38 So I think for a computer graphics type application,
00:50:43 yeah, definitely GANs can be quite useful.
00:50:46 And not only for that, but for helping,
00:50:52 for instance, on this problem of unsupervised example
00:50:58 of reducing the number of labeled examples.
00:51:02 I think people, it’s like they think they can get out
00:51:07 more than they put in.
00:51:11 There’s no free lunch, as you said.
00:51:14 What do you think, what’s your intuition?
00:51:17 How can we slow the growth of N to infinity in supervised,
00:51:22 N to infinity in supervised learning?
00:51:25 So for example, Mobileye has very successfully,
00:51:29 I mean, essentially annotated large amounts of data
00:51:33 to be able to drive a car.
00:51:34 Now one thought is, so we’re trying
00:51:37 to teach machines, school of AI.
00:51:41 And we’re trying to, so how can we become better teachers,
00:51:45 maybe?
00:51:46 That’s one way.
00:51:47 No, I like that.
00:51:51 Because again, one caricature of the history of computer
00:51:57 science, you could say, begins with programmers, expensive.
00:52:05 Continuous labelers, cheap.
00:52:09 And the future will be schools, like we have for kids.
00:52:14 Yeah.
00:52:16 Currently, the labeling methods were not
00:52:20 selective about which examples we teach networks with.
00:52:25 So I think the focus of making networks that learn much faster
00:52:31 is often on the architecture side.
00:52:33 But how can we pick better examples with which to learn?
00:52:37 Do you have intuitions about that?
00:52:39 Well, that’s part of the problem.
00:52:42 But the other one is, if we look at biology,
00:52:50 a reasonable assumption, I think,
00:52:52 is in the same spirit that I said,
00:52:58 evolution is opportunistic and has weak priors.
00:53:03 The way I think the intelligence of a child,
00:53:08 the baby may develop is by bootstrapping weak priors
00:53:16 from evolution.
00:53:17 For instance, you can assume that you
00:53:24 have in most organisms, including human babies,
00:53:28 built in some basic machinery to detect motion
00:53:35 and relative motion.
00:53:38 And in fact, we know all insects from fruit flies
00:53:42 to other animals, they have this,
00:53:49 even in the retinas, in the very peripheral part.
00:53:53 It’s very conserved across species, something
00:53:56 that evolution discovered early.
00:53:59 It may be the reason why babies tend
00:54:01 to look in the first few days to moving objects
00:54:06 and not to not moving objects.
00:54:08 Now, moving objects means, OK, they’re attracted by motion.
00:54:12 But motion also means that motion
00:54:15 gives automatic segmentation from the background.
00:54:20 So because of motion boundaries, either the object
00:54:25 is moving or the eye of the baby is tracking the moving object
00:54:30 and the background is moving, right?
00:54:32 Yeah, so just purely on the visual characteristics
00:54:36 of the scene, that seems to be the most useful.
00:54:37 Right, so it’s like looking at an object without background.
00:54:43 It’s ideal for learning the object.
00:54:45 Otherwise, it’s really difficult because you
00:54:48 have so much stuff.
00:54:50 So suppose you do this at the beginning, first weeks.
00:54:55 Then after that, you can recognize object.
00:54:58 Now they are imprinted, the number one,
00:55:02 even in the background, even without motion.
00:55:05 So that’s, by the way, I just want
00:55:08 to ask on the object recognition problem.
00:55:10 So there is this being responsive to movement
00:55:13 and doing edge detection, essentially.
00:55:16 What’s the gap between being effective at visually
00:55:21 recognizing stuff, detecting where it is,
00:55:24 and understanding the scene?
00:55:27 Is this a huge gap in many layers, or is it close?
00:55:32 No, I think that’s a huge gap.
00:55:35 I think present algorithm with all the success that we have
00:55:42 and the fact that there are a lot of very useful,
00:55:45 I think we are in a golden age for applications
00:55:48 of low level vision and low level speech recognition
00:55:53 and so on, Alexa and so on.
00:55:56 There are many more things of similar level
00:55:58 to be done, including medical diagnosis and so on.
00:56:02 But we are far from what we call understanding
00:56:05 of a scene, of language, of actions, of people.
00:56:11 That is, despite the claims, that’s, I think, very far.
00:56:18 We’re a little bit off.
00:56:19 So in popular culture and among many researchers,
00:56:23 some of which I’ve spoken with, the Stuart Russell
00:56:25 and Elon Musk, in and out of the AI field,
00:56:30 there’s a concern about the existential threat of AI.
00:56:34 And how do you think about this concern?
00:56:40 And is it valuable to think about large scale, long term,
00:56:45 unintended consequences of intelligent systems
00:56:50 we try to build?
00:56:51 I always think it’s better to worry first, early,
00:56:56 rather than late.
00:56:58 So worry is good.
00:56:59 Yeah.
00:57:00 I’m not against worrying at all.
00:57:03 Personally, I think that it will take a long time
00:57:09 before there is real reason to be worried.
00:57:15 But as I said, I think it’s good to put in place
00:57:19 and think about possible safety against.
00:57:24 What I find a bit misleading are things
00:57:27 like that have been said by people I know, like Elon Musk,
00:57:31 and what is Bostrom in particular,
00:57:35 and what is his first name?
00:57:36 Nick Bostrom.
00:57:37 Nick Bostrom, right.
00:57:40 And a couple of other people that, for instance, AI
00:57:44 is more dangerous than nuclear weapons.
00:57:46 I think that’s really wrong.
00:57:50 That can be misleading.
00:57:52 Because in terms of priority, we should still
00:57:56 be more worried about nuclear weapons
00:57:59 and what people are doing about it and so on than AI.
00:58:05 And you’ve spoken about Demis Hassabis
00:58:09 and yourself saying that you think
00:58:12 you’ll be about 100 years out before we
00:58:16 have a general intelligence system that’s
00:58:18 on par with a human being.
00:58:20 Do you have any updates for those predictions?
00:58:22 Well, I think he said.
00:58:24 He said 20, I think.
00:58:25 He said 20, right.
00:58:26 This was a couple of years ago.
00:58:27 I have not asked him again.
00:58:29 So should I have?
00:58:31 Your own prediction, what’s your prediction
00:58:36 about when you’ll be truly surprised?
00:58:38 And what’s the confidence interval on that?
00:58:43 It’s so difficult to predict the future and even
00:58:45 the present sometimes.
00:58:47 It’s pretty hard to predict.
00:58:48 But I would be, as I said, this is completely,
00:58:53 I would be more like Rod Brooks.
00:58:56 I think he’s about 200 years.
00:58:58 200 years.
00:59:01 When we have this kind of AGI system,
00:59:04 artificial general intelligence system,
00:59:06 you’re sitting in a room with her, him, it.
00:59:12 Do you think the underlying design of such a system
00:59:17 is something we’ll be able to understand?
00:59:19 It will be simple?
00:59:20 Do you think it’ll be explainable,
00:59:25 understandable by us?
00:59:27 Your intuition, again, we’re in the realm of philosophy
00:59:30 a little bit.
00:59:32 Well, probably no.
00:59:36 But again, it depends what you really
00:59:40 mean for understanding.
00:59:42 So I think we don’t understand how deep networks work.
00:59:53 I think we are beginning to have a theory now.
00:59:56 But in the case of deep networks,
00:59:59 or even in the case of the simpler kernel machines
01:00:04 or linear classifier, we really don’t understand
01:00:08 the individual units or so.
01:00:11 But we understand what the computation and the limitations
01:00:17 and the properties of it are.
01:00:20 It’s similar to many things.
01:00:24 What does it mean to understand how a fusion bomb works?
01:00:29 How many of us understand the basic principle?
01:00:36 And some of us may understand deeper details.
01:00:40 In that sense, understanding is, as a community,
01:00:43 as a civilization, can we build another copy of it?
01:00:47 And in that sense, do you think there
01:00:50 will need to be some evolutionary component where
01:00:53 it runs away from our understanding?
01:00:56 Or do you think it could be engineered from the ground up,
01:00:59 the same way you go from the transistor to PowerPoint?
01:01:02 So many years ago, this was actually 40, 41 years ago,
01:01:09 I wrote a paper with David Marr, who
01:01:13 was one of the founding fathers of computer vision,
01:01:18 computational vision.
01:01:20 I wrote a paper about levels of understanding,
01:01:23 which is related to the question we discussed earlier
01:01:26 about understanding PowerPoint, understanding transistors,
01:01:30 and so on.
01:01:31 And in that kind of framework, we
01:01:36 had the level of the hardware and the top level
01:01:39 of the algorithms.
01:01:42 We did not have learning.
01:01:45 Recently, I updated adding levels.
01:01:48 And one level I added to those three was learning.
01:01:55 And you can imagine, you could have a good understanding
01:01:59 of how you construct a learning machine, like we do.
01:02:04 But being unable to describe in detail what the learning
01:02:09 machines will discover, right?
01:02:13 Now, that would be still a powerful understanding,
01:02:17 if I can build a learning machine,
01:02:19 even if I don’t understand in detail every time it
01:02:24 learns something.
01:02:26 Just like our children, if they start
01:02:28 listening to a certain type of music,
01:02:31 I don’t know, Miley Cyrus or something,
01:02:33 you don’t understand why they came
01:02:36 to that particular preference.
01:02:37 But you understand the learning process.
01:02:39 That’s very interesting.
01:02:41 So on learning for systems to be part of our world,
01:02:50 it has a certain, one of the challenging things
01:02:53 that you’ve spoken about is learning ethics, learning
01:02:57 morals.
01:02:59 And how hard do you think is the problem of, first of all,
01:03:04 humans understanding our ethics?
01:03:06 What is the origin on the neural on the low level of ethics?
01:03:10 What is it at the higher level?
01:03:12 Is it something that’s learnable from machines
01:03:15 in your intuition?
01:03:17 I think, yeah, ethics is learnable, very likely.
01:03:23 I think it’s one of these problems where
01:03:29 I think understanding the neuroscience of ethics,
01:03:36 people discuss there is an ethics of neuroscience.
01:03:41 Yeah, yes.
01:03:42 How a neuroscientist should or should not behave.
01:03:46 Can you think of a neurosurgeon and the ethics
01:03:50 rule he has to be or he, she has to be.
01:03:53 But I’m more interested on the neuroscience of ethics.
01:03:57 You’re blowing my mind right now.
01:03:58 The neuroscience of ethics is very meta.
01:04:01 Yeah, and I think that would be important to understand also
01:04:05 for being able to design machines that
01:04:10 are ethical machines in our sense of ethics.
01:04:15 And you think there is something in neuroscience,
01:04:18 there’s patterns, tools in neuroscience
01:04:21 that could help us shed some light on ethics?
01:04:25 Or is it mostly on the psychologists of sociology
01:04:28 in which higher level?
01:04:29 No, there is psychology.
01:04:30 But there is also, in the meantime,
01:04:35 there is evidence, fMRI, of specific areas of the brain
01:04:41 that are involved in certain ethical judgment.
01:04:44 And not only this, you can stimulate those area
01:04:47 with magnetic fields and change the ethical decisions.
01:04:53 Yeah, wow.
01:04:56 So that’s work by a colleague of mine, Rebecca Sachs.
01:05:00 And there is other researchers doing similar work.
01:05:05 And I think this is the beginning.
01:05:08 But ideally, at some point, we’ll
01:05:11 have an understanding of how this works.
01:05:15 And why it evolved, right?
01:05:18 The big why question.
01:05:19 Yeah, it must have some purpose.
01:05:22 Yeah, obviously it has some social purposes, probably.
01:05:30 If neuroscience holds the key to at least illuminate
01:05:33 some aspect of ethics, that means
01:05:35 it could be a learnable problem.
01:05:37 Yeah, exactly.
01:05:38 And as we’re getting into harder and harder questions,
01:05:42 let’s go to the hard problem of consciousness.
01:05:45 Is this an important problem for us
01:05:48 to think about and solve on the engineering of intelligence
01:05:52 side of your work, of our dream?
01:05:56 It’s unclear.
01:05:57 So again, this is a deep problem,
01:06:02 partly because it’s very difficult to define
01:06:05 consciousness.
01:06:06 And there is a debate among neuroscientists
01:06:17 about whether consciousness and philosophers, of course,
01:06:23 whether consciousness is something that requires
01:06:28 flesh and blood, so to speak.
01:06:31 Or could be that we could have silicon devices that
01:06:38 are conscious, or up to statement
01:06:42 like everything has some degree of consciousness
01:06:45 and some more than others.
01:06:48 This is like Giulio Tonioni and phi.
01:06:53 We just recently talked to Christoph Koch.
01:06:56 OK.
01:06:57 Christoph was my first graduate student.
01:07:00 Do you think it’s important to illuminate
01:07:04 aspects of consciousness in order
01:07:07 to engineer intelligence systems?
01:07:10 Do you think an intelligent system would ultimately
01:07:13 have consciousness?
01:07:14 Are they interlinked?
01:07:18 Most of the people working in artificial intelligence,
01:07:22 I think, would answer, we don’t strictly
01:07:25 need consciousness to have an intelligent system.
01:07:30 That’s sort of the easier question,
01:07:31 because it’s a very engineering answer to the question.
01:07:36 Pass the Turing test, we don’t need consciousness.
01:07:38 But if you were to go, do you think
01:07:41 it’s possible that we need to have
01:07:46 that kind of self awareness?
01:07:48 We may, yes.
01:07:49 So for instance, I personally think
01:07:53 that when test a machine or a person in a Turing test,
01:08:00 in an extended Turing test, I think
01:08:05 consciousness is part of what we require in that test,
01:08:11 implicitly, to say that this is intelligent.
01:08:15 Christoph disagrees.
01:08:17 Yes, he does.
01:08:20 Despite many other romantic notions he holds,
01:08:23 he disagrees with that one.
01:08:24 Yes, that’s right.
01:08:26 So we’ll see.
01:08:29 Do you think, as a quick question,
01:08:34 Ernest Becker’s fear of death, do you
01:08:38 think mortality and those kinds of things
01:08:41 are important for consciousness and for intelligence?
01:08:49 The finiteness of life, finiteness of existence,
01:08:54 or is that just a side effect of evolution,
01:08:56 evolutionary side effect that’s useful for natural selection?
01:09:01 Do you think this kind of thing that this interview is
01:09:03 going to run out of time soon, our life
01:09:06 will run out of time soon, do you
01:09:08 think that’s needed to make this conversation good and life
01:09:11 good?
01:09:12 I never thought about it.
01:09:13 It’s a very interesting question.
01:09:15 I think Steve Jobs, in his commencement speech
01:09:21 at Stanford, argued that having a finite life
01:09:26 was important for stimulating achievements.
01:09:30 So it was different.
01:09:31 Yeah, live every day like it’s your last, right?
01:09:33 Yeah.
01:09:34 So rationally, I don’t think strictly you need mortality
01:09:41 for consciousness.
01:09:43 But who knows?
01:09:45 They seem to go together in our biological system, right?
01:09:48 Yeah, yeah.
01:09:51 You’ve mentioned before, and students are associated with,
01:09:57 AlphaGo immobilized the big recent success stories in AI.
01:10:01 And I think it’s captivated the entire world of what AI can do.
01:10:06 So what do you think will be the next breakthrough?
01:10:10 And what’s your intuition about the next breakthrough?
01:10:13 Of course, I don’t know where the next breakthrough is.
01:10:16 I think that there is a good chance, as I said before,
01:10:21 that the next breakthrough will also
01:10:23 be inspired by neuroscience.
01:10:27 But which one, I don’t know.
01:10:32 And there’s, so MIT has this quest for intelligence.
01:10:35 And there’s a few moon shots, which in that spirit,
01:10:39 which ones are you excited about?
01:10:41 Which projects kind of?
01:10:44 Well, of course, I’m excited about one
01:10:47 of the moon shots, which is our Center for Brains, Minds,
01:10:51 and Machines, which is the one which is fully funded by NSF.
01:10:58 And it is about visual intelligence.
01:11:02 And that one is particularly about understanding.
01:11:06 Visual intelligence, so the visual cortex,
01:11:09 and visual intelligence in the sense
01:11:13 of how we look around ourselves and understand
01:11:20 the world around ourselves, meaning what is going on,
01:11:25 how we could go from here to there without hitting
01:11:29 obstacles, whether there are other agents,
01:11:34 people in the environment.
01:11:36 These are all things that we perceive very quickly.
01:11:41 And it’s something actually quite close to being conscious,
01:11:46 not quite.
01:11:47 But there is this interesting experiment
01:11:50 that was run at Google X, which is in a sense
01:11:54 is just a virtual reality experiment,
01:11:58 but in which they had a subject sitting, say,
01:12:02 in a chair with goggles, like Oculus and so on, earphones.
01:12:11 And they were seeing through the eyes of a robot
01:12:15 nearby to cameras, microphones for receiving.
01:12:19 So their sensory system was there.
01:12:23 And the impression of all the subject, very strong,
01:12:28 they could not shake it off, was that they
01:12:31 were where the robot was.
01:12:35 They could look at themselves from the robot
01:12:38 and still feel they were where the robot is.
01:12:42 They were looking at their body.
01:12:46 Theirself had moved.
01:12:48 So some aspect of scene understanding
01:12:50 has to have ability to place yourself,
01:12:54 have a self awareness about your position in the world
01:12:57 and what the world is.
01:12:59 So we may have to solve the hard problem of consciousness
01:13:04 to solve it.
01:13:04 On their way, yes.
01:13:05 It’s quite a moonshine.
01:13:07 So you’ve been an advisor to some incredible minds,
01:13:12 including Demis Hassabis, Krzysztof Koch, Amna Shashua,
01:13:15 like you said.
01:13:17 All went on to become seminal figures
01:13:20 in their respective fields.
01:13:22 From your own success as a researcher
01:13:24 and from perspective as a mentor of these researchers,
01:13:29 having guided them in the way of advice,
01:13:34 what does it take to be successful in science
01:13:36 and engineering careers?
01:13:39 Whether you’re talking to somebody in their teens,
01:13:43 20s, and 30s, what does that path look like?
01:13:48 It’s curiosity and having fun.
01:13:53 And I think it’s important also having
01:13:57 fun with other curious minds.
01:14:02 It’s the people you surround with too,
01:14:04 so fun and curiosity.
01:14:06 Is there, you mentioned Steve Jobs,
01:14:09 is there also an underlying ambition
01:14:13 that’s unique that you saw?
01:14:14 Or does it really does boil down
01:14:16 to insatiable curiosity and fun?
01:14:18 Well of course, it’s being curious
01:14:22 in an active and ambitious way, yes.
01:14:26 Definitely.
01:14:29 But I think sometime in science,
01:14:33 there are friends of mine who are like this.
01:14:39 There are some of the scientists
01:14:40 like to work by themselves
01:14:44 and kind of communicate only when they complete their work
01:14:50 or discover something.
01:14:52 I think I always found the actual process
01:14:58 of discovering something is more fun
01:15:03 if it’s together with other intelligent
01:15:07 and curious and fun people.
01:15:09 So if you see the fun in that process,
01:15:11 the side effect of that process
01:15:13 will be that you’ll actually end up
01:15:14 discovering some interesting things.
01:15:16 So as you’ve led many incredible efforts here,
01:15:23 what’s the secret to being a good advisor,
01:15:25 mentor, leader in a research setting?
01:15:28 Is it a similar spirit?
01:15:30 Or yeah, what advice could you give
01:15:32 to people, young faculty and so on?
01:15:35 It’s partly repeating what I said
01:15:38 about an environment that should be friendly
01:15:41 and fun and ambitious.
01:15:44 And I think I learned a lot
01:15:49 from some of my advisors and friends
01:15:52 and some who are physicists.
01:15:55 And there was, for instance,
01:15:57 this behavior that was encouraged
01:16:02 of when somebody comes with a new idea in the group,
01:16:06 you are, unless it’s really stupid,
01:16:09 but you are always enthusiastic.
01:16:11 And then, and you’re enthusiastic for a few minutes,
01:16:14 for a few hours.
01:16:15 Then you start asking critically a few questions,
01:16:21 testing this.
01:16:23 But this is a process that is,
01:16:26 I think it’s very good.
01:16:29 You have to be enthusiastic.
01:16:30 Sometimes people are very critical from the beginning.
01:16:33 That’s not…
01:16:36 Yes, you have to give it a chance
01:16:37 for that seed to grow.
01:16:39 That said, with some of your ideas,
01:16:41 which are quite revolutionary,
01:16:42 so there’s a witness, especially in the human vision side
01:16:45 and neuroscience side,
01:16:47 there could be some pretty heated arguments.
01:16:50 Do you enjoy these?
01:16:51 Is that a part of science and academic pursuits
01:16:54 that you enjoy?
01:16:55 Yeah.
01:16:56 Is that something that happens in your group as well?
01:17:01 Yeah, absolutely.
01:17:02 I also spent some time in Germany.
01:17:04 Again, there is this tradition
01:17:05 in which people are more forthright,
01:17:10 less kind than here.
01:17:14 So in the U.S., when you write a bad letter,
01:17:20 you still say, this guy’s nice.
01:17:23 Yes, yes.
01:17:25 So…
01:17:26 Yeah, here in America, it’s degrees of nice.
01:17:28 Yes.
01:17:29 It’s all just degrees of nice, yeah.
01:17:31 Right, right.
01:17:31 So as long as this does not become personal,
01:17:36 and it’s really like a football game
01:17:40 with these rules, that’s great.
01:17:43 That’s fun.
01:17:46 So if you somehow found yourself in a position
01:17:49 to ask one question of an oracle,
01:17:51 like a genie, maybe a god,
01:17:55 and you’re guaranteed to get a clear answer,
01:17:58 what kind of question would you ask?
01:18:01 What would be the question you would ask?
01:18:04 In the spirit of our discussion,
01:18:06 it could be, how could I become 10 times more intelligent?
01:18:10 And so, but see, you only get a clear short answer.
01:18:16 So do you think there’s a clear short answer to that?
01:18:18 No.
01:18:20 And that’s the answer you’ll get.
01:18:22 Okay, so you’ve mentioned Flowers of Algernon.
01:18:26 Oh, yeah.
01:18:27 As a story that inspires you in your childhood,
01:18:32 as this story of a mouse,
01:18:37 human achieving genius level intelligence,
01:18:39 and then understanding what was happening
01:18:41 while slowly becoming not intelligent again,
01:18:44 and this tragedy of gaining intelligence
01:18:46 and losing intelligence,
01:18:48 do you think in that spirit, in that story,
01:18:51 do you think intelligence is a gift or a curse
01:18:55 from the perspective of happiness and meaning of life?
01:19:00 You try to create an intelligent system
01:19:02 that understands the universe,
01:19:03 but on an individual level, the meaning of life,
01:19:06 do you think intelligence is a gift?
01:19:10 It’s a good question.
01:19:17 I don’t know.
01:19:22 As one of the, as one people consider
01:19:26 the smartest people in the world,
01:19:29 in some dimension, at the very least, what do you think?
01:19:33 I don’t know, it may be invariant to intelligence,
01:19:37 that degree of happiness.
01:19:39 It would be nice if it were.
01:19:43 That’s the hope.
01:19:44 Yeah.
01:19:46 You could be smart and happy and clueless and happy.
01:19:50 Yeah.
01:19:51 As always, on the discussion of the meaning of life,
01:19:54 it’s probably a good place to end.
01:19:57 Tommaso, thank you so much for talking today.
01:19:59 Thank you, this was great.