Tomaso Poggio: Brains, Minds, and Machines #13

Transcript

00:00:00 The following is a conversation with Tommaso Poggio.

00:00:02 He’s a professor at MIT and is a director of the Center

00:00:06 for Brains, Minds, and Machines.

00:00:08 Cited over 100,000 times, his work

00:00:11 has had a profound impact on our understanding

00:00:14 of the nature of intelligence in both biological and artificial

00:00:18 neural networks.

00:00:19 He has been an advisor to many highly impactful researchers

00:00:23 and entrepreneurs in AI, including

00:00:26 Demis Hassabis of DeepMind, Amnon Shashua of Mobileye,

00:00:29 and Christoph Koch of the Allen Institute for Brain Science.

00:00:34 This conversation is part of the MIT course

00:00:36 on artificial general intelligence

00:00:38 and the artificial intelligence podcast.

00:00:40 If you enjoy it, subscribe on YouTube, iTunes,

00:00:42 or simply connect with me on Twitter

00:00:44 at Lex Friedman, spelled F R I D.

00:00:48 And now, here’s my conversation with Tommaso Poggio.

00:00:52 You’ve mentioned that in your childhood,

00:00:54 you’ve developed a fascination with physics, especially

00:00:57 the theory of relativity.

00:00:59 And that Einstein was also a childhood hero to you.

00:01:04 What aspect of Einstein’s genius, the nature of his genius,

00:01:09 do you think was essential for discovering

00:01:11 the theory of relativity?

00:01:12 You know, Einstein was a hero to me,

00:01:15 and I’m sure to many people, because he

00:01:18 was able to make, of course, a major, major contribution

00:01:23 to physics with simplifying a bit just a gedanken experiment,

00:01:31 a thought experiment, you know, imagining communication

00:01:37 with lights between a stationary observer

00:01:41 and somebody on a train.

00:01:43 And I thought, you know, the fact

00:01:47 that just with the force of his thought, of his thinking,

00:01:51 of his mind, he could get to something so deep

00:01:55 in terms of physical reality, how time

00:01:58 depend on space and speed, it was something

00:02:02 absolutely fascinating.

00:02:04 It was the power of intelligence,

00:02:06 the power of the mind.

00:02:08 Do you think the ability to imagine,

00:02:11 to visualize as he did, as a lot of great physicists do,

00:02:15 do you think that’s in all of us human beings?

00:02:18 Or is there something special to that one particular human

00:02:21 being?

00:02:22 I think, you know, all of us can learn and have, in principle,

00:02:30 similar breakthroughs.

00:02:33 There are lessons to be learned from Einstein.

00:02:37 He was one of five PhD students at ETA,

00:02:42 the Eidgenössische Technische Hochschule in Zurich,

00:02:47 in physics.

00:02:48 And he was the worst of the five,

00:02:50 the only one who did not get an academic position when

00:02:55 he graduated, when he finished his PhD.

00:02:57 And he went to work, as everybody knows,

00:03:01 for the patent office.

00:03:02 And so it’s not so much that he worked for the patent office,

00:03:05 but the fact that obviously he was smart,

00:03:08 but he was not a top student, obviously

00:03:11 was the anti conformist.

00:03:13 He was not thinking in the traditional way that probably

00:03:17 his teachers and the other students were doing.

00:03:20 So there is a lot to be said about trying

00:03:23 to do the opposite or something quite different from what

00:03:29 other people are doing.

00:03:31 That’s certainly true for the stock market.

00:03:32 Never buy if everybody’s buying.

00:03:36 And also true for science.

00:03:38 Yes.

00:03:39 So you’ve also mentioned, staying

00:03:42 on the theme of physics, that you were excited at a young age

00:03:47 by the mysteries of the universe that physics could uncover.

00:03:51 Such, as I saw mentioned, the possibility of time travel.

00:03:56 So the most out of the box question,

00:03:58 I think I’ll get to ask today, do you

00:04:00 think time travel is possible?

00:04:03 Well, it would be nice if it were possible right now.

00:04:07 In science, you never say no.

00:04:12 But your understanding of the nature of time.

00:04:15 Yeah.

00:04:15 It’s very likely that it’s not possible to travel in time.

00:04:22 We may be able to travel forward in time

00:04:26 if we can, for instance, freeze ourselves or go

00:04:31 on some spacecraft traveling close to the speed of light.

00:04:37 But in terms of actively traveling, for instance,

00:04:40 back in time, I find probably very unlikely.

00:04:45 So do you still hold the underlying dream

00:04:49 of the engineering intelligence that

00:04:52 will build systems that are able to do such huge leaps,

00:04:56 like discovering the kind of mechanism that would be

00:05:01 required to travel through time?

00:05:02 Do you still hold that dream or echoes of it

00:05:05 from your childhood?

00:05:07 Yeah.

00:05:08 I don’t think whether there are certain problems that probably

00:05:12 cannot be solved, depending what you believe

00:05:16 about the physical reality, like maybe totally impossible

00:05:21 to create energy from nothing or to travel back in time,

00:05:27 but about making machines that can think as well as we do

00:05:35 or better, or more likely, especially

00:05:38 in the short and midterm, help us think better,

00:05:42 which is, in a sense, is happening already

00:05:44 with the computers we have.

00:05:46 And it will happen more and more.

00:05:48 But that I certainly believe.

00:05:50 And I don’t see, in principle, why computers at some point

00:05:55 could not become more intelligent than we are,

00:05:59 although the word intelligence is a tricky one

00:06:03 and one we should discuss.

00:06:05 What I mean with that.

00:06:08 Intelligence, consciousness, words like love,

00:06:13 all these need to be disentangled.

00:06:16 So you’ve mentioned also that you believe

00:06:18 the problem of intelligence is the greatest problem

00:06:22 in science, greater than the origin of life

00:06:24 and the origin of the universe.

00:06:27 You’ve also, in the talk I’ve listened to,

00:06:30 said that you’re open to arguments against you.

00:06:34 So what do you think is the most captivating aspect

00:06:40 of this problem of understanding the nature of intelligence?

00:06:43 Why does it captivate you as it does?

00:06:47 Well, originally, I think one of the motivation

00:06:51 that I had as, I guess, a teenager when I was infatuated

00:06:56 with theory of relativity was really

00:06:59 that I found that there was the problem of time and space

00:07:05 and general relativity.

00:07:07 But there were so many other problems

00:07:10 of the same level of difficulty and importance

00:07:13 that I could, even if I were Einstein,

00:07:16 it was difficult to hope to solve all of them.

00:07:19 So what about solving a problem whose solution allowed

00:07:24 me to solve all the problems?

00:07:26 And this was, what if we could find the key to an intelligence

00:07:33 10 times better or faster than Einstein?

00:07:37 So that’s sort of seeing artificial intelligence

00:07:40 as a tool to expand our capabilities.

00:07:43 But is there just an inherent curiosity in you

00:07:47 in just understanding what it is in here

00:07:52 that makes it all work?

00:07:54 Yes, absolutely, you’re right.

00:07:55 So I started saying this was the motivation when

00:07:59 I was a teenager.

00:08:00 But soon after, I think the problem of human intelligence

00:08:07 became a real focus of my science and my research

00:08:15 because I think for me, the most interesting problem

00:08:22 is really asking who we are.

00:08:28 It’s asking not only a question about science,

00:08:31 but even about the very tool we are using to do science, which

00:08:36 is our brain.

00:08:37 How does our brain work?

00:08:39 From where does it come from?

00:08:42 What are its limitations?

00:08:43 Can we make it better?

00:08:46 And that, in many ways, is the ultimate question

00:08:50 that underlies this whole effort of science.

00:08:54 So you’ve made significant contributions

00:08:56 in both the science of intelligence

00:08:58 and the engineering of intelligence.

00:09:02 In a hypothetical way, let me ask,

00:09:05 how far do you think we can get in creating intelligence

00:09:07 systems without understanding the biological,

00:09:12 the understanding how the human brain creates intelligence?

00:09:15 Put another way, do you think we can

00:09:17 build a strong AI system without really getting at the core

00:09:22 understanding the functional nature of the brain?

00:09:25 Well, this is a real difficult question.

00:09:29 We did solve problems like flying

00:09:35 without really using too much our knowledge

00:09:40 about how birds fly.

00:09:44 It was important, I guess, to know that you could have

00:09:48 things heavier than air being able to fly, like birds.

00:09:56 But beyond that, probably we did not learn very much, some.

00:10:02 The Brothers Wright did learn a lot of observation

00:10:06 about birds and designing their aircraft.

00:10:12 But you can argue we did not use much of biology

00:10:16 in that particular case.

00:10:17 Now, in the case of intelligence,

00:10:20 I think that it’s a bit of a bet right now.

00:10:28 If you ask, OK, we all agree we’ll get at some point,

00:10:36 maybe soon, maybe later, to a machine that

00:10:39 is indistinguishable from my secretary,

00:10:42 say, in terms of what I can ask the machine to do.

00:10:47 I think we’ll get there.

00:10:49 And now the question is, you can ask people,

00:10:51 do you think we’ll get there without any knowledge

00:10:54 about the human brain?

00:10:57 Or that the best way to get there

00:10:59 is to understand better the human brain?

00:11:02 OK, this is, I think, an educated bet

00:11:05 that different people with different backgrounds

00:11:09 will decide in different ways.

00:11:11 The recent history of the progress

00:11:14 in AI in the last, I would say, five years or 10 years

00:11:18 has been that the main breakthroughs,

00:11:23 the main recent breakthroughs, really start from neuroscience.

00:11:32 I can mention reinforcement learning as one.

00:11:35 It’s one of the algorithms at the core of AlphaGo,

00:11:41 which is the system that beat the kind of an official world

00:11:45 champion of Go, Lee Sedol, two, three years ago in Seoul.

00:11:52 That’s one.

00:11:53 And that started really with the work of Pavlov in 1900,

00:12:00 Marvin Minsky in the 60s, and many other neuroscientists

00:12:05 later on.

00:12:07 And deep learning started, which is at the core, again,

00:12:12 of AlphaGo and systems like autonomous driving

00:12:17 systems for cars, like the systems that Mobileye,

00:12:22 which is a company started by one of my ex postdocs,

00:12:25 Amnon Shashua, did.

00:12:28 So that is at the core of those things.

00:12:30 And deep learning, really, the initial ideas

00:12:34 in terms of the architecture of these layered

00:12:37 hierarchical networks started with work of Torsten Wiesel

00:12:43 and David Hubel at Harvard up the river in the 60s.

00:12:47 So recent history suggests that neuroscience played a big role

00:12:53 in these breakthroughs.

00:12:54 My personal bet is that there is a good chance they continue

00:12:58 to play a big role.

00:12:59 Maybe not in all the future breakthroughs,

00:13:01 but in some of them.

00:13:03 At least in inspiration.

00:13:05 At least in inspiration, absolutely, yes.

00:13:07 So you studied both artificial and biological neural networks.

00:13:12 You said these mechanisms that underlie deep learning

00:13:17 and reinforcement learning.

00:13:19 But there is nevertheless significant differences

00:13:23 between biological and artificial neural networks

00:13:26 as they stand now.

00:13:27 So between the two, what do you find

00:13:30 is the most interesting, mysterious, maybe even

00:13:33 beautiful difference as it currently

00:13:35 stands in our understanding?

00:13:37 I must confess that until recently, I

00:13:41 found that the artificial networks, too simplistic

00:13:46 relative to real neural networks.

00:13:49 But recently, I’ve been starting to think that, yes,

00:13:53 there is a very big simplification of what

00:13:57 you find in the brain.

00:13:59 But on the other hand, they are much closer

00:14:03 in terms of the architecture to the brain

00:14:07 than other models that we had, that computer science used

00:14:11 as model of thinking, which were mathematical logics, LISP,

00:14:16 Prologue, and those kind of things.

00:14:19 So in comparison to those, they’re

00:14:21 much closer to the brain.

00:14:23 You have networks of neurons, which

00:14:26 is what the brain is about.

00:14:27 And the artificial neurons in the models, as I said,

00:14:32 caricature of the biological neurons.

00:14:35 But they’re still neurons, single units communicating

00:14:38 with other units, something that is absent

00:14:41 in the traditional computer type models of mathematics,

00:14:48 reasoning, and so on.

00:14:50 So what aspect would you like to see

00:14:53 in artificial neural networks added over time

00:14:57 as we try to figure out ways to improve them?

00:14:59 So one of the main differences and problems

00:15:07 in terms of deep learning today, and it’s not only

00:15:11 deep learning, and the brain, is the need for deep learning

00:15:16 techniques to have a lot of labeled examples.

00:15:23 For instance, for ImageNet, you have

00:15:24 like a training set, which is 1 million images, each one

00:15:29 labeled by some human in terms of which object is there.

00:15:34 And it’s clear that in biology, a baby

00:15:42 may be able to see millions of images

00:15:44 in the first years of life, but will not

00:15:47 have millions of labels given to him or her by parents

00:15:52 or caretakers.

00:15:56 So how do you solve that?

00:15:59 I think there is this interesting challenge

00:16:03 that today, deep learning and related techniques

00:16:08 are all about big data, big data meaning

00:16:11 a lot of examples labeled by humans,

00:16:18 whereas in nature, you have this big data

00:16:24 is n going to infinity.

00:16:26 That’s the best, n meaning labeled data.

00:16:30 But I think the biological world is more n going to 1.

00:16:34 A child can learn from a very small number

00:16:38 of labeled examples.

00:16:42 Like you tell a child, this is a car.

00:16:44 You don’t need to say, like in ImageNet, this is a car,

00:16:48 this is a car, this is not a car, this is not a car,

00:16:51 1 million times.

00:16:54 And of course, with AlphaGo, or at least the AlphaZero

00:16:57 variants, because the world of Go

00:17:01 is so simplistic that you can actually

00:17:05 learn by yourself through self play,

00:17:06 you can play against each other.

00:17:08 In the real world, the visual system

00:17:10 that you’ve studied extensively is a lot more complicated

00:17:14 than the game of Go.

00:17:16 On the comment about children, which

00:17:18 are fascinatingly good at learning new stuff,

00:17:23 how much of it do you think is hardware,

00:17:24 and how much of it is software?

00:17:26 Yeah, that’s a good, deep question.

00:17:29 In a sense, it’s the old question of nurture and nature,

00:17:32 how much is in the gene, and how much

00:17:36 is in the experience of an individual.

00:17:41 Obviously, it’s both that play a role.

00:17:44 And I believe that the way evolution gives,

00:17:53 puts prior information, so to speak, hardwired,

00:17:55 is not really hardwired.

00:17:58 But that’s essentially an hypothesis.

00:18:02 I think what’s going on is that evolution has almost

00:18:10 necessarily, if you believe in Darwin, is very opportunistic.

00:18:14 And think about our DNA and the DNA of Drosophila.

00:18:24 Our DNA does not have many more genes than Drosophila.

00:18:28 The fly.

00:18:29 The fly, the fruit fly.

00:18:32 Now, we know that the fruit fly does not

00:18:35 learn very much during its individual existence.

00:18:39 It looks like one of these machinery

00:18:42 that it’s really mostly, not 100%, but 95%,

00:18:47 hardcoded by the genes.

00:18:51 But since we don’t have many more genes than Drosophila,

00:18:55 evolution could encode in as a general learning machinery,

00:19:02 and then had to give very weak priors.

00:19:09 Like, for instance, let me give a specific example,

00:19:15 which is recent work by a member of our Center for Brains,

00:19:18 Minds, and Machines.

00:19:20 We know because of work of other people in our group

00:19:24 and other groups, that there are cells

00:19:26 in a part of our brain, neurons, that are tuned to faces.

00:19:31 They seem to be involved in face recognition.

00:19:33 Now, this face area seems to be present in young children

00:19:41 and adults.

00:19:44 And one question is, is there from the beginning?

00:19:48 Is hardwired by evolution?

00:19:51 Or somehow it’s learned very quickly.

00:19:55 So what’s your, by the way, a lot of the questions I’m asking,

00:19:58 the answer is we don’t really know.

00:20:00 But as a person who has contributed

00:20:04 some profound ideas in these fields,

00:20:06 you’re a good person to guess at some of these.

00:20:08 So of course, there’s a caveat before a lot of the stuff

00:20:11 we talk about.

00:20:11 But what is your hunch?

00:20:14 Is the face, the part of the brain

00:20:16 that seems to be concentrated on face recognition,

00:20:20 are you born with that?

00:20:21 Or you just is designed to learn that quickly,

00:20:25 like the face of the mother and so on?

00:20:26 My hunch, my bias was the second one, learned very quickly.

00:20:32 And it turns out that Marge Livingstone at Harvard

00:20:37 has done some amazing experiments in which she raised

00:20:41 baby monkeys, depriving them of faces

00:20:45 during the first weeks of life.

00:20:48 So they see technicians, but the technician have a mask.

00:20:53 Yes.

00:20:55 And so when they looked at the area

00:21:02 in the brain of these monkeys that were usually

00:21:05 defined faces, they found no face preference.

00:21:10 So my guess is that what evolution does in this case

00:21:16 is there is a plastic area, which

00:21:19 is plastic, which is kind of predetermined

00:21:22 to be imprinted very easily.

00:21:26 But the command from the gene is not a detailed circuitry

00:21:30 for a face template.

00:21:32 Could be, but this will require probably a lot of bits.

00:21:36 You had to specify a lot of connection of a lot of neurons.

00:21:39 Instead, the command from the gene

00:21:42 is something like imprint, memorize what you see most

00:21:47 often in the first two weeks of life,

00:21:49 especially in connection with food and maybe nipples.

00:21:53 I don’t know.

00:21:54 Well, source of food.

00:21:55 And so that area is very plastic at first and then solidifies.

00:22:00 It’d be interesting if a variant of that experiment

00:22:03 would show a different kind of pattern associated

00:22:06 with food than a face pattern, whether that could stick.

00:22:10 There are indications that during that experiment,

00:22:14 what the monkeys saw quite often were

00:22:19 the blue gloves of the technicians that were giving

00:22:23 to the baby monkeys the milk.

00:22:25 And some of the cells, instead of being face sensitive

00:22:29 in that area, are hand sensitive.

00:22:33 That’s fascinating.

00:22:35 Can you talk about what are the different parts of the brain

00:22:40 and, in your view, sort of loosely,

00:22:43 and how do they contribute to intelligence?

00:22:45 Do you see the brain as a bunch of different modules,

00:22:49 and they together come in the human brain

00:22:52 to create intelligence?

00:22:53 Or is it all one mush of the same kind

00:22:59 of fundamental architecture?

00:23:04 Yeah, that’s an important question.

00:23:08 And there was a phase in neuroscience back in the 1950

00:23:15 or so in which it was believed for a while

00:23:19 that the brain was equipotential.

00:23:21 This was the term.

00:23:22 You could cut out a piece, and nothing special

00:23:28 happened apart a little bit less performance.

00:23:32 There was a surgeon, Lashley, who

00:23:37 did a lot of experiments of this type with mice and rats

00:23:41 and concluded that every part of the brain

00:23:45 was essentially equivalent to any other one.

00:23:51 It turns out that that’s really not true.

00:23:56 There are very specific modules in the brain, as you said.

00:24:00 And people may lose the ability to speak

00:24:05 if you have a stroke in a certain region,

00:24:07 or may lose control of their legs in another region.

00:24:12 So they’re very specific.

00:24:14 The brain is also quite flexible and redundant,

00:24:17 so often it can correct things and take over functions

00:24:27 from one part of the brain to the other.

00:24:29 But really, there are specific modules.

00:24:33 So the answer that we know from this old work, which

00:24:40 was basically based on lesions, either on animals,

00:24:44 or very often there was a mine of very interesting data

00:24:52 coming from the war, from different types of injuries

00:25:00 that soldiers had in the brain.

00:25:03 And more recently, functional MRI,

00:25:09 which allow you to check which part of the brain

00:25:13 are active when you are doing different tasks,

00:25:21 can replace some of this.

00:25:23 You can see that certain parts of the brain are involved,

00:25:27 are active in certain tasks.

00:25:29 Vision, language, yeah, that’s right.

00:25:32 But sort of taking a step back to that part of the brain

00:25:36 that discovers that specializes in the face

00:25:39 and how that might be learned, what’s your intuition behind?

00:25:45 Is it possible that from a physicist perspective,

00:25:48 when you get lower and lower, that it’s all the same stuff

00:25:51 and it just, when you’re born, it’s plastic

00:25:54 and quickly figures out this part is going to be about vision,

00:25:58 this is going to be about language,

00:25:59 this is about common sense reasoning.

00:26:02 Do you have an intuition that that kind of learning

00:26:05 is going on really quickly, or is it really

00:26:07 kind of solidified in hardware?

00:26:09 That’s a great question.

00:26:11 So there are parts of the brain like the cerebellum

00:26:16 or the hippocampus that are quite different from each other.

00:26:21 They clearly have different anatomy,

00:26:23 different connectivity.

00:26:26 Then there is the cortex, which is the most developed part

00:26:33 of the brain in humans.

00:26:36 And in the cortex, you have different regions

00:26:39 of the cortex that are responsible for vision,

00:26:43 for audition, for motor control, for language.

00:26:47 Now, one of the big puzzles of this

00:26:50 is that in the cortex is the cortex is the cortex.

00:26:55 Looks like it is the same in terms of hardware,

00:27:00 in terms of type of neurons and connectivity

00:27:05 across these different modalities.

00:27:08 So for the cortex, I think aside these other parts

00:27:13 of the brain like spinal cord, hippocampus,

00:27:15 cerebellum, and so on, for the cortex,

00:27:18 I think your question about hardware and software

00:27:21 and learning and so on, I think is rather open.

00:27:28 And I find it very interesting for Risa

00:27:33 to think about an architecture, computer architecture, that

00:27:36 is good for vision and at the same time is good for language.

00:27:41 Seems to be so different problem areas that you have to solve.

00:27:49 But the underlying mechanism might be the same.

00:27:51 And that’s really instructive for artificial neural networks.

00:27:55 So we’ve done a lot of great work in vision,

00:27:58 in human vision, computer vision.

00:28:01 And you mentioned the problem of human vision

00:28:03 is really as difficult as the problem of general intelligence.

00:28:07 And maybe that connects to the cortex discussion.

00:28:11 Can you describe the human visual cortex

00:28:15 and how the humans begin to understand the world

00:28:20 through the raw sensory information?

00:28:22 What’s, for folks who are not familiar,

00:28:27 especially on the computer vision side,

00:28:30 we don’t often actually take a step back except saying

00:28:33 with a sentence or two that one is inspired by the other.

00:28:36 What is it that we know about the human visual cortex?

00:28:40 That’s interesting.

00:28:40 We know quite a bit.

00:28:41 At the same time, we don’t know a lot.

00:28:43 But the bit we know, in a sense, we know a lot of the details.

00:28:50 And many we don’t know.

00:28:53 And we know a lot of the top level,

00:28:58 the answer to the top level question.

00:29:00 But we don’t know some basic ones,

00:29:02 even in terms of general neuroscience, forgetting vision.

00:29:06 Why do we sleep?

00:29:08 It’s such a basic question.

00:29:11 And we really don’t have an answer to that.

00:29:15 So taking a step back on that.

00:29:17 So sleep, for example, is fascinating.

00:29:18 Do you think that’s a neuroscience question?

00:29:22 Or if we talk about abstractions, what do you

00:29:25 think is an interesting way to study intelligence

00:29:28 or most effective on the levels of abstraction?

00:29:30 Is it chemical, is it biological,

00:29:33 is it electrophysical, mathematical,

00:29:35 as you’ve done a lot of excellent work on that side?

00:29:37 Which psychology, at which level of abstraction do you think?

00:29:43 Well, in terms of levels of abstraction,

00:29:46 I think we need all of them.

00:29:50 It’s like if you ask me, what does it

00:29:54 mean to understand a computer?

00:29:57 That’s much simpler.

00:29:58 But in a computer, I could say, well,

00:30:01 I understand how to use PowerPoint.

00:30:04 That’s my level of understanding a computer.

00:30:08 It is reasonable.

00:30:09 It gives me some power to produce slides

00:30:11 and beautiful slides.

00:30:14 Now, you can ask somebody else.

00:30:17 He says, well, I know how the transistors work

00:30:19 that are inside the computer.

00:30:21 I can write the equation for transistor and diodes

00:30:25 and circuits, logical circuits.

00:30:29 And I can ask this guy, do you know how to operate PowerPoint?

00:30:32 No idea.

00:30:34 So do you think if we discovered computers walking amongst us

00:30:39 full of these transistors that are also operating

00:30:43 under windows and have PowerPoint,

00:30:45 do you think it’s digging in a little bit more?

00:30:49 How useful is it to understand the transistor in order

00:30:53 to be able to understand PowerPoint

00:30:58 and these higher level intelligent processes?

00:31:00 So I think in the case of computers,

00:31:03 because they were made by engineers, by us,

00:31:06 this different level of understanding

00:31:09 are rather separate on purpose.

00:31:13 They are separate modules so that the engineer that

00:31:17 designed the circuit for the chips does not

00:31:19 need to know what is inside PowerPoint.

00:31:23 And somebody can write the software translating

00:31:27 from one to the other.

00:31:30 So in that case, I don’t think understanding the transistor

00:31:36 helps you understand PowerPoint, or very little.

00:31:41 If you want to understand the computer, this question,

00:31:43 I would say you have to understand it

00:31:45 at different levels.

00:31:46 If you really want to build one, right?

00:31:51 But for the brain, I think these levels of understanding,

00:31:57 so the algorithms, which kind of computation,

00:32:00 the equivalent of PowerPoint, and the circuits,

00:32:04 the transistors, I think they are much more

00:32:07 intertwined with each other.

00:32:09 There is not a neatly level of the software separate

00:32:14 from the hardware.

00:32:15 And so that’s why I think in the case of the brain,

00:32:20 the problem is more difficult and more than for computers

00:32:23 requires the interaction, the collaboration

00:32:26 between different types of expertise.

00:32:30 The brain is a big hierarchical mess.

00:32:32 You can’t just disentangle levels.

00:32:35 I think you can, but it’s much more difficult.

00:32:37 And it’s not completely obvious.

00:32:40 And as I said, I think it’s one of the, personally,

00:32:44 I think is the greatest problem in science.

00:32:47 So I think it’s fair that it’s difficult.

00:32:51 That’s a difficult one.

00:32:53 That said, you do talk about compositionality

00:32:56 and why it might be useful.

00:32:58 And when you discuss why these neural networks,

00:33:01 in artificial or biological sense, learn anything,

00:33:05 you talk about compositionality.

00:33:07 See, there’s a sense that nature can be disentangled.

00:33:13 Or, well, all aspects of our cognition

00:33:19 could be disentangled to some degree.

00:33:22 So why do you think, first of all,

00:33:25 how do you see compositionality?

00:33:27 And why do you think it exists at all in nature?

00:33:31 I spoke about, I use the term compositionality

00:33:39 when we looked at deep neural networks, multilayers,

00:33:45 and trying to understand when and why they are more powerful

00:33:50 than more classical one layer networks,

00:33:54 like linear classifier, kernel machines, so called.

00:34:01 And what we found is that in terms

00:34:05 of approximating or learning or representing

00:34:08 a function, a mapping from an input to an output,

00:34:12 like from an image to the label in the image,

00:34:16 if this function has a particular structure,

00:34:20 then deep networks are much more powerful than shallow networks

00:34:26 to approximate the underlying function.

00:34:28 And the particular structure is a structure of compositionality.

00:34:33 If the function is made up of functions of function,

00:34:38 so that you need to look on when you are interpreting an image,

00:34:45 classifying an image, you don’t need

00:34:47 to look at all pixels at once.

00:34:51 But you can compute something from small groups of pixels.

00:34:57 And then you can compute something

00:34:59 on the output of this local computation and so on,

00:35:04 which is similar to what you do when you read a sentence.

00:35:07 You don’t need to read the first and the last letter.

00:35:11 But you can read syllables, combine them in words,

00:35:16 combine the words in sentences.

00:35:18 So this is this kind of structure.

00:35:21 So that’s as part of a discussion

00:35:22 of why deep neural networks may be more

00:35:26 effective than the shallow methods.

00:35:27 And is your sense, for most things

00:35:31 we can use neural networks for, those problems

00:35:37 are going to be compositional in nature, like language,

00:35:42 like vision?

00:35:44 How far can we get in this kind of way?

00:35:47 So here is almost philosophy.

00:35:51 Well, let’s go there.

00:35:53 Yeah, let’s go there.

00:35:54 So a friend of mine, Max Tegmark, who is a physicist at MIT.

00:36:00 I’ve talked to him on this thing.

00:36:01 Yeah, and he disagrees with you, right?

00:36:03 A little bit.

00:36:04 Yeah, we agree on most.

00:36:07 But the conclusion is a bit different.

00:36:10 His conclusion is that for images, for instance,

00:36:14 the compositional structure of this function

00:36:19 that we have to learn or to solve these problems

00:36:23 comes from physics, comes from the fact

00:36:27 that you have local interactions in physics

00:36:31 between atoms and other atoms, between particle

00:36:37 of matter and other particles, between planets

00:36:41 and other planets, between stars and other.

00:36:44 It’s all local.

00:36:48 And that’s true.

00:36:51 But you could push this argument a bit further.

00:36:56 Not this argument, actually.

00:36:57 You could argue that maybe that’s part of the truth.

00:37:02 But maybe what happens is kind of the opposite,

00:37:06 is that our brain is wired up as a deep network.

00:37:11 So it can learn, understand, solve

00:37:18 problems that have this compositional structure

00:37:22 and it cannot solve problems that don’t have

00:37:27 this compositional structure.

00:37:29 So the problems we are accustomed to, we think about,

00:37:34 we test our algorithms on, are this compositional structure

00:37:40 because our brain is made up.

00:37:42 And that’s, in a sense, an evolutionary perspective

00:37:45 that we’ve.

00:37:46 So the ones that didn’t have, that weren’t

00:37:50 dealing with the compositional nature of reality died off?

00:37:55 Yes, but also could be maybe the reason

00:38:00 why we have this local connectivity in the brain,

00:38:05 like simple cells in cortex looking

00:38:08 only at the small part of the image, each one of them,

00:38:11 and then other cells looking at the small number

00:38:14 of these simple cells and so on.

00:38:16 The reason for this may be purely

00:38:19 that it was difficult to grow long range connectivity.

00:38:25 So suppose it’s for biology.

00:38:28 It’s possible to grow short range connectivity but not

00:38:34 long range also because there is a limited number of long range

00:38:38 that you can.

00:38:39 And so you have this limitation from the biology.

00:38:45 And this means you build a deep convolutional network.

00:38:50 This would be something like a deep convolutional network.

00:38:53 And this is great for solving certain class of problems.

00:38:57 These are the ones we find easy and important for our life.

00:39:02 And yes, they were enough for us to survive.

00:39:07 And you can start a successful business

00:39:10 on solving those problems with Mobileye.

00:39:14 Driving is a compositional problem.

00:39:17 So on the learning task, we don’t

00:39:21 know much about how the brain learns

00:39:24 in terms of optimization.

00:39:26 So the thing that’s stochastic gradient descent

00:39:29 is what artificial neural networks use for the most part

00:39:33 to adjust the parameters in such a way that it’s

00:39:37 able to deal based on the label data,

00:39:40 it’s able to solve the problem.

00:39:42 So what’s your intuition about why it works at all?

00:39:50 How hard of a problem it is to optimize

00:39:53 a neural network, artificial neural network?

00:39:56 Is there other alternatives?

00:39:58 Just in general, your intuition is

00:40:01 behind this very simplistic algorithm

00:40:03 that seems to do pretty good, surprisingly so.

00:40:06 Yes.

00:40:07 So I find neuroscience, the architecture of cortex,

00:40:13 is really similar to the architecture of deep networks.

00:40:17 So there is a nice correspondence there

00:40:20 between the biology and this kind

00:40:23 of local connectivity, hierarchical architecture.

00:40:28 The stochastic gradient descent, as you said,

00:40:30 is a very simple technique.

00:40:35 It seems pretty unlikely that biology could do that

00:40:41 from what we know right now about cortex and neurons

00:40:47 and synapses.

00:40:50 So it’s a big question open whether there

00:40:53 are other optimization learning algorithms that

00:40:59 can replace stochastic gradient descent.

00:41:02 And my guess is yes, but nobody has found yet a real answer.

00:41:11 I mean, people are trying, still trying,

00:41:13 and there are some interesting ideas.

00:41:18 The fact that stochastic gradient descent

00:41:22 is so successful, this has become clearly not so

00:41:26 mysterious.

00:41:27 And the reason is that it’s an interesting fact.

00:41:33 It’s a change, in a sense, in how

00:41:36 people think about statistics.

00:41:39 And this is the following, is that typically when

00:41:45 you had data and you had, say, a model with parameters,

00:41:51 you are trying to fit the model to the data,

00:41:54 to fit the parameter.

00:41:55 Typically, the kind of crowd wisdom type idea

00:42:04 was you should have at least twice the number of data

00:42:09 than the number of parameters.

00:42:12 Maybe 10 times is better.

00:42:15 Now, the way you train neural networks these days

00:42:19 is that they have 10 or 100 times more parameters

00:42:23 than data, exactly the opposite.

00:42:26 And it has been one of the puzzles about neural networks.

00:42:34 How can you get something that really works

00:42:37 when you have so much freedom?

00:42:40 From that little data, it can generalize somehow.

00:42:43 Right, exactly.

00:42:44 Do you think the stochastic nature of it

00:42:46 is essential, the randomness?

00:42:48 So I think we have some initial understanding

00:42:50 why this happens.

00:42:52 But one nice side effect of having

00:42:56 this overparameterization, more parameters than data,

00:43:00 is that when you look for the minima of a loss function,

00:43:04 like stochastic gradient descent is doing,

00:43:08 you find I made some calculations based

00:43:12 on some old basic theorem of algebra called the Bezu

00:43:19 theorem that gives you an estimate of the number

00:43:23 of solution of a system of polynomial equation.

00:43:25 Anyway, the bottom line is that there are probably

00:43:30 more minima for a typical deep networks

00:43:36 than atoms in the universe.

00:43:39 Just to say, there are a lot because

00:43:42 of the overparameterization.

00:43:44 A more global minimum, zero minimum, good minimum.

00:43:50 A more global minima.

00:43:51 Yeah, a lot of them.

00:43:53 So you have a lot of solutions.

00:43:54 So it’s not so surprising that you can find them

00:43:57 relatively easily.

00:44:00 And this is because of the overparameterization.

00:44:04 The overparameterization sprinkles that entire space

00:44:07 with solutions that are pretty good.

00:44:09 It’s not so surprising, right?

00:44:11 It’s like if you have a system of linear equation

00:44:14 and you have more unknowns than equations, then you have,

00:44:18 we know, you have an infinite number of solutions.

00:44:22 And the question is to pick one.

00:44:24 That’s another story.

00:44:25 But you have an infinite number of solutions.

00:44:27 So there are a lot of value of your unknowns

00:44:31 that satisfy the equations.

00:44:33 But it’s possible that there’s a lot of those solutions that

00:44:36 aren’t very good.

00:44:37 What’s surprising is that they’re pretty good.

00:44:39 So that’s a good question.

00:44:40 Why can you pick one that generalizes well?

00:44:42 Yeah.

00:44:44 That’s a separate question with separate answers.

00:44:47 One theorem that people like to talk about that kind of

00:44:51 inspires imagination of the power of neural networks

00:44:53 is the universality, universal approximation theorem,

00:44:57 that you can approximate any computable function

00:45:00 with just a finite number of neurons

00:45:02 in a single hidden layer.

00:45:04 Do you find this theorem one surprising?

00:45:07 Do you find it useful, interesting, inspiring?

00:45:12 No, this one, I never found it very surprising.

00:45:16 It was known since the 80s, since I entered the field,

00:45:22 because it’s basically the same as Weierstrass theorem, which

00:45:27 says that I can approximate any continuous function

00:45:32 with a polynomial of sufficiently,

00:45:34 with a sufficient number of terms, monomials.

00:45:38 So basically the same.

00:45:39 And the proofs are very similar.

00:45:41 So your intuition was there was never

00:45:43 any doubt that neural networks in theory

00:45:45 could be very strong approximators.

00:45:48 Right.

00:45:48 The question, the interesting question,

00:45:50 is that if this theorem says you can approximate, fine.

00:45:58 But when you ask how many neurons, for instance,

00:46:03 or in the case of polynomial, how many monomials,

00:46:06 I need to get a good approximation.

00:46:11 Then it turns out that that depends

00:46:15 on the dimensionality of your function,

00:46:18 how many variables you have.

00:46:20 But it depends on the dimensionality

00:46:22 of your function in a bad way.

00:46:25 It’s, for instance, suppose you want

00:46:28 an error which is no worse than 10% in your approximation.

00:46:35 You come up with a network that approximate your function

00:46:38 within 10%.

00:46:40 Then it turns out that the number of units you need

00:46:44 are in the order of 10 to the dimensionality, d,

00:46:48 how many variables.

00:46:50 So if you have two variables, these two words,

00:46:54 you have 100 units and OK.

00:46:57 But if you have, say, 200 by 200 pixel images,

00:47:02 now this is 40,000, whatever.

00:47:06 We again go to the size of the universe pretty quickly.

00:47:09 Exactly, 10 to the 40,000 or something.

00:47:14 And so this is called the curse of dimensionality,

00:47:18 not quite appropriately.

00:47:22 And the hope is with the extra layers,

00:47:24 you can remove the curse.

00:47:28 What we proved is that if you have deep layers,

00:47:32 hierarchical architecture with the local connectivity

00:47:36 of the type of convolutional deep learning,

00:47:39 and if you’re dealing with a function that

00:47:42 has this kind of hierarchical architecture,

00:47:46 then you avoid completely the curse.

00:47:50 You’ve spoken a lot about supervised deep learning.

00:47:54 What are your thoughts, hopes, views

00:47:56 on the challenges of unsupervised learning

00:47:59 with GANs, with Generative Adversarial Networks?

00:48:05 Do you see those as distinct?

00:48:08 The power of GANs, do you see those

00:48:09 as distinct from supervised methods in neural networks,

00:48:13 or are they really all in the same representation ballpark?

00:48:16 GANs is one way to get estimation of probability

00:48:24 densities, which is a somewhat new way that people have not

00:48:28 done before.

00:48:30 I don’t know whether this will really play an important role

00:48:36 in intelligence.

00:48:39 Or it’s interesting.

00:48:43 I’m less enthusiastic about it than many people in the field.

00:48:48 I have the feeling that many people in the field

00:48:50 are really impressed by the ability

00:48:54 of producing realistic looking images in this generative way.

00:49:01 Which describes the popularity of the methods.

00:49:03 But you’re saying that while that’s exciting and cool

00:49:06 to look at, it may not be the tool that’s useful for it.

00:49:11 So you describe it kind of beautifully.

00:49:13 Current supervised methods go n to infinity

00:49:16 in terms of number of labeled points.

00:49:18 And we really have to figure out how to go to n to 1.

00:49:21 And you’re thinking GANs might help,

00:49:23 but they might not be the right.

00:49:25 I don’t think for that problem, which I really think

00:49:28 is important, I think they may help.

00:49:32 They certainly have applications,

00:49:33 for instance, in computer graphics.

00:49:35 And I did work long ago, which was

00:49:41 a little bit similar in terms of saying, OK, I have a network.

00:49:47 And I present images.

00:49:49 And I can input its images.

00:49:54 And output is, for instance, the pose of the image.

00:49:57 A face, how much is smiling, is rotated 45 degrees or not.

00:50:02 What about having a network that I train with the same data

00:50:07 set, but now I invert input and output.

00:50:10 Now the input is the pose or the expression, a number,

00:50:15 set of numbers.

00:50:16 And the output is the image.

00:50:18 And I train it.

00:50:20 And we did pretty good, interesting results

00:50:22 in terms of producing very realistic looking images.

00:50:27 It was a less sophisticated mechanism.

00:50:31 But the output was pretty less than GANs.

00:50:35 But the output was pretty much of the same quality.

00:50:38 So I think for a computer graphics type application,

00:50:43 yeah, definitely GANs can be quite useful.

00:50:46 And not only for that, but for helping,

00:50:52 for instance, on this problem of unsupervised example

00:50:58 of reducing the number of labeled examples.

00:51:02 I think people, it’s like they think they can get out

00:51:07 more than they put in.

00:51:11 There’s no free lunch, as you said.

00:51:14 What do you think, what’s your intuition?

00:51:17 How can we slow the growth of N to infinity in supervised,

00:51:22 N to infinity in supervised learning?

00:51:25 So for example, Mobileye has very successfully,

00:51:29 I mean, essentially annotated large amounts of data

00:51:33 to be able to drive a car.

00:51:34 Now one thought is, so we’re trying

00:51:37 to teach machines, school of AI.

00:51:41 And we’re trying to, so how can we become better teachers,

00:51:45 maybe?

00:51:46 That’s one way.

00:51:47 No, I like that.

00:51:51 Because again, one caricature of the history of computer

00:51:57 science, you could say, begins with programmers, expensive.

00:52:05 Continuous labelers, cheap.

00:52:09 And the future will be schools, like we have for kids.

00:52:14 Yeah.

00:52:16 Currently, the labeling methods were not

00:52:20 selective about which examples we teach networks with.

00:52:25 So I think the focus of making networks that learn much faster

00:52:31 is often on the architecture side.

00:52:33 But how can we pick better examples with which to learn?

00:52:37 Do you have intuitions about that?

00:52:39 Well, that’s part of the problem.

00:52:42 But the other one is, if we look at biology,

00:52:50 a reasonable assumption, I think,

00:52:52 is in the same spirit that I said,

00:52:58 evolution is opportunistic and has weak priors.

00:53:03 The way I think the intelligence of a child,

00:53:08 the baby may develop is by bootstrapping weak priors

00:53:16 from evolution.

00:53:17 For instance, you can assume that you

00:53:24 have in most organisms, including human babies,

00:53:28 built in some basic machinery to detect motion

00:53:35 and relative motion.

00:53:38 And in fact, we know all insects from fruit flies

00:53:42 to other animals, they have this,

00:53:49 even in the retinas, in the very peripheral part.

00:53:53 It’s very conserved across species, something

00:53:56 that evolution discovered early.

00:53:59 It may be the reason why babies tend

00:54:01 to look in the first few days to moving objects

00:54:06 and not to not moving objects.

00:54:08 Now, moving objects means, OK, they’re attracted by motion.

00:54:12 But motion also means that motion

00:54:15 gives automatic segmentation from the background.

00:54:20 So because of motion boundaries, either the object

00:54:25 is moving or the eye of the baby is tracking the moving object

00:54:30 and the background is moving, right?

00:54:32 Yeah, so just purely on the visual characteristics

00:54:36 of the scene, that seems to be the most useful.

00:54:37 Right, so it’s like looking at an object without background.

00:54:43 It’s ideal for learning the object.

00:54:45 Otherwise, it’s really difficult because you

00:54:48 have so much stuff.

00:54:50 So suppose you do this at the beginning, first weeks.

00:54:55 Then after that, you can recognize object.

00:54:58 Now they are imprinted, the number one,

00:55:02 even in the background, even without motion.

00:55:05 So that’s, by the way, I just want

00:55:08 to ask on the object recognition problem.

00:55:10 So there is this being responsive to movement

00:55:13 and doing edge detection, essentially.

00:55:16 What’s the gap between being effective at visually

00:55:21 recognizing stuff, detecting where it is,

00:55:24 and understanding the scene?

00:55:27 Is this a huge gap in many layers, or is it close?

00:55:32 No, I think that’s a huge gap.

00:55:35 I think present algorithm with all the success that we have

00:55:42 and the fact that there are a lot of very useful,

00:55:45 I think we are in a golden age for applications

00:55:48 of low level vision and low level speech recognition

00:55:53 and so on, Alexa and so on.

00:55:56 There are many more things of similar level

00:55:58 to be done, including medical diagnosis and so on.

00:56:02 But we are far from what we call understanding

00:56:05 of a scene, of language, of actions, of people.

00:56:11 That is, despite the claims, that’s, I think, very far.

00:56:18 We’re a little bit off.

00:56:19 So in popular culture and among many researchers,

00:56:23 some of which I’ve spoken with, the Stuart Russell

00:56:25 and Elon Musk, in and out of the AI field,

00:56:30 there’s a concern about the existential threat of AI.

00:56:34 And how do you think about this concern?

00:56:40 And is it valuable to think about large scale, long term,

00:56:45 unintended consequences of intelligent systems

00:56:50 we try to build?

00:56:51 I always think it’s better to worry first, early,

00:56:56 rather than late.

00:56:58 So worry is good.

00:56:59 Yeah.

00:57:00 I’m not against worrying at all.

00:57:03 Personally, I think that it will take a long time

00:57:09 before there is real reason to be worried.

00:57:15 But as I said, I think it’s good to put in place

00:57:19 and think about possible safety against.

00:57:24 What I find a bit misleading are things

00:57:27 like that have been said by people I know, like Elon Musk,

00:57:31 and what is Bostrom in particular,

00:57:35 and what is his first name?

00:57:36 Nick Bostrom.

00:57:37 Nick Bostrom, right.

00:57:40 And a couple of other people that, for instance, AI

00:57:44 is more dangerous than nuclear weapons.

00:57:46 I think that’s really wrong.

00:57:50 That can be misleading.

00:57:52 Because in terms of priority, we should still

00:57:56 be more worried about nuclear weapons

00:57:59 and what people are doing about it and so on than AI.

00:58:05 And you’ve spoken about Demis Hassabis

00:58:09 and yourself saying that you think

00:58:12 you’ll be about 100 years out before we

00:58:16 have a general intelligence system that’s

00:58:18 on par with a human being.

00:58:20 Do you have any updates for those predictions?

00:58:22 Well, I think he said.

00:58:24 He said 20, I think.

00:58:25 He said 20, right.

00:58:26 This was a couple of years ago.

00:58:27 I have not asked him again.

00:58:29 So should I have?

00:58:31 Your own prediction, what’s your prediction

00:58:36 about when you’ll be truly surprised?

00:58:38 And what’s the confidence interval on that?

00:58:43 It’s so difficult to predict the future and even

00:58:45 the present sometimes.

00:58:47 It’s pretty hard to predict.

00:58:48 But I would be, as I said, this is completely,

00:58:53 I would be more like Rod Brooks.

00:58:56 I think he’s about 200 years.

00:58:58 200 years.

00:59:01 When we have this kind of AGI system,

00:59:04 artificial general intelligence system,

00:59:06 you’re sitting in a room with her, him, it.

00:59:12 Do you think the underlying design of such a system

00:59:17 is something we’ll be able to understand?

00:59:19 It will be simple?

00:59:20 Do you think it’ll be explainable,

00:59:25 understandable by us?

00:59:27 Your intuition, again, we’re in the realm of philosophy

00:59:30 a little bit.

00:59:32 Well, probably no.

00:59:36 But again, it depends what you really

00:59:40 mean for understanding.

00:59:42 So I think we don’t understand how deep networks work.

00:59:53 I think we are beginning to have a theory now.

00:59:56 But in the case of deep networks,

00:59:59 or even in the case of the simpler kernel machines

01:00:04 or linear classifier, we really don’t understand

01:00:08 the individual units or so.

01:00:11 But we understand what the computation and the limitations

01:00:17 and the properties of it are.

01:00:20 It’s similar to many things.

01:00:24 What does it mean to understand how a fusion bomb works?

01:00:29 How many of us understand the basic principle?

01:00:36 And some of us may understand deeper details.

01:00:40 In that sense, understanding is, as a community,

01:00:43 as a civilization, can we build another copy of it?

01:00:47 And in that sense, do you think there

01:00:50 will need to be some evolutionary component where

01:00:53 it runs away from our understanding?

01:00:56 Or do you think it could be engineered from the ground up,

01:00:59 the same way you go from the transistor to PowerPoint?

01:01:02 So many years ago, this was actually 40, 41 years ago,

01:01:09 I wrote a paper with David Marr, who

01:01:13 was one of the founding fathers of computer vision,

01:01:18 computational vision.

01:01:20 I wrote a paper about levels of understanding,

01:01:23 which is related to the question we discussed earlier

01:01:26 about understanding PowerPoint, understanding transistors,

01:01:30 and so on.

01:01:31 And in that kind of framework, we

01:01:36 had the level of the hardware and the top level

01:01:39 of the algorithms.

01:01:42 We did not have learning.

01:01:45 Recently, I updated adding levels.

01:01:48 And one level I added to those three was learning.

01:01:55 And you can imagine, you could have a good understanding

01:01:59 of how you construct a learning machine, like we do.

01:02:04 But being unable to describe in detail what the learning

01:02:09 machines will discover, right?

01:02:13 Now, that would be still a powerful understanding,

01:02:17 if I can build a learning machine,

01:02:19 even if I don’t understand in detail every time it

01:02:24 learns something.

01:02:26 Just like our children, if they start

01:02:28 listening to a certain type of music,

01:02:31 I don’t know, Miley Cyrus or something,

01:02:33 you don’t understand why they came

01:02:36 to that particular preference.

01:02:37 But you understand the learning process.

01:02:39 That’s very interesting.

01:02:41 So on learning for systems to be part of our world,

01:02:50 it has a certain, one of the challenging things

01:02:53 that you’ve spoken about is learning ethics, learning

01:02:57 morals.

01:02:59 And how hard do you think is the problem of, first of all,

01:03:04 humans understanding our ethics?

01:03:06 What is the origin on the neural on the low level of ethics?

01:03:10 What is it at the higher level?

01:03:12 Is it something that’s learnable from machines

01:03:15 in your intuition?

01:03:17 I think, yeah, ethics is learnable, very likely.

01:03:23 I think it’s one of these problems where

01:03:29 I think understanding the neuroscience of ethics,

01:03:36 people discuss there is an ethics of neuroscience.

01:03:41 Yeah, yes.

01:03:42 How a neuroscientist should or should not behave.

01:03:46 Can you think of a neurosurgeon and the ethics

01:03:50 rule he has to be or he, she has to be.

01:03:53 But I’m more interested on the neuroscience of ethics.

01:03:57 You’re blowing my mind right now.

01:03:58 The neuroscience of ethics is very meta.

01:04:01 Yeah, and I think that would be important to understand also

01:04:05 for being able to design machines that

01:04:10 are ethical machines in our sense of ethics.

01:04:15 And you think there is something in neuroscience,

01:04:18 there’s patterns, tools in neuroscience

01:04:21 that could help us shed some light on ethics?

01:04:25 Or is it mostly on the psychologists of sociology

01:04:28 in which higher level?

01:04:29 No, there is psychology.

01:04:30 But there is also, in the meantime,

01:04:35 there is evidence, fMRI, of specific areas of the brain

01:04:41 that are involved in certain ethical judgment.

01:04:44 And not only this, you can stimulate those area

01:04:47 with magnetic fields and change the ethical decisions.

01:04:53 Yeah, wow.

01:04:56 So that’s work by a colleague of mine, Rebecca Sachs.

01:05:00 And there is other researchers doing similar work.

01:05:05 And I think this is the beginning.

01:05:08 But ideally, at some point, we’ll

01:05:11 have an understanding of how this works.

01:05:15 And why it evolved, right?

01:05:18 The big why question.

01:05:19 Yeah, it must have some purpose.

01:05:22 Yeah, obviously it has some social purposes, probably.

01:05:30 If neuroscience holds the key to at least illuminate

01:05:33 some aspect of ethics, that means

01:05:35 it could be a learnable problem.

01:05:37 Yeah, exactly.

01:05:38 And as we’re getting into harder and harder questions,

01:05:42 let’s go to the hard problem of consciousness.

01:05:45 Is this an important problem for us

01:05:48 to think about and solve on the engineering of intelligence

01:05:52 side of your work, of our dream?

01:05:56 It’s unclear.

01:05:57 So again, this is a deep problem,

01:06:02 partly because it’s very difficult to define

01:06:05 consciousness.

01:06:06 And there is a debate among neuroscientists

01:06:17 about whether consciousness and philosophers, of course,

01:06:23 whether consciousness is something that requires

01:06:28 flesh and blood, so to speak.

01:06:31 Or could be that we could have silicon devices that

01:06:38 are conscious, or up to statement

01:06:42 like everything has some degree of consciousness

01:06:45 and some more than others.

01:06:48 This is like Giulio Tonioni and phi.

01:06:53 We just recently talked to Christoph Koch.

01:06:56 OK.

01:06:57 Christoph was my first graduate student.

01:07:00 Do you think it’s important to illuminate

01:07:04 aspects of consciousness in order

01:07:07 to engineer intelligence systems?

01:07:10 Do you think an intelligent system would ultimately

01:07:13 have consciousness?

01:07:14 Are they interlinked?

01:07:18 Most of the people working in artificial intelligence,

01:07:22 I think, would answer, we don’t strictly

01:07:25 need consciousness to have an intelligent system.

01:07:30 That’s sort of the easier question,

01:07:31 because it’s a very engineering answer to the question.

01:07:36 Pass the Turing test, we don’t need consciousness.

01:07:38 But if you were to go, do you think

01:07:41 it’s possible that we need to have

01:07:46 that kind of self awareness?

01:07:48 We may, yes.

01:07:49 So for instance, I personally think

01:07:53 that when test a machine or a person in a Turing test,

01:08:00 in an extended Turing test, I think

01:08:05 consciousness is part of what we require in that test,

01:08:11 implicitly, to say that this is intelligent.

01:08:15 Christoph disagrees.

01:08:17 Yes, he does.

01:08:20 Despite many other romantic notions he holds,

01:08:23 he disagrees with that one.

01:08:24 Yes, that’s right.

01:08:26 So we’ll see.

01:08:29 Do you think, as a quick question,

01:08:34 Ernest Becker’s fear of death, do you

01:08:38 think mortality and those kinds of things

01:08:41 are important for consciousness and for intelligence?

01:08:49 The finiteness of life, finiteness of existence,

01:08:54 or is that just a side effect of evolution,

01:08:56 evolutionary side effect that’s useful for natural selection?

01:09:01 Do you think this kind of thing that this interview is

01:09:03 going to run out of time soon, our life

01:09:06 will run out of time soon, do you

01:09:08 think that’s needed to make this conversation good and life

01:09:11 good?

01:09:12 I never thought about it.

01:09:13 It’s a very interesting question.

01:09:15 I think Steve Jobs, in his commencement speech

01:09:21 at Stanford, argued that having a finite life

01:09:26 was important for stimulating achievements.

01:09:30 So it was different.

01:09:31 Yeah, live every day like it’s your last, right?

01:09:33 Yeah.

01:09:34 So rationally, I don’t think strictly you need mortality

01:09:41 for consciousness.

01:09:43 But who knows?

01:09:45 They seem to go together in our biological system, right?

01:09:48 Yeah, yeah.

01:09:51 You’ve mentioned before, and students are associated with,

01:09:57 AlphaGo immobilized the big recent success stories in AI.

01:10:01 And I think it’s captivated the entire world of what AI can do.

01:10:06 So what do you think will be the next breakthrough?

01:10:10 And what’s your intuition about the next breakthrough?

01:10:13 Of course, I don’t know where the next breakthrough is.

01:10:16 I think that there is a good chance, as I said before,

01:10:21 that the next breakthrough will also

01:10:23 be inspired by neuroscience.

01:10:27 But which one, I don’t know.

01:10:32 And there’s, so MIT has this quest for intelligence.

01:10:35 And there’s a few moon shots, which in that spirit,

01:10:39 which ones are you excited about?

01:10:41 Which projects kind of?

01:10:44 Well, of course, I’m excited about one

01:10:47 of the moon shots, which is our Center for Brains, Minds,

01:10:51 and Machines, which is the one which is fully funded by NSF.

01:10:58 And it is about visual intelligence.

01:11:02 And that one is particularly about understanding.

01:11:06 Visual intelligence, so the visual cortex,

01:11:09 and visual intelligence in the sense

01:11:13 of how we look around ourselves and understand

01:11:20 the world around ourselves, meaning what is going on,

01:11:25 how we could go from here to there without hitting

01:11:29 obstacles, whether there are other agents,

01:11:34 people in the environment.

01:11:36 These are all things that we perceive very quickly.

01:11:41 And it’s something actually quite close to being conscious,

01:11:46 not quite.

01:11:47 But there is this interesting experiment

01:11:50 that was run at Google X, which is in a sense

01:11:54 is just a virtual reality experiment,

01:11:58 but in which they had a subject sitting, say,

01:12:02 in a chair with goggles, like Oculus and so on, earphones.

01:12:11 And they were seeing through the eyes of a robot

01:12:15 nearby to cameras, microphones for receiving.

01:12:19 So their sensory system was there.

01:12:23 And the impression of all the subject, very strong,

01:12:28 they could not shake it off, was that they

01:12:31 were where the robot was.

01:12:35 They could look at themselves from the robot

01:12:38 and still feel they were where the robot is.

01:12:42 They were looking at their body.

01:12:46 Theirself had moved.

01:12:48 So some aspect of scene understanding

01:12:50 has to have ability to place yourself,

01:12:54 have a self awareness about your position in the world

01:12:57 and what the world is.

01:12:59 So we may have to solve the hard problem of consciousness

01:13:04 to solve it.

01:13:04 On their way, yes.

01:13:05 It’s quite a moonshine.

01:13:07 So you’ve been an advisor to some incredible minds,

01:13:12 including Demis Hassabis, Krzysztof Koch, Amna Shashua,

01:13:15 like you said.

01:13:17 All went on to become seminal figures

01:13:20 in their respective fields.

01:13:22 From your own success as a researcher

01:13:24 and from perspective as a mentor of these researchers,

01:13:29 having guided them in the way of advice,

01:13:34 what does it take to be successful in science

01:13:36 and engineering careers?

01:13:39 Whether you’re talking to somebody in their teens,

01:13:43 20s, and 30s, what does that path look like?

01:13:48 It’s curiosity and having fun.

01:13:53 And I think it’s important also having

01:13:57 fun with other curious minds.

01:14:02 It’s the people you surround with too,

01:14:04 so fun and curiosity.

01:14:06 Is there, you mentioned Steve Jobs,

01:14:09 is there also an underlying ambition

01:14:13 that’s unique that you saw?

01:14:14 Or does it really does boil down

01:14:16 to insatiable curiosity and fun?

01:14:18 Well of course, it’s being curious

01:14:22 in an active and ambitious way, yes.

01:14:26 Definitely.

01:14:29 But I think sometime in science,

01:14:33 there are friends of mine who are like this.

01:14:39 There are some of the scientists

01:14:40 like to work by themselves

01:14:44 and kind of communicate only when they complete their work

01:14:50 or discover something.

01:14:52 I think I always found the actual process

01:14:58 of discovering something is more fun

01:15:03 if it’s together with other intelligent

01:15:07 and curious and fun people.

01:15:09 So if you see the fun in that process,

01:15:11 the side effect of that process

01:15:13 will be that you’ll actually end up

01:15:14 discovering some interesting things.

01:15:16 So as you’ve led many incredible efforts here,

01:15:23 what’s the secret to being a good advisor,

01:15:25 mentor, leader in a research setting?

01:15:28 Is it a similar spirit?

01:15:30 Or yeah, what advice could you give

01:15:32 to people, young faculty and so on?

01:15:35 It’s partly repeating what I said

01:15:38 about an environment that should be friendly

01:15:41 and fun and ambitious.

01:15:44 And I think I learned a lot

01:15:49 from some of my advisors and friends

01:15:52 and some who are physicists.

01:15:55 And there was, for instance,

01:15:57 this behavior that was encouraged

01:16:02 of when somebody comes with a new idea in the group,

01:16:06 you are, unless it’s really stupid,

01:16:09 but you are always enthusiastic.

01:16:11 And then, and you’re enthusiastic for a few minutes,

01:16:14 for a few hours.

01:16:15 Then you start asking critically a few questions,

01:16:21 testing this.

01:16:23 But this is a process that is,

01:16:26 I think it’s very good.

01:16:29 You have to be enthusiastic.

01:16:30 Sometimes people are very critical from the beginning.

01:16:33 That’s not…

01:16:36 Yes, you have to give it a chance

01:16:37 for that seed to grow.

01:16:39 That said, with some of your ideas,

01:16:41 which are quite revolutionary,

01:16:42 so there’s a witness, especially in the human vision side

01:16:45 and neuroscience side,

01:16:47 there could be some pretty heated arguments.

01:16:50 Do you enjoy these?

01:16:51 Is that a part of science and academic pursuits

01:16:54 that you enjoy?

01:16:55 Yeah.

01:16:56 Is that something that happens in your group as well?

01:17:01 Yeah, absolutely.

01:17:02 I also spent some time in Germany.

01:17:04 Again, there is this tradition

01:17:05 in which people are more forthright,

01:17:10 less kind than here.

01:17:14 So in the U.S., when you write a bad letter,

01:17:20 you still say, this guy’s nice.

01:17:23 Yes, yes.

01:17:25 So…

01:17:26 Yeah, here in America, it’s degrees of nice.

01:17:28 Yes.

01:17:29 It’s all just degrees of nice, yeah.

01:17:31 Right, right.

01:17:31 So as long as this does not become personal,

01:17:36 and it’s really like a football game

01:17:40 with these rules, that’s great.

01:17:43 That’s fun.

01:17:46 So if you somehow found yourself in a position

01:17:49 to ask one question of an oracle,

01:17:51 like a genie, maybe a god,

01:17:55 and you’re guaranteed to get a clear answer,

01:17:58 what kind of question would you ask?

01:18:01 What would be the question you would ask?

01:18:04 In the spirit of our discussion,

01:18:06 it could be, how could I become 10 times more intelligent?

01:18:10 And so, but see, you only get a clear short answer.

01:18:16 So do you think there’s a clear short answer to that?

01:18:18 No.

01:18:20 And that’s the answer you’ll get.

01:18:22 Okay, so you’ve mentioned Flowers of Algernon.

01:18:26 Oh, yeah.

01:18:27 As a story that inspires you in your childhood,

01:18:32 as this story of a mouse,

01:18:37 human achieving genius level intelligence,

01:18:39 and then understanding what was happening

01:18:41 while slowly becoming not intelligent again,

01:18:44 and this tragedy of gaining intelligence

01:18:46 and losing intelligence,

01:18:48 do you think in that spirit, in that story,

01:18:51 do you think intelligence is a gift or a curse

01:18:55 from the perspective of happiness and meaning of life?

01:19:00 You try to create an intelligent system

01:19:02 that understands the universe,

01:19:03 but on an individual level, the meaning of life,

01:19:06 do you think intelligence is a gift?

01:19:10 It’s a good question.

01:19:17 I don’t know.

01:19:22 As one of the, as one people consider

01:19:26 the smartest people in the world,

01:19:29 in some dimension, at the very least, what do you think?

01:19:33 I don’t know, it may be invariant to intelligence,

01:19:37 that degree of happiness.

01:19:39 It would be nice if it were.

01:19:43 That’s the hope.

01:19:44 Yeah.

01:19:46 You could be smart and happy and clueless and happy.

01:19:50 Yeah.

01:19:51 As always, on the discussion of the meaning of life,

01:19:54 it’s probably a good place to end.

01:19:57 Tommaso, thank you so much for talking today.

01:19:59 Thank you, this was great.