Yann LeCun: Dark Matter of Intelligence and Self-Supervised Learning #258

Transcript

00:00:00 The following is a conversation with Yann LeCun,

00:00:02 his second time on the podcast.

00:00:04 He is the chief AI scientist at Meta, formerly Facebook,

00:00:09 professor at NYU, touring award winner,

00:00:13 one of the seminal figures in the history

00:00:15 of machine learning and artificial intelligence,

00:00:18 and someone who is brilliant and opinionated

00:00:21 in the best kind of way.

00:00:23 And so it was always fun to talk to him.

00:00:26 This is the Lex Friedman podcast.

00:00:28 To support it, please check out our sponsors

00:00:29 in the description.

00:00:31 And now, here’s my conversation with Yann LeCun.

00:00:36 You cowrote the article,

00:00:37 Self Supervised Learning, the Dark Matter of Intelligence.

00:00:40 Great title, by the way, with Ishan Mizra.

00:00:43 So let me ask, what is self supervised learning,

00:00:46 and why is it the dark matter of intelligence?

00:00:49 I’ll start by the dark matter part.

00:00:53 There is obviously a kind of learning

00:00:55 that humans and animals are doing

00:00:59 that we currently are not reproducing properly

00:01:02 with machines or with AI, right?

00:01:04 So the most popular approaches to machine learning today are,

00:01:08 or paradigms, I should say,

00:01:09 are supervised learning and reinforcement learning.

00:01:12 And they are extremely inefficient.

00:01:15 Supervised learning requires many samples

00:01:17 for learning anything.

00:01:19 And reinforcement learning requires a ridiculously large

00:01:22 number of trial and errors for a system to learn anything.

00:01:29 And that’s why we don’t have self driving cars.

00:01:32 That was a big leap from one to the other.

00:01:34 Okay, so that, to solve difficult problems,

00:01:38 you have to have a lot of human annotation

00:01:42 for supervised learning to work.

00:01:44 And to solve those difficult problems

00:01:45 with reinforcement learning,

00:01:46 you have to have some way to maybe simulate that problem

00:01:50 such that you can do that large scale kind of learning

00:01:52 that reinforcement learning requires.

00:01:54 Right, so how is it that most teenagers can learn

00:01:58 to drive a car in about 20 hours of practice,

00:02:02 whereas even with millions of hours of simulated practice,

00:02:07 a self driving car can’t actually learn

00:02:09 to drive itself properly.

00:02:12 And so obviously we’re missing something, right?

00:02:13 And it’s quite obvious for a lot of people

00:02:15 that the immediate response you get from many people is,

00:02:19 well, humans use their background knowledge

00:02:22 to learn faster, and they’re right.

00:02:25 Now, how was that background knowledge acquired?

00:02:28 And that’s the big question.

00:02:30 So now you have to ask, how do babies

00:02:34 in the first few months of life learn how the world works?

00:02:37 Mostly by observation,

00:02:38 because they can hardly act in the world.

00:02:40 And they learn an enormous amount

00:02:42 of background knowledge about the world

00:02:43 that may be the basis of what we call common sense.

00:02:47 This type of learning is not learning a task.

00:02:51 It’s not being reinforced for anything.

00:02:53 It’s just observing the world and figuring out how it works.

00:02:58 Building world models, learning world models.

00:03:01 How do we do this?

00:03:02 And how do we reproduce this in machines?

00:03:04 So self supervised learning is one instance

00:03:09 or one attempt at trying to reproduce this kind of learning.

00:03:13 Okay, so you’re looking at just observation,

00:03:16 so not even the interacting part of a child.

00:03:18 It’s just sitting there watching mom and dad walk around,

00:03:21 pick up stuff, all of that.

00:03:23 That’s what we mean about background knowledge.

00:03:25 Perhaps not even watching mom and dad,

00:03:27 just watching the world go by.

00:03:30 Just having eyes open or having eyes closed

00:03:31 or the very act of opening and closing eyes

00:03:34 that the world appears and disappears,

00:03:36 all that basic information.

00:03:39 And you’re saying in order to learn to drive,

00:03:43 like the reason humans are able to learn to drive quickly,

00:03:45 some faster than others,

00:03:47 is because of the background knowledge.

00:03:48 They’re able to watch cars operate in the world

00:03:51 in the many years leading up to it,

00:03:53 the physics of basic objects, all that kind of stuff.

00:03:55 That’s right.

00:03:56 I mean, the basic physics of objects,

00:03:57 you don’t even need to know how a car works, right?

00:04:00 Because that you can learn fairly quickly.

00:04:02 I mean, the example I use very often

00:04:03 is you’re driving next to a cliff.

00:04:06 And you know in advance because of your understanding

00:04:10 of intuitive physics that if you turn the wheel

00:04:13 to the right, the car will veer to the right,

00:04:15 will run off the cliff, fall off the cliff,

00:04:17 and nothing good will come out of this, right?

00:04:20 But if you are a sort of tabularized

00:04:23 reinforcement learning system

00:04:25 that doesn’t have a model of the world,

00:04:28 you have to repeat falling off this cliff

00:04:30 thousands of times before you figure out it’s a bad idea.

00:04:32 And then a few more thousand times

00:04:34 before you figure out how to not do it.

00:04:36 And then a few more million times

00:04:38 before you figure out how to not do it

00:04:39 in every situation you ever encounter.

00:04:42 So self supervised learning still has to have

00:04:45 some source of truth being told to it by somebody.

00:04:50 So you have to figure out a way without human assistance

00:04:54 or without significant amount of human assistance

00:04:56 to get that truth from the world.

00:04:59 So the mystery there is how much signal is there?

00:05:03 How much truth is there that the world gives you?

00:05:06 Whether it’s the human world,

00:05:08 like you watch YouTube or something like that,

00:05:10 or it’s the more natural world.

00:05:12 So how much signal is there?

00:05:14 So here’s the trick.

00:05:16 There is way more signal in sort of a self supervised

00:05:20 setting than there is in either a supervised

00:05:22 or reinforcement setting.

00:05:24 And this is going to my analogy of the cake.

00:05:30 The cake as someone has called it,

00:05:32 where when you try to figure out how much information

00:05:36 you ask the machine to predict

00:05:37 and how much feedback you give the machine at every trial,

00:05:41 in reinforcement learning,

00:05:41 you give the machine a single scalar.

00:05:43 You tell the machine you did good, you did bad.

00:05:45 And you only tell this to the machine once in a while.

00:05:49 When I say you, it could be the universe

00:05:51 telling the machine, right?

00:05:54 But it’s just one scalar.

00:05:55 And so as a consequence,

00:05:57 you cannot possibly learn something very complicated

00:05:59 without many, many, many trials

00:06:01 where you get many, many feedbacks of this type.

00:06:04 Supervised learning, you give a few bits to the machine

00:06:08 at every sample.

00:06:11 Let’s say you’re training a system on recognizing images

00:06:15 on ImageNet with 1000 categories,

00:06:17 that’s a little less than 10 bits of information per sample.

00:06:22 But self supervised learning, here is the setting.

00:06:24 Ideally, we don’t know how to do this yet,

00:06:26 but ideally you would show a machine a segment of video

00:06:31 and then stop the video and ask the machine to predict

00:06:34 what’s going to happen next.

00:06:37 And so we let the machine predict

00:06:38 and then you let time go by

00:06:41 and show the machine what actually happened

00:06:44 and hope the machine will learn to do a better job

00:06:47 at predicting next time around.

00:06:49 There’s a huge amount of information you give the machine

00:06:51 because it’s an entire video clip

00:06:54 of the future after the video clip you fed it

00:06:59 in the first place.

00:07:00 So both for language and for vision, there’s a subtle,

00:07:05 seemingly trivial construction,

00:07:06 but maybe that’s representative

00:07:08 of what is required to create intelligence,

00:07:10 which is filling the gap.

00:07:13 So it sounds dumb, but can you,

00:07:19 it is possible you could solve all of intelligence

00:07:22 in this way, just for both language,

00:07:25 just give a sentence and continue it

00:07:28 or give a sentence and there’s a gap in it,

00:07:32 some words blanked out and you fill in what words go there.

00:07:35 For vision, you give a sequence of images

00:07:39 and predict what’s going to happen next,

00:07:40 or you fill in what happened in between.

00:07:43 Do you think it’s possible that formulation alone

00:07:48 as a signal for self supervised learning

00:07:50 can solve intelligence for vision and language?

00:07:53 I think that’s the best shot at the moment.

00:07:56 So whether this will take us all the way

00:07:59 to human level intelligence or something,

00:08:01 or just cat level intelligence is not clear,

00:08:04 but among all the possible approaches

00:08:07 that people have proposed, I think it’s our best shot.

00:08:09 So I think this idea of an intelligent system

00:08:14 filling in the blanks, either predicting the future,

00:08:18 inferring the past, filling in missing information,

00:08:23 I’m currently filling the blank

00:08:25 of what is behind your head

00:08:26 and what your head looks like from the back,

00:08:30 because I have basic knowledge about how humans are made.

00:08:33 And I don’t know what you’re going to say,

00:08:36 at which point you’re going to speak,

00:08:37 whether you’re going to move your head this way or that way,

00:08:38 which way you’re going to look,

00:08:40 but I know you’re not going to just dematerialize

00:08:42 and reappear three meters down the hall,

00:08:46 because I know what’s possible and what’s impossible

00:08:49 according to intuitive physics.

00:08:51 You have a model of what’s possible and what’s impossible

00:08:53 and then you’d be very surprised if it happens

00:08:55 and then you’ll have to reconstruct your model.

00:08:57 Right, so that’s the model of the world.

00:08:59 It’s what tells you, what fills in the blanks.

00:09:02 So given your partial information about the state

00:09:04 of the world, given by your perception,

00:09:08 your model of the world fills in the missing information

00:09:11 and that includes predicting the future,

00:09:13 re predicting the past, filling in things

00:09:16 you don’t immediately perceive.

00:09:18 And that doesn’t have to be purely generic vision

00:09:22 or visual information or generic language.

00:09:24 You can go to specifics like predicting

00:09:28 what control decision you make when you’re driving

00:09:31 in a lane, you have a sequence of images from a vehicle

00:09:35 and then you have information if you record it on video

00:09:39 where the car ended up going so you can go back in time

00:09:43 and predict where the car went

00:09:45 based on the visual information.

00:09:46 That’s very specific, domain specific.

00:09:49 Right, but the question is whether we can come up

00:09:51 with sort of a generic method for training machines

00:09:57 to do this kind of prediction or filling in the blanks.

00:09:59 So right now, this type of approach has been unbelievably

00:10:04 successful in the context of natural language processing.

00:10:08 Every modern natural language processing is pre trained

00:10:10 in self supervised manner to fill in the blanks.

00:10:13 You show it a sequence of words, you remove 10% of them

00:10:16 and then you train some gigantic neural net

00:10:17 to predict the words that are missing.

00:10:20 And once you’ve pre trained that network,

00:10:22 you can use the internal representation learned by it

00:10:26 as input to something that you train supervised

00:10:30 or whatever.

00:10:32 That’s been incredibly successful.

00:10:33 Not so successful in images, although it’s making progress

00:10:37 and it’s based on sort of manual data augmentation.

00:10:42 We can go into this later,

00:10:43 but what has not been successful yet is training from video.

00:10:47 So getting a machine to learn to represent

00:10:49 the visual world, for example, by just watching video.

00:10:52 Nobody has really succeeded in doing this.

00:10:54 Okay, well, let’s kind of give a high level overview.

00:10:57 What’s the difference in kind and in difficulty

00:11:02 between vision and language?

00:11:03 So you said people haven’t been able to really

00:11:08 kind of crack the problem of vision open

00:11:10 in terms of self supervised learning,

00:11:11 but that may not be necessarily

00:11:13 because it’s fundamentally more difficult.

00:11:15 Maybe like when we’re talking about achieving,

00:11:18 like passing the Turing test in the full spirit

00:11:22 of the Turing test in language might be harder than vision.

00:11:24 That’s not obvious.

00:11:26 So in your view, which is harder

00:11:29 or perhaps are they just the same problem?

00:11:31 When the farther we get to solving each,

00:11:34 the more we realize it’s all the same thing.

00:11:36 It’s all the same cake.

00:11:37 I think what I’m looking for are methods

00:11:40 that make them look essentially like the same cake,

00:11:43 but currently they’re not.

00:11:44 And the main issue with learning world models

00:11:48 or learning predictive models is that the prediction

00:11:53 is never a single thing

00:11:55 because the world is not entirely predictable.

00:11:59 It may be deterministic or stochastic.

00:12:00 We can get into the philosophical discussion about it,

00:12:02 but even if it’s deterministic,

00:12:05 it’s not entirely predictable.

00:12:07 And so if I play a short video clip

00:12:11 and then I ask you to predict what’s going to happen next,

00:12:14 there’s many, many plausible continuations

00:12:16 for that video clip and the number of continuation grows

00:12:20 with the interval of time that you’re asking the system

00:12:23 to make a prediction for.

00:12:26 And so one big question with self supervised learning

00:12:29 is how you represent this uncertainty,

00:12:32 how you represent multiple discrete outcomes,

00:12:35 how you represent a sort of continuum

00:12:37 of possible outcomes, et cetera.

00:12:40 And if you are sort of a classical machine learning person,

00:12:45 you say, oh, you just represent a distribution, right?

00:12:49 And that we know how to do when we’re predicting words,

00:12:52 missing words in the text,

00:12:53 because you can have a neural net give a score

00:12:56 for every word in the dictionary.

00:12:58 It’s a big list of numbers, maybe 100,000 or so.

00:13:02 And you can turn them into a probability distribution

00:13:05 that tells you when I say a sentence,

00:13:09 the cat is chasing the blank in the kitchen.

00:13:13 There are only a few words that make sense there.

00:13:15 It could be a mouse or it could be a lizard spot

00:13:18 or something like that, right?

00:13:21 And if I say the blank is chasing the blank in the Savannah,

00:13:25 you also have a bunch of plausible options

00:13:27 for those two words, right?

00:13:30 Because you have kind of a underlying reality

00:13:33 that you can refer to to sort of fill in those blanks.

00:13:38 So you cannot say for sure in the Savannah,

00:13:42 if it’s a lion or a cheetah or whatever,

00:13:44 you cannot know if it’s a zebra or a do or whatever,

00:13:49 wildebeest, the same thing.

00:13:55 But you can represent the uncertainty

00:13:56 by just a long list of numbers.

00:13:58 Now, if I do the same thing with video,

00:14:01 when I ask you to predict a video clip,

00:14:04 it’s not a discrete set of potential frames.

00:14:07 You have to have somewhere representing

00:14:10 a sort of infinite number of plausible continuations

00:14:13 of multiple frames in a high dimensional continuous space.

00:14:17 And we just have no idea how to do this properly.

00:14:20 Fine night, high dimensional.

00:14:22 So like you,

00:14:23 It’s finite high dimensional, yes.

00:14:25 Just like the words,

00:14:26 they try to get it down to a small finite set

00:14:32 of like under a million, something like that.

00:14:34 Something like that.

00:14:35 I mean, it’s kind of ridiculous that we’re doing

00:14:38 a distribution over every single possible word

00:14:40 for language and it works.

00:14:42 It feels like that’s a really dumb way to do it.

00:14:46 Like there seems to be like there should be

00:14:49 some more compressed representation

00:14:52 of the distribution of the words.

00:14:55 You’re right about that.

00:14:56 And so do you have any interesting ideas

00:14:58 about how to represent all of reality in a compressed way

00:15:01 such that you can form a distribution over it?

00:15:03 That’s one of the big questions, how do you do that?

00:15:06 Right, I mean, what’s kind of another thing

00:15:08 that really is stupid about, I shouldn’t say stupid,

00:15:13 but like simplistic about current approaches

00:15:15 to self supervised learning in NLP in text

00:15:19 is that not only do you represent

00:15:21 a giant distribution over words,

00:15:23 but for multiple words that are missing,

00:15:25 those distributions are essentially independent

00:15:27 of each other.

00:15:30 And you don’t pay too much of a price for this.

00:15:33 So you can’t, so the system in the sentence

00:15:37 that I gave earlier, if it gives a certain probability

00:15:41 for a lion and cheetah, and then a certain probability

00:15:44 for gazelle, wildebeest and zebra,

00:15:51 those two probabilities are independent of each other.

00:15:55 And it’s not the case that those things are independent.

00:15:58 Lions actually attack like bigger animals than cheetahs.

00:16:01 So there’s a huge independent hypothesis in this process,

00:16:05 which is not actually true.

00:16:07 The reason for this is that we don’t know

00:16:09 how to represent properly distributions

00:16:13 over combinatorial sequences of symbols,

00:16:16 essentially because the number grows exponentially

00:16:19 with the length of the symbols.

00:16:21 And so we have to use tricks for this,

00:16:22 but those techniques can get around,

00:16:26 like don’t even deal with it.

00:16:27 So the big question is would there be some sort

00:16:31 of abstract latent representation of text

00:16:35 that would say that when I switch lion for gazelle,

00:16:40 lion for cheetah, I also have to switch zebra for gazelle?

00:16:45 Yeah, so this independence assumption,

00:16:48 let me throw some criticism at you that I often hear

00:16:51 and see how you respond.

00:16:52 So this kind of filling in the blanks is just statistics.

00:16:56 You’re not learning anything

00:16:58 like the deep underlying concepts.

00:17:01 You’re just mimicking stuff from the past.

00:17:05 You’re not learning anything new such that you can use it

00:17:08 to generalize about the world.

00:17:11 Or okay, let me just say the crude version,

00:17:14 which is just statistics.

00:17:16 It’s not intelligence.

00:17:18 What do you have to say to that?

00:17:19 What do you usually say to that

00:17:20 if you kind of hear this kind of thing?

00:17:22 I don’t get into those discussions

00:17:23 because they are kind of pointless.

00:17:26 So first of all, it’s quite possible

00:17:28 that intelligence is just statistics.

00:17:30 It’s just statistics of a particular kind.

00:17:32 Yes, this is the philosophical question.

00:17:35 It’s kind of is it possible

00:17:38 that intelligence is just statistics?

00:17:40 Yeah, but what kind of statistics?

00:17:43 So if you are asking the question,

00:17:47 are the models of the world that we learn,

00:17:50 do they have some notion of causality?

00:17:52 Yes.

00:17:53 So if the criticism comes from people who say,

00:17:57 current machine learning system don’t care about causality,

00:17:59 which by the way is wrong, I agree with them.

00:18:04 Your model of the world should have your actions

00:18:06 as one of the inputs.

00:18:09 And that will drive you to learn causal models of the world

00:18:11 where you know what intervention in the world

00:18:15 will cause what result.

00:18:16 Or you can do this by observation of other agents

00:18:19 acting in the world and observing the effect.

00:18:22 Other humans, for example.

00:18:24 So I think at some level of description,

00:18:28 intelligence is just statistics.

00:18:31 But that doesn’t mean you don’t have models

00:18:35 that have deep mechanistic explanation for what goes on.

00:18:40 The question is how do you learn them?

00:18:41 That’s the question I’m interested in.

00:18:44 Because a lot of people who actually voice their criticism

00:18:49 say that those mechanistic model

00:18:51 have to come from someplace else.

00:18:52 They have to come from human designers,

00:18:54 they have to come from I don’t know what.

00:18:56 And obviously we learn them.

00:18:59 Or if we don’t learn them as an individual,

00:19:01 nature learn them for us using evolution.

00:19:04 So regardless of what you think,

00:19:07 those processes have been learned somehow.

00:19:10 So if you look at the human brain,

00:19:12 just like when we humans introspect

00:19:14 about how the brain works,

00:19:16 it seems like when we think about what is intelligence,

00:19:20 we think about the high level stuff,

00:19:22 like the models we’ve constructed,

00:19:23 concepts like cognitive science,

00:19:25 like concepts of memory and reasoning module,

00:19:28 almost like these high level modules.

00:19:32 Is this serve as a good analogy?

00:19:35 Like are we ignoring the dark matter,

00:19:40 the basic low level mechanisms?

00:19:43 Just like we ignore the way the operating system works,

00:19:45 we’re just using the high level software.

00:19:49 We’re ignoring that at the low level,

00:19:52 the neural network might be doing something like statistics.

00:19:56 Like meaning, sorry to use this word

00:19:59 probably incorrectly and crudely,

00:20:00 but doing this kind of fill in the gap kind of learning

00:20:03 and just kind of updating the model constantly

00:20:05 in order to be able to support the raw sensory information

00:20:09 to predict it and then adjust to the prediction

00:20:11 when it’s wrong.

00:20:12 But like when we look at our brain at the high level,

00:20:15 it feels like we’re doing, like we’re playing chess,

00:20:18 like we’re like playing with high level concepts

00:20:22 and we’re stitching them together

00:20:23 and we’re putting them into longterm memory.

00:20:26 But really what’s going underneath

00:20:28 is something we’re not able to introspect,

00:20:30 which is this kind of simple, large neural network

00:20:34 that’s just filling in the gaps.

00:20:36 Right, well, okay.

00:20:37 So there’s a lot of questions and a lot of answers there.

00:20:39 Okay, so first of all,

00:20:40 there’s a whole school of thought in neuroscience,

00:20:42 computational neuroscience in particular,

00:20:45 that likes the idea of predictive coding,

00:20:47 which is really related to the idea

00:20:50 I was talking about in self supervised learning.

00:20:52 So everything is about prediction.

00:20:53 The essence of intelligence is the ability to predict

00:20:56 and everything the brain does is trying to predict,

00:20:59 predict everything from everything else.

00:21:02 Okay, and that’s really sort of the underlying principle,

00:21:04 if you want, that self supervised learning

00:21:07 is trying to kind of reproduce this idea of prediction

00:21:10 as kind of an essential mechanism

00:21:13 of task independent learning, if you want.

00:21:16 The next step is what kind of intelligence

00:21:19 are you interested in reproducing?

00:21:21 And of course, we all think about trying to reproduce

00:21:24 sort of high level cognitive processes in humans,

00:21:28 but like with machines, we’re not even at the level

00:21:30 of even reproducing the learning processes in a cat brain.

00:21:37 The most intelligent or intelligent systems

00:21:39 don’t have as much common sense as a house cat.

00:21:43 So how is it that cats learn?

00:21:45 And cats don’t do a whole lot of reasoning.

00:21:47 They certainly have causal models.

00:21:49 They certainly have, because many cats can figure out

00:21:53 how they can act on the world to get what they want.

00:21:56 They certainly have a fantastic model of intuitive physics,

00:22:01 certainly the dynamics of their own bodies,

00:22:04 but also of praise and things like that.

00:22:06 So they’re pretty smart.

00:22:09 They only do this with about 800 million neurons.

00:22:12 We are not anywhere close to reproducing this kind of thing.

00:22:17 So to some extent, I could say,

00:22:21 let’s not even worry about like the high level cognition

00:22:26 and kind of longterm planning and reasoning

00:22:27 that humans can do until we figure out like,

00:22:30 can we even reproduce what cats are doing?

00:22:32 Now that said, this ability to learn world models,

00:22:37 I think is the key to the possibility of learning machines

00:22:41 that can also reason.

00:22:43 So whenever I give a talk, I say there are three challenges

00:22:45 in the three main challenges in machine learning.

00:22:47 The first one is getting machines to learn

00:22:49 to represent the world

00:22:51 and I’m proposing self supervised learning.

00:22:54 The second is getting machines to reason

00:22:58 in ways that are compatible

00:22:59 with essentially gradient based learning

00:23:01 because this is what deep learning is all about really.

00:23:05 And the third one is something

00:23:06 we have no idea how to solve,

00:23:07 at least I have no idea how to solve

00:23:09 is can we get machines to learn hierarchical representations

00:23:14 of action plans?

00:23:17 We know how to train them

00:23:18 to learn hierarchical representations of perception

00:23:22 with convolutional nets and things like that

00:23:23 and transformers, but what about action plans?

00:23:26 Can we get them to spontaneously learn

00:23:28 good hierarchical representations of actions?

00:23:30 Also gradient based.

00:23:32 Yeah, all of that needs to be somewhat differentiable

00:23:35 so that you can apply sort of gradient based learning,

00:23:38 which is really what deep learning is about.

00:23:42 So it’s background, knowledge, ability to reason

00:23:46 in a way that’s differentiable

00:23:50 that is somehow connected, deeply integrated

00:23:53 with that background knowledge

00:23:55 or builds on top of that background knowledge

00:23:57 and then given that background knowledge

00:23:59 be able to make hierarchical plans in the world.

00:24:02 So if you take classical optimal control,

00:24:05 there’s something in classical optimal control

00:24:07 called model predictive control.

00:24:10 And it’s been around since the early sixties.

00:24:13 NASA uses that to compute trajectories of rockets.

00:24:16 And the basic idea is that you have a predictive model

00:24:20 of the rocket, let’s say,

00:24:21 or whatever system you intend to control,

00:24:25 which given the state of the system at time T

00:24:28 and given an action that you’re taking the system.

00:24:31 So for a rocket to be thrust

00:24:33 and all the controls you can have,

00:24:35 it gives you the state of the system

00:24:37 at time T plus Delta T, right?

00:24:38 So basically a differential equation, something like that.

00:24:43 And if you have this model

00:24:45 and you have this model in the form of some sort of neural net

00:24:48 or some sort of a set of formula

00:24:50 that you can back propagate gradient through,

00:24:52 you can do what’s called model predictive control

00:24:55 or gradient based model predictive control.

00:24:57 So you can unroll that model in time.

00:25:02 You feed it a hypothesized sequence of actions.

00:25:08 And then you have some objective function

00:25:10 that measures how well at the end of the trajectory,

00:25:13 the system has succeeded or matched what you wanted to do.

00:25:17 Is it a robot harm?

00:25:18 Have you grasped the object you want to grasp?

00:25:20 If it’s a rocket, are you at the right place

00:25:23 near the space station, things like that.

00:25:26 And by back propagation through time,

00:25:28 and again, this was invented in the 1960s,

00:25:30 by optimal control theorists, you can figure out

00:25:34 what is the optimal sequence of actions

00:25:36 that will get my system to the best final state.

00:25:42 So that’s a form of reasoning.

00:25:44 It’s basically planning.

00:25:45 And a lot of planning systems in robotics

00:25:48 are actually based on this.

00:25:49 And you can think of this as a form of reasoning.

00:25:53 So to take the example of the teenager driving a car,

00:25:57 you have a pretty good dynamical model of the car.

00:26:00 It doesn’t need to be very accurate.

00:26:01 But you know, again, that if you turn the wheel

00:26:03 to the right and there is a cliff,

00:26:05 you’re gonna run off the cliff, right?

00:26:06 You don’t need to have a very accurate model

00:26:08 to predict that.

00:26:09 And you can run this in your mind

00:26:10 and decide not to do it for that reason.

00:26:13 Because you can predict in advance

00:26:14 that the result is gonna be bad.

00:26:15 So you can sort of imagine different scenarios

00:26:17 and then employ or take the first step

00:26:21 in the scenario that is most favorable

00:26:23 and then repeat the process again.

00:26:24 The scenario that is most favorable

00:26:27 and then repeat the process of planning.

00:26:28 That’s called receding horizon model predictive control.

00:26:31 So even all those things have names going back decades.

00:26:36 And so if you’re not a classical optimal control,

00:26:40 the model of the world is not generally learned.

00:26:44 Sometimes a few parameters you have to identify.

00:26:46 That’s called systems identification.

00:26:47 But generally, the model is mostly deterministic

00:26:52 and mostly built by hand.

00:26:53 So the question of AI,

00:26:55 I think the big challenge of AI for the next decade

00:26:58 is how do we get machines to learn predictive models

00:27:01 of the world that deal with uncertainty

00:27:03 and deal with the real world in all this complexity?

00:27:05 So it’s not just the trajectory of a rocket,

00:27:08 which you can reduce to first principles.

00:27:10 It’s not even just the trajectory of a robot arm,

00:27:13 which again, you can model by careful mathematics.

00:27:16 But it’s everything else,

00:27:17 everything we observe in the world.

00:27:18 People, behavior,

00:27:20 physical systems that involve collective phenomena,

00:27:25 like water or trees and branches in a tree or something

00:27:31 or complex things that humans have no trouble

00:27:36 developing abstract representations

00:27:38 and predictive model for,

00:27:39 but we still don’t know how to do with machines.

00:27:41 Where do you put in these three,

00:27:43 maybe in the planning stages,

00:27:46 the game theoretic nature of this world,

00:27:50 where your actions not only respond

00:27:52 to the dynamic nature of the world, the environment,

00:27:55 but also affect it.

00:27:57 So if there’s other humans involved,

00:27:59 is this point number four,

00:28:02 or is it somehow integrated

00:28:03 into the hierarchical representation of action

00:28:05 in your view?

00:28:06 I think it’s integrated.

00:28:07 It’s just that now your model of the world has to deal with,

00:28:11 it just makes it more complicated.

00:28:13 The fact that humans are complicated

00:28:15 and not easily predictable,

00:28:17 that makes your model of the world much more complicated,

00:28:19 that much more complicated.

00:28:21 Well, there’s a chess,

00:28:22 I mean, I suppose chess is an analogy.

00:28:25 So multicolored tree search.

00:28:28 There’s a, I go, you go, I go, you go.

00:28:32 Like Andre Capote recently gave a talk at MIT

00:28:35 about car doors.

00:28:37 I think there’s some machine learning too,

00:28:39 but mostly car doors.

00:28:40 And there’s a dynamic nature to the car,

00:28:43 like the person opening the door,

00:28:44 checking, I mean, he wasn’t talking about that.

00:28:46 He was talking about the perception problem

00:28:48 of what the ontology of what defines a car door,

00:28:50 this big philosophical question.

00:28:52 But to me, it was interesting

00:28:54 because it’s obvious that the person opening the car doors,

00:28:57 they’re trying to get out, like here in New York,

00:28:59 trying to get out of the car.

00:29:01 You slowing down is going to signal something.

00:29:03 You speeding up is gonna signal something,

00:29:05 and that’s a dance.

00:29:06 It’s a asynchronous chess game.

00:29:10 I don’t know.

00:29:10 So it feels like it’s not just,

00:29:16 I mean, I guess you can integrate all of them

00:29:18 to one giant model, like the entirety

00:29:21 of these little interactions.

00:29:24 Because it’s not as complicated as chess.

00:29:25 It’s just like a little dance.

00:29:27 We do like a little dance together,

00:29:28 and then we figure it out.

00:29:29 Well, in some ways it’s way more complicated than chess

00:29:32 because it’s continuous, it’s uncertain

00:29:36 in a continuous manner.

00:29:38 It doesn’t feel more complicated.

00:29:39 But it doesn’t feel more complicated

00:29:41 because that’s what we’ve evolved to solve.

00:29:43 This is the kind of problem we’ve evolved to solve.

00:29:45 And so we’re good at it

00:29:46 because nature has made us good at it.

00:29:50 Nature has not made us good at chess.

00:29:52 We completely suck at chess.

00:29:55 In fact, that’s why we designed it as a game,

00:29:57 is to be challenging.

00:30:00 And if there is something that recent progress

00:30:02 in chess and Go has made us realize

00:30:05 is that humans are really terrible at those things,

00:30:07 like really bad.

00:30:09 There was a story right before AlphaGo

00:30:11 that the best Go players thought

00:30:15 there were maybe two or three stones behind an ideal player

00:30:18 that they would call God.

00:30:20 In fact, no, there are like nine or 10 stones behind.

00:30:23 I mean, we’re just bad.

00:30:25 So we’re not good at,

00:30:27 and it’s because we have limited working memory.

00:30:30 We’re not very good at doing this tree exploration

00:30:32 that computers are much better at doing than we are.

00:30:36 But we are much better

00:30:37 at learning differentiable models to the world.

00:30:40 I mean, I said differentiable in a kind of,

00:30:43 I should say not differentiable in the sense that

00:30:46 we went back far through it,

00:30:47 but in the sense that our brain has some mechanism

00:30:50 for estimating gradients of some kind.

00:30:54 And that’s what makes us efficient.

00:30:56 So if you have an agent that consists of a model

00:31:02 of the world, which in the human brain

00:31:04 is basically the entire front half of your brain,

00:31:08 an objective function,

00:31:10 which in humans is a combination of two things.

00:31:14 There is your sort of intrinsic motivation module,

00:31:17 which is in the basal ganglia,

00:31:19 the base of your brain.

00:31:20 That’s the thing that measures pain and hunger

00:31:22 and things like that,

00:31:23 like immediate feelings and emotions.

00:31:28 And then there is the equivalent

00:31:30 of what people in reinforcement learning call a critic,

00:31:32 which is a sort of module that predicts ahead

00:31:36 what the outcome of a situation will be.

00:31:41 And so it’s not a cost function,

00:31:43 but it’s sort of not an objective function,

00:31:45 but it’s sort of a train predictor

00:31:49 of the ultimate objective function.

00:31:50 And that also is differentiable.

00:31:52 And so if all of this is differentiable,

00:31:54 your cost function, your critic, your world model,

00:31:59 then you can use gradient based type methods

00:32:03 to do planning, to do reasoning, to do learning,

00:32:05 to do all the things that we’d like

00:32:08 an intelligent agent to do.

00:32:11 And gradient based learning,

00:32:14 like what’s your intuition?

00:32:15 That’s probably at the core of what can solve intelligence.

00:32:18 So you don’t need like logic based reasoning in your view.

00:32:25 I don’t know how to make logic based reasoning

00:32:27 compatible with efficient learning.

00:32:31 Okay, I mean, there is a big question,

00:32:32 perhaps a philosophical question.

00:32:33 I mean, it’s not that philosophical,

00:32:35 but that we can ask is that all the learning algorithms

00:32:40 we know from engineering and computer science

00:32:43 proceed by optimizing some objective function.

00:32:48 So one question we may ask is,

00:32:51 does learning in the brain minimize an objective function?

00:32:54 I mean, it could be a composite

00:32:57 of multiple objective functions,

00:32:58 but it’s still an objective function.

00:33:01 Second, if it does optimize an objective function,

00:33:04 does it do it by some sort of gradient estimation?

00:33:09 It doesn’t need to be a back prop,

00:33:10 but some way of estimating the gradient in efficient manner

00:33:14 whose complexity is on the same order of magnitude

00:33:17 as actually running the inference.

00:33:20 Because you can’t afford to do things

00:33:24 like perturbing a weight in your brain

00:33:26 to figure out what the effect is.

00:33:28 And then sort of, you can do sort of

00:33:30 estimating gradient by perturbation.

00:33:33 To me, it seems very implausible

00:33:35 that the brain uses some sort of zeroth order black box

00:33:41 gradient free optimization,

00:33:43 because it’s so much less efficient

00:33:45 than gradient optimization.

00:33:46 So it has to have a way of estimating gradient.

00:33:49 Is it possible that some kind of logic based reasoning

00:33:52 emerges in pockets as a useful,

00:33:55 like you said, if the brain is an objective function,

00:33:58 maybe it’s a mechanism for creating objective functions.

00:34:01 It’s a mechanism for creating knowledge bases, for example,

00:34:06 that can then be queried.

00:34:08 Like maybe it’s like an efficient representation

00:34:10 of knowledge that’s learned in a gradient based way

00:34:12 or something like that.

00:34:13 Well, so I think there is a lot of different types

00:34:15 of intelligence.

00:34:17 So first of all, I think the type of logical reasoning

00:34:19 that we think about that we are maybe stemming

00:34:23 from sort of classical AI of the 1970s and 80s.

00:34:29 I think humans use that relatively rarely

00:34:33 and are not particularly good at it.

00:34:34 But we judge each other based on our ability

00:34:37 to solve those rare problems.

00:34:40 It’s called an IQ test.

00:34:41 I don’t think so.

00:34:42 Like I’m not very good at chess.

00:34:45 Yes, I’m judging you this whole time.

00:34:47 Because, well, we actually.

00:34:49 With your heritage, I’m sure you’re good at chess.

00:34:53 No, stereotypes.

00:34:55 Not all stereotypes are true.

00:34:58 Well, I’m terrible at chess.

00:34:59 So, but I think perhaps another type of intelligence

00:35:04 that I have is this ability of sort of building models

00:35:08 to the world from reasoning obviously,

00:35:13 but also data.

00:35:15 And those models generally are more kind of analogical.

00:35:18 So it’s reasoning by simulation,

00:35:22 and by analogy, where you use one model

00:35:25 to apply to a new situation.

00:35:26 Even though you’ve never seen that situation,

00:35:28 you can sort of connect it to a situation

00:35:31 you’ve encountered before.

00:35:33 And your reasoning is more akin

00:35:36 to some sort of internal simulation.

00:35:38 So you’re kind of simulating what’s happening

00:35:41 when you’re building, I don’t know,

00:35:42 a box out of wood or something, right?

00:35:44 You can imagine in advance what would be the result

00:35:47 of cutting the wood in this particular way.

00:35:49 Are you going to use screws or nails or whatever?

00:35:52 When you are interacting with someone,

00:35:54 you also have a model of that person

00:35:55 and sort of interact with that person,

00:35:59 having this model in mind to kind of tell the person

00:36:03 what you think is useful to them.

00:36:05 So I think this ability to construct models to the world

00:36:10 is basically the essence, the essence of intelligence.

00:36:13 And the ability to use it then to plan actions

00:36:18 that will fulfill a particular criterion,

00:36:23 of course, is necessary as well.

00:36:25 So I’m going to ask you a series of impossible questions

00:36:27 as we keep asking, as I’ve been doing.

00:36:30 So if that’s the fundamental sort of dark matter

00:36:33 of intelligence, this ability to form a background model,

00:36:36 what’s your intuition about how much knowledge is required?

00:36:41 You know, I think dark matter,

00:36:43 you could put a percentage on it

00:36:45 of the composition of the universe

00:36:50 and how much of it is dark matter,

00:36:51 how much of it is dark energy,

00:36:52 how much information do you think is required

00:36:57 to be a house cat?

00:36:59 So you have to be able to, when you see a box going in,

00:37:02 when you see a human compute the most evil action,

00:37:06 if there’s a thing that’s near an edge,

00:37:07 you knock it off, all of that,

00:37:10 plus the extra stuff you mentioned,

00:37:12 which is a great self awareness of the physics

00:37:15 of your own body and the world.

00:37:18 How much knowledge is required, do you think, to solve it?

00:37:22 I don’t even know how to measure an answer to that question.

00:37:25 I’m not sure how to measure it,

00:37:26 but whatever it is, it fits in about 800,000 neurons,

00:37:32 800 million neurons.

00:37:33 What’s the representation does?

00:37:36 Everything, all knowledge, everything, right?

00:37:40 You know, it’s less than a billion.

00:37:41 A dog is 2 billion, but a cat is less than 1 billion.

00:37:45 And so multiply that by a thousand

00:37:48 and you get the number of synapses.

00:37:50 And I think almost all of it is learned

00:37:52 through this, you know, a sort of self supervised running,

00:37:55 although, you know, I think a tiny sliver

00:37:58 is learned through reinforcement running

00:37:59 and certainly very little through, you know,

00:38:02 classical supervised running,

00:38:03 although it’s not even clear how supervised running

00:38:05 actually works in the biological world.

00:38:09 So I think almost all of it is self supervised running,

00:38:12 but it’s driven by the sort of ingrained objective functions

00:38:18 that a cat or a human have at the base of their brain,

00:38:21 which kind of drives their behavior.

00:38:24 So, you know, nature tells us you’re hungry.

00:38:29 It doesn’t tell us how to feed ourselves.

00:38:31 That’s something that the rest of our brain

00:38:33 has to figure out, right?

00:38:35 What’s interesting is there might be more

00:38:37 like deeper objective functions

00:38:39 than allowing the whole thing.

00:38:41 So hunger may be some kind of,

00:38:44 now you go to like neurobiology,

00:38:46 it might be just the brain trying to maintain homeostasis.

00:38:52 So hunger is just one of the human perceivable symptoms

00:38:58 of the brain being unhappy

00:38:59 with the way things are currently.

00:39:01 It could be just like one really dumb objective function

00:39:04 at the core.

00:39:04 But that’s how behavior is driven.

00:39:08 The fact that, you know, or basal ganglia

00:39:12 drive us to do things that are different

00:39:14 from say an orangutan or certainly a cat

00:39:18 is what makes, you know, human nature

00:39:20 versus orangutan nature versus cat nature.

00:39:23 So for example, you know, our basal ganglia

00:39:27 drives us to seek the company of other humans.

00:39:32 And that’s because nature has figured out

00:39:34 that we need to be social animals for our species to survive.

00:39:37 And it’s true of many primates.

00:39:41 It’s not true of orangutans.

00:39:42 Orangutans are solitary animals.

00:39:44 They don’t seek the company of others.

00:39:46 In fact, they avoid them.

00:39:49 In fact, they scream at them when they come too close

00:39:51 because they’re territorial.

00:39:52 Because for their survival, you know,

00:39:55 evolution has figured out that’s the best thing.

00:39:58 I mean, they’re occasionally social, of course,

00:40:00 for, you know, reproduction and stuff like that.

00:40:03 But they’re mostly solitary.

00:40:05 So all of those behaviors are not part of intelligence.

00:40:09 You know, people say,

00:40:10 oh, you’re never gonna have intelligent machines

00:40:11 because, you know, human intelligence is social.

00:40:13 But then you look at orangutans, you look at octopus.

00:40:16 Octopus never know their parents.

00:40:18 They barely interact with any other.

00:40:20 And they get to be really smart in less than a year,

00:40:23 in like half a year.

00:40:26 You know, in a year, they’re adults.

00:40:27 In two years, they’re dead.

00:40:28 So there are things that we think, as humans,

00:40:33 are intimately linked with intelligence,

00:40:35 like social interaction, like language.

00:40:39 We think, I think we give way too much importance

00:40:42 to language as a substrate of intelligence as humans.

00:40:46 Because we think our reasoning is so linked with language.

00:40:49 So to solve the house cat intelligence problem,

00:40:53 you think you could do it on a desert island.

00:40:55 You could have, you could just have a cat sitting there

00:41:00 looking at the waves, at the ocean waves,

00:41:03 and figure a lot of it out.

00:41:05 It needs to have sort of, you know,

00:41:07 the right set of drives to kind of, you know,

00:41:11 get it to do the thing and learn the appropriate things,

00:41:13 right, but like for example, you know,

00:41:17 baby humans are driven to learn to stand up and walk.

00:41:22 You know, that’s kind of, this desire is hardwired.

00:41:26 How to do it precisely is not, that’s learned.

00:41:28 But the desire to walk, move around and stand up,

00:41:32 that’s sort of probably hardwired.

00:41:35 But it’s very simple to hardwire this kind of stuff.

00:41:38 Oh, like the desire to, well, that’s interesting.

00:41:42 You’re hardwired to want to walk.

00:41:45 That’s not, there’s gotta be a deeper need for walking.

00:41:50 I think it was probably socially imposed by society

00:41:53 that you need to walk all the other bipedal.

00:41:55 No, like a lot of simple animals that, you know,

00:41:58 will probably walk without ever watching

00:42:01 any other members of the species.

00:42:03 It seems like a scary thing to have to do

00:42:06 because you suck at bipedal walking at first.

00:42:09 It seems crawling is much safer, much more like,

00:42:13 why are you in a hurry?

00:42:15 Well, because you have this thing that drives you to do it,

00:42:18 you know, which is sort of part of the sort of

00:42:24 human development.

00:42:25 Is that understood actually what?

00:42:26 Not entirely, no.

00:42:28 What’s the reason you get on two feet?

00:42:29 It’s really hard.

00:42:30 Like most animals don’t get on two feet.

00:42:32 Well, they get on four feet.

00:42:33 You know, many mammals get on four feet.

00:42:35 Yeah, they do. Very quickly.

00:42:36 Some of them extremely quickly.

00:42:38 But I don’t, you know, like from the last time

00:42:41 I’ve interacted with a table,

00:42:42 that’s much more stable than a thing than two legs.

00:42:44 It’s just a really hard problem.

00:42:46 Yeah, I mean, birds have figured it out with two feet.

00:42:48 Well, technically we can go into ontology.

00:42:52 They have four, I guess they have two feet.

00:42:54 They have two feet.

00:42:55 Chickens.

00:42:56 You know, dinosaurs have two feet, many of them.

00:42:58 Allegedly.

00:43:01 I’m just now learning that T. rex was eating grass,

00:43:04 not other animals.

00:43:05 T. rex might’ve been a friendly pet.

00:43:08 What do you think about,

00:43:10 I don’t know if you looked at the test

00:43:13 for general intelligence that François Chollet put together.

00:43:16 I don’t know if you got a chance to look

00:43:18 at that kind of thing.

00:43:19 What’s your intuition about how to solve

00:43:21 like an IQ type of test?

00:43:23 I don’t know.

00:43:24 I think it’s so outside of my radar screen

00:43:26 that it’s not really relevant, I think, in the short term.

00:43:30 Well, I guess one way to ask,

00:43:33 another way, perhaps more closer to what do you work is like,

00:43:37 how do you solve MNIST with very little example data?

00:43:42 That’s right.

00:43:43 And that’s the answer to this probably

00:43:44 is self supervised learning.

00:43:45 Just learn to represent images

00:43:47 and then learning to recognize handwritten digits

00:43:51 on top of this will only require a few samples.

00:43:53 And we observe this in humans, right?

00:43:55 You show a young child a picture book

00:43:58 with a couple of pictures of an elephant and that’s it.

00:44:01 The child knows what an elephant is.

00:44:03 And we see this today with practical systems

00:44:06 that we train image recognition systems

00:44:09 with enormous amounts of images,

00:44:13 either completely self supervised

00:44:15 or very weakly supervised.

00:44:16 For example, you can train a neural net

00:44:20 to predict whatever hashtag people type on Instagram, right?

00:44:24 Then you can do this with billions of images

00:44:25 because there’s billions per day that are showing up.

00:44:28 So the amount of training data there

00:44:30 is essentially unlimited.

00:44:32 And then you take the output representation,

00:44:35 a couple of layers down from the outputs

00:44:37 of what the system learned and feed this as input

00:44:40 to a classifier for any object in the world that you want

00:44:43 and it works pretty well.

00:44:44 So that’s transfer learning, okay?

00:44:47 Or weakly supervised transfer learning.

00:44:51 People are making very, very fast progress

00:44:53 using self supervised learning

00:44:55 for this kind of scenario as well.

00:44:58 And my guess is that that’s gonna be the future.

00:45:02 For self supervised learning,

00:45:03 how much cleaning do you think is needed

00:45:06 for filtering malicious signal or what’s a better term?

00:45:11 But like a lot of people use hashtags on Instagram

00:45:16 to get like good SEO that doesn’t fully represent

00:45:21 the contents of the image.

00:45:23 Like they’ll put a picture of a cat

00:45:24 and hashtag it with like science, awesome, fun.

00:45:28 I don’t know all kinds, why would you put science?

00:45:31 That’s not very good SEO.

00:45:33 The way my colleagues who worked on this project

00:45:34 at Facebook, now Meta AI, a few years ago dealt with this

00:45:39 is that they only selected something like 17,000 tags

00:45:43 that correspond to kind of physical things or situations,

00:45:48 like that has some visual content.

00:45:52 So you wouldn’t have like hash TBT or anything like that.

00:45:57 Oh, so they keep a very select set of hashtags

00:46:00 is what you’re saying?

00:46:01 Yeah.

00:46:02 Okay.

00:46:03 But it’s still in the order of 10 to 20,000.

00:46:06 So it’s fairly large.

00:46:07 Okay.

00:46:09 Can you tell me about data augmentation?

00:46:11 What the heck is data augmentation and how is it used

00:46:14 maybe contrast of learning for video?

00:46:19 What are some cool ideas here?

00:46:20 Right, so data augmentation.

00:46:22 I mean, first data augmentation is the idea

00:46:24 of artificially increasing the size of your training set

00:46:26 by distorting the images that you have

00:46:30 in ways that don’t change the nature of the image, right?

00:46:32 So you do MNIST, you can do data augmentation on MNIST

00:46:35 and people have done this since the 1990s, right?

00:46:37 You take a MNIST digit and you shift it a little bit

00:46:40 or you change the size or rotate it, skew it,

00:46:45 you know, et cetera.

00:46:47 Add noise.

00:46:48 Add noise, et cetera.

00:46:49 And it works better if you train a supervised classifier

00:46:52 with augmented data, you’re gonna get better results.

00:46:55 Now it’s become really interesting

00:46:58 over the last couple of years

00:47:00 because a lot of self supervised learning techniques

00:47:04 to pre train vision systems are based on data augmentation.

00:47:07 And the basic techniques is originally inspired

00:47:12 by techniques that I worked on in the early 90s

00:47:15 and Jeff Hinton worked on also in the early 90s.

00:47:17 They were sort of parallel work.

00:47:20 I used to call this Siamese network.

00:47:21 So basically you take two identical copies

00:47:24 of the same network, they share the same weights

00:47:27 and you show two different views of the same object.

00:47:31 Either those two different views may have been obtained

00:47:33 by data augmentation

00:47:35 or maybe it’s two different views of the same scene

00:47:37 from a camera that you moved or at different times

00:47:40 or something like that, right?

00:47:41 Or two pictures of the same person, things like that.

00:47:44 And then you train this neural net,

00:47:46 those two identical copies of this neural net

00:47:48 to produce an output representation, a vector

00:47:52 in such a way that the representation for those two images

00:47:56 are as close to each other as possible,

00:47:58 as identical to each other as possible, right?

00:48:00 Because you want the system

00:48:02 to basically learn a function that will be invariant,

00:48:06 that will not change, whose output will not change

00:48:08 when you transform those inputs in those particular ways,

00:48:12 right?

00:48:14 So that’s easy to do.

00:48:15 What’s complicated is how do you make sure

00:48:17 that when you show two images that are different,

00:48:19 the system will produce different things?

00:48:21 Because if you don’t have a specific provision for this,

00:48:26 the system will just ignore the inputs when you train it,

00:48:29 it will end up ignoring the input

00:48:30 and just produce a constant vector

00:48:31 that is the same for every input, right?

00:48:33 That’s called a collapse.

00:48:35 Now, how do you avoid collapse?

00:48:36 So there’s two ideas.

00:48:38 One idea that I proposed in the early 90s

00:48:41 with my colleagues at Bell Labs,

00:48:43 Jane Barmley and a couple other people,

00:48:46 which we now call contrastive learning,

00:48:48 which is to have negative examples, right?

00:48:50 So you have pairs of images that you know are different

00:48:54 and you show them to the network and those two copies,

00:48:57 and then you push the two output vectors away

00:48:59 from each other and it will eventually guarantee

00:49:02 that things that are semantically similar

00:49:04 produce similar representations

00:49:06 and things that are different

00:49:07 produce different representations.

00:49:10 We actually came up with this idea

00:49:11 for a project of doing signature verification.

00:49:14 So we would collect signatures from,

00:49:18 like multiple signatures on the same person

00:49:20 and then train a neural net to produce the same representation

00:49:23 and then force the system to produce different

00:49:27 representation for different signatures.

00:49:31 This was actually, the problem was proposed by people

00:49:33 from what was a subsidiary of AT&T at the time called NCR.

00:49:38 And they were interested in storing

00:49:40 representation of the signature on the 80 bytes

00:49:43 of the magnetic strip of a credit card.

00:49:46 So we came up with this idea of having a neural net

00:49:48 with 80 outputs that we would quantize on bytes

00:49:52 so that we could encode the signature.

00:49:53 And that encoding was then used to compare

00:49:55 whether the signature matches or not.

00:49:57 That’s right.

00:49:57 So then you would sign, you would run through the neural net

00:50:00 and then you would compare the output vector

00:50:02 to whatever is stored on your card.

00:50:03 Did it actually work?

00:50:04 It worked, but they ended up not using it.

00:50:08 Because nobody cares actually.

00:50:10 I mean, the American financial payment system

00:50:13 is incredibly lax in that respect compared to Europe.

00:50:17 Oh, with the signatures?

00:50:18 What’s the purpose of signatures anyway?

00:50:20 This is very different.

00:50:21 Nobody looks at them, nobody cares.

00:50:23 It’s, yeah.

00:50:24 Yeah, no, so that’s contrastive learning, right?

00:50:27 So you need positive and negative pairs.

00:50:29 And the problem with that is that,

00:50:31 even though I had the original paper on this,

00:50:34 I’m actually not very positive about it

00:50:36 because it doesn’t work in high dimension.

00:50:38 If your representation is high dimensional,

00:50:41 there’s just too many ways for two things to be different.

00:50:44 And so you would need lots and lots

00:50:45 and lots of negative pairs.

00:50:48 So there is a particular implementation of this,

00:50:50 which is relatively recent from actually

00:50:52 the Google Toronto group where, you know,

00:50:56 Jeff Hinton is the senior member there.

00:50:58 It’s called SIMCLR, S I M C L R.

00:51:02 And it, you know, basically a particular way

00:51:03 of implementing this idea of contrastive learning,

00:51:06 the particular objective function.

00:51:08 Now, what I’m much more enthusiastic about these days

00:51:13 is non contrastive methods.

00:51:14 So other ways to guarantee that the representations

00:51:19 would be different for different inputs.

00:51:24 And it’s actually based on an idea that Jeff Hinton

00:51:28 proposed in the early nineties with his student

00:51:30 at the time, Sue Becker.

00:51:31 And it’s based on the idea of maximizing

00:51:33 the mutual information between the outputs

00:51:35 of the two systems.

00:51:36 You only show positive pairs.

00:51:37 You only show pairs of images that you know

00:51:39 are somewhat similar.

00:51:41 And you train the two networks to be informative,

00:51:44 but also to be as informative of each other as possible.

00:51:48 So basically one representation has to be predictable

00:51:51 from the other, essentially.

00:51:54 And, you know, he proposed that idea,

00:51:56 had, you know, a couple of papers in the early nineties,

00:51:59 and then nothing was done about it for decades.

00:52:02 And I kind of revived this idea together

00:52:04 with my postdocs at FAIR,

00:52:07 particularly a postdoc called Stefan Denis,

00:52:08 who is now a junior professor in Finland

00:52:11 at University of Aalto.

00:52:13 We came up with something that we call Barlow Twins.

00:52:18 And it’s a particular way of maximizing

00:52:20 the information content of a vector,

00:52:24 you know, using some hypotheses.

00:52:27 And we have kind of another version of it

00:52:30 that’s more recent now called VICREG, V I C A R E G.

00:52:33 That means Variance, Invariance, Covariance,

00:52:35 Regularization.

00:52:36 And it’s the thing I’m the most excited about

00:52:38 in machine learning in the last 15 years.

00:52:40 I mean, I’m not, I’m really, really excited about this.

00:52:43 What kind of data augmentation is useful

00:52:46 for that noncontrastive learning method?

00:52:49 Are we talking about, does that not matter that much?

00:52:51 Or it seems like a very important part of the step.

00:52:55 Yeah.

00:52:55 How you generate the images that are similar,

00:52:57 but sufficiently different.

00:52:58 Yeah, that’s right.

00:52:59 It’s an important step and it’s also an annoying step

00:53:01 because you need to have that knowledge

00:53:02 of what data augmentation you can do

00:53:05 that do not change the nature of the object.

00:53:09 And so the standard scenario,

00:53:12 which a lot of people working in this area are using

00:53:14 is you use the type of distortion.

00:53:18 So basically you do a geometric distortion.

00:53:21 So one basically just shifts the image a little bit,

00:53:23 it’s called cropping.

00:53:24 Another one kind of changes the scale a little bit.

00:53:26 Another one kind of rotates it.

00:53:28 Another one changes the colors.

00:53:30 You can do a shift in color balance

00:53:32 or something like that, saturation.

00:53:34 Another one sort of blurs it.

00:53:36 Another one adds noise.

00:53:37 So you have like a catalog of kind of standard things

00:53:40 and people try to use the same ones

00:53:42 for different algorithms so that they can compare.

00:53:44 But some algorithms, some self supervised algorithm

00:53:47 actually can deal with much bigger,

00:53:49 like more aggressive data augmentation and some don’t.

00:53:52 So that kind of makes the whole thing difficult.

00:53:55 But that’s the kind of distortions we’re talking about.

00:53:57 And so you train with those distortions

00:54:02 and then you chop off the last layer, a couple layers

00:54:07 of the network and you use the representation

00:54:11 as input to a classifier.

00:54:12 You train the classifier on ImageNet, let’s say,

00:54:16 or whatever, and measure the performance.

00:54:19 And interestingly enough, the methods that are really good

00:54:23 at eliminating the information that is irrelevant,

00:54:25 which is the distortions between those images,

00:54:29 do a good job at eliminating it.

00:54:31 And as a consequence, you cannot use the representations

00:54:36 in those systems for things like object detection

00:54:39 and localization because that information is gone.

00:54:41 So the type of data augmentation you need to do

00:54:44 depends on the tasks you want eventually the system

00:54:47 to solve and the type of data augmentation,

00:54:50 standard data augmentation that we use today

00:54:52 are only appropriate for object recognition

00:54:54 or image classification.

00:54:56 They’re not appropriate for things like.

00:54:57 Can you help me out understand what wide localizations?

00:55:00 So you’re saying it’s just not good at the negative,

00:55:03 like at classifying the negative,

00:55:05 so that’s why it can’t be used for the localization?

00:55:07 No, it’s just that you train the system,

00:55:10 you give it an image and then you give it the same image

00:55:13 shifted and scaled and you tell it that’s the same image.

00:55:17 So the system basically is trained

00:55:19 to eliminate the information about position and size.

00:55:22 So now you want to use that to figure out

00:55:26 where an object is and what size it is.

00:55:27 Like a bounding box, like they’d be able to actually.

00:55:30 Okay, it can still find the object in the image,

00:55:34 it’s just not very good at finding

00:55:35 the exact boundaries of that object, interesting.

00:55:38 Interesting, which that’s an interesting

00:55:42 sort of philosophical question,

00:55:43 how important is object localization anyway?

00:55:46 We’re like obsessed by measuring image segmentation,

00:55:51 obsessed by measuring perfectly knowing

00:55:53 the boundaries of objects when arguably

00:55:56 that’s not that essential to understanding

00:56:01 what are the contents of the scene.

00:56:03 On the other hand, I think evolutionarily,

00:56:05 the first vision systems in animals

00:56:08 were basically all about localization,

00:56:10 very little about recognition.

00:56:12 And in the human brain, you have two separate pathways

00:56:15 for recognizing the nature of a scene or an object

00:56:20 and localizing objects.

00:56:22 So you use the first pathway called eventual pathway

00:56:25 for telling what you’re looking at.

00:56:29 The other pathway, the dorsal pathway,

00:56:30 is used for navigation, for grasping, for everything else.

00:56:34 And basically a lot of the things you need for survival

00:56:36 are localization and detection.

00:56:41 Is similarity learning or contrastive learning,

00:56:45 are these non contrastive methods

00:56:46 the same as understanding something?

00:56:48 Just because you know a distorted cat

00:56:50 is the same as a non distorted cat,

00:56:52 does that mean you understand what it means to be a cat?

00:56:56 To some extent.

00:56:57 I mean, it’s a superficial understanding, obviously.

00:57:00 But what is the ceiling of this method, do you think?

00:57:02 Is this just one trick on the path

00:57:05 to doing self supervised learning?

00:57:07 Can we go really, really far?

00:57:10 I think we can go really far.

00:57:11 So if we figure out how to use techniques of that type,

00:57:16 perhaps very different, but the same nature,

00:57:19 to train a system from video to do video prediction,

00:57:23 essentially, I think we’ll have a path towards,

00:57:30 I wouldn’t say unlimited, but a path towards some level

00:57:33 of physical common sense in machines.

00:57:38 And I also think that that ability to learn

00:57:44 how the world works from a sort of high throughput channel

00:57:47 like vision is a necessary step towards

00:57:53 sort of real artificial intelligence.

00:57:55 In other words, I believe in grounded intelligence.

00:57:58 I don’t think we can train a machine

00:57:59 to be intelligent purely from text.

00:58:02 Because I think the amount of information about the world

00:58:04 that’s contained in text is tiny compared

00:58:07 to what we need to know.

00:58:11 So for example, and people have attempted to do this

00:58:15 for 30 years, the psych project and things like that,

00:58:18 basically kind of writing down all the facts that are known

00:58:21 and hoping that some sort of common sense will emerge.

00:58:25 I think it’s basically hopeless.

00:58:27 But let me take an example.

00:58:28 You take an object, I describe a situation to you.

00:58:31 I take an object, I put it on the table

00:58:33 and I push the table.

00:58:34 It’s completely obvious to you that the object

00:58:37 will be pushed with the table,

00:58:39 because it’s sitting on it.

00:58:41 There’s no text in the world, I believe, that explains this.

00:58:45 And so if you train a machine as powerful as it could be,

00:58:49 your GPT 5000 or whatever it is,

00:58:53 it’s never gonna learn about this.

00:58:57 That information is just not present in any text.

00:59:01 Well, the question, like with the psych project,

00:59:03 the dream I think is to have like 10 million,

00:59:08 say facts like that, that give you a headstart,

00:59:13 like a parent guiding you.

00:59:15 Now, we humans don’t need a parent to tell us

00:59:17 that the table will move, sorry,

00:59:19 the smartphone will move with the table.

00:59:21 But we get a lot of guidance in other ways.

00:59:25 So it’s possible that we can give it a quick shortcut.

00:59:28 What about a cat?

00:59:29 The cat knows that.

00:59:30 No, but they evolved, so.

00:59:33 No, they learn like us.

00:59:35 Sorry, the physics of stuff?

00:59:37 Yeah.

00:59:38 Well, yeah, so you’re saying it’s,

00:59:41 so you’re putting a lot of intelligence

00:59:45 onto the nurture side, not the nature.

00:59:47 Yes.

00:59:47 We seem to have, you know,

00:59:50 there’s a very inefficient arguably process of evolution

00:59:53 that got us from bacteria to who we are today.

00:59:57 Started at the bottom, now we’re here.

00:59:59 So the question is how, okay,

01:00:04 the question is how fundamental is that,

01:00:06 the nature of the whole hardware?

01:00:08 And then is there any way to shortcut it

01:00:11 if it’s fundamental?

01:00:12 If it’s not, if it’s most of intelligence,

01:00:14 most of the cool stuff we’ve been talking about

01:00:15 is mostly nurture, mostly trained.

01:00:18 We figure it out by observing the world.

01:00:20 We can form that big, beautiful, sexy background model

01:00:24 that you’re talking about just by sitting there.

01:00:28 Then, okay, then you need to, then like maybe,

01:00:34 it is all supervised learning all the way down.

01:00:37 Self supervised learning, say.

01:00:39 Whatever it is that makes, you know,

01:00:41 human intelligence different from other animals,

01:00:44 which, you know, a lot of people think is language

01:00:46 and logical reasoning and this kind of stuff.

01:00:48 It cannot be that complicated because it only popped up

01:00:51 in the last million years.

01:00:52 Yeah.

01:00:54 And, you know, it only involves, you know,

01:00:57 less than 1% of our genome might be,

01:00:59 which is the difference between human genome

01:01:01 and chimps or whatever.

01:01:03 So it can’t be that complicated.

01:01:06 You know, it can’t be that fundamental.

01:01:08 I mean, most of the complicated stuff

01:01:10 already exists in cats and dogs and, you know,

01:01:13 certainly primates, nonhuman primates.

01:01:17 Yeah, that little thing with humans

01:01:18 might be just something about social interaction

01:01:22 and ability to maintain ideas

01:01:24 across like a collective of people.

01:01:28 It sounds very dramatic and very impressive,

01:01:30 but it probably isn’t mechanistically speaking.

01:01:33 It is, but we’re not there yet.

01:01:34 Like, you know, we have, I mean, this is number 634,

01:01:39 you know, in the list of problems we have to solve.

01:01:43 So basic physics of the world is number one.

01:01:46 What do you, just a quick tangent on data augmentation.

01:01:51 So a lot of it is hard coded versus learned.

01:01:57 Do you have any intuition that maybe

01:02:00 there could be some weird data augmentation,

01:02:03 like generative type of data augmentation,

01:02:06 like doing something weird to images,

01:02:07 which then improves the similarity learning process?

01:02:13 So not just kind of dumb, simple distortions,

01:02:16 but by you shaking your head,

01:02:18 just saying that even simple distortions are enough.

01:02:20 I think, no, I think data augmentation

01:02:22 is a temporary necessary evil.

01:02:26 So what people are working on now is two things.

01:02:28 One is the type of self supervised learning,

01:02:32 like trying to translate the type of self supervised learning

01:02:35 people use in language, translating these two images,

01:02:38 which is basically a denoising autoencoder method, right?

01:02:41 So you take an image, you block, you mask some parts of it,

01:02:47 and then you train some giant neural net

01:02:49 to reconstruct the parts that are missing.

01:02:52 And until very recently,

01:02:56 there was no working methods for that.

01:02:59 All the autoencoder type methods for images

01:03:01 weren’t producing very good representation,

01:03:03 but there’s a paper now coming out of the fair group

01:03:06 at MNL Park that actually works very well.

01:03:08 So that doesn’t require data augmentation,

01:03:12 that requires only masking, okay.

01:03:15 Only masking for images, okay.

01:03:18 Right, so you mask part of the image

01:03:20 and you train a system, which in this case is a transformer

01:03:24 because the transformer represents the image

01:03:28 as non overlapping patches,

01:03:30 so it’s easy to mask patches and things like that.

01:03:33 Okay, but then my question transfers to that problem,

01:03:35 the masking, like why should the mask be square or rectangle?

01:03:40 So it doesn’t matter, like, you know,

01:03:41 I think we’re gonna come up probably in the future

01:03:44 with sort of ways to mask that are kind of random,

01:03:50 essentially, I mean, they are random already, but.

01:03:52 No, no, but like something that’s challenging,

01:03:56 like optimally challenging.

01:03:59 So like, I mean, maybe it’s a metaphor that doesn’t apply,

01:04:02 but you’re, it seems like there’s a data augmentation

01:04:06 or masking, there’s an interactive element with it.

01:04:09 Like you’re almost like playing with an image.

01:04:12 And like, it’s like the way we play with an image

01:04:14 in our minds.

01:04:15 No, but it’s like dropout.

01:04:16 It’s like Boston machine training.

01:04:18 You, you know, every time you see a percept,

01:04:23 you also, you can perturb it in some way.

01:04:26 And then the principle of the training procedure

01:04:31 is to minimize the difference of the output

01:04:33 or the representation between the clean version

01:04:36 and the corrupted version, essentially, right?

01:04:40 And you can do this in real time, right?

01:04:42 So, you know, Boston machine work like this, right?

01:04:44 You show a percept, you tell the machine

01:04:47 that’s a good combination of activities

01:04:49 or your input neurons.

01:04:50 And then you either let them go their merry way

01:04:56 without clamping them to values,

01:04:58 or you only do this with a subset.

01:05:01 And what you’re doing is you’re training the system

01:05:03 so that the stable state of the entire network

01:05:07 is the same regardless of whether it sees

01:05:08 the entire input or whether it sees only part of it.

01:05:12 You know, denoising autoencoder method

01:05:14 is basically the same thing, right?

01:05:15 You’re training a system to reproduce the input,

01:05:18 the complete inputs and filling the input

01:05:20 and filling the blanks, regardless of which parts

01:05:23 are missing, and that’s really the underlying principle.

01:05:26 And you could imagine sort of, even in the brain,

01:05:28 some sort of neural principle where, you know,

01:05:30 neurons kind of oscillate, right?

01:05:32 So they take their activity and then temporarily

01:05:35 they kind of shut off to, you know,

01:05:38 force the rest of the system to basically reconstruct

01:05:42 the input without their help, you know?

01:05:44 And, I mean, you could imagine, you know,

01:05:49 more or less biologically possible processes.

01:05:51 Something like that.

01:05:51 And I guess with this denoising autoencoder

01:05:54 and masking and data augmentation,

01:05:58 you don’t have to worry about being super efficient.

01:06:01 You could just do as much as you want

01:06:03 and get better over time.

01:06:06 Because I was thinking, like, you might want to be clever

01:06:08 about the way you do all these procedures, you know,

01:06:12 but that’s only, it’s somehow costly to do every iteration,

01:06:16 but it’s not really.

01:06:17 Not really.

01:06:19 Maybe.

01:06:20 And then there is, you know,

01:06:21 data augmentation without explicit data augmentation.

01:06:24 This data augmentation by weighting,

01:06:25 which is, you know, the sort of video prediction.

01:06:29 You’re observing a video clip,

01:06:31 observing the, you know, the continuation of that video clip.

01:06:36 You try to learn a representation

01:06:38 using dual joint embedding architectures

01:06:40 in such a way that the representation of the future clip

01:06:43 is easily predictable from the representation

01:06:45 of the observed clip.

01:06:48 Do you think YouTube has enough raw data

01:06:52 from which to learn how to be a cat?

01:06:56 I think so.

01:06:57 So the amount of data is not the constraint.

01:07:01 No, it would require some selection, I think.

01:07:04 Some selection?

01:07:05 Some selection of, you know, maybe the right type of data.

01:07:08 You need some.

01:07:09 Don’t go down the rabbit hole of just cat videos.

01:07:11 You might need to watch some lectures or something.

01:07:14 No, you wouldn’t.

01:07:15 How meta would that be

01:07:17 if it like watches lectures about intelligence

01:07:21 and then learns,

01:07:22 watches your lectures in NYU

01:07:24 and learns from that how to be intelligent?

01:07:26 I don’t think that would be enough.

01:07:30 What’s your, do you find multimodal learning interesting?

01:07:33 We’ve been talking about visual language,

01:07:35 like combining those together,

01:07:36 maybe audio, all those kinds of things.

01:07:38 There’s a lot of things that I find interesting

01:07:40 in the short term,

01:07:41 but are not addressing the important problem

01:07:44 that I think are really kind of the big challenges.

01:07:46 So I think, you know, things like multitask learning,

01:07:48 continual learning, you know, adversarial issues.

01:07:54 I mean, those have great practical interests

01:07:57 in the relatively short term, possibly,

01:08:00 but I don’t think they’re fundamental.

01:08:01 You know, active learning,

01:08:02 even to some extent, reinforcement learning.

01:08:04 I think those things will become either obsolete

01:08:07 or useless or easy

01:08:10 once we figured out how to do self supervised

01:08:14 representation learning

01:08:15 or learning predictive world models.

01:08:19 And so I think that’s what, you know,

01:08:21 the entire community should be focusing on.

01:08:24 At least people who are interested

01:08:25 in sort of fundamental questions

01:08:26 or, you know, really kind of pushing the envelope

01:08:28 of AI towards the next stage.

01:08:31 But of course, there’s like a huge amount of,

01:08:33 you know, very interesting work to do

01:08:34 in sort of practical questions

01:08:35 that have, you know, short term impact.

01:08:38 Well, you know, it’s difficult to talk about

01:08:41 the temporal scale,

01:08:42 because all of human civilization

01:08:44 will eventually be destroyed

01:08:45 because the sun will die out.

01:08:48 And even if Elon Musk is successful

01:08:50 in multi planetary colonization across the galaxy,

01:08:54 eventually the entirety of it

01:08:56 will just become giant black holes.

01:08:58 And that’s gonna take a while though.

01:09:02 So, but what I’m saying is then that logic

01:09:04 can be used to say it’s all meaningless.

01:09:07 I’m saying all that to say that multitask learning

01:09:11 might be, you’re calling it practical

01:09:15 or pragmatic or whatever.

01:09:17 That might be the thing that achieves something

01:09:19 very akin to intelligence

01:09:22 while we’re trying to solve the more general problem

01:09:26 of self supervised learning of background knowledge.

01:09:29 So the reason I bring that up,

01:09:30 maybe one way to ask that question.

01:09:33 I’ve been very impressed

01:09:34 by what Tesla Autopilot team is doing.

01:09:36 I don’t know if you’ve gotten a chance to glance

01:09:38 at this particular one example of multitask learning,

01:09:42 where they’re literally taking the problem,

01:09:44 like, I don’t know, Charles Darwin studying animals.

01:09:48 They’re studying the problem of driving

01:09:51 and asking, okay, what are all the things

01:09:53 you have to perceive?

01:09:55 And the way they’re solving it is one,

01:09:57 there’s an ontology where you’re bringing that to the table.

01:10:00 So you’re formulating a bunch of different tasks.

01:10:02 It’s like over a hundred tasks or something like that

01:10:04 that they’re involved in driving.

01:10:06 And then they’re deploying it

01:10:07 and then getting data back from people that run into trouble

01:10:10 and they’re trying to figure out, do we add tasks?

01:10:12 Do we, like, we focus on each individual task separately?

01:10:16 In fact, I would say,

01:10:18 I would classify Andre Karpathy’s talk in two ways.

01:10:20 So one was about doors

01:10:22 and the other one about how much ImageNet sucks.

01:10:24 He kept going back and forth on those two topics,

01:10:28 which ImageNet sucks,

01:10:30 meaning you can’t just use a single benchmark.

01:10:33 There’s so, like, you have to have like a giant suite

01:10:37 of benchmarks to understand how well your system actually works.

01:10:39 Oh, I agree with him.

01:10:40 I mean, he’s a very sensible guy.

01:10:43 Now, okay, it’s very clear that if you’re faced

01:10:47 with an engineering problem that you need to solve

01:10:50 in a relatively short time,

01:10:51 particularly if you have Elon Musk breathing down your neck,

01:10:55 you’re going to have to take shortcuts, right?

01:10:58 You might think about the fact that the right thing to do

01:11:02 and the longterm solution involves, you know,

01:11:04 some fancy self supervised running,

01:11:06 but you have, you know, Elon Musk breathing down your neck

01:11:10 and, you know, this involves, you know, human lives.

01:11:13 And so you have to basically just do

01:11:17 the systematic engineering and, you know,

01:11:22 fine tuning and refinements

01:11:23 and trial and error and all that stuff.

01:11:26 There’s nothing wrong with that.

01:11:27 That’s called engineering.

01:11:28 That’s called, you know, putting technology out in the world.

01:11:35 And you have to kind of ironclad it before you do this,

01:11:39 you know, so much for, you know,

01:11:44 grand ideas and principles.

01:11:48 But, you know, I’m placing myself sort of, you know,

01:11:50 some, you know, upstream of this, you know,

01:11:54 quite a bit upstream of this.

01:11:55 You’re a Plato, think about platonic forms.

01:11:58 You’re not platonic because eventually

01:12:01 I want that stuff to get used,

01:12:03 but it’s okay if it takes five or 10 years

01:12:06 for the community to realize this is the right thing to do.

01:12:09 I’ve done this before.

01:12:11 It’s been the case before that, you know,

01:12:13 I’ve made that case.

01:12:14 I mean, if you look back in the mid 2000, for example,

01:12:17 and you ask yourself the question, okay,

01:12:19 I want to recognize cars or faces or whatever,

01:12:24 you know, I can use convolutional net.

01:12:25 So I can use sort of more conventional

01:12:28 kind of computer vision techniques, you know,

01:12:29 using interest point detectors or assist density features

01:12:33 and, you know, sticking an SVM on top.

01:12:35 At that time, the datasets were so small

01:12:37 that those methods that use more hand engineering

01:12:41 worked better than ConvNets.

01:12:43 It was just not enough data for ConvNets

01:12:45 and ConvNets were a little slow with the kind of hardware

01:12:48 that was available at the time.

01:12:50 And there was a sea change when, basically,

01:12:53 when, you know, datasets became bigger

01:12:56 and GPUs became available.

01:12:58 That’s what, you know, two of the main factors

01:13:02 that basically made people change their mind.

01:13:07 And you can look at the history of,

01:13:11 like, all sub branches of AI or pattern recognition.

01:13:16 And there’s a similar trajectory followed by techniques

01:13:19 where people start by, you know, engineering the hell out of it.

01:13:25 You know, be it optical character recognition,

01:13:29 speech recognition, computer vision,

01:13:31 like image recognition in general,

01:13:34 natural language understanding, like, you know, translation,

01:13:37 things like that, right?

01:13:38 You start to engineer the hell out of it.

01:13:41 You start to acquire all the knowledge,

01:13:42 the prior knowledge you know about image formation,

01:13:44 about, you know, the shape of characters,

01:13:46 about, you know, morphological operations,

01:13:49 about, like, feature extraction, Fourier transforms,

01:13:52 you know, vernicke moments, you know, whatever, right?

01:13:54 People have come up with thousands of ways

01:13:56 of representing images

01:13:57 so that they could be easily classified afterwards.

01:14:01 Same for speech recognition, right?

01:14:03 There is, you know, it took decades

01:14:04 for people to figure out a good front end

01:14:06 to preprocess speech signals

01:14:09 so that, you know, all the information

01:14:11 about what is being said is preserved,

01:14:13 but most of the information

01:14:14 about the identity of the speaker is gone.

01:14:16 You know, kestrel coefficients or whatever, right?

01:14:20 And same for text, right?

01:14:23 You do named entity recognition and you parse

01:14:26 and you do tagging of the parts of speech

01:14:31 and, you know, you do this sort of tree representation

01:14:34 of clauses and all that stuff, right?

01:14:36 Before you can do anything.

01:14:40 So that’s how it starts, right?

01:14:43 Just engineer the hell out of it.

01:14:45 And then you start having data

01:14:47 and maybe you have more powerful computers.

01:14:50 Maybe you know something about statistical learning.

01:14:52 So you start using machine learning

01:14:53 and it’s usually a small sliver

01:14:54 on top of your kind of handcrafted system

01:14:56 where, you know, you extract features by hand.

01:14:59 Okay, and now, you know, nowadays the standard way

01:15:02 of doing this is that you train the entire thing end to end

01:15:04 with a deep learning system and it learns its own features

01:15:06 and, you know, speech recognition systems nowadays

01:15:10 or CR systems are completely end to end.

01:15:12 It’s, you know, it’s some giant neural net

01:15:15 that takes raw waveforms

01:15:17 and produces a sequence of characters coming out.

01:15:20 And it’s just a huge neural net, right?

01:15:22 There’s no, you know, Markov model,

01:15:24 there’s no language model that is explicit

01:15:26 other than, you know, something that’s ingrained

01:15:28 in the sort of neural language model, if you want.

01:15:30 Same for translation, same for all kinds of stuff.

01:15:33 So you see this continuous evolution

01:15:36 from, you know, less and less hand crafting

01:15:40 and more and more learning.

01:15:43 And I think, I mean, it’s true in biology as well.

01:15:50 So, I mean, we might disagree about this,

01:15:52 maybe not, this one little piece at the end,

01:15:56 you mentioned active learning.

01:15:58 It feels like active learning,

01:16:01 which is the selection of data

01:16:02 and also the interactivity needs to be part

01:16:05 of this giant neural network.

01:16:06 You cannot just be an observer

01:16:08 to do self supervised learning.

01:16:09 You have to, well, I don’t,

01:16:12 self supervised learning is just a word,

01:16:14 but I would, whatever this giant stack

01:16:16 of a neural network that’s automatically learning,

01:16:19 it feels, my intuition is that you have to have a system,

01:16:26 whether it’s a physical robot or a digital robot,

01:16:30 that’s interacting with the world

01:16:32 and doing so in a flawed way and improving over time

01:16:35 in order to form the self supervised learning.

01:16:41 Well, you can’t just give it a giant sea of data.

01:16:44 Okay, I agree and I disagree.

01:16:47 I agree in the sense that I think, I agree in two ways.

01:16:52 The first way I agree is that if you want,

01:16:55 and you certainly need a causal model of the world

01:16:57 that allows you to predict the consequences

01:16:59 of your actions, to train that model,

01:17:01 you need to take actions, right?

01:17:02 You need to be able to act in a world

01:17:04 and see the effect for you to be,

01:17:07 to learn causal models of the world.

01:17:08 So that’s not obvious because you can observe others.

01:17:11 You can observe others.

01:17:12 And you can infer that they’re similar to you

01:17:14 and then you can learn from that.

01:17:16 Yeah, but then you have to kind of hardwire that part,

01:17:18 right, and then, you know, mirror neurons

01:17:19 and all that stuff, right?

01:17:20 So, and it’s not clear to me

01:17:23 how you would do this in a machine.

01:17:24 So I think the action part would be necessary

01:17:30 for having causal models of the world.

01:17:32 The second reason it may be necessary,

01:17:36 or at least more efficient,

01:17:37 is that active learning basically, you know,

01:17:41 goes for the jugular of what you don’t know, right?

01:17:44 Is, you know, obvious areas of uncertainty

01:17:48 about your world and about how the world behaves.

01:17:52 And you can resolve this uncertainty

01:17:56 by systematic exploration of that part

01:17:58 that you don’t know.

01:18:00 And if you know that you don’t know,

01:18:01 then, you know, it makes you curious.

01:18:03 You kind of look into situations that,

01:18:05 and, you know, across the animal world,

01:18:09 different species have different levels of curiosity,

01:18:12 right, depending on how they’re built, right?

01:18:15 So, you know, cats and rats are incredibly curious,

01:18:18 dogs not so much, I mean, less.

01:18:20 Yeah, so it could be useful

01:18:22 to have that kind of curiosity.

01:18:23 So it’d be useful,

01:18:24 but curiosity just makes the process faster.

01:18:26 It doesn’t make the process exist.

01:18:28 The, so what process, what learning process is it

01:18:33 that active learning makes more efficient?

01:18:37 And I’m asking that first question, you know,

01:18:42 you know, we haven’t answered that question yet.

01:18:43 So, you know, I worry about active learning

01:18:45 once this question is…

01:18:47 So it’s the more fundamental question to ask.

01:18:49 And if active learning or interaction

01:18:53 increases the efficiency of the learning,

01:18:56 see, sometimes it becomes very different

01:18:59 if the increase is several orders of magnitude, right?

01:19:03 Like…

01:19:04 That’s true.

01:19:05 But fundamentally it’s still the same thing

01:19:07 and building up the intuition about how to,

01:19:10 in a self supervised way to construct background models,

01:19:13 efficient or inefficient, is the core problem.

01:19:18 What do you think about Yoshua Bengio’s

01:19:20 talking about consciousness

01:19:22 and all of these kinds of concepts?

01:19:24 Okay, I don’t know what consciousness is, but…

01:19:29 It’s a good opener.

01:19:31 And to some extent, a lot of the things

01:19:33 that are said about consciousness

01:19:34 remind me of the questions people were asking themselves

01:19:38 in the 18th century or 17th century

01:19:40 when they discovered that, you know, how the eye works

01:19:44 and the fact that the image at the back of the eye

01:19:46 was upside down, right?

01:19:49 Because you have a lens.

01:19:50 And so on your retina, the image that forms is an image

01:19:54 of the world, but it’s upside down.

01:19:55 How is it that you see right side up?

01:19:57 And, you know, with what we know today in science,

01:20:00 you know, we realize this question doesn’t make any sense

01:20:03 or is kind of ridiculous in some way, right?

01:20:05 So I think a lot of what is said about consciousness

01:20:07 is of that nature.

01:20:08 Now, that said, there is a lot of really smart people

01:20:10 that for whom I have a lot of respect

01:20:13 who are talking about this topic,

01:20:14 people like David Chalmers, who is a colleague of mine at NYU.

01:20:17 I have kind of an orthodox folk speculative hypothesis

01:20:28 about consciousness.

01:20:29 So we’re talking about the study of a world model.

01:20:32 And I think, you know, our entire prefrontal cortex

01:20:35 basically is the engine for a world model.

01:20:40 But when we are attending at a particular situation,

01:20:44 we’re focused on that situation.

01:20:46 We basically cannot attend to anything else.

01:20:48 And that seems to suggest that we basically have

01:20:53 only one world model engine in our prefrontal cortex.

01:20:59 That engine is configurable to the situation at hand.

01:21:02 So we are building a box out of wood,

01:21:04 or we are driving down the highway playing chess.

01:21:09 We basically have a single model of the world

01:21:12 that we configure into the situation at hand,

01:21:15 which is why we can only attend to one task at a time.

01:21:19 Now, if there is a task that we do repeatedly,

01:21:22 it goes from the sort of deliberate reasoning

01:21:25 using model of the world and prediction

01:21:27 and perhaps something like model predictive control,

01:21:29 which I was talking about earlier,

01:21:31 to something that is more subconscious

01:21:33 that becomes automatic.

01:21:34 So I don’t know if you’ve ever played

01:21:35 against a chess grandmaster.

01:21:39 I get wiped out in 10 plays, right?

01:21:43 And I have to think about my move for like 15 minutes.

01:21:50 And the person in front of me, the grandmaster,

01:21:52 would just react within seconds, right?

01:21:56 He doesn’t need to think about it.

01:21:58 That’s become part of the subconscious

01:21:59 because it’s basically just pattern recognition

01:22:02 at this point.

01:22:04 Same, the first few hours you drive a car,

01:22:07 you are really attentive, you can’t do anything else.

01:22:09 And then after 20, 30 hours of practice, 50 hours,

01:22:13 the subconscious, you can talk to the person next to you,

01:22:15 things like that, right?

01:22:17 Unless the situation becomes unpredictable

01:22:19 and then you have to stop talking.

01:22:21 So that suggests you only have one model in your head.

01:22:24 And it might suggest the idea that consciousness

01:22:27 basically is the module that configures

01:22:29 this world model of yours.

01:22:31 You need to have some sort of executive kind of overseer

01:22:36 that configures your world model for the situation at hand.

01:22:40 And that leads to kind of the really curious concept

01:22:43 that consciousness is not a consequence

01:22:46 of the power of our minds,

01:22:47 but of the limitation of our brains.

01:22:49 That because we have only one world model,

01:22:52 we have to be conscious.

01:22:53 If we had as many world models

01:22:55 as situations we encounter,

01:22:58 then we could do all of them simultaneously

01:23:00 and we wouldn’t need this sort of executive control

01:23:02 that we call consciousness.

01:23:04 Yeah, interesting.

01:23:05 And somehow maybe that executive controller,

01:23:08 I mean, the hard problem of consciousness,

01:23:10 there’s some kind of chemicals in biology

01:23:12 that’s creating a feeling,

01:23:15 like it feels to experience some of these things.

01:23:18 That’s kind of like the hard question is,

01:23:22 what the heck is that and why is that useful?

01:23:24 Maybe the more pragmatic question,

01:23:26 why is it useful to feel like this is really you

01:23:29 experiencing this versus just like information

01:23:33 being processed?

01:23:34 It could be just a very nice side effect

01:23:39 of the way we evolved.

01:23:41 That’s just very useful to feel a sense of ownership

01:23:48 to the decisions you make, to the perceptions you make,

01:23:51 to the model you’re trying to maintain.

01:23:53 Like you own this thing and this is the only one you got

01:23:56 and if you lose it, it’s gonna really suck.

01:23:58 And so you should really send the brain

01:24:00 some signals about it.

01:24:02 So what ideas do you believe might be true

01:24:06 that most or at least many people disagree with?

01:24:11 Let’s say in the space of machine learning.

01:24:13 Well, it depends who you talk about,

01:24:14 but I think, so certainly there is a bunch of people

01:24:20 who are nativists, right?

01:24:21 Who think that a lot of the basic things about the world

01:24:23 are kind of hardwired in our minds.

01:24:26 Things like the world is three dimensional, for example,

01:24:28 is that hardwired?

01:24:30 Things like object permanence,

01:24:32 is this something that we learn

01:24:35 before the age of three months or so?

01:24:37 Or are we born with it?

01:24:39 And there are very wide disagreements

01:24:42 among the cognitive scientists for this.

01:24:46 I think those things are actually very simple to learn.

01:24:50 Is it the case that the oriented edge detectors in V1

01:24:54 are learned or are they hardwired?

01:24:56 I think they are learned.

01:24:57 They might be learned before both

01:24:58 because it’s really easy to generate signals

01:25:00 from the retina that actually will train edge detectors.

01:25:04 And again, those are things that can be learned

01:25:06 within minutes of opening your eyes, right?

01:25:09 I mean, since the 1990s,

01:25:12 we have algorithms that can learn oriented edge detectors

01:25:15 completely unsupervised

01:25:16 with the equivalent of a few minutes of real time.

01:25:19 So those things have to be learned.

01:25:22 And there’s also those MIT experiments

01:25:24 where you kind of plug the optical nerve

01:25:27 on the auditory cortex of a baby ferret, right?

01:25:30 And that auditory cortex

01:25:31 becomes a visual cortex essentially.

01:25:33 So clearly there’s learning taking place there.

01:25:37 So I think a lot of what people think are so basic

01:25:41 that they need to be hardwired,

01:25:43 I think a lot of those things are learned

01:25:44 because they are easy to learn.

01:25:46 So you put a lot of value in the power of learning.

01:25:49 What kind of things do you suspect might not be learned?

01:25:53 Is there something that could not be learned?

01:25:56 So your intrinsic drives are not learned.

01:25:59 There are the things that make humans human

01:26:03 or make cats different from dogs, right?

01:26:07 It’s the basic drives that are kind of hardwired

01:26:10 in our basal ganglia.

01:26:13 I mean, there are people who are working

01:26:14 on this kind of stuff that’s called intrinsic motivation

01:26:16 in the context of reinforcement learning.

01:26:18 So these are objective functions

01:26:20 where the reward doesn’t come from the external world.

01:26:23 It’s computed by your own brain.

01:26:24 Your own brain computes whether you’re happy or not, right?

01:26:28 It measures your degree of comfort or in comfort.

01:26:33 And because it’s your brain computing this,

01:26:36 presumably it knows also how to estimate

01:26:37 gradients of this, right?

01:26:38 So it’s easier to learn when your objective is intrinsic.

01:26:47 So that has to be hardwired.

01:26:50 The critic that makes longterm prediction of the outcome,

01:26:53 which is the eventual result of this, that’s learned.

01:26:57 And perception is learned

01:26:59 and your model of the world is learned.

01:27:01 But let me take an example of why the critic,

01:27:04 I mean, an example of how the critic may be learned, right?

01:27:06 If I come to you, I reach across the table

01:27:11 and I pinch your arm, right?

01:27:13 Complete surprise for you.

01:27:15 You would not have expected this from me.

01:27:16 I was expecting that the whole time, but yes, right.

01:27:18 Let’s say for the sake of the story, yes.

01:27:20 So, okay, your basal ganglia is gonna light up

01:27:24 because it’s gonna hurt, right?

01:27:28 And now your model of the world includes the fact that

01:27:31 I may pinch you if I approach my…

01:27:34 Don’t trust humans.

01:27:36 Right, my hand to your arm.

01:27:37 So if I try again, you’re gonna recoil.

01:27:40 And that’s your critic, your predictive,

01:27:44 your predictor of your ultimate pain system

01:27:50 that predicts that something bad is gonna happen

01:27:52 and you recoil to avoid it.

01:27:53 So even that can be learned.

01:27:55 That is learned, definitely.

01:27:56 This is what allows you also to define some goals, right?

01:28:00 So the fact that you’re a school child,

01:28:04 you wake up in the morning and you go to school

01:28:06 and it’s not because you necessarily like waking up early

01:28:12 and going to school,

01:28:12 but you know that there is a long term objective

01:28:14 you’re trying to optimize.

01:28:15 So Ernest Becker, I’m not sure if you’re familiar with him,

01:28:18 the philosopher, he wrote the book Denial of Death

01:28:20 and his idea is that one of the core motivations

01:28:23 of human beings is our terror of death, our fear of death.

01:28:27 That’s what makes us unique from cats.

01:28:28 Cats are just surviving.

01:28:30 They do not have a deep, like a cognizance introspection

01:28:37 that over the horizon is the end.

01:28:41 And then he says that, I mean,

01:28:43 there’s a terror management theory

01:28:44 that just all these psychological experiments

01:28:46 that show basically this idea

01:28:50 that all of human civilization, everything we create

01:28:54 is kind of trying to forget if even for a brief moment

01:28:58 that we’re going to die.

01:29:00 When do you think humans understand

01:29:03 that they’re going to die?

01:29:04 Is it learned early on also?

01:29:07 I don’t know at what point.

01:29:11 I mean, it’s a question like at what point

01:29:13 do you realize that what death really is?

01:29:16 And I think most people don’t actually realize

01:29:18 what death is, right?

01:29:19 I mean, most people believe that you go to heaven

01:29:20 or something, right?

01:29:21 So to push back on that, what Ernest Becker says

01:29:25 and Sheldon Solomon, all of those folks,

01:29:29 and I find those ideas a little bit compelling

01:29:31 is that there is moments in life, early in life,

01:29:34 a lot of this fun happens early in life

01:29:36 when you do deeply experience

01:29:41 the terror of this realization.

01:29:43 And all the things you think about about religion,

01:29:45 all those kinds of things that we kind of think about

01:29:48 more like teenage years and later,

01:29:50 we’re talking about way earlier.

01:29:52 No, it was like seven or eight years,

01:29:53 something like that, yeah.

01:29:54 You realize, holy crap, this is like the mystery,

01:29:59 the terror, like it’s almost like you’re a little prey,

01:30:03 a little baby deer sitting in the darkness

01:30:05 of the jungle or the woods looking all around you.

01:30:08 There’s darkness full of terror.

01:30:09 I mean, that realization says, okay,

01:30:12 I’m gonna go back in the comfort of my mind

01:30:14 where there is a deep meaning,

01:30:16 where there is maybe like pretend I’m immortal

01:30:20 in however way, however kind of idea I can construct

01:30:25 to help me understand that I’m immortal.

01:30:27 Religion helps with that.

01:30:28 You can delude yourself in all kinds of ways,

01:30:31 like lose yourself in the busyness of each day,

01:30:34 have little goals in mind, all those kinds of things

01:30:36 to think that it’s gonna go on forever.

01:30:38 And you kind of know you’re gonna die, yeah,

01:30:40 and it’s gonna be sad, but you don’t really understand

01:30:43 that you’re going to die.

01:30:45 And so that’s their idea.

01:30:46 And I find that compelling because it does seem

01:30:49 to be a core unique aspect of human nature

01:30:52 that we’re able to think that we’re going,

01:30:55 we’re able to really understand that this life is finite.

01:30:59 That seems important.

01:31:00 There’s a bunch of different things there.

01:31:02 So first of all, I don’t think there is a qualitative

01:31:04 difference between us and cats in the term.

01:31:07 I think the difference is that we just have a better

01:31:10 long term ability to predict in the long term.

01:31:14 And so we have a better understanding of how the world works.

01:31:17 So we have better understanding of finiteness of life

01:31:20 and things like that.

01:31:21 So we have a better planning engine than cats?

01:31:23 Yeah.

01:31:24 Okay.

01:31:25 But what’s the motivation for planning that far?

01:31:28 Well, I think it’s just a side effect of the fact

01:31:30 that we have just a better planning engine

01:31:32 because it makes us, as I said,

01:31:34 the essence of intelligence is the ability to predict.

01:31:37 And so the, because we’re smarter as a side effect,

01:31:41 we also have this ability to kind of make predictions

01:31:43 about our own future existence or lack thereof.

01:31:47 Okay.

01:31:48 You say religion helps with that.

01:31:50 I think religion hurts actually.

01:31:53 It makes people worry about like,

01:31:55 what’s going to happen after their death, et cetera.

01:31:57 If you believe that, you just don’t exist after death.

01:32:00 Like, it solves completely the problem, at least.

01:32:02 You’re saying if you don’t believe in God,

01:32:04 you don’t worry about what happens after death?

01:32:07 Yeah.

01:32:08 I don’t know.

01:32:09 You only worry about this life

01:32:11 because that’s the only one you have.

01:32:14 I think it’s, well, I don’t know.

01:32:16 If I were to say what Ernest Becker says,

01:32:17 and obviously I agree with him more than not,

01:32:22 is you do deeply worry.

01:32:26 If you believe there’s no God,

01:32:27 there’s still a deep worry of the mystery of it all.

01:32:31 Like, how does that make any sense that it just ends?

01:32:35 I don’t think we can truly understand that this ride,

01:32:39 I mean, so much of our life, the consciousness,

01:32:41 the ego is invested in this being.

01:32:46 And then…

01:32:47 Science keeps bringing humanity down from its pedestal.

01:32:51 And that’s just another example of it.

01:32:54 That’s wonderful, but for us individual humans,

01:32:57 we don’t like to be brought down from a pedestal.

01:33:00 You’re saying like, but see, you’re fine with it because,

01:33:03 well, so what Ernest Becker would say is you’re fine with it

01:33:06 because there’s just a more peaceful existence for you,

01:33:08 but you’re not really fine.

01:33:09 You’re hiding from it.

01:33:10 In fact, some of the people that experience

01:33:12 the deepest trauma earlier in life,

01:33:16 they often, before they seek extensive therapy,

01:33:19 will say that I’m fine.

01:33:21 It’s like when you talk to people who are truly angry,

01:33:23 how are you doing, I’m fine.

01:33:25 The question is, what’s going on?

01:33:27 Now I had a near death experience.

01:33:29 I had a very bad motorbike accident when I was 17.

01:33:33 So, but that didn’t have any impact

01:33:36 on my reflection on that topic.

01:33:40 So I’m basically just playing a bit of devil’s advocate,

01:33:43 pushing back on wondering,

01:33:45 is it truly possible to accept death?

01:33:47 And the flip side, that’s more interesting,

01:33:49 I think for AI and robotics is how important

01:33:53 is it to have this as one of the suite of motivations

01:33:57 is to not just avoid falling off the roof

01:34:03 or something like that, but ponder the end of the ride.

01:34:10 If you listen to the stoics, it’s a great motivator.

01:34:14 It adds a sense of urgency.

01:34:16 So maybe to truly fear death or be cognizant of it

01:34:21 might give a deeper meaning and urgency to the moment

01:34:26 to live fully.

01:34:30 Maybe I don’t disagree with that.

01:34:32 I mean, I think what motivates me here

01:34:34 is knowing more about human nature.

01:34:38 I mean, I think human nature and human intelligence

01:34:41 is a big mystery.

01:34:42 It’s a scientific mystery

01:34:45 in addition to philosophical and et cetera,

01:34:48 but I’m a true believer in science.

01:34:50 So, and I do have kind of a belief

01:34:56 that for complex systems like the brain and the mind,

01:34:59 the way to understand it is to try to reproduce it

01:35:04 with artifacts that you build

01:35:07 because you know what’s essential to it

01:35:08 when you try to build it.

01:35:10 The same way I’ve used this analogy before with you,

01:35:12 I believe, the same way we only started

01:35:15 to understand aerodynamics

01:35:18 when we started building airplanes

01:35:19 and that helped us understand how birds fly.

01:35:22 So I think there’s kind of a similar process here

01:35:25 where we don’t have a full theory of intelligence,

01:35:29 but building intelligent artifacts

01:35:31 will help us perhaps develop some underlying theory

01:35:35 that encompasses not just artificial implements,

01:35:39 but also human and biological intelligence in general.

01:35:43 So you’re an interesting person to ask this question

01:35:46 about sort of all kinds of different other

01:35:49 intelligent entities or intelligences.

01:35:53 What are your thoughts about kind of like the touring

01:35:56 or the Chinese room question?

01:35:59 If we create an AI system that exhibits

01:36:02 a lot of properties of intelligence and consciousness,

01:36:07 how comfortable are you thinking of that entity

01:36:10 as intelligent or conscious?

01:36:12 So you’re trying to build now systems

01:36:14 that have intelligence and there’s metrics

01:36:16 about their performance, but that metric is external.

01:36:22 So how are you, are you okay calling a thing intelligent

01:36:26 or are you going to be like most humans

01:36:29 and be once again unhappy to be brought down

01:36:32 from a pedestal of consciousness slash intelligence?

01:36:34 No, I’ll be very happy to understand

01:36:39 more about human nature, human mind and human intelligence

01:36:45 through the construction of machines

01:36:47 that have similar abilities.

01:36:50 And if a consequence of this is to bring down humanity

01:36:54 one notch down from its already low pedestal,

01:36:58 I’m just fine with it.

01:36:59 That’s just the reality of life.

01:37:01 So I’m fine with that.

01:37:02 Now you were asking me about things that,

01:37:05 opinions I have that a lot of people may disagree with.

01:37:07 I think if we think about the design

01:37:12 of autonomous intelligence systems,

01:37:14 so assuming that we are somewhat successful

01:37:16 at some level of getting machines to learn models

01:37:20 of the world, predictive models of the world,

01:37:22 we build intrinsic motivation objective functions

01:37:25 to drive the behavior of that system.

01:37:28 The system also has perception modules

01:37:30 that allows it to estimate the state of the world

01:37:32 and then have some way of figuring out

01:37:34 the sequence of actions that,

01:37:36 to optimize a particular objective.

01:37:39 If it has a critic of the type that I was describing before,

01:37:42 the thing that makes you recoil your arm

01:37:44 the second time I try to pinch you,

01:37:48 intelligent autonomous machine will have emotions.

01:37:51 I think emotions are an integral part

01:37:54 of autonomous intelligence.

01:37:56 If you have an intelligent system

01:37:59 that is driven by intrinsic motivation, by objectives,

01:38:03 if it has a critic that allows it to predict in advance

01:38:07 whether the outcome of a situation is gonna be good or bad,

01:38:11 is going to have emotions, it’s gonna have fear.

01:38:13 Yes.

01:38:14 When it predicts that the outcome is gonna be bad

01:38:18 and something to avoid is gonna have elation

01:38:20 when it predicts it’s gonna be good.

01:38:24 If it has drives to relate with humans,

01:38:28 in some ways the way humans have,

01:38:30 it’s gonna be social, right?

01:38:34 And so it’s gonna have emotions

01:38:36 about attachment and things of that type.

01:38:38 So I think the sort of sci fi thing

01:38:44 where you see commander data,

01:38:46 like having an emotion chip that you can turn off, right?

01:38:50 I think that’s ridiculous.

01:38:51 So, I mean, here’s the difficult

01:38:53 philosophical social question.

01:38:57 Do you think there will be a time like a civil rights

01:39:01 movement for robots where, okay, forget the movement,

01:39:05 but a discussion like the Supreme Court

01:39:09 that particular kinds of robots,

01:39:12 you know, particular kinds of systems

01:39:16 deserve the same rights as humans

01:39:18 because they can suffer just as humans can,

01:39:22 all those kinds of things.

01:39:24 Well, perhaps, perhaps not.

01:39:27 Like imagine that humans were,

01:39:29 that you could, you know, die and be restored.

01:39:33 Like, you know, you could be sort of, you know,

01:39:35 be 3D reprinted and, you know,

01:39:37 your brain could be reconstructed in its finest details.

01:39:40 Our ideas of rights will change in that case.

01:39:43 If you can always just,

01:39:45 there’s always a backup you could always restore.

01:39:48 Maybe like the importance of murder

01:39:50 will go down one notch.

01:39:51 That’s right.

01:39:52 But also your desire to do dangerous things,

01:39:57 like, you know, skydiving or, you know,

01:40:03 or, you know, race car driving,

01:40:05 you know, car racing or that kind of stuff,

01:40:07 you know, would probably increase

01:40:09 or, you know, aeroplanes, aerobatics

01:40:11 or that kind of stuff, right?

01:40:12 It would be fine to do a lot of those things

01:40:14 or explore, you know, dangerous areas and things like that.

01:40:17 It would kind of change your relationship.

01:40:19 So now it’s very likely that robots would be like that

01:40:22 because, you know, they’ll be based on perhaps technology

01:40:27 that is somewhat similar to today’s technology

01:40:30 and you can always have a backup.

01:40:32 So it’s possible, I don’t know if you like video games,

01:40:35 but there’s a game called Diablo and…

01:40:39 Oh, my sons are huge fans of this.

01:40:41 Yes.

01:40:44 In fact, they made a game that’s inspired by it.

01:40:47 Awesome.

01:40:47 Like built a game?

01:40:49 My three sons have a game design studio between them, yeah.

01:40:52 That’s awesome.

01:40:53 They came out with a game.

01:40:54 They just came out with a game.

01:40:55 Last year, no, this was last year,

01:40:56 early last year, about a year ago.

01:40:58 That’s awesome.

01:40:59 But so in Diablo, there’s something called hardcore mode,

01:41:02 which if you die, there’s no, you’re gone.

01:41:05 Right.

01:41:06 That’s it.

01:41:07 And so it’s possible with AI systems

01:41:10 for them to be able to operate successfully

01:41:13 and for us to treat them in a certain way

01:41:15 because they have to be integrated in human society,

01:41:18 they have to be able to die, no copies allowed.

01:41:22 In fact, copying is illegal.

01:41:23 It’s possible with humans as well,

01:41:25 like cloning will be illegal, even when it’s possible.

01:41:28 But cloning is not copying, right?

01:41:29 I mean, you don’t reproduce the mind of the person

01:41:33 and the experience.

01:41:33 Right.

01:41:34 It’s just a delayed twin, so.

01:41:36 But then it’s, but we were talking about with computers

01:41:39 that you will be able to copy.

01:41:40 Right.

01:41:41 You will be able to perfectly save,

01:41:42 pickle the mind state.

01:41:46 And it’s possible that that will be illegal

01:41:49 because that goes against,

01:41:53 that will destroy the motivation of the system.

01:41:55 Okay, so let’s say you have a domestic robot, okay?

01:42:00 Sometime in the future.

01:42:01 Yes.

01:42:02 And the domestic robot comes to you kind of

01:42:06 somewhat pre trained, it can do a bunch of things,

01:42:08 but it has a particular personality

01:42:10 that makes it slightly different from the other robots

01:42:12 because that makes them more interesting.

01:42:14 And then because it’s lived with you for five years,

01:42:18 you’ve grown some attachment to it and vice versa,

01:42:21 and it’s learned a lot about you.

01:42:24 Or maybe it’s not a real household robot.

01:42:25 Maybe it’s a virtual assistant that lives in your,

01:42:29 you know, augmented reality glasses or whatever, right?

01:42:32 You know, the horror movie type thing, right?

01:42:36 And that system to some extent,

01:42:39 the intelligence in that system is a bit like your child

01:42:43 or maybe your PhD student in the sense that

01:42:47 there’s a lot of you in that machine now, right?

01:42:50 And so if it were a living thing,

01:42:53 you would do this for free if you want, right?

01:42:56 If it’s your child, your child can, you know,

01:42:58 then live his or her own life.

01:43:01 And you know, the fact that they learn stuff from you

01:43:04 doesn’t mean that you have any ownership of it, right?

01:43:06 But if it’s a robot that you’ve trained,

01:43:09 perhaps you have some intellectual property claim

01:43:13 about.

01:43:14 Oh, intellectual property.

01:43:15 Oh, I thought you meant like a permanence value

01:43:18 in the sense that part of you is in.

01:43:20 Well, there is permanence value, right?

01:43:21 So you would lose a lot if that robot were to be destroyed

01:43:24 and you had no backup, you would lose a lot, right?

01:43:26 You lose a lot of investment, you know,

01:43:28 kind of like, you know, a person dying, you know,

01:43:31 that a friend of yours dying

01:43:34 or a coworker or something like that.

01:43:38 But also you have like intellectual property rights

01:43:42 in the sense that that system is fine tuned

01:43:45 to your particular existence.

01:43:47 So that’s now a very unique instantiation

01:43:49 of that original background model,

01:43:51 whatever it was that arrived.

01:43:54 And then there are issues of privacy, right?

01:43:55 Because now imagine that that robot has its own kind

01:44:00 of volition and decides to work for someone else.

01:44:02 Or kind of, you know, thinks life with you

01:44:06 is sort of untenable or whatever.

01:44:07 Now, all the things that that system learned from you,

01:44:14 you know, can you like, you know,

01:44:16 delete all the personal information

01:44:18 that that system knows about you?

01:44:19 I mean, that would be kind of an ethical question.

01:44:22 Like, you know, can you erase the mind

01:44:24 of a intelligent robot to protect your privacy?

01:44:30 You can’t do this with humans.

01:44:31 You can ask them to shut up,

01:44:32 but that you don’t have complete power over them.

01:44:35 You can’t erase humans, yeah, it’s the problem

01:44:38 with the relationships, you know, if you break up,

01:44:40 you can’t erase the other human.

01:44:42 With robots, I think it will have to be the same thing

01:44:44 with robots, that risk, that there has to be some risk

01:44:52 to our interactions to truly experience them deeply,

01:44:55 it feels like.

01:44:56 So you have to be able to lose your robot friend

01:44:59 and that robot friend to go tweeting

01:45:01 about how much of an asshole you were.

01:45:03 But then are you allowed to, you know,

01:45:06 murder the robot to protect your private information

01:45:08 if the robot decides to leave?

01:45:09 I have this intuition that for robots with certain,

01:45:14 like, it’s almost like a regulation.

01:45:16 If you declare your robot to be,

01:45:19 let’s call it sentient or something like that,

01:45:20 like this robot is designed for human interaction,

01:45:24 then you’re not allowed to murder these robots.

01:45:26 It’s the same as murdering other humans.

01:45:28 Well, but what about you do a backup of the robot

01:45:30 that you preserve on a hard drive

01:45:32 for the equivalent in the future?

01:45:33 That might be illegal.

01:45:34 It’s like piracy is illegal.

01:45:38 No, but it’s your own robot, right?

01:45:39 But you can’t, you don’t.

01:45:41 But then you can wipe out his brain.

01:45:45 So this robot doesn’t know anything about you anymore,

01:45:47 but you still have, technically it’s still in existence

01:45:50 because you backed it up.

01:45:51 And then there’ll be these great speeches

01:45:53 at the Supreme Court by saying,

01:45:55 oh, sure, you can erase the mind of the robot

01:45:57 just like you can erase the mind of a human.

01:46:00 We both can suffer.

01:46:01 There’ll be some epic like Obama type character

01:46:03 with a speech that we,

01:46:05 like the robots and the humans are the same.

01:46:08 We can both suffer.

01:46:09 We can both hope.

01:46:11 We can both, all of those kinds of things,

01:46:14 raise families, all that kind of stuff.

01:46:17 It’s interesting for these, just like you said,

01:46:20 emotion seems to be a fascinatingly powerful aspect

01:46:24 of human interaction, human robot interaction.

01:46:27 And if they’re able to exhibit emotions

01:46:30 at the end of the day,

01:46:31 that’s probably going to have us deeply consider

01:46:35 human rights, like what we value in humans,

01:46:38 what we value in other animals.

01:46:40 That’s why robots and AI is great.

01:46:42 It makes us ask really good questions.

01:46:44 The hard questions, yeah.

01:46:45 But you asked about the Chinese room type argument.

01:46:49 Is it real?

01:46:50 If it looks real.

01:46:51 I think the Chinese room argument is a really good one.

01:46:54 So.

01:46:55 So for people who don’t know what Chinese room is,

01:46:58 you can, I don’t even know how to formulate it well,

01:47:00 but basically you can mimic the behavior

01:47:04 of an intelligence system by just following

01:47:06 a giant algorithm code book that tells you exactly

01:47:10 how to respond in exactly each case.

01:47:12 But is that really intelligent?

01:47:14 It’s like a giant lookup table.

01:47:16 When this person says this, you answer this.

01:47:18 When this person says this, you answer this.

01:47:21 And if you understand how that works,

01:47:24 you have this giant, nearly infinite lookup table.

01:47:27 Is that really intelligence?

01:47:28 Cause intelligence seems to be a mechanism

01:47:31 that’s much more interesting and complex

01:47:33 than this lookup table.

01:47:34 I don’t think so.

01:47:35 So the, I mean, the real question comes down to,

01:47:38 do you think, you know, you can,

01:47:42 you can mechanize intelligence in some way,

01:47:44 even if that involves learning?

01:47:47 And the answer is, of course, yes, there’s no question.

01:47:50 There’s a second question then, which is,

01:47:53 assuming you can reproduce intelligence

01:47:56 in sort of different hardware than biological hardware,

01:47:59 you know, like computers, can you, you know,

01:48:04 match human intelligence in all the domains

01:48:09 in which humans are intelligent?

01:48:12 Is it possible, right?

01:48:13 So that’s the hypothesis of strong AI.

01:48:17 The answer to this, in my opinion, is an unqualified yes.

01:48:20 This will as well happen at some point.

01:48:22 There’s no question that machines at some point

01:48:25 will become more intelligent than humans

01:48:26 in all domains where humans are intelligent.

01:48:28 This is not for tomorrow.

01:48:30 It is going to take a long time,

01:48:32 regardless of what, you know,

01:48:34 Elon and others have claimed or believed.

01:48:38 This is a lot harder than many of those guys think it is.

01:48:43 And many of those guys who thought it was simpler than that

01:48:45 years, you know, five years ago,

01:48:47 now think it’s hard because it’s been five years

01:48:49 and they realize it’s going to take a lot longer.

01:48:53 That includes a bunch of people at DeepMind, for example.

01:48:55 But…

01:48:56 Oh, interesting.

01:48:57 I haven’t actually touched base with the DeepMind folks,

01:48:59 but some of it, Elon or Demis Hassabis.

01:49:03 I mean, sometimes in your role,

01:49:05 you have to kind of create deadlines

01:49:08 that are nearer than farther away

01:49:10 to kind of create an urgency.

01:49:12 Because, you know, you have to believe the impossible

01:49:14 is possible in order to accomplish it.

01:49:16 And there’s, of course, a flip side to that coin,

01:49:18 but it’s a weird, you can’t be too cynical

01:49:21 if you want to get something done.

01:49:22 Absolutely.

01:49:23 I agree with that.

01:49:24 But, I mean, you have to inspire people, right?

01:49:26 To work on sort of ambitious things.

01:49:31 So, you know, it’s certainly a lot harder than we believe,

01:49:35 but there’s no question in my mind that this will happen.

01:49:38 And now, you know, people are kind of worried about

01:49:40 what does that mean for humans?

01:49:42 They are going to be brought down from their pedestal,

01:49:45 you know, a bunch of notches with that.

01:49:47 And, you know, is that going to be good or bad?

01:49:51 I mean, it’s just going to give more power, right?

01:49:53 It’s an amplifier for human intelligence, really.

01:49:56 So, speaking of doing cool, ambitious things,

01:49:59 FAIR, the Facebook AI research group,

01:50:02 has recently celebrated its eighth birthday.

01:50:05 Or, maybe you can correct me on that.

01:50:08 Looking back, what has been the successes, the failures,

01:50:12 the lessons learned from the eight years of FAIR?

01:50:14 And maybe you can also give context of

01:50:16 where does the newly minted meta AI fit into,

01:50:21 how does it relate to FAIR?

01:50:22 Right, so let me tell you a little bit

01:50:23 about the organization of all this.

01:50:26 Yeah, FAIR was created almost exactly eight years ago.

01:50:30 It wasn’t called FAIR yet.

01:50:31 It took that name a few months later.

01:50:34 And at the time I joined Facebook,

01:50:37 there was a group called the AI group

01:50:39 that had about 12 engineers and a few scientists,

01:50:43 like, you know, 10 engineers and two scientists

01:50:45 or something like that.

01:50:47 I ran it for three and a half years as a director,

01:50:50 you know, hired the first few scientists

01:50:52 and kind of set up the culture and organized it,

01:50:55 you know, explained to the Facebook leadership

01:50:57 what fundamental research was about

01:51:00 and how it can work within industry

01:51:03 and how it needs to be open and everything.

01:51:07 And I think it’s been an unqualified success

01:51:12 in the sense that FAIR has simultaneously produced,

01:51:17 you know, top level research

01:51:19 and advanced the science and the technology,

01:51:21 provided tools, open source tools,

01:51:23 like PyTorch and many others,

01:51:26 but at the same time has had a direct

01:51:29 or mostly indirect impact on Facebook at the time,

01:51:34 now Meta, in the sense that a lot of systems

01:51:38 that Meta is built around now are based

01:51:43 on research projects that started at FAIR.

01:51:48 And so if you were to take out, you know,

01:51:49 deep learning out of Facebook services now

01:51:52 and Meta more generally,

01:51:55 I mean, the company would literally crumble.

01:51:57 I mean, it’s completely built around AI these days.

01:52:01 And it’s really essential to the operations.

01:52:04 So what happened after three and a half years

01:52:06 is that I changed role, I became chief scientist.

01:52:10 So I’m not doing day to day management of FAIR anymore.

01:52:14 I’m more of a kind of, you know,

01:52:17 think about strategy and things like that.

01:52:18 And I carry my, I conduct my own research.

01:52:21 I have, you know, my own kind of research group

01:52:23 working on self supervised learning and things like this,

01:52:25 which I didn’t have time to do when I was director.

01:52:28 So now FAIR is run by Joel Pinot and Antoine Bord together

01:52:34 because FAIR is kind of split in two now.

01:52:36 There’s something called FAIR Labs,

01:52:37 which is sort of bottom up science driven research

01:52:40 and FAIR Excel, which is slightly more organized

01:52:43 for bigger projects that require a little more

01:52:46 kind of focus and more engineering support

01:52:49 and things like that.

01:52:49 So Joel needs FAIR Lab and Antoine Bord needs FAIR Excel.

01:52:52 Where are they located?

01:52:54 It’s delocalized all over.

01:52:58 So there’s no question that the leadership of the company

01:53:02 believes that this was a very worthwhile investment.

01:53:06 And what that means is that it’s there for the long run.

01:53:12 Right?

01:53:13 So if you want to talk in these terms, which I don’t like,

01:53:17 this is a business model, if you want,

01:53:19 where FAIR, despite being a very fundamental research lab

01:53:23 brings a lot of value to the company,

01:53:25 either mostly indirectly through other groups.

01:53:29 Now what happened three and a half years ago

01:53:31 when I stepped down was also the creation of Facebook AI,

01:53:34 which was basically a larger organization

01:53:37 that covers FAIR, so FAIR is included in it,

01:53:41 but also has other organizations

01:53:43 that are focused on applied research

01:53:47 or advanced development of AI technology

01:53:51 that is more focused on the products of the company.

01:53:54 So less emphasis on fundamental research.

01:53:56 Less fundamental, but it’s still research.

01:53:58 I mean, there’s a lot of papers coming out

01:53:59 of those organizations and the people are awesome

01:54:03 and wonderful to interact with.

01:54:06 But it serves as kind of a way

01:54:10 to kind of scale up if you want sort of AI technology,

01:54:15 which, you know, may be very experimental

01:54:17 and sort of lab prototypes into things that are usable.

01:54:20 So FAIR is a subset of Meta AI.

01:54:23 Is FAIR become like KFC?

01:54:24 It’ll just keep the F.

01:54:26 Nobody cares what the F stands for.

01:54:29 We’ll know soon enough, probably by the end of 2021.

01:54:35 I guess it’s not a giant change, Mare, FAIR.

01:54:38 Well, Mare doesn’t sound too good,

01:54:39 but the brand people are kind of deciding on this

01:54:43 and they’ve been hesitating for a while now.

01:54:45 And they tell us they’re going to come up with an answer

01:54:48 as to whether FAIR is going to change name

01:54:50 or whether we’re going to change just the meaning of the F.

01:54:53 That’s a good call.

01:54:54 I would keep FAIR and change the meaning of the F.

01:54:56 That would be my preference.

01:54:57 I would turn the F into fundamental AI research.

01:55:02 Oh, that’s really good.

01:55:03 Within Meta AI.

01:55:04 So this would be meta FAIR,

01:55:06 but people will call it FAIR, right?

01:55:08 Yeah, exactly.

01:55:09 I like it.

01:55:10 And now Meta AI is part of the Reality Lab.

01:55:16 So Meta now, the new Facebook is called Meta

01:55:21 and it’s kind of divided into Facebook, Instagram, WhatsApp

01:55:30 and Reality Lab.

01:55:32 And Reality Lab is about AR, VR, telepresence,

01:55:37 communication technology and stuff like that.

01:55:40 It’s kind of the, you can think of it as the sort of,

01:55:44 a combination of sort of new products

01:55:47 and technology part of Meta.

01:55:51 Is that where the touch sensing for robots,

01:55:54 I saw that you were posting about that.

01:55:56 Touch sensing for robot is part of FAIR actually.

01:55:58 That’s a FAIR project.

01:55:59 Oh, it is.

01:55:59 Okay, cool.

01:56:00 Yeah, this is also the, no, but there is the other way,

01:56:03 the haptic glove, right?

01:56:05 Yes, that’s more Reality Lab.

01:56:07 That’s Reality Lab research.

01:56:10 Reality Lab research.

01:56:11 By the way, the touch sensors are super interesting.

01:56:14 Like integrating that modality

01:56:16 into the whole sensing suite is very interesting.

01:56:20 So what do you think about the Metaverse?

01:56:23 What do you think about this whole kind of expansion

01:56:27 of the view of the role of Facebook and Meta in the world?

01:56:30 Well, Metaverse really should be thought of

01:56:32 as the next step in the internet, right?

01:56:35 Sort of trying to kind of make the experience

01:56:41 more compelling of being connected

01:56:46 either with other people or with content.

01:56:49 And we are evolved and trained to evolve

01:56:54 in 3D environments where we can see other people.

01:56:58 We can talk to them when we’re near them

01:57:01 or an other viewer far away can’t hear us,

01:57:04 things like that, right?

01:57:05 So there’s a lot of social conventions

01:57:08 that exist in the real world that we can try to transpose.

01:57:10 Now, what is going to be eventually the,

01:57:15 how compelling is it going to be?

01:57:16 Like, is it going to be the case

01:57:18 that people are going to be willing to do this

01:57:21 if they have to wear a huge pair of goggles all day?

01:57:24 Maybe not.

01:57:26 But then again, if the experience

01:57:27 is sufficiently compelling, maybe so.

01:57:30 Or if the device that you have to wear

01:57:32 is just basically a pair of glasses,

01:57:34 and technology makes sufficient progress for that.

01:57:38 AR is a much easier concept to grasp

01:57:41 that you’re going to have augmented reality glasses

01:57:45 that basically contain some sort of virtual assistant

01:57:48 that can help you in your daily lives.

01:57:50 But at the same time with the AR,

01:57:51 you have to contend with reality.

01:57:53 With VR, you can completely detach yourself from reality.

01:57:55 So it gives you freedom.

01:57:57 It might be easier to design worlds in VR.

01:58:00 Yeah, but you can imagine the metaverse

01:58:02 being a mix, right?

01:58:06 Or like, you can have objects that exist in the metaverse

01:58:09 that pop up on top of the real world,

01:58:11 or only exist in virtual reality.

01:58:14 Okay, let me ask the hard question.

01:58:17 Oh, because all of this was easy so far.

01:58:18 This was easy.

01:58:20 The Facebook, now Meta, the social network

01:58:24 has been painted by the media as a net negative for society,

01:58:28 even destructive and evil at times.

01:58:30 You’ve pushed back against this, defending Facebook.

01:58:34 Can you explain your defense?

01:58:36 Yeah, so the description,

01:58:38 the company that is being described in some media

01:58:43 is not the company we know when we work inside.

01:58:47 And it could be claimed that a lot of employees

01:58:52 are uninformed about what really goes on in the company,

01:58:54 but I’m a vice president.

01:58:56 I mean, I have a pretty good vision of what goes on.

01:58:58 I don’t know everything, obviously.

01:59:00 I’m not involved in everything,

01:59:01 but certainly not in decision about content moderation

01:59:05 or anything like this,

01:59:06 but I have some decent vision of what goes on.

01:59:10 And this evil that is being described, I just don’t see it.

01:59:13 And then I think there is an easy story to buy,

01:59:18 which is that all the bad things in the world

01:59:21 and the reason your friend believe crazy stuff,

01:59:25 there’s an easy scapegoat in social media in general,

01:59:32 Facebook in particular.

01:59:34 But you have to look at the data.

01:59:35 Is it the case that Facebook, for example,

01:59:40 polarizes people politically?

01:59:42 Are there academic studies that show this?

01:59:45 Is it the case that teenagers think of themselves less

01:59:50 if they use Instagram more?

01:59:52 Is it the case that people get more riled up

01:59:57 against opposite sides in a debate or political opinion

02:00:02 if they are more on Facebook or if they are less?

02:00:05 And study after study show that none of this is true.

02:00:10 This is independent studies by academic.

02:00:12 They’re not funded by Facebook or Meta.

02:00:15 Study by Stanford, by some of my colleagues at NYU actually

02:00:18 with whom I have no connection.

02:00:20 There’s a study recently, they paid people,

02:00:24 I think it was in former Yugoslavia,

02:00:29 I’m not exactly sure in what part,

02:00:31 but they paid people to not use Facebook for a while

02:00:34 in the period before the anniversary

02:00:40 of the Srebrenica massacres.

02:00:43 So people get riled up, like should we have a celebration?

02:00:47 I mean, a memorial kind of celebration for it or not.

02:00:51 So they paid a bunch of people

02:00:52 to not use Facebook for a few weeks.

02:00:56 And it turns out that those people ended up

02:00:59 being more polarized than they were at the beginning

02:01:02 and the people who were more on Facebook were less polarized.

02:01:06 There’s a study from Stanford of economists at Stanford

02:01:10 that try to identify the causes

02:01:12 of increasing polarization in the US.

02:01:16 And it’s been going on for 40 years

02:01:17 before Mark Zuckerberg was born continuously.

02:01:22 And so if there is a cause,

02:01:25 it’s not Facebook or social media.

02:01:27 So you could say if social media just accelerated,

02:01:29 but no, I mean, it’s basically a continuous evolution

02:01:33 by some measure of polarization in the US.

02:01:35 And then you compare this with other countries

02:01:37 like the West half of Germany

02:01:41 because you can go 40 years in the East side

02:01:44 or Denmark or other countries.

02:01:47 And they use Facebook just as much

02:01:49 and they’re not getting more polarized,

02:01:50 they’re getting less polarized.

02:01:52 So if you want to look for a causal relationship there,

02:01:57 you can find a scapegoat, but you can’t find a cause.

02:01:59 Now, if you want to fix the problem,

02:02:01 you have to find the right cause.

02:02:03 And what rise me up is that people now are accusing Facebook

02:02:07 of bad deeds that are done by others

02:02:09 and those others are we’re not doing anything about them.

02:02:12 And by the way, those others include the owner

02:02:14 of the Wall Street Journal

02:02:15 in which all of those papers were published.

02:02:17 So I should mention that I’m talking to Schrepp,

02:02:20 Mike Schrepp on this podcast and also Mark Zuckerberg

02:02:23 and probably these are conversations you can have with them

02:02:26 because it’s very interesting to me,

02:02:27 even if Facebook has some measurable negative effect,

02:02:31 you can’t just consider that in isolation.

02:02:33 You have to consider about all the positive ways

02:02:35 that it connects us.

02:02:36 So like every technology.

02:02:38 It connects people, it’s a question.

02:02:39 You can’t just say like there’s an increase in division.

02:02:43 Yes, probably Google search engine

02:02:46 has created increase in division.

02:02:47 But you have to consider about how much information

02:02:49 are brought to the world.

02:02:51 Like I’m sure Wikipedia created more division.

02:02:53 If you just look at the division,

02:02:55 we have to look at the full context of the world

02:02:57 and they didn’t make a better world.

02:02:59 And you have to.

02:02:59 The printing press has created more division, right?

02:03:01 Exactly.

02:03:02 I mean, so when the printing press was invented,

02:03:06 the first books that were printed were things like the Bible

02:03:10 and that allowed people to read the Bible by themselves,

02:03:13 not get the message uniquely from priests in Europe.

02:03:17 And that created the Protestant movement

02:03:20 and 200 years of religious persecution and wars.

02:03:23 So that’s a bad side effect of the printing press.

02:03:26 Social networks aren’t being nearly as bad

02:03:28 as the printing press,

02:03:29 but nobody would say the printing press was a bad idea.

02:03:33 Yeah, a lot of it is perception

02:03:35 and there’s a lot of different incentives operating here.

02:03:38 Maybe a quick comment,

02:03:40 since you’re one of the top leaders at Facebook

02:03:42 and at Meta, sorry, that’s in the tech space,

02:03:46 I’m sure Facebook involves a lot of incredible

02:03:49 technological challenges that need to be solved.

02:03:52 A lot of it probably is in the computer infrastructure,

02:03:55 the hardware, I mean, it’s just a huge amount.

02:03:58 Maybe can you give me context about how much of Shrepp’s life

02:04:03 is AI and how much of it is low level compute?

02:04:06 How much of it is flying all around doing business stuff?

02:04:09 And the same with Mark Zuckerberg.

02:04:12 They really focus on AI.

02:04:13 I mean, certainly in the run up of the creation of FAIR

02:04:19 and for at least a year after that, if not more,

02:04:24 Mark was very, very much focused on AI

02:04:26 and was spending quite a lot of effort on it.

02:04:29 And that’s his style.

02:04:30 When he gets interested in something,

02:04:32 he reads everything about it.

02:04:34 He read some of my papers, for example, before I joined.

02:04:39 And so he learned a lot about it.

02:04:41 He said he liked notes.

02:04:43 Right.

02:04:46 And Shrepp was really into it also.

02:04:51 I mean, Shrepp is really kind of,

02:04:54 has something I’ve tried to preserve also

02:04:57 despite my not so young age,

02:05:00 which is a sense of wonder about science and technology.

02:05:03 And he certainly has that.

02:05:06 He’s also a wonderful person.

02:05:07 I mean, in terms of like as a manager,

02:05:10 like dealing with people and everything.

02:05:12 Mark also, actually.

02:05:14 I mean, they’re very human people.

02:05:18 In the case of Mark, it’s shockingly human

02:05:20 given his trajectory.

02:05:25 I mean, the personality of him that is painted in the press,

02:05:28 it’s just completely wrong.

02:05:29 Yeah.

02:05:30 But you have to know how to play the press.

02:05:31 So that’s, I put some of that responsibility on him too.

02:05:36 You have to, it’s like, you know,

02:05:40 like the director, the conductor of an orchestra,

02:05:44 you have to play the press and the public

02:05:46 in a certain kind of way

02:05:48 where you convey your true self to them.

02:05:49 If there’s a depth and kindness to it.

02:05:51 It’s hard.

02:05:51 And it’s probably not the best at it.

02:05:53 So, yeah.

02:05:56 You have to learn.

02:05:57 And it’s sad to see, and I’ll talk to him about it,

02:06:00 but Shrep is slowly stepping down.

02:06:04 It’s always sad to see folks sort of be there

02:06:07 for a long time and slowly.

02:06:09 I guess time is sad.

02:06:11 I think he’s done the thing he set out to do.

02:06:14 And, you know, he’s got, you know,

02:06:19 family priorities and stuff like that.

02:06:21 And I understand, you know, after 13 years or something.

02:06:27 It’s been a good run.

02:06:28 Which in Silicon Valley is basically a lifetime.

02:06:32 Yeah.

02:06:32 You know, because, you know, it’s dog years.

02:06:35 So, NeurIPS, the conference just wrapped up.

02:06:38 Let me just go back to something else.

02:06:40 You posted that a paper you coauthored

02:06:42 was rejected from NeurIPS.

02:06:44 As you said, proudly, in quotes, rejected.

02:06:48 It’s a joke.

02:06:48 Yeah, I know.

02:06:49 So, can you describe this paper?

02:06:53 And like, what was the idea in it?

02:06:55 And also, maybe this is a good opportunity to ask

02:06:59 what are the pros and cons, what works and what doesn’t

02:07:01 about the review process?

02:07:03 Yeah, let me talk about the paper first.

02:07:04 I’ll talk about the review process afterwards.

02:07:09 The paper is called VicReg.

02:07:10 So, this is, I mentioned that before.

02:07:12 Variance, invariance, covariance, regularization.

02:07:14 And it’s a technique, a noncontrastive learning technique

02:07:18 for what I call joint embedding architecture.

02:07:21 So, SiameseNets are an example

02:07:23 of joint embedding architecture.

02:07:24 So, joint embedding architecture is,

02:07:29 let me back up a little bit, right?

02:07:30 So, if you want to do self supervised learning,

02:07:33 you can do it by prediction.

02:07:36 So, let’s say you want to train the system

02:07:37 to predict video, right?

02:07:38 You show it a video clip and you train the system

02:07:42 to predict the next, the continuation of that video clip.

02:07:45 Now, because you need to handle uncertainty,

02:07:47 because there are many continuations that are plausible,

02:07:51 you need to have, you need to handle this in some way.

02:07:54 You need to have a way for the system

02:07:56 to be able to produce multiple predictions.

02:08:00 And the way, the only way I know to do this

02:08:03 is through what’s called a latent variable.

02:08:05 So, you have some sort of hidden vector

02:08:08 of a variable that you can vary over a set

02:08:11 or draw from a distribution.

02:08:12 And as you vary this vector over a set,

02:08:14 the output, the prediction varies

02:08:16 over a set of plausible predictions, okay?

02:08:18 So, that’s called,

02:08:19 I call this a generative latent variable model.

02:08:24 Got it.

02:08:24 Okay, now there is an alternative to this,

02:08:27 to handle uncertainty.

02:08:28 And instead of directly predicting the next frames

02:08:33 of the clip, you also run those through another neural net.

02:08:41 So, you now have two neural nets,

02:08:42 one that looks at the initial segment of the video clip,

02:08:48 and another one that looks at the continuation

02:08:51 during training, right?

02:08:53 And what you’re trying to do is learn a representation

02:08:57 of those two video clips that is maximally informative

02:09:00 about the video clips themselves,

02:09:03 but is such that you can predict the representation

02:09:07 of the second video clip

02:09:08 from the representation of the first one easily, okay?

02:09:12 And you can sort of formalize this

02:09:13 in terms of maximizing mutual information

02:09:15 and some stuff like that, but it doesn’t matter.

02:09:18 What you want is informative representations

02:09:24 of the two video clips that are mutually predictable.

02:09:28 What that means is that there’s a lot of details

02:09:30 in the second video clips that are irrelevant.

02:09:36 Let’s say a video clip consists in a camera panning

02:09:40 the scene, there’s gonna be a piece of that room

02:09:43 that is gonna be revealed, and I can somewhat predict

02:09:46 what that room is gonna look like,

02:09:48 but I may not be able to predict the details

02:09:50 of the texture of the ground

02:09:52 and where the tiles are ending and stuff like that, right?

02:09:54 So, those are irrelevant details

02:09:56 that perhaps my representation will eliminate.

02:09:59 And so, what I need is to train this second neural net

02:10:03 in such a way that whenever the continuation video clip

02:10:08 varies over all the plausible continuations,

02:10:13 the representation doesn’t change.

02:10:15 Got it.

02:10:16 So, it’s the, yeah, yeah, got it.

02:10:18 Over the space of the representations,

02:10:20 doing the same kind of thing

02:10:21 as you do with similarity learning.

02:10:24 Right.

02:10:25 So, these are two ways to handle multimodality

02:10:28 in a prediction, right?

02:10:29 In the first way, you parameterize the prediction

02:10:32 with a latent variable,

02:10:33 but you predict pixels essentially, right?

02:10:35 In the second one, you don’t predict pixels,

02:10:38 you predict an abstract representation of pixels,

02:10:40 and you guarantee that this abstract representation

02:10:43 has as much information as possible about the input,

02:10:46 but sort of, you know,

02:10:47 drops all the stuff that you really can’t predict,

02:10:49 essentially.

02:10:52 I used to be a big fan of the first approach.

02:10:53 And in fact, in this paper with Hicham Mishra,

02:10:55 this blog post, the Dark Matter Intelligence,

02:10:58 I was kind of advocating for this.

02:10:59 And in the last year and a half,

02:11:01 I’ve completely changed my mind.

02:11:02 I’m now a big fan of the second one.

02:11:04 And it’s because of a small collection of algorithms

02:11:10 that have been proposed over the last year and a half or so,

02:11:13 two years, to do this, including vCraig,

02:11:17 its predecessor called Barlow Twins,

02:11:19 which I mentioned, a method from our friends at DeepMind

02:11:23 called BYOL, and there’s a bunch of others now

02:11:28 that kind of work similarly.

02:11:29 So, they’re all based on this idea of joint embedding.

02:11:32 Some of them have an explicit criterion

02:11:34 that is an approximation of mutual information.

02:11:36 Some others at BYOL work, but we don’t really know why.

02:11:39 And there’s been like lots of theoretical papers

02:11:41 about why BYOL works.

02:11:42 No, it’s not that, because we take it out

02:11:43 and it still works, and blah, blah, blah.

02:11:46 I mean, so there’s like a big debate,

02:11:47 but the important point is that we now have a collection

02:11:51 of noncontrastive joint embedding methods,

02:11:53 which I think is the best thing since sliced bread.

02:11:56 So, I’m super excited about this

02:11:58 because I think it’s our best shot

02:12:01 for techniques that would allow us

02:12:02 to kind of build predictive world models.

02:12:06 And at the same time,

02:12:07 learn hierarchical representations of the world,

02:12:09 where what matters about the world is preserved

02:12:11 and what is irrelevant is eliminated.

02:12:14 And by the way, the representations,

02:12:15 the before and after, is in the space

02:12:19 in a sequence of images, or is it for single images?

02:12:22 It would be either for a single image, for a sequence.

02:12:24 It doesn’t have to be images.

02:12:25 This could be applied to text.

02:12:26 This could be applied to just about any signal.

02:12:28 I’m looking for methods that are generally applicable

02:12:32 that are not specific to one particular modality.

02:12:36 It could be audio or whatever.

02:12:37 Got it.

02:12:38 So, what’s the story behind this paper?

02:12:40 This paper is describing one such method?

02:12:43 It’s this vcrack method.

02:12:44 So, this is coauthored.

02:12:45 The first author is a student called Adrien Barne,

02:12:49 who is a resident PhD student at Fair Paris,

02:12:52 who is coadvised by me and Jean Ponce,

02:12:55 who is a professor at École Normale Supérieure,

02:12:58 also a research director at INRIA.

02:13:01 So, this is a wonderful program in France

02:13:03 where PhD students can basically do their PhD in industry,

02:13:06 and that’s kind of what’s happening here.

02:13:10 And this paper is a followup on this Bardo Twin paper

02:13:15 by my former postdoc, now Stéphane Denis,

02:13:18 with Li Jing and Iorij Bontar

02:13:21 and a bunch of other people from Fair.

02:13:24 And one of the main criticism from reviewers

02:13:27 is that vcrack is not different enough from Bardo Twins.

02:13:31 But, you know, my impression is that it’s, you know,

02:13:36 Bardo Twins with a few bugs fixed, essentially,

02:13:39 and in the end, this is what people will use.

02:13:43 Right, so.

02:13:44 But, you know, I’m used to stuff

02:13:47 that I submit being rejected for a while.

02:13:49 So, it might be rejected and actually exceptionally well cited

02:13:51 because people use it.

02:13:52 Well, it’s already cited like a bunch of times.

02:13:54 So, I mean, the question is then to the deeper question

02:13:57 about peer review and conferences.

02:14:00 I mean, computer science is a field that’s kind of unique

02:14:02 that the conference is highly prized.

02:14:04 That’s one.

02:14:05 Right.

02:14:06 And it’s interesting because the peer review process there

02:14:09 is similar, I suppose, to journals,

02:14:11 but it’s accelerated significantly.

02:14:13 Well, not significantly, but it goes fast.

02:14:16 And it’s a nice way to get stuff out quickly,

02:14:19 to peer review it quickly,

02:14:20 go to present it quickly to the community.

02:14:22 So, not quickly, but quicker.

02:14:25 Yeah.

02:14:26 But nevertheless, it has many of the same flaws

02:14:27 of peer review,

02:14:29 because it’s a limited number of people look at it.

02:14:31 There’s bias and the following,

02:14:32 like that if you want to do new ideas,

02:14:35 you’re going to get pushback.

02:14:38 There’s self interested people that kind of can infer

02:14:42 who submitted it and kind of, you know,

02:14:45 be cranky about it, all that kind of stuff.

02:14:47 Yeah, I mean, there’s a lot of social phenomena there.

02:14:51 There’s one social phenomenon, which is that

02:14:53 because the field has been growing exponentially,

02:14:56 the vast majority of people in the field

02:14:58 are extremely junior.

02:15:00 Yeah.

02:15:00 So, as a consequence,

02:15:01 and that’s just a consequence of the field growing, right?

02:15:04 So, as the number of, as the size of the field

02:15:07 kind of starts saturating,

02:15:08 you will have less of that problem

02:15:11 of reviewers being very inexperienced.

02:15:15 A consequence of this is that, you know, young reviewers,

02:15:20 I mean, there’s a phenomenon which is that

02:15:22 reviewers try to make their life easy

02:15:24 and to make their life easy when reviewing a paper

02:15:27 is very simple.

02:15:28 You just have to find a flaw in the paper, right?

02:15:29 So, basically they see the task as finding flaws in papers

02:15:34 and most papers have flaws, even the good ones.

02:15:36 Yeah.

02:15:38 So, it’s easy to, you know, to do that.

02:15:41 Your job is easier as a reviewer if you just focus on this.

02:15:46 But what’s important is like,

02:15:49 is there a new idea in that paper

02:15:51 that is likely to influence?

02:15:54 It doesn’t matter if the experiments are not that great,

02:15:56 if the protocol is, you know, so, so, you know,

02:16:00 things like that.

02:16:01 As long as there is a worthy idea in it

02:16:05 that will influence the way people think about the problem,

02:16:09 even if they make it better, you know, eventually,

02:16:11 I think that’s really what makes a paper useful.

02:16:15 And so, this combination of social phenomena

02:16:19 creates a disease that has plagued, you know,

02:16:24 other fields in the past, like speech recognition,

02:16:26 where basically, you know, people chase numbers

02:16:28 on benchmarks and it’s much easier to get a paper accepted

02:16:34 if it brings an incremental improvement

02:16:37 on a sort of mainstream well accepted method or problem.

02:16:44 And those are, to me, boring papers.

02:16:46 I mean, they’re not useless, right?

02:16:47 Because industry, you know, strives

02:16:50 on those kinds of progress,

02:16:52 but they’re not the ones that I’m interested in,

02:16:54 in terms of like new concepts and new ideas.

02:16:55 So, papers that are really trying to strike

02:16:59 kind of new advances generally don’t make it.

02:17:02 Now, thankfully we have Archive.

02:17:04 Archive, exactly.

02:17:05 And then there’s open review type of situations

02:17:08 where you, and then, I mean, Twitter’s a kind of open review.

02:17:11 I’m a huge believer that review should be done

02:17:13 by thousands of people, not two people.

02:17:15 I agree.

02:17:16 And so Archive, like do you see a future

02:17:19 where a lot of really strong papers,

02:17:21 it’s already the present, but a growing future

02:17:23 where it’ll just be Archive

02:17:26 and you’re presenting an ongoing continuous conference

02:17:31 called Twitter slash the internet slash Archive Sanity.

02:17:35 Andre just released a new version.

02:17:38 So just not, you know, not being so elitist

02:17:40 about this particular gating.

02:17:43 It’s not a question of being elitist or not.

02:17:44 It’s a question of being basically recommendation

02:17:50 and sort of approvals for people who don’t see themselves

02:17:53 as having the ability to do so by themselves, right?

02:17:55 And so it saves time, right?

02:17:57 If you rely on other people’s opinion

02:18:00 and you trust those people or those groups

02:18:03 to evaluate a paper for you, that saves you time

02:18:09 because, you know, you don’t have to like scrutinize

02:18:12 the paper as much, you know, is brought to your attention.

02:18:15 I mean, it’s the whole idea of sort of, you know,

02:18:16 collective recommender system, right?

02:18:18 So I actually thought about this a lot, you know,

02:18:22 about 10, 15 years ago,

02:18:24 because there were discussions at NIPS

02:18:27 and, you know, and we’re about to create iClear

02:18:30 with Yoshua Bengio.

02:18:31 And so I wrote a document kind of describing

02:18:34 a reviewing system, which basically was, you know,

02:18:38 you post your paper on some repository,

02:18:39 let’s say archive or now could be open review.

02:18:42 And then you can form a reviewing entity,

02:18:46 which is equivalent to a reviewing board, you know,

02:18:48 of a journal or program committee of a conference.

02:18:53 You have to list the members.

02:18:55 And then that group reviewing entity can choose

02:19:00 to review a particular paper spontaneously or not.

02:19:03 There is no exclusive relationship anymore

02:19:05 between a paper and a venue or reviewing entity.

02:19:09 Any reviewing entity can review any paper

02:19:12 or may choose not to.

02:19:15 And then, you know, given evaluation,

02:19:16 it’s not published, not published,

02:19:17 it’s just an evaluation and a comment,

02:19:20 which would be public, signed by the reviewing entity.

02:19:23 And if it’s signed by a reviewing entity,

02:19:25 you know, it’s one of the members of reviewing entity.

02:19:27 So if the reviewing entity is, you know,

02:19:30 Lex Friedman’s, you know, preferred papers, right?

02:19:33 You know, it’s Lex Friedman writing the review.

02:19:35 Yes, so for me, that’s a beautiful system, I think.

02:19:40 But in addition to that,

02:19:42 it feels like there should be a reputation system

02:19:45 for the reviewers.

02:19:47 For the reviewing entities,

02:19:49 not the reviewers individually.

02:19:50 The reviewing entities, sure.

02:19:51 But even within that, the reviewers too,

02:19:53 because there’s another thing here.

02:19:57 It’s not just the reputation,

02:19:59 it’s an incentive for an individual person to do great.

02:20:02 Right now, in the academic setting,

02:20:05 the incentive is kind of internal,

02:20:07 just wanting to do a good job.

02:20:09 But honestly, that’s not a strong enough incentive

02:20:11 to do a really good job in reading a paper,

02:20:13 in finding the beautiful amidst the mistakes and the flaws

02:20:16 and all that kind of stuff.

02:20:17 Like if you’re the person that first discovered

02:20:20 a powerful paper, and you get to be proud of that discovery,

02:20:25 then that gives a huge incentive to you.

02:20:27 That’s a big part of my proposal, actually,

02:20:29 where I describe that as, you know,

02:20:31 if your evaluation of papers is predictive

02:20:35 of future success, okay,

02:20:37 then your reputation should go up as a reviewing entity.

02:20:42 So yeah, exactly.

02:20:43 I mean, I even had a master’s student

02:20:46 who was a master’s student in library science

02:20:49 and computer science actually kind of work out exactly

02:20:52 how that should work with formulas and everything.

02:20:55 So in terms of implementation,

02:20:56 do you think that’s something that’s doable?

02:20:58 I mean, I’ve been sort of, you know,

02:20:59 talking about this to sort of various people

02:21:02 like, you know, Andrew McCallum, who started Open Review.

02:21:05 And the reason why we picked Open Review

02:21:07 for iClear initially,

02:21:09 even though it was very early for them,

02:21:11 is because my hope was that iClear,

02:21:14 it was eventually going to kind of

02:21:16 inaugurate this type of system.

02:21:18 So iClear kept the idea of Open Reviews.

02:21:22 So where the reviews are, you know,

02:21:23 published with a paper, which I think is very useful,

02:21:27 but in many ways that’s kind of reverted

02:21:29 to kind of more of a conventional type conferences

02:21:33 for everything else.

02:21:34 And that, I mean, I don’t run iClear.

02:21:37 I’m just the president of the foundation,

02:21:41 but you know, people who run it

02:21:44 should make decisions about how to run it.

02:21:45 And I’m not going to tell them because they are volunteers

02:21:48 and I’m really thankful that they do that.

02:21:50 So, but I’m saddened by the fact

02:21:53 that we’re not being innovative enough.

02:21:57 Yeah, me too.

02:21:57 I hope that changes.

02:21:59 Yeah.

02:22:00 Cause the communication science broadly,

02:22:02 but communication computer science ideas

02:22:05 is how you make those ideas have impact, I think.

02:22:08 Yeah, and I think, you know, a lot of this is

02:22:11 because people have in their mind kind of an objective,

02:22:16 which is, you know, fairness for authors

02:22:19 and the ability to count points basically

02:22:22 and give credits accurately.

02:22:24 But that comes at the expense of the progress of science.

02:22:28 So to some extent,

02:22:29 we’re slowing down the progress of science.

02:22:32 And are we actually achieving fairness?

02:22:34 And we’re not achieving fairness.

02:22:35 You know, we still have biases.

02:22:37 You know, we’re doing, you know, a double blind review,

02:22:39 but you know, the biases are still there.

02:22:44 There are different kinds of biases.

02:22:46 You write that the phenomenon of emergence,

02:22:49 collective behavior exhibited by a large collection

02:22:51 of simple elements in interaction

02:22:54 is one of the things that got you

02:22:55 into neural nets in the first place.

02:22:57 I love cellular automata.

02:22:59 I love simple interacting elements

02:23:02 and the things that emerge from them.

02:23:04 Do you think we understand how complex systems can emerge

02:23:07 from such simple components that interact simply?

02:23:11 No, we don’t.

02:23:12 It’s a big mystery.

02:23:13 Also, it’s a mystery for physicists.

02:23:14 It’s a mystery for biologists.

02:23:17 You know, how is it that the universe around us

02:23:22 seems to be increasing in complexity and not decreasing?

02:23:25 I mean, that is a kind of curious property of physics

02:23:29 that despite the second law of thermodynamics,

02:23:32 we seem to be, you know, evolution and learning

02:23:35 and et cetera seems to be kind of at least locally

02:23:40 to increase complexity and not decrease it.

02:23:44 So perhaps the ultimate purpose of the universe

02:23:46 is to just get more complex.

02:23:49 Have these, I mean, small pockets of beautiful complexity.

02:23:55 Does that, cellular automata,

02:23:57 these kinds of emergence of complex systems

02:23:59 give you some intuition or guide your understanding

02:24:04 of machine learning systems and neural networks and so on?

02:24:06 Or are these, for you right now, disparate concepts?

02:24:09 Well, it got me into it.

02:24:10 You know, I discovered the existence of the perceptron

02:24:15 when I was a college student, you know, by reading a book

02:24:19 and it was a debate between Chomsky and Piaget

02:24:21 and Seymour Papert from MIT was kind of singing the praise

02:24:25 of the perceptron in that book.

02:24:27 And I, the first time I heard about the running machine,

02:24:29 right, so I started digging the literature

02:24:31 and I found those paper, those books,

02:24:33 which were basically transcription of workshops

02:24:37 or conferences from the fifties and sixties

02:24:39 about self organizing systems.

02:24:42 So there were, there was a series of conferences

02:24:44 on self organizing systems and there’s books on this.

02:24:48 Some of them are, you can actually get them

02:24:50 at the internet archive, you know, the digital version.

02:24:55 And there are like fascinating articles in there by,

02:24:58 there’s a guy whose name has been largely forgotten,

02:25:00 Heinz von Förster, he’s a German physicist

02:25:04 who immigrated to the US and worked

02:25:07 on self organizing systems in the fifties.

02:25:11 And in the sixties he created at University of Illinois

02:25:13 at Urbana Champagne, he created the biological

02:25:16 computer laboratory, BCL, which was all about neural nets.

02:25:21 Unfortunately, that was kind of towards the end

02:25:23 of the popularity of neural nets.

02:25:24 So that lab never kind of strived very much,

02:25:27 but he wrote a bunch of papers about self organization

02:25:30 and about the mystery of self organization.

02:25:33 An example he has is you take, imagine you are in space,

02:25:37 there’s no gravity and you have a big box

02:25:38 with magnets in it, okay.

02:25:42 You know, kind of rectangular magnets

02:25:43 with North Pole on one end, South Pole on the other end.

02:25:46 You shake the box gently and the magnets will kind of stick

02:25:49 to themselves and probably form like complex structure,

02:25:53 you know, spontaneously.

02:25:55 You know, that could be an example of self organization,

02:25:57 but you know, you have lots of examples,

02:25:58 neural nets are an example of self organization too,

02:26:01 you know, in many respect.

02:26:03 And it’s a bit of a mystery, you know,

02:26:05 how like what is possible with this, you know,

02:26:09 pattern formation in physical systems, in chaotic system

02:26:12 and things like that, you know, the emergence of life,

02:26:16 you know, things like that.

02:26:16 So, you know, how does that happen?

02:26:19 So it’s a big puzzle for physicists as well.

02:26:22 It feels like understanding this,

02:26:24 the mathematics of emergence

02:26:27 in some constrained situations

02:26:29 might help us create intelligence,

02:26:32 like help us add a little spice to the systems

02:26:36 because you seem to be able to in complex systems

02:26:40 with emergence to be able to get a lot from little.

02:26:44 And so that seems like a shortcut

02:26:47 to get big leaps in performance, but…

02:26:51 But there’s a missing concept that we don’t have.

02:26:55 Yeah.

02:26:55 And it’s something also I’ve been fascinated by

02:26:58 since my undergrad days,

02:27:00 and it’s how you measure complexity, right?

02:27:03 So we don’t actually have good ways of measuring,

02:27:06 or at least we don’t have good ways of interpreting

02:27:09 the measures that we have at our disposal.

02:27:11 Like how do you measure the complexity of something, right?

02:27:14 So there’s all those things, you know,

02:27:15 like, you know, Kolmogorov, Chaitin, Solomonov complexity

02:27:18 of, you know, the length of the shortest program

02:27:20 that would generate a bit string can be thought of

02:27:23 as the complexity of that bit string, right?

02:27:26 I’ve been fascinated by that concept.

02:27:28 The problem with that is that

02:27:30 that complexity is defined up to a constant,

02:27:32 which can be very large.

02:27:34 Right.

02:27:35 There are similar concepts that are derived from,

02:27:37 you know, Bayesian probability theory,

02:27:42 where, you know, the complexity of something

02:27:44 is the negative log of its probability, essentially, right?

02:27:48 And you have a complete equivalence between the two things.

02:27:51 And there you would think, you know,

02:27:52 the probability is something that’s well defined mathematically,

02:27:55 which means complexity is well defined.

02:27:57 But it’s not true.

02:27:58 You need to have a model of the distribution.

02:28:01 You may need to have a prior

02:28:02 if you’re doing Bayesian inference.

02:28:04 And the prior plays the same role

02:28:05 as the choice of the computer

02:28:07 with which you measure Kolmogorov complexity.

02:28:09 And so every measure of complexity we have

02:28:12 has some arbitrary density,

02:28:15 you know, an additive constant,

02:28:16 which can be arbitrarily large.

02:28:19 And so, you know, how can we come up with a good theory

02:28:23 of how things become more complex

02:28:24 if we don’t have a good measure of complexity?

02:28:26 Yeah, which we need for this.

02:28:28 One way that people study this in the space of biology,

02:28:32 the people that study the origin of life

02:28:33 or try to recreate the life in the laboratory.

02:28:37 And the more interesting one is the alien one,

02:28:39 is when we go to other planets,

02:28:41 how do we recognize this life?

02:28:43 Because, you know, complexity, we associate complexity,

02:28:46 maybe some level of mobility with life.

02:28:50 You know, we have to be able to, like,

02:28:51 have concrete algorithms for, like,

02:28:57 measuring the level of complexity we see

02:29:00 in order to know the difference between life and non life.

02:29:02 And the problem is that complexity

02:29:04 is in the eye of the beholder.

02:29:05 So let me give you an example.

02:29:07 If I give you an image of the MNIST digits, right,

02:29:13 and I flip through MNIST digits,

02:29:15 there is obviously some structure to it

02:29:18 because local structure, you know,

02:29:20 neighboring pixels are correlated

02:29:23 across the entire data set.

02:29:25 I imagine that I apply a random permutation

02:29:30 to all the pixels, a fixed random permutation.

02:29:33 Now I show you those images,

02:29:35 they will look, you know, really disorganized to you,

02:29:38 more complex.

02:29:40 In fact, they’re not more complex in absolute terms,

02:29:42 they’re exactly the same as originally, right?

02:29:45 And if you knew what the permutation was,

02:29:46 you know, you could undo the permutation.

02:29:49 Now, imagine I give you special glasses

02:29:52 that undo that permutation.

02:29:54 Now, all of a sudden, what looked complicated

02:29:56 becomes simple.

02:29:57 Right.

02:29:57 So if you have two, if you have, you know,

02:30:00 humans on one end, and then another race of aliens

02:30:03 that sees the universe with permutation glasses.

02:30:05 Yeah, with the permutation glasses.

02:30:06 Okay, what we perceive as simple to them

02:30:09 is hardly complicated, it’s probably heat.

02:30:11 Yeah.

02:30:12 Heat, yeah.

02:30:13 Okay, and what they perceive as simple to us

02:30:15 is random fluctuation, it’s heat.

02:30:18 Yeah.

02:30:19 Yeah, it’s truly in the eye of the beholder.

02:30:22 Yeah.

02:30:23 It depends what kind of glasses you’re wearing.

02:30:24 Right.

02:30:25 It depends what kind of algorithm you’re running

02:30:26 in your perception system.

02:30:28 So I don’t think we’ll have a theory of intelligence,

02:30:31 self organization, evolution, things like this,

02:30:34 until we have a good handle on a notion of complexity

02:30:38 which we know is in the eye of the beholder.

02:30:42 Yeah, it’s sad to think that we might not be able

02:30:44 to detect or interact with alien species

02:30:47 because we’re wearing different glasses.

02:30:50 Because their notion of locality

02:30:51 might be different from ours.

02:30:52 Yeah, exactly.

02:30:53 This actually connects with fascinating questions

02:30:55 in physics at the moment, like modern physics,

02:30:58 quantum physics, like, you know, questions about,

02:31:00 like, you know, can we recover the information

02:31:02 that’s lost in a black hole and things like this, right?

02:31:04 And that relies on notions of complexity,

02:31:09 which, you know, I find this fascinating.

02:31:11 Can you describe your personal quest

02:31:13 to build an expressive electronic wind instrument, EWI?

02:31:19 What is it?

02:31:20 What does it take to build it?

02:31:24 Well, I’m a tinker.

02:31:25 I like building things.

02:31:26 I like building things with combinations of electronics

02:31:28 and, you know, mechanical stuff.

02:31:32 You know, I have a bunch of different hobbies,

02:31:34 but, you know, probably my first one was little,

02:31:37 was building model airplanes and stuff like that.

02:31:39 And I still do that to some extent.

02:31:41 But also electronics, I taught myself electronics

02:31:43 before I studied it.

02:31:46 And the reason I taught myself electronics

02:31:48 is because of music.

02:31:49 My cousin was an aspiring electronic musician

02:31:53 and he had an analog synthesizer.

02:31:55 And I was, you know, basically modifying it for him

02:31:58 and building sequencers and stuff like that, right, for him.

02:32:00 I was in high school when I was doing this.

02:32:02 That’s the interesting, like, progressive rock, like 80s.

02:32:06 Like, what’s the greatest band of all time,

02:32:08 according to Yann LeCun?

02:32:09 Oh, man, there’s too many of them.

02:32:11 But, you know, it’s a combination of, you know,

02:32:16 Mahavishnu Orchestra, Weather Report,

02:32:19 yes, Genesis, you know, pre Peter Gabriel,

02:32:27 Gentle Giant, you know, things like that.

02:32:29 Great.

02:32:29 Okay, so this love of electronics

02:32:32 and this love of music combined together.

02:32:34 Right, so I was actually trained to play

02:32:36 Baroque and Renaissance music and I played in an orchestra

02:32:42 when I was in high school and first years of college.

02:32:45 And I played the recorder, crumb horn,

02:32:48 a little bit of oboe, you know, things like that.

02:32:50 So I’m a wind instrument player.

02:32:52 But I always wanted to play improvised music,

02:32:54 even though I don’t know anything about it.

02:32:56 And the only way I figured, you know,

02:32:58 short of like learning to play saxophone

02:33:01 was to play electronic wind instruments.

02:33:03 So they behave, you know, the fingering is similar

02:33:05 to a saxophone, but, you know,

02:33:07 you have wide variety of sound

02:33:09 because you control the synthesizer with it.

02:33:11 So I had a bunch of those, you know,

02:33:13 going back to the late 80s from either Yamaha or Akai.

02:33:18 They’re both kind of the main manufacturers of those.

02:33:22 So they were classically, you know,

02:33:23 going back several decades.

02:33:25 But I’ve never been completely satisfied with them

02:33:27 because of lack of expressivity.

02:33:31 And, you know, those things, you know,

02:33:32 are somewhat expressive.

02:33:33 I mean, they measure the breath pressure,

02:33:34 they measure the lip pressure.

02:33:36 And, you know, you have various parameters.

02:33:39 You can vary with fingers,

02:33:41 but they’re not really as expressive

02:33:44 as an acoustic instrument, right?

02:33:47 You hear John Coltrane play two notes

02:33:49 and you know it’s John Coltrane,

02:33:50 you know, it’s got a unique sound.

02:33:53 Or Miles Davis, right?

02:33:54 You can hear it’s Miles Davis playing the trumpet

02:33:57 because the sound reflects their, you know,

02:34:02 physiognomy, basically, the shape of the vocal track

02:34:07 kind of shapes the sound.

02:34:09 So how do you do this with an electronic instrument?

02:34:12 And I was, many years ago,

02:34:13 I met a guy called David Wessel.

02:34:15 He was a professor at Berkeley

02:34:18 and created the Center for Music Technology there.

02:34:23 And he was interested in that question.

02:34:25 And so I kept kind of thinking about this for many years.

02:34:28 And finally, because of COVID, you know, I was at home,

02:34:31 I was in my workshop.

02:34:32 My workshop serves also as my kind of Zoom room

02:34:36 and home office.

02:34:37 And this is in New Jersey?

02:34:38 In New Jersey.

02:34:39 And I started really being serious about, you know,

02:34:43 building my own iwi instrument.

02:34:45 What else is going on in that New Jersey workshop?

02:34:48 Is there some crazy stuff you’ve built,

02:34:50 like just, or like left on the workshop floor, left behind?

02:34:55 A lot of crazy stuff is, you know,

02:34:57 electronics built with microcontrollers of various kinds

02:35:01 and, you know, weird flying contraptions.

02:35:06 So you still love flying?

02:35:08 It’s a family disease.

02:35:09 My dad got me into it when I was a kid.

02:35:13 And he was building model airplanes when he was a kid.

02:35:16 And he was a mechanical engineer.

02:35:19 He taught himself electronics also.

02:35:21 So he built his early radio control systems

02:35:24 in the late 60s, early 70s.

02:35:27 And so that’s what got me into,

02:35:29 I mean, he got me into kind of, you know,

02:35:31 engineering and science and technology.

02:35:33 Do you also have an interest in appreciation of flight

02:35:36 in other forms, like with drones, quadroptors,

02:35:38 or do you, is it model airplane, the thing that’s?

02:35:41 You know, before drones were, you know,

02:35:45 kind of a consumer product, you know,

02:35:49 I built my own, you know,

02:35:50 with also building a microcontroller

02:35:52 with JavaScripts and accelerometers for stabilization,

02:35:56 writing the firmware for it, you know.

02:35:57 And then when it became kind of a standard thing

02:35:59 you could buy, it was boring, you know,

02:36:00 I stopped doing it.

02:36:01 It was not fun anymore.

02:36:03 Yeah.

02:36:04 You were doing it before it was cool.

02:36:06 Yeah.

02:36:07 What advice would you give to a young person today

02:36:10 in high school and college

02:36:11 that dreams of doing something big like Yann LeCun,

02:36:15 like let’s talk in the space of intelligence,

02:36:18 dreams of having a chance to solve

02:36:21 some fundamental problem in space of intelligence,

02:36:23 both for their career and just in life,

02:36:26 being somebody who was a part

02:36:28 of creating something special?

02:36:30 So try to get interested by big questions,

02:36:35 things like, you know, what is intelligence?

02:36:38 What is the universe made of?

02:36:40 What’s life all about?

02:36:41 Things like that.

02:36:45 Like even like crazy big questions,

02:36:47 like what’s time?

02:36:49 Like nobody knows what time is.

02:36:53 And then learn basic things,

02:36:58 like basic methods, either from math,

02:37:00 from physics or from engineering.

02:37:03 Things that have a long shelf life.

02:37:05 Like if you have a choice between,

02:37:07 like, you know, learning, you know,

02:37:10 mobile programming on iPhone

02:37:12 or quantum mechanics, take quantum mechanics.

02:37:16 Because you’re gonna learn things

02:37:18 that you have no idea exist.

02:37:20 And you may not, you may never be a quantum physicist,

02:37:25 but you will learn about path integrals.

02:37:26 And path integrals are used everywhere.

02:37:29 It’s the same formula that you use

02:37:30 for, you know, Bayesian integration and stuff like that.

02:37:33 So the ideas, the little ideas within quantum mechanics,

02:37:37 within some of these kind of more solidified fields

02:37:41 will have a longer shelf life.

02:37:42 You’ll somehow use indirectly in your work.

02:37:46 Learn classical mechanics, like you’ll learn

02:37:48 about Lagrangian, for example,

02:37:51 which is like a huge, hugely useful concept,

02:37:55 you know, for all kinds of different things.

02:37:57 Learn statistical physics, because all the math

02:38:01 that comes out of, you know, for machine learning

02:38:05 basically comes out of, was figured out

02:38:07 by statistical physicists in the, you know,

02:38:09 late 19th, early 20th century, right?

02:38:10 So, and for some of them actually more recently

02:38:14 for, by people like Giorgio Parisi,

02:38:16 who just got the Nobel prize for the replica method,

02:38:19 among other things, it’s used for a lot of different things.

02:38:23 You know, variational inference,

02:38:25 that math comes from statistical physics.

02:38:28 So a lot of those kind of, you know, basic courses,

02:38:33 you know, if you do electrical engineering,

02:38:36 you take signal processing,

02:38:37 you’ll learn about Fourier transforms.

02:38:39 Again, something super useful is at the basis

02:38:42 of things like graph neural nets,

02:38:44 which is an entirely new sub area of, you know,

02:38:49 AI machine learning, deep learning,

02:38:50 which I think is super promising

02:38:52 for all kinds of applications.

02:38:54 Something very promising,

02:38:55 if you’re more interested in applications,

02:38:56 is the applications of AI machine learning

02:38:58 and deep learning to science,

02:39:01 or to science that can help solve big problems

02:39:05 in the world.

02:39:05 I have colleagues at Meta, at Fair,

02:39:09 who started this project called Open Catalyst,

02:39:11 and it’s an open project collaborative.

02:39:14 And the idea is to use deep learning

02:39:16 to help design new chemical compounds or materials

02:39:21 that would facilitate the separation

02:39:23 of hydrogen from oxygen.

02:39:25 If you can efficiently separate oxygen from hydrogen

02:39:29 with electricity, you solve climate change.

02:39:33 It’s as simple as that,

02:39:34 because you cover, you know,

02:39:37 some random desert with solar panels,

02:39:40 and you have them work all day,

02:39:42 produce hydrogen,

02:39:43 and then you shoot the hydrogen wherever it’s needed.

02:39:45 You don’t need anything else.

02:39:48 You know, you have controllable power

02:39:53 that can be transported anywhere.

02:39:55 So if we have a large scale,

02:39:59 efficient energy storage technology,

02:40:02 like producing hydrogen, we solve climate change.

02:40:06 Here’s another way to solve climate change,

02:40:08 is figuring out how to make fusion work.

02:40:10 Now, the problem with fusion

02:40:11 is that you make a super hot plasma,

02:40:13 and the plasma is unstable and you can’t control it.

02:40:16 Maybe with deep learning,

02:40:17 you can find controllers that will stabilize plasma

02:40:19 and make, you know, practical fusion reactors.

02:40:21 I mean, that’s very speculative,

02:40:23 but, you know, it’s worth trying,

02:40:24 because, you know, the payoff is huge.

02:40:28 There’s a group at Google working on this,

02:40:29 led by John Platt.

02:40:31 So control, convert as many problems

02:40:33 in science and physics and biology and chemistry

02:40:36 into a learnable problem

02:40:39 and see if a machine can learn it.

02:40:41 Right, I mean, there’s properties of, you know,

02:40:43 complex materials that we don’t understand

02:40:46 from first principle, for example, right?

02:40:48 So, you know, if we could design new, you know,

02:40:53 new materials, we could make more efficient batteries.

02:40:56 You know, we could make maybe faster electronics.

02:40:58 We could, I mean, there’s a lot of things we can imagine

02:41:01 doing, or, you know, lighter materials

02:41:04 for cars or airplanes or things like that.

02:41:06 Maybe better fuel cells.

02:41:07 I mean, there’s all kinds of stuff we can imagine.

02:41:09 If we had good fuel cells, hydrogen fuel cells,

02:41:12 we could use them to power airplanes,

02:41:13 and, you know, transportation wouldn’t be, or cars,

02:41:17 and we wouldn’t have emission problem,

02:41:20 CO2 emission problems for air transportation anymore.

02:41:24 So there’s a lot of those things, I think,

02:41:26 where AI, you know, can be used.

02:41:30 And this is not even talking about

02:41:31 all the sort of medicine, biology,

02:41:33 and everything like that, right?

02:41:35 You know, like, you know, protein folding,

02:41:37 you know, figuring out, like, how could you design

02:41:40 your proteins that it sticks to another protein

02:41:41 at a particular site, because that’s how you design drugs

02:41:44 in the end.

02:41:46 So, you know, deep learning would be useful,

02:41:47 although those are kind of, you know,

02:41:49 would be sort of enormous progress

02:41:51 if we could use it for that.

02:41:53 Here’s an example.

02:41:54 If you take, this is like from recent material physics,

02:41:58 you take a monoatomic layer of graphene, right?

02:42:02 So it’s just carbon on a hexagonal mesh,

02:42:04 and you make this single atom thick.

02:42:09 You put another one on top,

02:42:10 you twist them by some magic number of degrees,

02:42:13 three degrees or something.

02:42:14 It becomes superconductor.

02:42:16 Nobody has any idea why.

02:42:18 Okay.

02:42:20 I want to know how that was discovered,

02:42:22 but that’s the kind of thing that machine learning

02:42:23 can actually discover, these kinds of things.

02:42:25 Maybe not, but there is a hint, perhaps,

02:42:28 that with machine learning, we would train a system

02:42:31 to basically be a phenomenological model

02:42:34 of some complex emergent phenomenon,

02:42:37 which, you know, superconductivity is one of those,

02:42:42 where, you know, this collective phenomenon

02:42:44 is too difficult to describe from first principles

02:42:46 with the current, you know,

02:42:48 the usual sort of reductionist type method,

02:42:51 but we could have deep learning systems

02:42:54 that predict the properties of a system

02:42:57 from a description of it after being trained

02:42:59 with sufficiently many samples.

02:43:04 This guy, Pascal Foua, at EPFL,

02:43:06 he has a startup company that,

02:43:09 where he basically trained a convolutional net,

02:43:13 essentially, to predict the aerodynamic properties

02:43:16 of solids, and you can generate as much data as you want

02:43:19 by just running computational free dynamics, right?

02:43:21 So you give, like, a wing, airfoil,

02:43:27 or something, shape of some kind,

02:43:29 and you run computational free dynamics,

02:43:31 you get, as a result, the drag and, you know,

02:43:36 lift and all that stuff, right?

02:43:37 And you can generate lots of data,

02:43:40 train a neural net to make those predictions,

02:43:41 and now what you have is a differentiable model

02:43:44 of, let’s say, drag and lift

02:43:47 as a function of the shape of that solid,

02:43:48 and so you can do back rate and descent,

02:43:49 you can optimize the shape

02:43:51 so you get the properties you want.

02:43:54 Yeah, that’s incredible.

02:43:56 That’s incredible, and on top of all that,

02:43:58 probably you should read a little bit of literature

02:44:01 and a little bit of history

02:44:03 for inspiration and for wisdom,

02:44:06 because after all, all of these technologies

02:44:08 will have to work in the human world.

02:44:10 Yes.

02:44:11 And the human world is complicated.

02:44:12 It is, sadly.

02:44:15 Jan, this is an amazing conversation.

02:44:18 I’m really honored that you would talk with me today.

02:44:20 Thank you for all the amazing work you’re doing

02:44:22 at FAIR, at Meta, and thank you for being so passionate

02:44:26 after all these years about everything

02:44:28 that’s going on, you’re a beacon of hope

02:44:29 for the machine learning community,

02:44:31 and thank you so much for spending

02:44:33 your valuable time with me today.

02:44:34 That was awesome.

02:44:35 Thanks for having me on.

02:44:36 That was a pleasure.

02:44:38 Thanks for listening to this conversation with Jan Lacune.

02:44:41 To support this podcast,

02:44:42 please check out our sponsors in the description.

02:44:45 And now, let me leave you with some words

02:44:47 from Isaac Asimov.

02:44:50 Your assumptions are your windows on the world.

02:44:53 Scrub them off every once in a while,

02:44:56 or the light won’t come in.

02:44:58 Thank you for listening, and hope to see you next time.