Dileep George: Brain-Inspired AI #115

Transcript

00:00:00 The following is a conversation with Dilip George, a researcher at the intersection of

00:00:05 Neuroscience and Artificial Intelligence, cofounder of Vicarious with Scott Phoenix,

00:00:10 and formerly cofounder of Numenta with Jeff Hawkins, who’s been on this podcast, and

00:00:16 Donna Dubinsky. From his early work on hierarchical temporal memory to recursive cortical networks

00:00:23 to today, Dilip’s always sought to engineer intelligence that is closely inspired by the

00:00:29 human brain. As a side note, I think we understand very little about the fundamental principles

00:00:35 underlying the function of the human brain, but the little we do know gives hints that may be

00:00:41 more useful for engineering intelligence than any idea in mathematics, computer science, physics,

00:00:46 and scientific fields outside of biology. And so the brain is a kind of existence proof that says

00:00:53 it’s possible. Keep at it. I should also say that brain inspired AI is often overhyped and use this

00:01:01 fodder just as quantum computing for marketing speak, but I’m not afraid of exploring these

00:01:08 sometimes overhyped areas since where there’s smoke, there’s sometimes fire.

00:01:13 Quick summary of the ads. Three sponsors, Babbel, Raycon Earbuds, and Masterclass. Please consider

00:01:20 supporting this podcast by clicking the special links in the description to get the discount.

00:01:25 It really is the best way to support this podcast. If you enjoy this thing, subscribe on YouTube,

00:01:31 review it with five stars on Apple Podcast, support on Patreon, or connect with me on Twitter

00:01:36 at Lex Friedman. As usual, I’ll do a few minutes of ads now and never any ads in the middle that

00:01:42 can break the flow of the conversation. This show is sponsored by Babbel, an app and website that

00:01:48 gets you speaking in a new language within weeks. Go to babbel.com and use code LEX to get three

00:01:54 months free. They offer 14 languages, including Spanish, French, Italian, German, and yes, Russian.

00:02:03 Daily lessons are 10 to 15 minutes, super easy, effective, designed by over 100 language experts.

00:02:10 Let me read a few lines from the Russian poem Noch ulytse fanar apteka by Alexander Bloc

00:02:18 that you’ll start to understand if you sign up to Babbel.

00:02:34 Now I say that you’ll only start to understand this poem because Russian starts with a language

00:02:41 and ends with vodka. Now the latter part is definitely not endorsed or provided by Babbel

00:02:47 and will probably lose me the sponsorship, but once you graduate from Babbel,

00:02:51 you can enroll in my advanced course of late night Russian conversation over vodka.

00:02:56 I have not yet developed an app for that. It’s in progress. So get started by visiting babbel.com

00:03:02 and use code LEX to get three months free. This show is sponsored by Raycon earbuds.

00:03:09 Get them at buyraycon.com slash LEX. They become my main method of listening to podcasts,

00:03:14 audiobooks, and music when I run, do pushups and pull ups, or just living life. In fact,

00:03:20 I often listen to brown noise with them when I’m thinking deeply about something. It helps me focus.

00:03:26 They’re super comfortable, pair easily, great sound, great bass, six hours of playtime.

00:03:33 I’ve been putting in a lot of miles to get ready for a potential ultra marathon

00:03:38 and listening to audiobooks on World War II. The sound is rich and really comes in clear.

00:03:45 So again, get them at buyraycon.com slash LEX. This show is sponsored by Masterclass.

00:03:52 Sign up at masterclass.com slash LEX to get a discount and to support this podcast.

00:03:57 When I first heard about Masterclass, I thought it was too good to be true. I still think it’s

00:04:02 too good to be true. For 180 bucks a year, you get an all access pass to watch courses from

00:04:08 to list some of my favorites. Chris Hatfield on Space Exploration, Neil deGrasse Tyson on

00:04:13 Scientific Thinking and Communication, Will Wright, creator of SimCity and Sims on Game Design.

00:04:19 Every time I do this read, I really want to play a city builder game. Carlos Santana on guitar,

00:04:26 Garak Kasparov on chess, Daniel Nagano on poker and many more. Chris Hatfield explaining how rockets

00:04:32 work and the experience of being launched into space alone is worth the money. By the way,

00:04:38 you can watch it on basically any device. Once again, sign up at masterclass.com to get a discount

00:04:43 and to support this podcast. And now here’s my conversation with Dileep George. Do you think

00:04:50 we need to understand the brain in order to build it? Yes. If you want to build the brain, we

00:04:56 definitely need to understand how it works. Blue Brain or Henry Markram’s project is trying to

00:05:04 build a brain without understanding it, just trying to put details of the brain from neuroscience

00:05:11 experiments into a giant simulation by putting more and more neurons, more and more details.

00:05:18 But that is not going to work because when it doesn’t perform as what you expect it to do,

00:05:26 then what do you do? You just keep adding more details. How do you debug it? So unless you

00:05:32 understand, unless you have a theory about how the system is supposed to work, how the pieces are

00:05:37 supposed to fit together, what they’re going to contribute, you can’t build it. At the functional

00:05:42 level, understand. So can you actually linger on and describe the Blue Brain project? It’s kind of

00:05:48 a fascinating principle and idea to try to simulate the brain. We’re talking about the human

00:05:56 brain, right? Right. Human brains and rat brains or cat brains have lots in common that the cortex,

00:06:03 the neocortex structure is very similar. So initially they were trying to just simulate

00:06:11 a cat brain. To understand the nature of evil. To understand the nature of evil. Or as it happens

00:06:21 in most of these simulations, you easily get one thing out, which is oscillations. If you simulate

00:06:29 a large number of neurons, they oscillate and you can adjust the parameters and say that,

00:06:35 oh, oscillations match the rhythm that we see in the brain, et cetera. I see. So the idea is,

00:06:43 is the simulation at the level of individual neurons? Yeah. So the Blue Brain project,

00:06:49 the original idea as proposed was you put very detailed biophysical neurons, biophysical models

00:06:59 of neurons, and you interconnect them according to the statistics of connections that we have found

00:07:06 from real neuroscience experiments, and then turn it on and see what happens. And these neural

00:07:14 models are incredibly complicated in themselves, right? Because these neurons are modeled using

00:07:22 this idea called Hodgkin Huxley models, which are about how signals propagate in a cable.

00:07:28 And there are active dendrites, all those phenomena, which those phenomena themselves,

00:07:34 we don’t understand that well. And then we put in connectivity, which is part guesswork,

00:07:40 part observed. And of course, if we do not have any theory about how it is supposed to work,

00:07:48 we just have to take whatever comes out of it as, okay, this is something interesting.

00:07:54 But in your sense, these models of the way signal travels along,

00:07:59 like with the axons and all the basic models, they’re too crude.

00:08:04 Oh, well, actually, they are pretty detailed and pretty sophisticated. And they do replicate

00:08:12 the neural dynamics. If you take a single neuron and you try to turn on the different channels,

00:08:20 the calcium channels and the different receptors, and see what the effect of turning on or off those

00:08:28 channels are in the neuron’s spike output, people have built pretty sophisticated models of that.

00:08:35 And they are, I would say, in the regime of correct.

00:08:41 Well, see, the correctness, that’s interesting, because you mentioned at several levels,

00:08:45 the correctness is measured by looking at some kind of aggregate statistics.

00:08:49 It would be more of the spiking dynamics of a signal neuron.

00:08:53 Spiking dynamics of a signal neuron, okay.

00:08:54 Yeah. And yeah, these models, because they are going to the level of mechanism,

00:09:00 so they are basically looking at, okay, what is the effect of turning on an ion channel?

00:09:07 And you can model that using electric circuits. So it is not just a function fitting. People are

00:09:17 looking at the mechanism underlying it and putting that in terms of electric circuit theory, signal

00:09:23 propagation theory, and modeling that. So those models are sophisticated, but getting a single

00:09:31 neurons model 99% right does not still tell you how to… It would be the analog of getting a

00:09:40 transistor model right and now trying to build a microprocessor. And if you did not understand how

00:09:50 a microprocessor works, but you say, oh, I now can model one transistor well, and now I will just

00:09:57 try to interconnect the transistors according to whatever I could guess from the experiments

00:10:03 and try to simulate it, then it is very unlikely that you will produce a functioning microprocessor.

00:10:12 When you want to produce a functioning microprocessor, you want to understand Boolean

00:10:16 logic, how do the gates work, all those things, and then understand how do those gates get

00:10:22 implemented using transistors. Yeah. This reminds me, there’s a paper,

00:10:26 maybe you’re familiar with it, that I remember going through in a reading group that

00:10:31 approaches a microprocessor from a perspective of a neuroscientist. I think it basically,

00:10:38 it uses all the tools that we have of neuroscience to try to understand,

00:10:42 like as if we just aliens showed up to study computers and to see if those tools could be

00:10:49 used to get any kind of sense of how the microprocessor works. I think the final,

00:10:54 the takeaway from at least this initial exploration is that we’re screwed. There’s no

00:11:01 way that the tools of neuroscience would be able to get us to anything, like not even

00:11:05 Boolean logic. I mean, it’s just any aspect of the architecture of the function of the

00:11:15 processes involved, the clocks, the timing, all that, you can’t figure that out from the

00:11:21 tools of neuroscience. Yeah. So I’m very familiar with this particular

00:11:25 paper. I think it was called, can a neuroscientist understand a microprocessor or something like

00:11:33 that. Following the methodology in that paper, even an electrical engineer would not understand

00:11:39 microprocessors. So I don’t think it is that bad in the sense of saying, neuroscientists do

00:11:49 find valuable things by observing the brain. They do find good insights, but those insights cannot

00:11:58 be put together just as a simulation. You have to investigate what are the computational

00:12:05 underpinnings of those findings. How do all of them fit together from an information processing

00:12:13 and information processing perspective? Somebody has to painstakingly put those things together

00:12:21 and build hypothesis. So I don’t want to diss all of neuroscientists saying, oh, they’re not

00:12:26 finding anything. No, that paper almost went to that level of neuroscientists will never

00:12:31 understand. No, that’s not true. I think they do find lots of useful things, but it has to be put

00:12:37 together in a computational framework. Yeah. I mean, but you know, just the AI systems will be

00:12:43 listening to this podcast a hundred years from now and they will probably, there’s some nonzero

00:12:50 probability they’ll find your words laughable. There’s like, I remember humans thought they

00:12:55 understood something about the brain. They were totally clueless. There’s a sense about neuroscience

00:12:59 that we may be in the very, very early days of understanding the brain. But I mean, that’s one

00:13:06 perspective. I mean, in your perspective, how far are we into understanding any aspect of the brain?

00:13:18 So the, the, the dynamics of the individual neuron communication to the, how when they, in,

00:13:24 in a collective sense, how they’re able to store information, transfer information, how

00:13:31 intelligence then emerges, all that kind of stuff. Where are we on that timeline?

00:13:35 Yeah. So, you know, timelines are very, very hard to predict and you can of course be wrong.

00:13:40 And it can be wrong in, on either side. You know, we know that now when we look back the first

00:13:48 flight was in 1903. In 1900, there was a New York Times article on flying machines that do not fly

00:13:57 and, and you know, humans might not fly for another a hundred years. That was what that

00:14:03 article stated. And so, but no, they, they flew three years after that. So it is, you know,

00:14:08 it’s very hard to, so… Well, and on that point, one of the Wright brothers,

00:14:15 I think two years before, said that, like he said, like some number, like 50 years,

00:14:23 he has become convinced that it’s, it’s, it’s impossible. Even during their experimentation.

00:14:31 Yeah. Yeah. I mean, that’s a tribute to when that’s like the entrepreneurial battle of like

00:14:36 depression of going through, just like thinking there’s, this is impossible, but there, yeah,

00:14:41 there’s something, even the person that’s in it is not able to see estimate correctly.

00:14:47 Exactly. But I can, I can tell from the point of, you know, objectively, what are the things that we

00:14:52 know about the brain and how that can be used to build AI models, which can then go back and

00:14:58 inform how the brain works. So my way of understanding the brain would be to basically say,

00:15:04 look at the insights neuroscientists have found, understand that from a computational angle,

00:15:11 information processing angle, build models using that. And then building that model, which,

00:15:18 which functions, which is a functional model, which is, which is doing the task that we want

00:15:22 the model to do. It is not just trying to model a phenomena in the brain. It is, it is trying to

00:15:27 do what the brain is trying to do on the, on the whole functional level. And building that model

00:15:33 will help you fill in the missing pieces that, you know, biology just gives you the hints and

00:15:39 building the model, you know, fills in the rest of the, the pieces of the puzzle. And then you

00:15:44 can go and connect that back to biology and say, okay, now it makes sense that this part of the

00:15:51 brain is doing this, or this layer in the cortical circuit is doing this. And then continue this

00:15:59 iteratively because now that will inform new experiments in neuroscience. And of course,

00:16:05 you know, building the model and verifying that in the real world will also tell you more about,

00:16:11 does the model actually work? And you can refine the model, find better ways of putting these

00:16:17 neuroscience insights together. So, so I would say it is, it is, you know, it, so

00:16:23 neuroscientists alone, just from experimentation will not be able to build a model of the,

00:16:28 of the brain or a functional model of the brain. So we, you know, there, there’s lots of efforts,

00:16:35 which are very impressive efforts in collecting more and more connectivity data from the brain.

00:16:41 You know, how, how are the microcircuits of the brain connected with each other?

00:16:45 Those are beautiful, by the way.

00:16:47 Those are beautiful. And at the same time, those, those do not itself by themselves,

00:16:54 convey the story of how does it work? And, and somebody has to understand, okay,

00:17:00 why are they connected like that? And what, what are those things doing? And, and we do that by

00:17:06 building models in AI using hints from neuroscience and, and repeat the cycle.

00:17:11 So what aspect of the brain are useful in this whole endeavor, which by the way, I should say,

00:17:18 you’re, you’re both a neuroscientist and an AI person. I guess the dream is to both understand

00:17:24 the brain and to build AGI systems. So you’re, it’s like an engineer’s perspective of trying

00:17:32 to understand the brain. So what aspects of the brain, functionally speaking, like you said,

00:17:37 do you find interesting?

00:17:38 Yeah, quite a lot of things. All right. So one is, you know, if you look at the visual cortex

00:17:46 and, and, you know, the visual cortex is, is a large part of the brain. I forget the exact

00:17:51 fraction, but it is, it’s a huge part of our brain area is occupied by just, just vision.

00:17:59 So vision, visual cortex is not just a feed forward cascade of neurons. There are a lot

00:18:06 more feedback connections in the brain compared to the feed forward connections. And, and it is

00:18:11 surprising to the level of detail neuroscientists have actually studied this. If you, if you go into

00:18:17 neuroscience literature and poke around and ask, you know, have they studied what will be the effect

00:18:22 of poking a neuron in level IT in level V1? And have they studied that? And you will say, yes,

00:18:33 they have studied that.

00:18:34 So every part of every possible combination.

00:18:38 I mean, it’s, it’s a, it’s not a random exploration at all. It’s a very hypothesis driven,

00:18:43 right? Like they, they are very experimental. Neuroscientists are very, very systematic

00:18:47 in how they probe the brain because experiments are very costly to conduct. They take a lot of

00:18:52 preparation. They, they need a lot of control. So they, they are very hypothesis driven in how

00:18:57 they probe the brain. And often what I find is that when we have a question in AI about

00:19:05 has anybody probed how lateral connections in the brain works? And when you go and read the

00:19:11 literature, yes, people have probed it and people have probed it very systematically. And, and they

00:19:16 have hypotheses about how those lateral connections are supposedly contributing to visual processing.

00:19:23 But of course they haven’t built very, very functional, detailed models of it.

00:19:27 By the way, how do the, in those studies, sorry to interrupt, do they, do they stimulate like

00:19:32 a neuron in one particular area of the visual cortex and then see how the travel of the signal

00:19:37 travels kind of thing?

00:19:38 Fascinating, very, very fascinating experiments. So I can, I can give you one example I was

00:19:43 impressed with. This is, so before going to that, let me, let me give you, you know, a overview of

00:19:50 how the, the layers in the cortex are organized, right? Visual cortex is organized into roughly

00:19:56 four hierarchical levels. Okay. So V1, V2, V4, IT. And in V1…

00:20:02 What happened to V3?

00:20:03 Well, yeah, that’s another pathway. Okay. So this is, this, I’m talking about just object

00:20:08 recognition pathway.

00:20:09 All right, cool.

00:20:10 And then in V1 itself, so it’s, there is a very detailed microcircuit in V1 itself. That is,

00:20:19 there is organization within a level itself. The cortical sheet is organized into, you know,

00:20:25 multiple layers and there are columnar structure. And, and this, this layer wise and columnar

00:20:31 structure is repeated in V1, V2, V4, IT, all of them, right? And, and the connections between

00:20:38 these layers within a level, you know, in V1 itself, there are six layers roughly, and the

00:20:44 connections between them, there is a particular structure to them. And now, so one example

00:20:51 of an experiment people did is when I, when you present a stimulus, which is, let’s say,

00:21:00 requires separating the foreground from the background of an object. So it is, it’s a

00:21:06 textured triangle on a textured background. And you can check, does the surface settle

00:21:14 first or does the contour settle first?

00:21:19 Settle?

00:21:19 Settle in the sense that the, so when you finally form the percept of the, of the triangle,

00:21:28 you understand where the contours of the triangle are, and you also know where the inside of

00:21:32 the triangle is, right? That’s when you form the final percept. Now you can ask, what is

00:21:39 the dynamics of forming that final percept? Do the, do the neurons first find the edges

00:21:48 and converge on where the edges are, and then they find the inner surfaces, or does it go

00:21:55 the other way around?

00:21:55 The other way around. So what’s the answer?

00:21:58 In this case, it turns out that it first settles on the edges. It converges on the edge hypothesis

00:22:05 first, and then the surfaces are filled in from the edges to the inside.

00:22:10 That’s fascinating.

00:22:12 And the detail to which you can study this, it’s amazing that you can actually not only

00:22:18 find the temporal dynamics of when this happens, and then you can also find which layer in

00:22:25 the, you know, in V1, which layer is encoding the edges, which layer is encoding the surfaces,

00:22:32 and which layer is encoding the feedback, which layer is encoding the feed forward,

00:22:37 and what’s the combination of them that produces the final percept.

00:22:42 And these kinds of experiments stand out when you try to explain illusions. One example

00:22:48 of a favorite illusion of mine is the Kanitsa triangle. I don’t know that you are familiar

00:22:51 with this one. So this is an example where it’s a triangle, but only the corners of the

00:23:00 triangle are shown in the stimulus. So they look like kind of Pacman.

00:23:06 Oh, the black Pacman.

00:23:07 Exactly.

00:23:08 And then you start to see.

00:23:10 Your visual system hallucinates the edges. And when you look at it, you will see a faint

00:23:16 edge. And you can go inside the brain and look, do actually neurons signal the presence

00:23:24 of this edge? And if they signal, how do they do it? Because they are not receiving anything

00:23:30 from the input. The input is blank for those neurons. So how do they signal it? When does

00:23:37 the signaling happen? So if a real contour is present in the input, then the neurons

00:23:45 immediately signal, okay, there is an edge here. When it is an illusory edge, it is clearly

00:23:52 not in the input. It is coming from the context. So those neurons fire later. And you can say

00:23:58 that, okay, it’s the feedback connection that is causing them to fire. And they happen later.

00:24:05 And I’ll find the dynamics of them. So these studies are pretty impressive and very detailed.

00:24:13 So by the way, just a step back, you said that there may be more feedback connections

00:24:20 than feed forward connections. First of all, if it’s just for like a machine learning folks,

00:24:27 I mean, that’s crazy that there’s all these feedback connections. We often think about,

00:24:36 thanks to deep learning, you start to think about the human brain as a kind of feed forward

00:24:42 mechanism. So what the heck are these feedback connections? What’s the dynamics? What are we

00:24:52 supposed to think about them? So this fits into a very beautiful picture about how the brain works.

00:24:59 So the beautiful picture of how the brain works is that our brain is building a model of the world.

00:25:06 I know. So our visual system is building a model of how objects behave in the world. And we are

00:25:13 constantly projecting that model back onto the world. So what we are seeing is not just a feed

00:25:20 forward thing that just gets interpreted in a feed forward part. We are constantly projecting

00:25:25 our expectations onto the world. And what the final person is a combination of what we project

00:25:31 onto the world combined with what the actual sensory input is. Almost like trying to calculate

00:25:37 the difference and then trying to interpret the difference. Yeah. I wouldn’t put this calculating

00:25:44 the difference. It’s more like what is the best explanation for the input stimulus based on the

00:25:50 model of the world I have. Got it. And that’s where all the illusions come in. But that’s an

00:25:56 incredibly efficient process. So the feedback mechanism, it just helps you constantly. Yeah.

00:26:05 So hallucinate how the world should be based on your world model and then just looking at

00:26:11 if there’s novelty, like trying to explain it. Hence, that’s why movement. We detect movement

00:26:19 really well. There’s all these kinds of things. And this is like at all different levels of the

00:26:25 cortex you’re saying. This happens at the lowest level or the highest level. Yes. Yeah. In fact,

00:26:30 feedback connections are more prevalent in everywhere in the cortex. And so one way to

00:26:36 think about it, and there’s a lot of evidence for this, is inference. So basically, if you have a

00:26:42 model of the world and when some evidence comes in, what you are doing is inference. You are trying

00:26:50 to now explain this evidence using your model of the world. And this inference includes projecting

00:26:58 your model onto the evidence and taking the evidence back into the model and doing an

00:27:04 iterative procedure. And this iterative procedure is what happens using the feed forward feedback

00:27:11 propagation. And feedback affects what you see in the world, and it also affects feed forward

00:27:17 propagation. And examples are everywhere. We see these kinds of things everywhere. The idea that

00:27:25 there can be multiple competing hypotheses in our model trying to explain the same evidence,

00:27:32 and then you have to kind of make them compete. And one hypothesis will explain away the other

00:27:39 hypothesis through this competition process. So you have competing models of the world

00:27:46 that try to explain. What do you mean by explain away?

00:27:50 So this is a classic example in graphical models, probabilistic models.

00:27:56 What are those?

00:28:01 I think it’s useful to mention because we’ll talk about them more.

00:28:05 So neural networks are one class of machine learning models. You have distributed set of

00:28:12 nodes, which are called the neurons. Each one is doing a dot product and you can approximate

00:28:18 any function using this multilevel network of neurons. So that’s a class of models which are

00:28:24 useful for function approximation. There is another class of models in machine learning

00:28:30 called probabilistic graphical models. And you can think of them as each node in that model is

00:28:38 variable, which is talking about something. It can be a variable representing, is an edge present

00:28:46 in the input or not? And at the top of the network, a node can be representing, is there an object

00:28:56 present in the world or not? So it is another way of encoding knowledge. And then once you

00:29:06 encode the knowledge, you can do inference in the right way. What is the best way to

00:29:15 explain some set of evidence using this model that you encoded? So when you encode the model,

00:29:20 you are encoding the relationship between these different variables. How is the edge

00:29:24 connected to the model of the object? How is the surface connected to the model of the object?

00:29:29 And then, of course, this is a very distributed, complicated model. And inference is, how do you

00:29:37 explain a piece of evidence when a set of stimulus comes in? If somebody tells me there is a 50%

00:29:42 probability that there is an edge here in this part of the model, how does that affect my belief

00:29:47 on whether I should think that there is a square present in the image? So this is the process of

00:29:54 inference. So one example of inference is having this expiring away effect between multiple causes.

00:30:02 So graphical models can be used to represent causality in the world. So let’s say, you know,

00:30:10 your alarm at home can be triggered by a burglar getting into your house, or it can be triggered

00:30:22 by an earthquake. Both can be causes of the alarm going off. So now, you’re in your office,

00:30:30 you heard burglar alarm going off, you are heading home, thinking that there’s a burglar got in. But

00:30:36 while driving home, if you hear on the radio that there was an earthquake in the vicinity,

00:30:41 now your strength of evidence for a burglar getting into their house is diminished. Because

00:30:49 now that piece of evidence is explained by the earthquake being present. So if you think about

00:30:56 these two causes explaining at lower level variable, which is alarm, now, what we’re seeing

00:31:01 is that increasing the evidence for some cause, you know, there is evidence coming in from below

00:31:08 for alarm being present. And initially, it was flowing to a burglar being present. But now,

00:31:14 since there is side evidence for this other cause, it explains away this evidence and evidence will

00:31:20 now flow to the other cause. This is, you know, two competing causal things trying to explain

00:31:26 the same evidence. And the brain has a similar kind of mechanism for doing so. That’s kind of

00:31:31 interesting. And how’s that all encoded in the brain? Like, where’s the storage of information?

00:31:39 Are we talking just maybe to get it a little bit more specific? Is it in the hardware of the actual

00:31:46 connections? Is it in chemical communication? Is it electrical communication? Do we know?

00:31:53 So this is, you know, a paper that we are bringing out soon.

00:31:56 Which one is this?

00:31:57 This is the cortical microcircuits paper that I sent you a draft of. Of course, this is a lot of

00:32:03 this. A lot of it is still hypothesis. One hypothesis is that you can think of a cortical column

00:32:09 as encoding a concept. A concept, you know, think of it as an example of a concept. Is an edge

00:32:20 present or not? Or is an object present or not? Okay, so you can think of it as a binary variable,

00:32:27 a binary random variable. The presence of an edge or not, or the presence of an object or not.

00:32:32 So each cortical column can be thought of as representing that one concept, one variable.

00:32:38 And then the connections between these cortical columns are basically encoding the relationship

00:32:43 between these random variables. And then there are connections within the cortical column.

00:32:49 Each cortical column is implemented using multiple layers of neurons with very, very,

00:32:54 very rich structure there. You know, there are thousands of neurons in a cortical column.

00:33:00 But that structure is similar across the different cortical columns.

00:33:03 Correct. And also these cortical columns connect to a substructure called thalamus.

00:33:10 So all cortical columns pass through this substructure. So our hypothesis is that

00:33:17 the connections between the cortical columns implement this, you know, that’s where the

00:33:21 knowledge is stored about how these different concepts connect to each other. And then the

00:33:28 neurons inside this cortical column and in thalamus in combination implement this actual

00:33:35 computation for inference, which includes explaining away and competing between the

00:33:41 different hypotheses. And it is all very… So what is amazing is that neuroscientists have

00:33:49 actually done experiments to the tune of showing these things. They might not be putting it in the

00:33:55 overall inference framework, but they will show things like, if I poke this higher level neuron,

00:34:03 it will inhibit through this complicated loop through thalamus, it will inhibit this other

00:34:07 column. So they will do such experiments. But do they use terminology of concepts,

00:34:14 for example? So, I mean, is it something where it’s easy to anthropomorphize

00:34:22 and think about concepts like you started moving into logic based kind of reasoning systems. So

00:34:31 I would just think of concepts in that kind of way, or is it a lot messier, a lot more gray area,

00:34:40 you know, even more gray, even more messy than the artificial neural network kinds,

00:34:47 kinds of abstractions? Easiest way to think of it as a variable,

00:34:50 right? It’s a binary variable, which is showing the presence or absence of something.

00:34:55 So, but I guess what I’m asking is, is that something that we’re supposed to think of

00:35:01 something that’s human interpretable of that something?

00:35:04 It doesn’t need to be. It doesn’t need to be human interpretable. There’s no need for it to

00:35:07 be human interpretable. But it’s almost like you will be able to find some interpretation of it

00:35:17 because it is connected to the other things that you know about.

00:35:20 Yeah. And the point is it’s useful somehow.

00:35:23 Yeah. It’s useful as an entity in the graphic,

00:35:29 in connecting to the other entities that are, let’s call them concepts.

00:35:33 Right. Okay. So, by the way, are these the cortical microcircuits?

00:35:38 Correct. These are the cortical microcircuits. You know, that’s what neuroscientists use to

00:35:43 talk about the circuits within a level of the cortex. So, you can think of, you know,

00:35:49 let’s think of a neural network, artificial neural network terms. People talk about the

00:35:54 architecture of how many layers they build, what is the fan in, fan out, et cetera. That is the

00:36:01 macro architecture. And then within a layer of the neural network, the cortical neural network

00:36:11 is much more structured within a level. There’s a lot more intricate structure there. But even

00:36:18 within an artificial neural network, you can think of feature detection plus pooling as one

00:36:23 level. And so, that is kind of a microcircuit. It’s much more complex in the real brain. And so,

00:36:32 within a level, whatever is that circuitry within a column of the cortex and between the layers of

00:36:38 the cortex, that’s the microcircuitry. I love that terminology. Machine learning

00:36:43 people don’t use the circuit terminology. Right.

00:36:45 But they should. It’s nice. So, okay. Okay. So, that’s the cortical microcircuit. So,

00:36:53 what’s interesting about, what can we say, what is the paper that you’re working on

00:37:00 propose about the ideas around these cortical microcircuits?

00:37:04 So, this is a fully functional model for the microcircuits of the visual cortex.

00:37:10 So, the paper focuses on your idea and our discussion now is focusing on vision.

00:37:15 Yeah. The visual cortex. Okay. So,

00:37:18 this is a model. This is a full model. This is how vision works.

00:37:22 But this is a hypothesis. Okay. So, let me step back a bit. So, we looked at neuroscience for

00:37:32 insights on how to build a vision model. Right.

00:37:35 And we synthesized all those insights into a computational model. This is called the recursive

00:37:40 cortical network model that we used for breaking captures. And we are using the same model for

00:37:47 robotic picking and tracking of objects. And that, again, is a vision system.

00:37:52 That’s a vision system. Computer vision system.

00:37:54 That’s a computer vision system. Takes in images and outputs what?

00:37:59 On one side, it outputs the class of the image and also segments the image. And you can also ask it

00:38:06 further queries. Where is the edge of the object? Where is the interior of the object? So, it’s a

00:38:11 model that you build to answer multiple questions. So, you’re not trying to build a model for just

00:38:17 classification or just segmentation, et cetera. It’s a joint model that can do multiple things.

00:38:23 So, that’s the model that we built using insights from neuroscience. And some of those insights are

00:38:30 what is the role of feedback connections? What is the role of lateral connections? So,

00:38:34 all those things went into the model. The model actually uses feedback connections.

00:38:38 All these ideas from neuroscience. Yeah.

00:38:41 So, what the heck is a recursive cortical network? What are the architecture approaches,

00:38:47 interesting aspects here, which is essentially a brain inspired approach to computer vision?

00:38:54 Yeah. So, there are multiple layers to this question. I can go from the very,

00:38:58 very top and then zoom in. Okay. So, one important thing, constraint that went into the model is that

00:39:05 you should not think vision, think of vision as something in isolation. We should not think

00:39:11 perception as something as a preprocessor for cognition. Perception and cognition are interconnected.

00:39:19 And so, you should not think of one problem in separation from the other problem. And so,

00:39:24 that means if you finally want to have a system that understand concepts about the world and can

00:39:30 learn a very conceptual model of the world and can reason and connect to language, all of those

00:39:36 things, you need to think all the way through and make sure that your perception system

00:39:41 is compatible with your cognition system and language system and all of them.

00:39:45 And one aspect of that is top down controllability. What does that mean?

00:39:52 So, that means, you know, so think of, you know, you can close your eyes and think about

00:39:58 the details of one object, right? I can zoom in further and further. So, think of the bottle in

00:40:05 front of me, right? And now, you can think about, okay, what the cap of that bottle looks.

00:40:11 I know we can think about what’s the texture on that bottle of the cap. You know, you can think

00:40:18 about, you know, what will happen if something hits that. So, you can manipulate your visual

00:40:25 knowledge in cognition driven ways. Yes. And so, this top down controllability and being able to

00:40:35 simulate scenarios in the world. So, you’re not just a passive player in this perception game.

00:40:43 You can control it. You have imagination. Correct. Correct. So, basically, you know,

00:40:50 basically having a generative network, which is a model and it is not just some arbitrary

00:40:56 generative network. It has to be built in a way that it is controllable top down. It is not just

00:41:02 trying to generate a whole picture at once. You know, it’s not trying to generate photorealistic

00:41:07 things of the world. You know, you don’t have good photorealistic models of the world. Human

00:41:11 brains do not have. If I, for example, ask you the question, what is the color of the letter E

00:41:17 in the Google logo? You have no idea. Although, you have seen it millions of times, hundreds of

00:41:25 times. So, it’s not, our model is not photorealistic, but it has other properties that we can

00:41:32 manipulate it. And you can think about filling in a different color in that logo. You can think

00:41:37 about expanding the letter E. You know, you can see what, so you can imagine the consequence of,

00:41:44 you know, actions that you have never performed. So, these are the kind of characteristics the

00:41:49 generative model need to have. So, this is one constraint that went into our model. Like, you

00:41:52 know, so this is, when you read the, just the perception side of the paper, it is not obvious

00:41:57 that this was a constraint into the, that went into the model, this top down controllability

00:42:02 of the generative model. So, what does top down controllability in a model look like? It’s a

00:42:10 really interesting concept. Fascinating concept. What does that, is that the recursiveness gives

00:42:16 you that? Or how do you do it? Quite a few things. It’s like, what does the model factor,

00:42:22 factorize? You know, what are the, what is the model representing as different pieces in the

00:42:26 puzzle? Like, you know, so, so in the RCN network, it thinks of the world, you know, so what I said,

00:42:33 the background of an image is modeled separately from the foreground of the image. So,

00:42:39 the objects are separate from the background. They are different entities. So, there’s a kind

00:42:43 of segmentation that’s built in fundamentally. And then even that object is composed of parts.

00:42:49 And also, another one is the shape of the object is differently modeled from the texture of the

00:42:57 object. Got it. So, there’s like these, you know who Francois Chollet is? Yeah. So, there’s, he

00:43:08 developed this like IQ test type of thing for ARC challenge for, and it’s kind of cool that there’s

00:43:16 these concepts, priors that he defines that you bring to the table in order to be able to reason

00:43:22 about basic shapes and things in IQ test. So, here you’re making it quite explicit that here are the

00:43:30 things that you should be, these are like distinct things that you should be able to model in this.

00:43:36 Keep in mind that you can derive this from much more general principles. It doesn’t, you don’t

00:43:42 need to explicitly put it as, oh, objects versus foreground versus background, the surface versus

00:43:48 the structure. No, these are, these are derivable from more fundamental principles of how, you know,

00:43:55 what’s the property of continuity of natural signals. What’s the property of continuity of

00:44:01 natural signals? Yeah. By the way, that sounds very poetic, but yeah. So, you’re saying that’s a,

00:44:07 there’s some low level properties from which emerges the idea that shapes should be different

00:44:12 than like there should be a parts of an object. There should be, I mean, kind of like Francois,

00:44:18 I mean, there’s objectness, there’s all these things that it’s kind of crazy that we humans,

00:44:25 I guess, evolved to have because it’s useful for us to perceive the world. Yeah. Correct. And it

00:44:30 derives mostly from the properties of natural signals. And so, natural signals. So, natural

00:44:38 signals are the kind of things we’ll perceive in the natural world. Correct. I don’t know. I don’t

00:44:43 know why that sounds so beautiful. Natural signals. Yeah. As opposed to a QR code, right? Which is an

00:44:48 artificial signal that we created. Humans are not very good at classifying QR codes. We are very

00:44:52 good at saying something is a cat or a dog, but not very good at, you know, where computers are

00:44:58 very good at classifying QR codes. So, our visual system is tuned for natural signals. So,

00:45:05 it’s tuned for natural signals. And there are fundamental assumptions in the architecture

00:45:11 that are derived from natural signals properties. I wonder when you take hallucinogenic drugs,

00:45:18 does that go into natural or is that closer to the QR code? It’s still natural. It’s still natural?

00:45:25 Yeah. Because it is still operating using your brains. By the way, on that topic, I mean,

00:45:30 I haven’t been following. I think they’re becoming legalized and certain. I can’t wait

00:45:34 they become legalized to a degree that you, like, vision science researchers could study it.

00:45:40 Yeah. Just like through medical, chemical ways, modify. There could be ethical concerns, but

00:45:47 modify. That’s another way to study the brain is to be able to chemically modify it. There’s

00:45:53 probably very long a way to figure out how to do it ethically. Yeah, but I think there are studies

00:46:01 on that already. Yeah, I think so. Because it’s not unethical to give it to rats.

00:46:08 Oh, that’s true. That’s true. There’s a lot of drugged up rats out there. Okay, cool. Sorry.

00:46:15 Sorry. It’s okay. So, there’s these low level things from natural signals that…

00:46:23 …from which these properties will emerge. But it is still a very hard problem on how to encode

00:46:33 that. So, you mentioned the priors Francho wanted to encode in the abstract reasoning challenge,

00:46:44 but it is not straightforward how to encode those priors. So, some of those challenges,

00:46:50 like the object completion challenges are things that we purely use our visual system to do.

00:46:57 It looks like abstract reasoning, but it is purely an output of the vision system. For example,

00:47:03 completing the corners of that condenser triangle, completing the lines of that condenser triangle.

00:47:07 It’s purely a visual system property. There is no abstract reasoning involved. It uses all these

00:47:12 priors, but it is stored in our visual system in a particular way that is amenable to inference.

00:47:18 That is one of the things that we tackled in the… Basically saying, okay, these are the

00:47:25 prior knowledge which will be derived from the world, but then how is that prior knowledge

00:47:31 represented in the model such that inference when some piece of evidence comes in can be

00:47:38 done very efficiently and in a very distributed way? Because there are so many ways of representing

00:47:44 knowledge, which is not amenable to very quick inference, quick lookups. So that’s one core part

00:47:53 of what we tackled in the RCN model. How do you encode visual knowledge to do very quick inference?

00:48:02 Can you maybe comment on… So folks listening to this in general may be familiar with

00:48:08 different kinds of architectures of a neural networks.

00:48:10 What are we talking about with RCN? What does the architecture look like? What are the different

00:48:16 components? Is it close to neural networks? Is it far away from neural networks? What does it look

00:48:20 like? Yeah. So you can think of the Delta between the model and a convolutional neural network,

00:48:27 if people are familiar with convolutional neural networks. So convolutional neural networks have

00:48:31 this feed forward processing cascade, which is called feature detectors and pooling. And that

00:48:37 is repeated in a multi level system. And if you want an intuitive idea of what is happening,

00:48:46 feature detectors are detecting interesting co occurrences in the input. It can be a line,

00:48:53 a corner, an eye or a piece of texture, et cetera. And the pooling neurons are doing some local

00:49:03 transformation of that and making it invariant to local transformations. So this is what the

00:49:07 structure of convolutional neural network is. Recursive cortical network has a similar structure

00:49:14 when you look at just the feed forward pathway. But in addition to that, it is also structured

00:49:19 in a way that it is generative so that it can run it backward and combine the forward with the

00:49:25 backward. Another aspect that it has is it has lateral connections. So if you have an edge here

00:49:37 and an edge here, it has connections between these edges. It is not just feed forward connections.

00:49:42 It is something between these edges, which is the nodes representing these edges, which is to

00:49:49 enforce compatibility between them. So otherwise what will happen is that constraints. It’s a

00:49:53 constraint. It’s basically if you do just feature detection followed by pooling, then your

00:50:01 transformations in different parts of the visual field are not coordinated. And so you will create

00:50:07 a jagged, when you generate from the model, you will create jagged things and uncoordinated

00:50:14 transformations. So these lateral connections are enforcing the transformations.

00:50:20 Is the whole thing still differentiable?

00:50:22 No, it’s not. It’s not trained using backprop.

00:50:27 Okay. That’s really important. So there’s this feed forward, there’s feedback mechanisms.

00:50:33 There’s some interesting connectivity things. It’s still layered like multiple layers.

00:50:41 Okay. Very, very interesting. And yeah. Okay. So the interconnection between adjacent connections

00:50:48 across service constraints that keep the thing stable.

00:50:52 Correct.

00:50:53 Okay. So what else?

00:50:55 And then there’s this idea of doing inference. A neural network does not do inference on the fly.

00:51:03 So an example of why this inference is important is, you know, so one of the first applications

00:51:09 that we showed in the paper was to crack text based captures.

00:51:15 What are captures?

00:51:16 I mean, by the way, one of the most awesome, like the people don’t use this term anymore

00:51:21 as human computation, I think. I love this term. The guy who created captures,

00:51:26 I think came up with this term. I love it. Anyway. What are captures?

00:51:32 So captures are those things that you fill in when you’re, you know, if you’re

00:51:38 opening a new account in Google, they show you a picture, you know, usually

00:51:43 it used to be set of garbled letters that you have to kind of figure out what is that string

00:51:48 of characters and type it. And the reason captures exist is because, you know, Google or Twitter

00:51:56 do not want automatic creation of accounts. You can use a computer to create millions of accounts

00:52:03 and use that for nefarious purposes. So you want to make sure that to the extent possible,

00:52:10 the interaction that their system is having is with a human. So it’s a, it’s called a human

00:52:16 interaction proof. A capture is a human interaction proof. So, so this is a captures are by design,

00:52:23 things that are easy for humans to solve, but hard for computers.

00:52:27 Hard for robots.

00:52:28 Yeah. So, and text based captures was the one which is prevalent around 2014,

00:52:36 because at that time, text based captures were hard for computers to crack. Even now,

00:52:42 they are actually in the sense of an arbitrary text based capture will be unsolvable even now,

00:52:48 but with the techniques that we have developed, it can be, you know, you can quickly develop

00:52:52 a mechanism that solves the capture.

00:52:55 They’ve probably gotten a lot harder too. They’ve been getting clever and clever

00:53:00 generating these text captures. So, okay. So that was one of the things you’ve tested it on is these

00:53:06 kinds of captures in 2014, 15, that kind of stuff. So what, I mean, why, by the way, why captures?

00:53:15 Yeah. Even now, I would say capture is a very, very good challenge problem. If you want to

00:53:21 understand how human perception works, and if you want to build systems that work,

00:53:27 like the human brain, and I wouldn’t say capture is a solved problem. We have cracked the fundamental

00:53:32 defense of captures, but it is not solved in the way that humans solve it. So I can give an example.

00:53:40 I can take a five year old child who has just learned characters and show them any new capture

00:53:48 that we create. They will be able to solve it. I can show you, I can show you a picture of a

00:53:56 character. I can show you pretty much any new capture from any new website. You’ll be able to

00:54:02 solve it without getting any training examples from that particular style of capture.

00:54:06 You’re assuming I’m human. Yeah.

00:54:08 Yes. Yeah. That’s right. So if you are human, otherwise I will be able to figure that out

00:54:15 using this one. But this whole podcast is just a touring test, a long touring test. Anyway,

00:54:22 yeah. So humans can figure it out with very few examples. Or no training examples. No training

00:54:28 examples from that particular style of capture. So even now this is unreachable for the current

00:54:37 deep learning system. So basically there is no, I don’t think a system exists where you can

00:54:41 basically say, train on whatever you want. And then now say, hey, I will show you a new capture,

00:54:47 which I did not show you in the training setup. Will the system be able to solve it? It still

00:54:54 doesn’t exist. So that is the magic of human perception. And Doug Hofstadter put this very

00:55:01 beautifully in one of his talks. The central problem in AI is what is the letter A. If you

00:55:11 can build a system that reliably can detect all the variations of the letter A, you don’t even

00:55:17 know to go to the B and the C. Yeah. You don’t even know to go to the B and the C or the strings

00:55:23 of characters. And so that is the spirit with which we tackle that problem.

00:55:28 What does it mean by that? I mean, is it like without training examples, try to figure out

00:55:36 the fundamental elements that make up the letter A in all of its forms?

00:55:43 In all of its forms. A can be made with two humans standing, leaning against each other,

00:55:47 holding the hands. And it can be made of leaves.

00:55:52 Yeah. You might have to understand everything about this world in order to understand the

00:55:56 letter A. Yeah. Exactly.

00:55:57 So it’s common sense reasoning, essentially. Yeah.

00:56:00 Right. So to finally, to really solve, finally to say that you have solved capture,

00:56:07 you have to solve the whole problem.

00:56:08 Yeah. Okay. So how does this kind of the RCN architecture help us to do a better job of that

00:56:18 kind of thing? Yeah. So as I mentioned, one of the important things was being able to do inference,

00:56:24 being able to dynamically do inference.

00:56:28 Can you clarify what you mean? Because you said like neural networks don’t do inference.

00:56:33 Yeah. So what do you mean by inference in this context then?

00:56:35 So, okay. So in captures, what they do to confuse people is to make these characters crowd together.

00:56:43 Yes. Okay. And when you make the characters crowd together, what happens is that you will now start

00:56:48 seeing combinations of characters as some other new character or an existing character. So you

00:56:53 would put an R and N together. It will start looking like an M. And so locally, there is

00:57:02 very strong evidence for it being some incorrect character. But globally, the only explanation that

00:57:11 fits together is something that is different from what you can find locally. Yes. So this is

00:57:18 inference. You are basically taking local evidence and putting it in the global context and often

00:57:25 coming to a conclusion locally, which is conflicting with the local information.

00:57:29 So actually, so you mean inference like in the way it’s used when you talk about reasoning,

00:57:36 for example, as opposed to like inference, which is with artificial neural networks,

00:57:42 which is a single pass to the network. Okay. So like you’re basically doing some basic forms of

00:57:47 reasoning, like integration of like how local things fit into the global picture.

00:57:54 And things like explaining a way coming into this one, because you are explaining that piece

00:57:59 of evidence as something else, because globally, that’s the only thing that makes sense. So now

00:58:08 you can amortize this inference in a neural network. If you want to do this, you can brute

00:58:15 force it. You can just show it all combinations of things that you want your reasoning to work over.

00:58:23 And you can just train the help out of that neural network and it will look like it is doing inference

00:58:30 on the fly, but it is really just doing amortized inference. It is because you have shown it a lot

00:58:37 of these combinations during training time. So what you want to do is be able to do dynamic

00:58:43 inference rather than just being able to show all those combinations in the training time.

00:58:48 And that’s something we emphasized in the model. What does it mean, dynamic inference? Is that

00:58:54 that has to do with the feedback thing? Yes. Like what is dynamic? I’m trying to visualize what

00:59:00 dynamic inference would be in this case. Like what is it doing with the input? It’s shown the input

00:59:05 the first time. Yeah. And is like what’s changing over temporally? What’s the dynamics of this

00:59:13 inference process? So you can think of it as you have at the top of the model, the characters that

00:59:19 you are trained on. They are the causes that you are trying to explain the pixels using the

00:59:26 characters as the causes. The characters are the things that cause the pixels. Yeah. So there’s

00:59:33 this causality thing. So the reason you mentioned causality, I guess, is because there’s a temporal

00:59:38 aspect to this whole thing. In this particular case, the temporal aspect is not important.

00:59:43 It is more like when if I turn the character on, the pixels will turn on. Yeah, it will be after

00:59:50 this a little bit. Okay. So that is causality in the sense of like a logic causality, like

00:59:55 hence inference. Okay. The dynamics is that even though locally it will look like, okay, this is an

01:00:03 A. And locally, just when I look at just that patch of the image, it looks like an A. But when I look

01:00:11 at it in the context of all the other causes, A is not something that makes sense. So that is

01:00:17 something you have to kind of recursively figure out. Yeah. So, okay. And this thing performed

01:00:24 pretty well on the CAPTCHAs. Correct. And I mean, is there some kind of interesting intuition you

01:00:32 can provide why it did well? Like what did it look like? Is there visualizations that could be human

01:00:37 interpretable to us humans? Yes. Yeah. So the good thing about the model is that it is extremely,

01:00:44 so it is not just doing a classification, right? It is providing a full explanation for the scene.

01:00:50 So when it operates on a scene, it is coming back and saying, look, this is the part is the A,

01:00:59 and these are the pixels that turned on. These are the pixels in the input that makes me think that

01:01:06 it is an A. And also, these are the portions I hallucinated. It provides a complete explanation

01:01:14 of that form. And then these are the contours. This is the interior. And this is in front of

01:01:21 this other object. So that’s the kind of explanation the inference network provides.

01:01:28 So that is useful and interpretable. And then the kind of errors it makes are also,

01:01:40 I don’t want to read too much into it, but the kind of errors the network makes are very similar

01:01:47 to the kinds of errors humans would make in a similar situation. So there’s something about

01:01:51 the structure that feels reminiscent of the way humans visual system works. Well, I mean,

01:02:00 how hardcoded is this to the capture problem, this idea?

01:02:04 Not really hardcoded because the assumptions, as I mentioned, are general, right? It is more,

01:02:11 and those themselves can be applied in many situations which are natural signals. So it’s

01:02:17 the foreground versus background factorization and the factorization of the surfaces versus

01:02:24 the contours. So these are all generally applicable assumptions.

01:02:27 In all vision. So why attack the capture problem, which is quite unique in the computer vision

01:02:36 context versus like the traditional benchmarks of ImageNet and all those kinds of image

01:02:42 classification or even segmentation tasks and all of that kind of stuff. What’s your thinking about

01:02:49 those kinds of benchmarks in this context? I mean, those benchmarks are useful for deep

01:02:55 learning kind of algorithms. So the settings that deep learning works in are here is my huge

01:03:03 training set and here is my test set. So the training set is almost 100x, 1000x bigger than

01:03:10 the test set in many, many cases. What we wanted to do was invert that. The training set is way

01:03:18 smaller than the test set. And capture is a problem that is by definition hard for computers

01:03:30 and it has these good properties of strong generalization, strong out of training distribution

01:03:36 generalization. If you are interested in studying that and having your model have that property,

01:03:44 then it’s a good data set to tackle. So have you attempted to, which I think,

01:03:49 I believe there’s quite a growing body of work on looking at MNIST and ImageNet without training.

01:03:58 So it’s like taking the basic challenge is what tiny fraction of the training set can we take in

01:04:05 order to do a reasonable job of the classification task? Have you explored that angle in these

01:04:13 classic benchmarks? Yes. So we did do MNIST. So it’s not just capture. So there was also

01:04:23 multiple versions of MNIST, including the standard version where we inverted the problem,

01:04:28 which is basically saying rather than train on 60,000 training data, how quickly can you get

01:04:37 to high level accuracy with very little training data? Is there some performance you remember,

01:04:42 like how well did it do? How many examples did it need? Yeah. I remember that it was

01:04:50 on the order of tens or hundreds of examples to get into 95% accuracy. And it was definitely

01:05:00 better than the other systems out there at that time.

01:05:03 At that time. Yeah. They’re really pushing. I think that’s a really interesting space,

01:05:07 actually. I think there’s an actual name for MNIST. There’s different names to the different

01:05:17 sizes of training sets. I mean, people are like attacking this problem. I think it’s

01:05:21 super interesting. It’s funny how like the MNIST will probably be with us all the way to AGI.

01:05:29 It’s a data set that just sticks by. It’s a clean, simple data set to study the fundamentals of

01:05:37 learning with just like captures. It’s interesting. Not enough people. I don’t know. Maybe you can

01:05:43 correct me, but I feel like captures don’t show up as often in papers as they probably should.

01:05:48 That’s correct. Yeah. Because usually these things have a momentum. Once something gets

01:05:56 established as a standard benchmark, there is a dynamics of how graduate students operate and how

01:06:06 academic system works that pushes people to track that benchmark.

01:06:10 Yeah. Nobody wants to think outside the box. Okay. Okay. So good performance on the captures.

01:06:20 What else is there interesting on the RCN side before we talk about the cortical micros?

01:06:25 Yeah. So the same model. So the important part of the model was that it trains very

01:06:31 quickly with very little training data and it’s quite robust to out of distribution

01:06:37 perturbations. And we are using that very fruitfully at Vicarious in many of the

01:06:45 robotics tasks we are solving. Well, let me ask you this kind of touchy question. I have to,

01:06:51 I’ve spoken with your friend, colleague, Jeff Hawkins, too. I have to kind of ask,

01:06:59 there is a bit of, whenever you have brain inspired stuff and you make big claims,

01:07:05 big sexy claims, there’s critics, I mean, machine learning subreddit, don’t get me started on those

01:07:14 people. Criticism is good, but they’re a bit over the top. There is quite a bit of sort of

01:07:23 skepticism and criticism. Is this work really as good as it promises to be? Do you have thoughts

01:07:31 on that kind of skepticism? Do you have comments on the kind of criticism I might have received

01:07:36 about, you know, is this approach legit? Is this a promising approach? Or at least as promising as

01:07:44 it seems to be, you know, advertised as? Yeah, I can comment on it. So, you know, our RCN paper

01:07:52 is published in Science, which I would argue is a very high quality journal, very hard to publish

01:07:58 in. And, you know, usually it is indicative of the quality of the work. And I am very,

01:08:08 very certain that the ideas that we brought together in that paper, in terms of the importance

01:08:13 of feedback connections, recursive inference, lateral connections, coming to best explanation

01:08:20 of the scene as the problem to solve, trying to solve recognition, segmentation, all jointly,

01:08:27 in a way that is compatible with higher level cognition, top down attention, all those ideas

01:08:31 that we brought together into something, you know, coherent and workable in the world and

01:08:36 solving a challenging, tackling a challenging problem. I think that will stay and that

01:08:40 contribution I stand by. Now, I can tell you a story which is funny in the context of this. So,

01:08:49 if you read the abstract of the paper and, you know, the argument we are putting in, you know,

01:08:53 we are putting in, look, current deep learning systems take a lot of training data. They don’t

01:08:59 use these insights. And here is our new model, which is not a deep neural network. It’s a

01:09:03 graphical model. It does inference. This is how the paper is, right? Now, once the paper was

01:09:08 accepted and everything, it went to the press department in Science, you know, AAAS Science

01:09:14 Office. We didn’t do any press release when it was published. It went to the press department.

01:09:18 What was the press release that they wrote up? A new deep learning model.

01:09:24 Solves CAPTCHAs.

01:09:25 Solves CAPTCHAs. And so, you can see where was, you know, what was being hyped in that thing,

01:09:32 right? So, there is a dynamic in the community of, you know, so that especially happens when

01:09:42 there are lots of new people coming into the field and they get attracted to one thing.

01:09:46 And some people are trying to think different compared to that. So, there is some, I think

01:09:52 skepticism is science is important and it is, you know, very much required. But it’s also,

01:09:59 it’s not skepticism. Usually, it’s mostly bandwagon effect that is happening rather than.

01:10:05 Well, but that’s not even that. I mean, I’ll tell you what they react to, which is like,

01:10:09 I’m sensitive to as well. If you look at just companies, OpenAI, DeepMind, Vicarious, I mean,

01:10:16 they just, there’s a little bit of a race to the top and hype, right? It’s like, it doesn’t pay off

01:10:27 to be humble. So, like, and the press is just irresponsible often. They just, I mean, don’t

01:10:37 get me started on the state of journalism today. Like, it seems like the people who write articles

01:10:42 about these things, they literally have not even spent an hour on the Wikipedia article about what

01:10:49 is neural networks. Like, they haven’t like invested just even the language to laziness.

01:10:56 It’s like, robots beat humans. Like, they write this kind of stuff that just, and then of course,

01:11:06 the researchers are quite sensitive to that because it gets a lot of attention. They’re like,

01:11:11 why did this word get so much attention? That’s over the top and people get really sensitive.

01:11:18 The same kind of criticism with OpenAI did work with Rubik’s cube with the robot that people

01:11:24 criticized. Same with GPT2 and 3, they criticize. Same thing with DeepMinds with AlphaZero. I mean,

01:11:33 yeah, I’m sensitive to it. But, and of course, with your work, you mentioned deep learning, but

01:11:39 there’s something super sexy to the public about brain inspired. I mean, that immediately grabs

01:11:45 people’s imagination, not even like neural networks, but like really brain inspired, like

01:11:53 brain like neural networks. That seems really compelling to people and to me as well, to the

01:12:00 world as a narrative. And so people hook up, hook onto that. And sometimes the skepticism engine

01:12:10 turns on in the research community and they’re skeptical. But I think putting aside the ideas

01:12:17 of the actual performance and captures or performance in any data set. I mean, to me,

01:12:22 all these data sets are useless anyway. It’s nice to have them. But in the grand scheme of things,

01:12:28 they’re silly toy examples. The point is, is there intuition about the ideas, just like you

01:12:36 mentioned, bringing the ideas together in a unique way? Is there something there? Is there some value

01:12:42 there? And is it going to stand the test of time? And that’s the hope. That’s the hope.

01:12:46 Yes. My confidence there is very high. I don’t treat brain inspired as a marketing term.

01:12:53 I am looking into the details of biology and puzzling over those things and I am grappling

01:13:01 with those things. And so it is not a marketing term at all. You can use it as a marketing term

01:13:07 and people often use it and you can get combined with them. And when people don’t understand

01:13:13 how you’re approaching the problem, it is easy to be misunderstood and think of it as purely

01:13:20 marketing. But that’s not the way we are. So you really, I mean, as a scientist,

01:13:27 you believe that if we kind of just stick to really understanding the brain, that’s going to,

01:13:33 that’s the right, like you should constantly meditate on the, how does the brain do this?

01:13:39 Because that’s going to be really helpful for engineering and technology systems.

01:13:43 Yes. You need to, so I think it’s one input and it is helpful, but you should know when to deviate

01:13:51 from it too. So an example is convolutional neural networks, right? Convolution is not an

01:13:59 operation brain implements. The visual cortex is not convolutional. Visual cortex has local

01:14:06 receptive fields, local connectivity, but there is no translation invariance in the network weights

01:14:18 in the visual cortex. That is a computational trick, which is a very good engineering trick

01:14:24 that we use for sharing the training between the different nodes. And that trick will be with us

01:14:31 for some time. It will go away when we have robots with eyes and heads that move. And so then that

01:14:41 trick will go away. It will not be useful at that time. So the brain doesn’t have translational

01:14:49 invariance. It has the focal point, like it has a thing it focuses on. Correct. It has a phobia.

01:14:54 And because of the phobia, the receptive fields are not like the copying of the weights. Like the

01:15:01 weights in the center are very different from the weights in the periphery. Yes. At the periphery.

01:15:05 I mean, I did this, actually wrote a paper and just gotten a chance to really study peripheral

01:15:12 vision, which is a fascinating thing. Very under understood thing of what the brain, you know,

01:15:21 at every level the brain does with the periphery. It does some funky stuff. Yeah. So it’s another

01:15:28 kind of trick than convolutional. Like it does, it’s, you know, convolution in neural networks is

01:15:39 a trick for efficiency, is efficiency trick. And the brain does a whole nother kind of thing.

01:15:44 Correct. So you need to understand the principles or processing so that you can still apply

01:15:51 engineering tricks where you want it to. You don’t want to be slavishly mimicking all the things of

01:15:55 the brain. And so, yeah, so it should be one input. And I think it is extremely helpful,

01:16:02 but it should be the point of really understanding so that you know when to deviate from it.

01:16:06 So, okay. That’s really cool. That’s work from a few years ago. You did work in Umenta with Jeff

01:16:14 Hawkins with hierarchical temporal memory. How is your just, if you could give a brief history,

01:16:23 how is your view of the way the models of the brain changed over the past few years leading up

01:16:30 to now? Is there some interesting aspects where there was an adjustment to your understanding of

01:16:36 the brain or is it all just building on top of each other? In terms of the higher level ideas,

01:16:42 especially the ones Jeff wrote about in the book, if you blur out, right. Yeah. On intelligence.

01:16:47 Right. On intelligence. If you blur out the details and if you just zoom out and at the

01:16:52 higher level idea, things are, I would say, consistent with what he wrote about. But many

01:17:02 things will be consistent with that because it’s a blur. Deep learning systems are also

01:17:08 multi level, hierarchical, all of those things. But in terms of the detail, a lot of things are

01:17:16 different. And those details matter a lot. So one point of difference I had with Jeff was how to

01:17:28 approach, how much of biological plausibility and realism do you want in the learning algorithms?

01:17:36 So when I was there, this was almost 10 years ago now.

01:17:41 It flies when you’re having fun.

01:17:43 Yeah. I don’t know what Jeff thinks now, but 10 years ago, the difference was that

01:17:49 I did not want to be so constrained on saying my learning algorithms need to be

01:17:56 biologically plausible based on some filter of biological plausibility available at that time.

01:18:03 To me, that is a dangerous cut to make because we are discovering more and more things about

01:18:09 the brain all the time. New biophysical mechanisms, new channels are being discovered

01:18:14 all the time. So I don’t want to upfront kill off a learning algorithm just because we don’t

01:18:21 really understand the full biophysics or whatever of how the brain learns.

01:18:27 Exactly. Exactly.

01:18:29 Let me ask and I’m sorry to interrupt. What’s your sense? What’s our best understanding of

01:18:34 how the brain learns?

01:18:36 So things like backpropagation, credit assignment. So many of these algorithms have,

01:18:42 learning algorithms have things in common, right? It is a backpropagation is one way of

01:18:47 credit assignment. There is another algorithm called expectation maximization, which is,

01:18:52 you know, another weight adjustment algorithm.

01:18:55 But is it your sense the brain does something like this?

01:18:58 Has to. There is no way around it in the sense of saying that you do have to adjust the

01:19:04 connections.

01:19:06 So yeah, and you’re saying credit assignment, you have to reward the connections that were

01:19:09 useful in making a correct prediction and not, yeah, I guess what else, but yeah, it

01:19:14 doesn’t have to be differentiable.

01:19:16 Yeah, it doesn’t have to be differentiable. Yeah. But you have to have a, you know, you

01:19:22 have a model that you start with, you have data comes in and you have to have a way of

01:19:27 adjusting the model such that it better fits the data. So that is all of learning, right?

01:19:33 And some of them can be using backprop to do that. Some of it can be using, you know,

01:19:40 very local graph changes to do that.

01:19:45 That can be, you know, many of these learning algorithms have similar update properties

01:19:52 locally in terms of what the neurons need to do locally.

01:19:57 I wonder if small differences in learning algorithms can have huge differences in the

01:20:01 actual effect. So the dynamics of, I mean, sort of the reverse like spiking, like if

01:20:09 credit assignment is like a lightning versus like a rainstorm or something, like whether

01:20:18 there’s like a looping local type of situation with the credit assignment, whether there is

01:20:26 like regularization, like how it injects robustness into the whole thing, like whether

01:20:34 it’s chemical or electrical or mechanical. Yeah. All those kinds of things. I feel like

01:20:42 it, that, yeah, I feel like those differences could be essential, right? It could be. It’s

01:20:48 just that you don’t know enough to, on the learning side, you don’t know, you don’t know

01:20:54 enough to say that is definitely not the way the brain does it. Got it. So you don’t want

01:20:59 to be stuck to it. So that, yeah. So you’ve been open minded on that side of things.

01:21:04 On the inference side, on the recognition side, I am much more, I’m able to be constrained

01:21:09 because it’s much easier to do experiments because, you know, it’s like, okay, here’s

01:21:13 the stimulus, you know, how many steps did it get to take the answer? I can trace it

01:21:18 back. I can, I can understand the speed of that computation, et cetera. I’m able to do

01:21:23 of that computation, et cetera, much more readily on the inference side. Got it. And

01:21:28 then you can’t do good experiments on the learning side. Correct. So let’s go right

01:21:34 into the cortical microcircuits right back. So what are these ideas beyond recursive cortical

01:21:42 network that you’re looking at now? So we have made a, you know, pass through multiple

01:21:48 of the steps that, you know, as I mentioned earlier, you know, we were looking at perception

01:21:54 from the angle of cognition, right? It was not just perception for perception’s sake.

01:21:58 How do you, how do you connect it to cognition? How do you learn concepts and how do you learn

01:22:04 abstract reasoning? Similar to some of the things Francois talked about, right? So we

01:22:13 have taken one pass through it basically saying, what is the basic cognitive architecture that

01:22:19 you need to have, which has a perceptual system, which has a system that learns dynamics of

01:22:25 the world and then has something like a routine program learning system on top of it to learn

01:22:32 concepts. So we have built one, you know, the version point one of that system. This

01:22:38 was another science robotics paper. It’s the title of that paper was, you know, something

01:22:44 like cognitive programs. How do you build cognitive programs? And the application there

01:22:49 was on manipulation, robotic manipulation? It was, so think of it like this. Suppose

01:22:56 you wanted to tell a new person that you met, you don’t know the language that person uses.

01:23:04 You want to communicate to that person to achieve some task, right? So I want to say,

01:23:10 hey, you need to pick up all the red cups from the kitchen counter and put it here, right?

01:23:17 How do you communicate that, right? You can show pictures. You can basically say, look,

01:23:21 this is the starting state. The things are here. This is the ending state. And what does

01:23:28 the person need to understand from that? The person needs to understand what conceptually

01:23:32 happened in those pictures from the input to the output, right? So we are looking at

01:23:39 preverbal conceptual understanding. Without language, how do you have a set of concepts

01:23:45 that you can manipulate in your head? And from a set of images of input and output,

01:23:52 can you infer what is happening in those images?

01:23:55 Got it. With concepts that are pre language. Okay. So what’s it mean for a concept to be pre language?

01:24:02 Like why is language so important here?

01:24:10 So I want to make a distinction between concepts that are just learned from text

01:24:17 by just feeding brute force text. You can start extracting things like, okay,

01:24:23 a cow is likely to be on grass. So those kinds of things, you can extract purely from text.

01:24:32 But that’s kind of a simple association thing rather than a concept as an abstraction of

01:24:37 something that happens in the real world in a grounded way that I can simulate it in my

01:24:44 mind and connect it back to the real world. And you think kind of the visual world,

01:24:51 concepts in the visual world are somehow lower level than just the language?

01:24:58 The lower level kind of makes it feel like, okay, that’s unimportant. It’s more like,

01:25:04 I would say the concepts in the visual and the motor system and the concept learning system,

01:25:15 which if you cut off the language part, just what we learn by interacting with the world

01:25:20 and abstractions from that, that is a prerequisite for any real language understanding.

01:25:26 So you disagree with Chomsky because he says language is at the bottom of everything.

01:25:32 No, I disagree with Chomsky completely on how many levels from universal grammar to…

01:25:39 So that was a paper in science beyond the recursive cortical network.

01:25:43 What other interesting problems are there, the open problems and brain inspired approaches

01:25:50 that you’re thinking about?

01:25:51 I mean, everything is open, right? No problem is solved, solved. I think of perception as kind of

01:26:02 the first thing that you have to build, but the last thing that you will be actually solved.

01:26:07 Because if you do not build perception system in the right way, you cannot build concept system in

01:26:12 the right way. So you have to build a perception system, however wrong that might be, you have to

01:26:18 still build that and learn concepts from there and then keep iterating. And finally, perception

01:26:24 will get solved fully when perception, cognition, language, all those things work together finally.

01:26:30 So great, we’ve talked a lot about perception, but then maybe on the concept side and like common

01:26:37 sense or just general reasoning side, is there some intuition you can draw from the brain about

01:26:45 how we can do that?

01:26:46 So I have this classic example I give. So suppose I give you a few sentences and then ask you a

01:26:56 question following that sentence. This is a natural language processing problem, right? So here

01:27:01 it goes. I’m telling you, Sally pounded a nail on the ceiling. Okay, that’s a sentence. Now I’m

01:27:10 asking you a question. Was the nail horizontal or vertical?

01:27:14 Vertical.

01:27:15 Okay, how did you answer that?

01:27:16 Well, I imagined Sally, it was kind of hard to imagine what the hell she was doing, but I

01:27:24 imagined I had a visual of the whole situation.

01:27:28 Exactly, exactly. So here, you know, I post a question in natural language. The answer to

01:27:34 that question was you got the answer from actually simulating the scene. Now I can go more and more

01:27:40 detailed about, okay, was Sally standing on something while doing this? Could she have been

01:27:47 standing on a light bulb to do this? I could ask more and more questions about this and I can ask,

01:27:53 make you simulate the scene in more and more detail, right? Where is all that knowledge that

01:27:59 you’re accessing stored? It is not in your language system. It was not just by reading

01:28:05 text, you got that knowledge. It is stored from the everyday experiences that you have had from,

01:28:12 and by the age of five, you have pretty much all of this, right? And it is stored in your visual

01:28:18 system, motor system in a way such that it can be accessed through language.

01:28:24 Got it. I mean, right. So the language is just almost sort of the query into the whole visual

01:28:30 cortex and that does the whole feedback thing. But I mean, it is all reasoning kind of connected to

01:28:36 the perception system in some way. You can do a lot of it. You know, you can still do a lot of it

01:28:43 by quick associations without having to go into the depth. And most of the time you will be right,

01:28:49 right? You can just do quick associations, but I can easily create tricky situations for you.

01:28:55 Where that quick associations is wrong and you have to actually run the simulation.

01:29:00 So figuring out how these concepts connect. Do I have a good idea of how to do that?

01:29:06 That’s exactly one of the problems that we are working on. And the way we are approaching that

01:29:13 is basically saying, okay, you need to, so the takeaway is that language,

01:29:20 is simulation control and your perceptual plus a motor system is building a simulation of the world.

01:29:28 And so that’s basically the way we are approaching it. And the first thing that we built was a

01:29:34 controllable perceptual system. And we built a schema networks, which was a controllable dynamic

01:29:40 system. Then we built a concept learning system that puts all these things together

01:29:44 into programs or subtractions that you can run and simulate. And now we are taking the step

01:29:51 of connecting it to language. And it will be very simple examples. Initially, it will not be

01:29:57 the GPT3 like examples, but it will be grounded simulation based language.

01:30:02 And for like the querying would be like question answering kind of thing?

01:30:08 Correct. Correct. And so that’s what we’re trying to do. We’re trying to build a system

01:30:13 kind of thing. Correct. Correct. And it will be in some simple world initially on, you know,

01:30:19 but it will be about, okay, can the system connect the language and ground it in the right way and

01:30:25 run the right simulations to come up with the answer. And the goal is to try to do things that,

01:30:29 for example, GPT3 couldn’t do. Correct. Speaking of which, if we could talk about GPT3 a little

01:30:38 bit, I think it’s an interesting thought provoking set of ideas that OpenAI is pushing forward. I

01:30:46 think it’s good for us to talk about the limits and the possibilities in the neural network. So

01:30:51 in general, what are your thoughts about this recently released very large 175 billion parameter

01:30:58 language model? So I haven’t directly evaluated it yet. From what I have seen on Twitter and

01:31:05 other people evaluating it, it looks very intriguing. I am very intrigued by some of

01:31:09 the properties it is displaying. And of course the text generation part of that was already

01:31:17 evident in GPT2 that it can generate coherent text over long distances. But of course the

01:31:26 weaknesses are also pretty visible in saying that, okay, it is not really carrying a world state

01:31:32 around. And sometimes you get sentences like, I went up the hill to reach the valley or the thing

01:31:39 like some completely incompatible statements, or when you’re traveling from one place to the other,

01:31:46 it doesn’t take into account the time of travel, things like that. So those things I think will

01:31:50 happen less in GPT3 because it is trained on even more data and it can do even more longer distance

01:31:59 coherence. But it will still have the fundamental limitations that it doesn’t have a world model

01:32:07 and it can’t run simulations in its head to find whether something is true in the world or not.

01:32:13 So it’s taking a huge amount of text from the internet and forming a compressed representation.

01:32:20 Do you think in that could emerge something that’s an approximation of a world model,

01:32:27 which essentially could be used for reasoning? I’m not talking about GPT3, I’m talking about GPT4,

01:32:35 5 and GPT10. Yeah, I mean they will look more impressive than GPT3. So if you take that to

01:32:42 the extreme then a Markov chain of just first order and if you go to, I’m taking the other

01:32:51 extreme, if you read Shannon’s book, he has a model of English text which is based on first

01:32:59 order Markov chains, second order Markov chains, third order Markov chains and saying that okay,

01:33:03 third order Markov chains look better than first order Markov chains. So does that mean a first

01:33:09 order Markov chain has a model of the world? Yes, it does. So yes, in that level when you go higher

01:33:18 order models or more sophisticated structure in the model like the transformer networks have,

01:33:24 yes they have a model of the text world, but that is not a model of the world. It’s a model

01:33:32 of the text world and it will have interesting properties and it will be useful, but just scaling

01:33:41 it up is not going to give us AGI or natural language understanding or meaning. Well the

01:33:49 question is whether being forced to compress a very large amount of text forces you to construct

01:33:58 things that are very much like, because the ideas of concepts and meaning is a spectrum.

01:34:06 Sure, yeah. So in order to form that kind of compression,

01:34:13 maybe it will be forced to figure out abstractions which look awfully a lot like the kind of things

01:34:24 that we think about as concepts, as world models, as common sense. Is that possible?

01:34:31 No, I don’t think it is possible because the information is not there.

01:34:34 The information is there behind the text, right?

01:34:38 No, unless somebody has written down all the details about how everything works in the world

01:34:44 to the absurd amounts like, okay, it is easier to walk forward than backward, that you have to open

01:34:51 the door to go out of the thing, doctors wear underwear. Unless all these things somebody

01:34:56 has written down somewhere or somehow the program found it to be useful for compression from some

01:35:01 other text, the information is not there. So that’s an argument that text is a lot

01:35:07 lower fidelity than the experience of our physical world.

01:35:13 Right, correct. Pictures worth a thousand words.

01:35:17 Well, in this case, pictures aren’t really… So the richest aspect of the physical world isn’t

01:35:24 even just pictures, it’s the interactivity with the world.

01:35:28 Exactly, yeah.

01:35:29 It’s being able to interact. It’s almost like…

01:35:36 It’s almost like if you could interact… Well, maybe I agree with you that pictures

01:35:42 worth a thousand words, but a thousand…

01:35:45 It’s still… Yeah, you could capture it with the GPTX.

01:35:49 So I wonder if there’s some interactive element where a system could live in text world where it

01:35:54 could be part of the chat, be part of talking to people. It’s interesting. I mean, fundamentally…

01:36:03 So you’re making a statement about the limitation of text. Okay, so let’s say we have a text

01:36:10 corpus that includes basically every experience we could possibly have. I mean, just a very large

01:36:19 corpus of text and also interactive components. I guess the question is whether the neural network

01:36:25 architecture, these very simple transformers, but if they had like hundreds of trillions or

01:36:33 whatever comes after a trillion parameters, whether that could store the information

01:36:42 needed, that’s architecturally. Do you have thoughts about the limitation on that side of

01:36:46 things with neural networks? I mean, so transformers are still a feed forward neural

01:36:52 network. It has a very interesting architecture, which is good for text modeling and probably some

01:36:59 aspects of video modeling, but it is still a feed forward architecture. You believe in the

01:37:04 feedback mechanism, the recursion. Oh, and also causality, being able to do counterfactual

01:37:11 reasoning, being able to do interventions, which is actions in the world. So all those things

01:37:20 require different kinds of models to be built. I don’t think transformers captures that family. It

01:37:28 is very good at statistical modeling of text and it will become better and better with more data,

01:37:35 bigger models, but that is only going to get so far. So I had this joke on Twitter saying that,

01:37:44 hey, this is a model that has read all of quantum mechanics and theory of relativity and we are

01:37:51 asking you to do text completion or we are asking you to solve simple puzzles. When you have AGI,

01:37:59 that is not what you ask the system to do. We will ask the system to do experiments and come

01:38:08 up with hypothesis and revise the hypothesis based on evidence from experiments, all those things.

01:38:13 Those are the things that we want the system to do when we have AGI, not solve simple puzzles.

01:38:20 Like impressive demos, somebody generating a red button in HTML.

01:38:24 Right, which are all useful. There is no dissing the usefulness of it.

01:38:29 So by the way, I am playing a little bit of a devil’s advocate, so calm down internet.

01:38:37 So I am curious almost in which ways will a dumb but large neural network will surprise us.

01:38:47 I completely agree with your intuition. It is just that I do not want to dogmatically

01:38:58 100% put all the chips there. We have been surprised so much. Even the current GPT2 and

01:39:06 GPT3 are so surprising. The self play mechanisms of AlphaZero are really surprising. The fact that

01:39:18 reinforcement learning works at all to me is really surprising. The fact that neural networks work at

01:39:23 all is quite surprising given how nonlinear the space is, the fact that it is able to find local

01:39:30 minima that are at all reasonable. It is very surprising. I wonder sometimes whether us humans

01:39:39 just want for AGI not to be such a dumb thing. Because exactly what you are saying is like

01:39:52 the ideas of concepts and be able to reason with those concepts and connect those concepts in

01:39:57 hierarchical ways and then to be able to have world models. Just everything we are describing

01:40:05 in human language in this poetic way seems to make sense. That is what intelligence and reasoning

01:40:11 are like. I wonder if at the core of it, it could be much dumber. Well, finally it is still

01:40:17 connections and messages passing over. So in that way it is dumb. So I guess the recursion,

01:40:24 the feedback mechanism, that does seem to be a fundamental kind of thing.

01:40:32 The idea of concepts. Also memory. Correct. Having an episodic memory. That seems to be

01:40:39 an important thing. So how do we get memory? So we have another piece of work which came

01:40:45 out recently on how do you form episodic memories and form abstractions from them.

01:40:52 And we haven’t figured out all the connections of that to the overall cognitive architecture.

01:40:57 But what are your ideas about how you could have episodic memory? So at least it is very clear

01:41:04 that you need to have two kinds of memory. That is very, very clear. There are things that happen

01:41:13 as statistical patterns in the world, but then there is the one timeline of things that happen

01:41:19 only once in your life. And this day is not going to happen ever again. And that needs to be stored

01:41:27 as just a stream of strings. This is my experience. And then the question is about

01:41:36 how do you take that experience and connect it to the statistical part of it? How do you

01:41:40 now say that, okay, I experienced this thing. Now I want to be careful about similar situations.

01:41:47 So you need to be able to index that similarity using your other giants that is the model of the

01:41:57 world that you have learned. Although the situation came from the episode, you need to be able to

01:42:02 index the other one. So the episodic memory being implemented as an indexing over the other model

01:42:13 that you’re building. So the memories remain and they’re indexed into the statistical thing

01:42:24 that you form. Yeah, statistical causal structural model that you built over time. So it’s basically

01:42:30 the idea is that the hippocampus is just storing or sequencing a set of pointers that happens over

01:42:41 time. And then whenever you want to reconstitute that memory and evaluate the different aspects of

01:42:48 it, whether it was good, bad, do I need to encounter the situation again? You need the cortex

01:42:55 to reinstantiate, to replay that memory. So how do you find that memory? Like which

01:43:00 direction is the important direction? Both directions are again, bidirectional.

01:43:05 I mean, I guess how do you retrieve the memory? So this is again, hypothesis. We’re making this

01:43:11 up. So when you come to a new situation, your cortex is doing inference over in the new situation.

01:43:21 And then of course, hippocampus is connected to different parts of the cortex and you have this

01:43:27 deja vu situation, right? Okay, I have seen this thing before. And then in the hippocampus, you can

01:43:35 have an index of, okay, this is when it happened as a timeline. And then you can use the hippocampus

01:43:44 to drive the similar timelines to say now I am, rather than being driven by my current input

01:43:52 stimuli, I am going back in time and rewinding my experience from there, putting back into the

01:43:58 cortex. And then putting it back into the cortex of course affects what you’re going to see next

01:44:03 in your current situation. Got it. Yeah. So that’s the whole thing, having a world model and then

01:44:09 yeah, connecting to the perception. Yeah, it does seem to be that that’s what’s happening. On the

01:44:16 neural network side, it’s interesting to think of how we actually do that. Yeah. To have a knowledge

01:44:24 base. Yes. It is possible that you can put many of these structures into neural networks and we will

01:44:31 find ways of combining properties of neural networks and graphical models. So, I mean,

01:44:39 it’s already started happening. Graph neural networks are kind of a merge between them.

01:44:43 Yeah. And there will be more of that thing. So, but to me it is, the direction is pretty clear,

01:44:51 looking at biology and the history of evolutionary history of intelligence, it is pretty clear that,

01:44:59 okay, what is needed is more structure in the models and modeling of the world and supporting

01:45:06 dynamic inference. Well, let me ask you, there’s a guy named Elon Musk, there’s a company called

01:45:13 Neuralink and there’s a general field called brain computer interfaces. Yeah. It’s kind of a

01:45:20 interface between your two loves. Yes. The brain and the intelligence. So there’s like

01:45:26 very direct applications of brain computer interfaces for people with different conditions,

01:45:32 more in the short term. Yeah. But there’s also these sci fi futuristic kinds of ideas of AI

01:45:38 systems being able to communicate in a high bandwidth way with the brain, bidirectional.

01:45:45 Yeah. What are your thoughts about Neuralink and BCI in general as a possibility? So I think BCI

01:45:53 is a cool research area. And in fact, when I got interested in brains initially, when I was

01:46:02 enrolled at Stanford and when I got interested in brains, it was through a brain computer

01:46:07 interface talk that Krishna Shenoy gave. That’s when I even started thinking about the problem.

01:46:14 So it is definitely a fascinating research area and the applications are enormous. So there is a

01:46:21 science fiction scenario of brains directly communicating. Let’s keep that aside for the

01:46:26 time being. Even just the intermediate milestones that pursuing, which are very reasonable as far

01:46:32 as I can see, being able to control an external limb using direct connections from the brain

01:46:40 and being able to write things into the brain. So those are all good steps to take and they have

01:46:49 enormous applications. People losing limbs being able to control prosthetics, quadriplegics being

01:46:55 able to control something, and therapeutics. I also know about another company working in

01:47:01 the space called Paradromics. They’re based on a different electrode array, but trying to attack

01:47:09 some of the same problems. So I think it’s a very… Also surgery? Correct. Surgically implanted

01:47:14 electrodes. Yeah. So yeah, I think of it as a very, very promising field, especially when it is

01:47:22 helping people overcome some limitations. Now, at some point, of course, it will advance the level of

01:47:29 being able to communicate. How hard is that problem do you think? Let’s say we magically solve

01:47:37 what I think is a really hard problem of doing all of this safely. Yeah. So being able to connect

01:47:45 electrodes and not just thousands, but like millions to the brain. I think it’s very,

01:47:51 very hard because you also do not know what will happen to the brain with that in the sense of how

01:47:58 does the brain adapt to something like that? And as we were learning, the brain is quite,

01:48:04 in terms of neuroplasticity, is pretty malleable. Correct. So it’s going to adjust. Correct. So the

01:48:10 machine learning side, the computer side is going to adjust, and then the brain is going to adjust.

01:48:14 Exactly. And then what soup does this land us into? The kind of hallucinations you might get

01:48:20 from this that might be pretty intense. Just connecting to all of Wikipedia. It’s interesting

01:48:28 whether we need to be able to figure out the basic protocol of the brain’s communication schemes

01:48:34 in order to get them to the machine and the brain to talk. Because another possibility is the brain

01:48:41 actually just adjust to whatever the heck the computer is doing. Exactly. That’s the way I think

01:48:45 that I find that to be a more promising way. It’s basically saying, okay, attach electrodes

01:48:51 to some part of the cortex. Maybe if it is done from birth, the brain will adapt. It says that

01:48:58 that part is not damaged. It was not used for anything. These electrodes are attached there.

01:49:02 And now you train that part of the brain to do this high bandwidth communication between

01:49:09 something else. And if you do it like that, then it is brain adapting to… And of course,

01:49:15 your external system is designed so that it is adaptable. Just like we designed computers

01:49:21 or mouse, keyboard, all of them to be interacting with humans. So of course, that feedback system

01:49:28 is designed to be human compatible, but now it is not trying to record from all of the brain.

01:49:37 And now two systems trying to adapt to each other. It’s the brain adapting into one way.

01:49:44 That’s fascinating. The brain is connected to the internet. Just imagine just connecting it

01:49:51 to Twitter and just taking that stream of information. Yeah. But again, if we take a

01:49:59 step back, I don’t know what your intuition is. I feel like that is not as hard of a problem as the

01:50:08 doing it safely. There’s a huge barrier to surgery because the biological system, it’s a mush of

01:50:19 like weird stuff. So that the surgery part of it, biology part of it, the longterm repercussions

01:50:26 part of it. I don’t know what else will… We often find after a long time in biology that,

01:50:35 okay, that idea was wrong. So people used to cut off the gland called the thymus or something.

01:50:43 And then they found that, oh no, that actually causes cancer.

01:50:50 And then there’s a subtle like millions of variables involved. But this whole process,

01:50:55 the nice thing, just like again with Elon, just like colonizing Mars, seems like a ridiculously

01:51:02 difficult idea. But in the process of doing it, we might learn a lot about the biology of the

01:51:08 neurobiology of the brain, the neuroscience side of things. It’s like, if you want to learn

01:51:13 something, do the most difficult version of it and see what you learn. The intermediate steps

01:51:19 that they are taking sounded all very reasonable to me. It’s great. Well, but like everything with

01:51:25 Elon is the timeline seems insanely fast. So that’s the only awful question. Well,

01:51:34 we’ve been talking about cognition a little bit. So like reasoning,

01:51:38 we haven’t mentioned the other C word, which is consciousness. Do you ever think about that one?

01:51:43 Is that useful at all in this whole context of what it takes to create an intelligent reasoning

01:51:51 being? Or is that completely outside of your, like the engineering perspective of intelligence?

01:51:58 It is not outside the realm, but it doesn’t on a day to day basis inform what we do,

01:52:05 but it’s more, so in many ways, the company name is connected to this idea of consciousness.

01:52:12 What’s the company name? Vicarious. So Vicarious is the company name. And so what does Vicarious

01:52:19 mean? At the first level, it is about modeling the world and it is internalizing the external actions.

01:52:29 So you interact with the world and learn a lot about the world. And now after having learned

01:52:34 a lot about the world, you can run those things in your mind without actually having to act

01:52:42 in the world. So you can run things vicariously just in your brain. And similarly, you can

01:52:48 experience another person’s thoughts by having a model of how that person works

01:52:54 and running there, putting yourself in some other person’s shoes. So that is being vicarious.

01:53:01 Now it’s the same modeling apparatus that you’re using to model the external world

01:53:06 or some other person’s thoughts. You can turn it to yourself. If that same modeling thing is

01:53:14 applied to your own modeling apparatus, then that is what gives rise to consciousness, I think.

01:53:21 Well, that’s more like self awareness. There’s the hard problem of consciousness, which is

01:53:25 when the model feels like something, when this whole process is like you really are in it.

01:53:37 You feel like an entity in this world. Not just you know that you’re an entity, but it feels like

01:53:43 something to be that entity. And thereby, we attribute this. Then it starts to be where

01:53:54 something that has consciousness can suffer. You start to have these kinds of things that we can

01:53:59 reason about that is much heavier. It seems like there’s much greater cost to your decisions.

01:54:09 And mortality is tied up into that. The fact that these things end. First of all, I end at some

01:54:18 point, and then other things end. That somehow seems to be, at least for us humans, a deep

01:54:27 motivator. That idea of motivation in general, we talk about goals in AI, but goals aren’t quite

01:54:38 the same thing as our mortality. It feels like, first of all, humans don’t have a goal, and they

01:54:46 just kind of create goals at different levels. They make up goals because we’re terrified by

01:54:54 the mystery of the thing that gets us all. We make these goals up. We’re like a goal generation

01:55:02 machine, as opposed to a machine which optimizes the trajectory towards a singular goal. It feels

01:55:10 like that’s an important part of cognition, that whole mortality thing. Well, it is a part of human

01:55:18 cognition, but there is no reason for that mortality to come to the equation for an artificial

01:55:30 system, because we can copy the artificial system. The problem with humans is that I can’t clone

01:55:36 you. Even if I clone you as the hardware, your experience that was stored in your brain,

01:55:45 your episodic memory, all those will not be captured in the new clone. But that’s not the

01:55:52 same with an AI system. But it’s also possible that the thing that you mentioned with us humans

01:56:02 is actually of fundamental importance for intelligence. The fact that you can copy an AI

01:56:07 system means that that AI system is not yet an AGI. If you look at existence proof, if we reason

01:56:18 based on existence proof, you could say that it doesn’t feel like death is a fundamental property

01:56:24 of an intelligent system. But we don’t yet. Give me an example of an immortal intelligent being.

01:56:33 We don’t have those. It’s very possible that that is a fundamental property of intelligence,

01:56:42 is a thing that has a deadline for itself. Well, you can think of it like this. Suppose you invent

01:56:49 a way to freeze people for a long time. It’s not dying. So you can be frozen and woken up

01:56:58 thousands of years from now. So it’s no fear of death. Well, no, it’s not about time. It’s about

01:57:08 the knowledge that it’s temporary. And that aspect of it, the finiteness of it, I think

01:57:17 creates a kind of urgency. Correct. For us, for humans. Yeah, for humans. Yes. And that is part

01:57:23 of our drives. And that’s why I’m not too worried about AI having motivations to kill all humans

01:57:35 and those kinds of things. Why? Just wait. So why do you need to do that? I’ve never heard that

01:57:43 before. That’s a good point. Yeah, just murder seems like a lot of work. Let’s just wait it out.

01:57:52 They’ll probably hurt themselves. Let me ask you, people often kind of wonder, world class researchers

01:58:01 such as yourself, what kind of books, technical fiction, philosophical, had an impact on you and

01:58:10 your life and maybe ones you could possibly recommend that others read? Maybe if you have

01:58:17 three books that pop into mind. Yeah. So I definitely liked Judea Pearl’s book,

01:58:23 Probabilistic Reasoning and Intelligent Systems. It’s a very deep technical book. But what I liked

01:58:30 is that, so there are many places where you can learn about probabilistic graphical models from.

01:58:36 But throughout this book, Judea Pearl kind of sprinkles his philosophical observations and he

01:58:42 thinks about, connects us to how the brain thinks and attentions and resources, all those things. So

01:58:48 that whole thing makes it more interesting to read. He emphasizes the importance of causality.

01:58:54 So that was in his later book. So this was the first book, Probabilistic Reasoning and Intelligent

01:58:58 Systems. He mentions causality, but he hadn’t really sunk his teeth into causality. But he

01:59:05 really sunk his teeth into, how do you actually formalize it? And the second book,

01:59:11 Causality, the one in 2000, that one is really hard. So I would recommend that.

01:59:17 Yeah. So that looks at the mathematical, his model of…

01:59:22 Do calculus.

01:59:23 Do calculus. Yeah. It was pretty dense mathematically.

01:59:25 Right. The book of Y is definitely more enjoyable.

01:59:28 For sure.

01:59:29 Yeah. So I would recommend Probabilistic Reasoning and Intelligent Systems.

01:59:34 Another book I liked was one from Doug Hofstadter. This was a long time ago. He had a book,

01:59:41 I think it was called The Mind’s Eye. It was probably Hofstadter and Daniel Dennett together.

01:59:49 Yeah. And I actually was, I bought that book. It’s on my show. I haven’t read it yet,

01:59:54 but I couldn’t get an electronic version of it, which is annoying because you read everything on

02:00:00 Kindle. So you had to actually purchase the physical. It’s one of the only physical books

02:00:06 I have because anyway, a lot of people recommended it highly. So yeah.

02:00:11 And the third one I would definitely recommend reading is, this is not a technical book. It is

02:00:18 history. The name of the book, I think, is Bishop’s Boys. It’s about Wright brothers

02:00:25 and their path and how it was… There are multiple books on this topic and all of them

02:00:34 are great. It’s fascinating how flight was treated as an unsolvable problem. And also,

02:00:46 what aspects did people emphasize? People thought, oh, it is all about

02:00:51 just powerful engines. You just need to have powerful lightweight engines. And so some people

02:01:00 thought of it as, how far can we just throw the thing? Just throw it.

02:01:04 Like a catapult.

02:01:05 Yeah. So it’s very fascinating. And even after they made the invention,

02:01:11 people are not believing it.

02:01:13 Ah, the social aspect of it.

02:01:15 The social aspect. It’s very fascinating.

02:01:18 I mean, do you draw any parallels between birds fly? So there’s the natural approach to flight

02:01:28 and then there’s the engineered approach. Do you see the same kind of thing with the brain

02:01:33 and our trying to engineer intelligence?

02:01:37 Yeah. It’s a good analogy to have. Of course, all analogies have their limits.

02:01:43 So people in AI often use airplanes as an example of, hey, we didn’t learn anything from birds.

02:01:55 But the funny thing is that, and the saying is, airplanes don’t flap wings. This is what they

02:02:02 say. The funny thing and the ironic thing is that you don’t need to flap to fly is something

02:02:09 Wright brothers found by observing birds. So they have in their notebook, in some of these books,

02:02:18 they show their notebook drawings. They make detailed notes about buzzards just soaring over

02:02:26 thermals. And they basically say, look, flapping is not the important, propulsion is not the

02:02:31 important problem to solve here. We want to solve control. And once you solve control,

02:02:37 propulsion will fall into place. All of these are people, they realize this by observing birds.

02:02:44 Beautifully put. That’s actually brilliant because people do use that analogy a lot. I’m

02:02:49 going to have to remember that one. Do you have advice for people interested in artificial

02:02:54 intelligence like young folks today? I talk to undergraduate students all the time,

02:02:59 interested in neuroscience, interested in understanding how the brain works. Is there

02:03:03 advice you would give them about their career, maybe about their life in general?

02:03:09 Sure. I think every piece of advice should be taken with a pinch of salt, of course,

02:03:14 because each person is different, their motivations are different. But I can definitely

02:03:20 say if your goal is to understand the brain from the angle of wanting to build one, then

02:03:28 being an experimental neuroscientist might not be the way to go about it. A better way to pursue it

02:03:36 might be through computer science, electrical engineering, machine learning, and AI. And of

02:03:42 course, you have to study the neuroscience, but that you can do on your own. If you’re more

02:03:48 attracted by finding something intriguing about, discovering something intriguing about the brain,

02:03:53 then of course, it is better to be an experimentalist. So find that motivation,

02:03:58 what are you intrigued by? And of course, find your strengths too. Some people are very good

02:04:03 experimentalists and they enjoy doing that. And it’s interesting to see which department,

02:04:10 if you’re picking in terms of your education path, whether to go with like, at MIT, it’s

02:04:18 brain and computer, no, it’d be CS. Yeah. Brain and cognitive sciences, yeah. Or the CS side of

02:04:29 things. And actually the brain folks, the neuroscience folks are more and more now

02:04:34 embracing of learning TensorFlow and PyTorch, right? They see the power of trying to engineer

02:04:44 ideas that they get from the brain into, and then explore how those could be used to create

02:04:52 intelligent systems. So that might be the right department actually. Yeah. So this was a question

02:04:58 in one of the Redwood Neuroscience Institute workshops that Jeff Hawkins organized almost 10

02:05:06 years ago. This question was put to a panel, right? What should be the undergrad major you should

02:05:11 take if you want to understand the brain? And the majority opinion in that one was electrical

02:05:17 engineering. Interesting. Because, I mean, I’m a double undergrad, so I got lucky in that way.

02:05:25 But I think it does have some of the right ingredients because you learn about circuits.

02:05:30 You learn about how you can construct circuits to approach, do functions. You learn about

02:05:37 microprocessors. You learn information theory. You learn signal processing. You learn continuous

02:05:43 math. So in that way, it’s a good step. If you want to go to computer science or neuroscience,

02:05:50 it’s a good step. The downside, you’re more likely to be forced to use MATLAB.

02:05:56 You’re more likely to be forced to use MATLAB. So one of the interesting things about, I mean,

02:06:07 this is changing. The world is changing. But certain departments lagged on the programming

02:06:13 side of things, on developing good habits in terms of software engineering. But I think that’s more

02:06:19 and more changing. And students can take that into their own hands, like learn to program. I feel

02:06:26 like everybody should learn to program because it, like everyone in the sciences, because it

02:06:34 empowers, it puts the data at your fingertips. So you can organize it. You can find all kinds of

02:06:40 things in the data. And then you can also, for the appropriate sciences, build systems that,

02:06:46 like based on that. So like then engineer intelligent systems.

02:06:49 We already talked about mortality. So we hit a ridiculous point. But let me ask you,

02:07:04 one of the things about intelligence is it’s goal driven. And you study the brain. So the question

02:07:13 is like, what’s the goal that the brain is operating under? What’s the meaning of it all

02:07:17 for us humans in your view? What’s the meaning of life? The meaning of life is whatever you

02:07:23 construct out of it. It’s completely open. It’s open. So there’s nothing, like you mentioned,

02:07:31 you like constraints. So it’s wide open. Is there some useful aspect that you think about in terms

02:07:42 of like the openness of it and just the basic mechanisms of generating goals in studying

02:07:50 cognition in the brain that you think about? Or is it just about, because everything we’ve talked

02:07:56 about kind of the perception system is to understand the environment. That’s like to be

02:08:00 able to like not die, like not fall over and like be able to, you don’t think we need to

02:08:09 think about anything bigger than that. Yeah, I think so, because it’s basically being able to

02:08:16 understand the machinery of the world such that you can pursue whatever goals you want.

02:08:21 So the machinery of the world is really ultimately what we should be striving to understand. The

02:08:26 rest is just whatever the heck you want to do or whatever fun you have.

02:08:31 One who is culturally popular. I think that’s beautifully put. I don’t think there’s a better

02:08:42 way to end it. Dilip, I’m so honored that you show up here and waste your time with me. It’s

02:08:49 been an awesome conversation. Thanks so much for talking today. Oh, thank you so much. This was

02:08:54 so much more fun than I expected. Thank you. Thanks for listening to this conversation with

02:09:00 Dilip George. And thank you to our sponsors, Babbel, Raycon Earbuds, and Masterclass. Please

02:09:07 consider supporting this podcast by going to babbel.com and use code LEX, going to buyraycon.com

02:09:16 and signing up at masterclass.com. Click the links, get the discount. It really is the best

02:09:22 way to support this podcast. If you enjoy this thing, subscribe on YouTube, review the Five

02:09:27 Stars Napa podcast, support it on Patreon, or connect with me on Twitter at Lex Friedman,

02:09:33 spelled yes, without the E, just F R I D M A M. And now let me leave you with some words from Marcus

02:09:43 Aurelius. You have power over your mind, not outside events. Realize this and you will find

02:09:51 strength. Thank you for listening and hope to see you next time.