Transcript
00:00:00 The following is a conversation with Dilip George, a researcher at the intersection of
00:00:05 Neuroscience and Artificial Intelligence, cofounder of Vicarious with Scott Phoenix,
00:00:10 and formerly cofounder of Numenta with Jeff Hawkins, who’s been on this podcast, and
00:00:16 Donna Dubinsky. From his early work on hierarchical temporal memory to recursive cortical networks
00:00:23 to today, Dilip’s always sought to engineer intelligence that is closely inspired by the
00:00:29 human brain. As a side note, I think we understand very little about the fundamental principles
00:00:35 underlying the function of the human brain, but the little we do know gives hints that may be
00:00:41 more useful for engineering intelligence than any idea in mathematics, computer science, physics,
00:00:46 and scientific fields outside of biology. And so the brain is a kind of existence proof that says
00:00:53 it’s possible. Keep at it. I should also say that brain inspired AI is often overhyped and use this
00:01:01 fodder just as quantum computing for marketing speak, but I’m not afraid of exploring these
00:01:08 sometimes overhyped areas since where there’s smoke, there’s sometimes fire.
00:01:13 Quick summary of the ads. Three sponsors, Babbel, Raycon Earbuds, and Masterclass. Please consider
00:01:20 supporting this podcast by clicking the special links in the description to get the discount.
00:01:25 It really is the best way to support this podcast. If you enjoy this thing, subscribe on YouTube,
00:01:31 review it with five stars on Apple Podcast, support on Patreon, or connect with me on Twitter
00:01:36 at Lex Friedman. As usual, I’ll do a few minutes of ads now and never any ads in the middle that
00:01:42 can break the flow of the conversation. This show is sponsored by Babbel, an app and website that
00:01:48 gets you speaking in a new language within weeks. Go to babbel.com and use code LEX to get three
00:01:54 months free. They offer 14 languages, including Spanish, French, Italian, German, and yes, Russian.
00:02:03 Daily lessons are 10 to 15 minutes, super easy, effective, designed by over 100 language experts.
00:02:10 Let me read a few lines from the Russian poem Noch ulytse fanar apteka by Alexander Bloc
00:02:18 that you’ll start to understand if you sign up to Babbel.
00:02:34 Now I say that you’ll only start to understand this poem because Russian starts with a language
00:02:41 and ends with vodka. Now the latter part is definitely not endorsed or provided by Babbel
00:02:47 and will probably lose me the sponsorship, but once you graduate from Babbel,
00:02:51 you can enroll in my advanced course of late night Russian conversation over vodka.
00:02:56 I have not yet developed an app for that. It’s in progress. So get started by visiting babbel.com
00:03:02 and use code LEX to get three months free. This show is sponsored by Raycon earbuds.
00:03:09 Get them at buyraycon.com slash LEX. They become my main method of listening to podcasts,
00:03:14 audiobooks, and music when I run, do pushups and pull ups, or just living life. In fact,
00:03:20 I often listen to brown noise with them when I’m thinking deeply about something. It helps me focus.
00:03:26 They’re super comfortable, pair easily, great sound, great bass, six hours of playtime.
00:03:33 I’ve been putting in a lot of miles to get ready for a potential ultra marathon
00:03:38 and listening to audiobooks on World War II. The sound is rich and really comes in clear.
00:03:45 So again, get them at buyraycon.com slash LEX. This show is sponsored by Masterclass.
00:03:52 Sign up at masterclass.com slash LEX to get a discount and to support this podcast.
00:03:57 When I first heard about Masterclass, I thought it was too good to be true. I still think it’s
00:04:02 too good to be true. For 180 bucks a year, you get an all access pass to watch courses from
00:04:08 to list some of my favorites. Chris Hatfield on Space Exploration, Neil deGrasse Tyson on
00:04:13 Scientific Thinking and Communication, Will Wright, creator of SimCity and Sims on Game Design.
00:04:19 Every time I do this read, I really want to play a city builder game. Carlos Santana on guitar,
00:04:26 Garak Kasparov on chess, Daniel Nagano on poker and many more. Chris Hatfield explaining how rockets
00:04:32 work and the experience of being launched into space alone is worth the money. By the way,
00:04:38 you can watch it on basically any device. Once again, sign up at masterclass.com to get a discount
00:04:43 and to support this podcast. And now here’s my conversation with Dileep George. Do you think
00:04:50 we need to understand the brain in order to build it? Yes. If you want to build the brain, we
00:04:56 definitely need to understand how it works. Blue Brain or Henry Markram’s project is trying to
00:05:04 build a brain without understanding it, just trying to put details of the brain from neuroscience
00:05:11 experiments into a giant simulation by putting more and more neurons, more and more details.
00:05:18 But that is not going to work because when it doesn’t perform as what you expect it to do,
00:05:26 then what do you do? You just keep adding more details. How do you debug it? So unless you
00:05:32 understand, unless you have a theory about how the system is supposed to work, how the pieces are
00:05:37 supposed to fit together, what they’re going to contribute, you can’t build it. At the functional
00:05:42 level, understand. So can you actually linger on and describe the Blue Brain project? It’s kind of
00:05:48 a fascinating principle and idea to try to simulate the brain. We’re talking about the human
00:05:56 brain, right? Right. Human brains and rat brains or cat brains have lots in common that the cortex,
00:06:03 the neocortex structure is very similar. So initially they were trying to just simulate
00:06:11 a cat brain. To understand the nature of evil. To understand the nature of evil. Or as it happens
00:06:21 in most of these simulations, you easily get one thing out, which is oscillations. If you simulate
00:06:29 a large number of neurons, they oscillate and you can adjust the parameters and say that,
00:06:35 oh, oscillations match the rhythm that we see in the brain, et cetera. I see. So the idea is,
00:06:43 is the simulation at the level of individual neurons? Yeah. So the Blue Brain project,
00:06:49 the original idea as proposed was you put very detailed biophysical neurons, biophysical models
00:06:59 of neurons, and you interconnect them according to the statistics of connections that we have found
00:07:06 from real neuroscience experiments, and then turn it on and see what happens. And these neural
00:07:14 models are incredibly complicated in themselves, right? Because these neurons are modeled using
00:07:22 this idea called Hodgkin Huxley models, which are about how signals propagate in a cable.
00:07:28 And there are active dendrites, all those phenomena, which those phenomena themselves,
00:07:34 we don’t understand that well. And then we put in connectivity, which is part guesswork,
00:07:40 part observed. And of course, if we do not have any theory about how it is supposed to work,
00:07:48 we just have to take whatever comes out of it as, okay, this is something interesting.
00:07:54 But in your sense, these models of the way signal travels along,
00:07:59 like with the axons and all the basic models, they’re too crude.
00:08:04 Oh, well, actually, they are pretty detailed and pretty sophisticated. And they do replicate
00:08:12 the neural dynamics. If you take a single neuron and you try to turn on the different channels,
00:08:20 the calcium channels and the different receptors, and see what the effect of turning on or off those
00:08:28 channels are in the neuron’s spike output, people have built pretty sophisticated models of that.
00:08:35 And they are, I would say, in the regime of correct.
00:08:41 Well, see, the correctness, that’s interesting, because you mentioned at several levels,
00:08:45 the correctness is measured by looking at some kind of aggregate statistics.
00:08:49 It would be more of the spiking dynamics of a signal neuron.
00:08:53 Spiking dynamics of a signal neuron, okay.
00:08:54 Yeah. And yeah, these models, because they are going to the level of mechanism,
00:09:00 so they are basically looking at, okay, what is the effect of turning on an ion channel?
00:09:07 And you can model that using electric circuits. So it is not just a function fitting. People are
00:09:17 looking at the mechanism underlying it and putting that in terms of electric circuit theory, signal
00:09:23 propagation theory, and modeling that. So those models are sophisticated, but getting a single
00:09:31 neurons model 99% right does not still tell you how to… It would be the analog of getting a
00:09:40 transistor model right and now trying to build a microprocessor. And if you did not understand how
00:09:50 a microprocessor works, but you say, oh, I now can model one transistor well, and now I will just
00:09:57 try to interconnect the transistors according to whatever I could guess from the experiments
00:10:03 and try to simulate it, then it is very unlikely that you will produce a functioning microprocessor.
00:10:12 When you want to produce a functioning microprocessor, you want to understand Boolean
00:10:16 logic, how do the gates work, all those things, and then understand how do those gates get
00:10:22 implemented using transistors. Yeah. This reminds me, there’s a paper,
00:10:26 maybe you’re familiar with it, that I remember going through in a reading group that
00:10:31 approaches a microprocessor from a perspective of a neuroscientist. I think it basically,
00:10:38 it uses all the tools that we have of neuroscience to try to understand,
00:10:42 like as if we just aliens showed up to study computers and to see if those tools could be
00:10:49 used to get any kind of sense of how the microprocessor works. I think the final,
00:10:54 the takeaway from at least this initial exploration is that we’re screwed. There’s no
00:11:01 way that the tools of neuroscience would be able to get us to anything, like not even
00:11:05 Boolean logic. I mean, it’s just any aspect of the architecture of the function of the
00:11:15 processes involved, the clocks, the timing, all that, you can’t figure that out from the
00:11:21 tools of neuroscience. Yeah. So I’m very familiar with this particular
00:11:25 paper. I think it was called, can a neuroscientist understand a microprocessor or something like
00:11:33 that. Following the methodology in that paper, even an electrical engineer would not understand
00:11:39 microprocessors. So I don’t think it is that bad in the sense of saying, neuroscientists do
00:11:49 find valuable things by observing the brain. They do find good insights, but those insights cannot
00:11:58 be put together just as a simulation. You have to investigate what are the computational
00:12:05 underpinnings of those findings. How do all of them fit together from an information processing
00:12:13 and information processing perspective? Somebody has to painstakingly put those things together
00:12:21 and build hypothesis. So I don’t want to diss all of neuroscientists saying, oh, they’re not
00:12:26 finding anything. No, that paper almost went to that level of neuroscientists will never
00:12:31 understand. No, that’s not true. I think they do find lots of useful things, but it has to be put
00:12:37 together in a computational framework. Yeah. I mean, but you know, just the AI systems will be
00:12:43 listening to this podcast a hundred years from now and they will probably, there’s some nonzero
00:12:50 probability they’ll find your words laughable. There’s like, I remember humans thought they
00:12:55 understood something about the brain. They were totally clueless. There’s a sense about neuroscience
00:12:59 that we may be in the very, very early days of understanding the brain. But I mean, that’s one
00:13:06 perspective. I mean, in your perspective, how far are we into understanding any aspect of the brain?
00:13:18 So the, the, the dynamics of the individual neuron communication to the, how when they, in,
00:13:24 in a collective sense, how they’re able to store information, transfer information, how
00:13:31 intelligence then emerges, all that kind of stuff. Where are we on that timeline?
00:13:35 Yeah. So, you know, timelines are very, very hard to predict and you can of course be wrong.
00:13:40 And it can be wrong in, on either side. You know, we know that now when we look back the first
00:13:48 flight was in 1903. In 1900, there was a New York Times article on flying machines that do not fly
00:13:57 and, and you know, humans might not fly for another a hundred years. That was what that
00:14:03 article stated. And so, but no, they, they flew three years after that. So it is, you know,
00:14:08 it’s very hard to, so… Well, and on that point, one of the Wright brothers,
00:14:15 I think two years before, said that, like he said, like some number, like 50 years,
00:14:23 he has become convinced that it’s, it’s, it’s impossible. Even during their experimentation.
00:14:31 Yeah. Yeah. I mean, that’s a tribute to when that’s like the entrepreneurial battle of like
00:14:36 depression of going through, just like thinking there’s, this is impossible, but there, yeah,
00:14:41 there’s something, even the person that’s in it is not able to see estimate correctly.
00:14:47 Exactly. But I can, I can tell from the point of, you know, objectively, what are the things that we
00:14:52 know about the brain and how that can be used to build AI models, which can then go back and
00:14:58 inform how the brain works. So my way of understanding the brain would be to basically say,
00:15:04 look at the insights neuroscientists have found, understand that from a computational angle,
00:15:11 information processing angle, build models using that. And then building that model, which,
00:15:18 which functions, which is a functional model, which is, which is doing the task that we want
00:15:22 the model to do. It is not just trying to model a phenomena in the brain. It is, it is trying to
00:15:27 do what the brain is trying to do on the, on the whole functional level. And building that model
00:15:33 will help you fill in the missing pieces that, you know, biology just gives you the hints and
00:15:39 building the model, you know, fills in the rest of the, the pieces of the puzzle. And then you
00:15:44 can go and connect that back to biology and say, okay, now it makes sense that this part of the
00:15:51 brain is doing this, or this layer in the cortical circuit is doing this. And then continue this
00:15:59 iteratively because now that will inform new experiments in neuroscience. And of course,
00:16:05 you know, building the model and verifying that in the real world will also tell you more about,
00:16:11 does the model actually work? And you can refine the model, find better ways of putting these
00:16:17 neuroscience insights together. So, so I would say it is, it is, you know, it, so
00:16:23 neuroscientists alone, just from experimentation will not be able to build a model of the,
00:16:28 of the brain or a functional model of the brain. So we, you know, there, there’s lots of efforts,
00:16:35 which are very impressive efforts in collecting more and more connectivity data from the brain.
00:16:41 You know, how, how are the microcircuits of the brain connected with each other?
00:16:45 Those are beautiful, by the way.
00:16:47 Those are beautiful. And at the same time, those, those do not itself by themselves,
00:16:54 convey the story of how does it work? And, and somebody has to understand, okay,
00:17:00 why are they connected like that? And what, what are those things doing? And, and we do that by
00:17:06 building models in AI using hints from neuroscience and, and repeat the cycle.
00:17:11 So what aspect of the brain are useful in this whole endeavor, which by the way, I should say,
00:17:18 you’re, you’re both a neuroscientist and an AI person. I guess the dream is to both understand
00:17:24 the brain and to build AGI systems. So you’re, it’s like an engineer’s perspective of trying
00:17:32 to understand the brain. So what aspects of the brain, functionally speaking, like you said,
00:17:37 do you find interesting?
00:17:38 Yeah, quite a lot of things. All right. So one is, you know, if you look at the visual cortex
00:17:46 and, and, you know, the visual cortex is, is a large part of the brain. I forget the exact
00:17:51 fraction, but it is, it’s a huge part of our brain area is occupied by just, just vision.
00:17:59 So vision, visual cortex is not just a feed forward cascade of neurons. There are a lot
00:18:06 more feedback connections in the brain compared to the feed forward connections. And, and it is
00:18:11 surprising to the level of detail neuroscientists have actually studied this. If you, if you go into
00:18:17 neuroscience literature and poke around and ask, you know, have they studied what will be the effect
00:18:22 of poking a neuron in level IT in level V1? And have they studied that? And you will say, yes,
00:18:33 they have studied that.
00:18:34 So every part of every possible combination.
00:18:38 I mean, it’s, it’s a, it’s not a random exploration at all. It’s a very hypothesis driven,
00:18:43 right? Like they, they are very experimental. Neuroscientists are very, very systematic
00:18:47 in how they probe the brain because experiments are very costly to conduct. They take a lot of
00:18:52 preparation. They, they need a lot of control. So they, they are very hypothesis driven in how
00:18:57 they probe the brain. And often what I find is that when we have a question in AI about
00:19:05 has anybody probed how lateral connections in the brain works? And when you go and read the
00:19:11 literature, yes, people have probed it and people have probed it very systematically. And, and they
00:19:16 have hypotheses about how those lateral connections are supposedly contributing to visual processing.
00:19:23 But of course they haven’t built very, very functional, detailed models of it.
00:19:27 By the way, how do the, in those studies, sorry to interrupt, do they, do they stimulate like
00:19:32 a neuron in one particular area of the visual cortex and then see how the travel of the signal
00:19:37 travels kind of thing?
00:19:38 Fascinating, very, very fascinating experiments. So I can, I can give you one example I was
00:19:43 impressed with. This is, so before going to that, let me, let me give you, you know, a overview of
00:19:50 how the, the layers in the cortex are organized, right? Visual cortex is organized into roughly
00:19:56 four hierarchical levels. Okay. So V1, V2, V4, IT. And in V1…
00:20:02 What happened to V3?
00:20:03 Well, yeah, that’s another pathway. Okay. So this is, this, I’m talking about just object
00:20:08 recognition pathway.
00:20:09 All right, cool.
00:20:10 And then in V1 itself, so it’s, there is a very detailed microcircuit in V1 itself. That is,
00:20:19 there is organization within a level itself. The cortical sheet is organized into, you know,
00:20:25 multiple layers and there are columnar structure. And, and this, this layer wise and columnar
00:20:31 structure is repeated in V1, V2, V4, IT, all of them, right? And, and the connections between
00:20:38 these layers within a level, you know, in V1 itself, there are six layers roughly, and the
00:20:44 connections between them, there is a particular structure to them. And now, so one example
00:20:51 of an experiment people did is when I, when you present a stimulus, which is, let’s say,
00:21:00 requires separating the foreground from the background of an object. So it is, it’s a
00:21:06 textured triangle on a textured background. And you can check, does the surface settle
00:21:14 first or does the contour settle first?
00:21:19 Settle?
00:21:19 Settle in the sense that the, so when you finally form the percept of the, of the triangle,
00:21:28 you understand where the contours of the triangle are, and you also know where the inside of
00:21:32 the triangle is, right? That’s when you form the final percept. Now you can ask, what is
00:21:39 the dynamics of forming that final percept? Do the, do the neurons first find the edges
00:21:48 and converge on where the edges are, and then they find the inner surfaces, or does it go
00:21:55 the other way around?
00:21:55 The other way around. So what’s the answer?
00:21:58 In this case, it turns out that it first settles on the edges. It converges on the edge hypothesis
00:22:05 first, and then the surfaces are filled in from the edges to the inside.
00:22:10 That’s fascinating.
00:22:12 And the detail to which you can study this, it’s amazing that you can actually not only
00:22:18 find the temporal dynamics of when this happens, and then you can also find which layer in
00:22:25 the, you know, in V1, which layer is encoding the edges, which layer is encoding the surfaces,
00:22:32 and which layer is encoding the feedback, which layer is encoding the feed forward,
00:22:37 and what’s the combination of them that produces the final percept.
00:22:42 And these kinds of experiments stand out when you try to explain illusions. One example
00:22:48 of a favorite illusion of mine is the Kanitsa triangle. I don’t know that you are familiar
00:22:51 with this one. So this is an example where it’s a triangle, but only the corners of the
00:23:00 triangle are shown in the stimulus. So they look like kind of Pacman.
00:23:06 Oh, the black Pacman.
00:23:07 Exactly.
00:23:08 And then you start to see.
00:23:10 Your visual system hallucinates the edges. And when you look at it, you will see a faint
00:23:16 edge. And you can go inside the brain and look, do actually neurons signal the presence
00:23:24 of this edge? And if they signal, how do they do it? Because they are not receiving anything
00:23:30 from the input. The input is blank for those neurons. So how do they signal it? When does
00:23:37 the signaling happen? So if a real contour is present in the input, then the neurons
00:23:45 immediately signal, okay, there is an edge here. When it is an illusory edge, it is clearly
00:23:52 not in the input. It is coming from the context. So those neurons fire later. And you can say
00:23:58 that, okay, it’s the feedback connection that is causing them to fire. And they happen later.
00:24:05 And I’ll find the dynamics of them. So these studies are pretty impressive and very detailed.
00:24:13 So by the way, just a step back, you said that there may be more feedback connections
00:24:20 than feed forward connections. First of all, if it’s just for like a machine learning folks,
00:24:27 I mean, that’s crazy that there’s all these feedback connections. We often think about,
00:24:36 thanks to deep learning, you start to think about the human brain as a kind of feed forward
00:24:42 mechanism. So what the heck are these feedback connections? What’s the dynamics? What are we
00:24:52 supposed to think about them? So this fits into a very beautiful picture about how the brain works.
00:24:59 So the beautiful picture of how the brain works is that our brain is building a model of the world.
00:25:06 I know. So our visual system is building a model of how objects behave in the world. And we are
00:25:13 constantly projecting that model back onto the world. So what we are seeing is not just a feed
00:25:20 forward thing that just gets interpreted in a feed forward part. We are constantly projecting
00:25:25 our expectations onto the world. And what the final person is a combination of what we project
00:25:31 onto the world combined with what the actual sensory input is. Almost like trying to calculate
00:25:37 the difference and then trying to interpret the difference. Yeah. I wouldn’t put this calculating
00:25:44 the difference. It’s more like what is the best explanation for the input stimulus based on the
00:25:50 model of the world I have. Got it. And that’s where all the illusions come in. But that’s an
00:25:56 incredibly efficient process. So the feedback mechanism, it just helps you constantly. Yeah.
00:26:05 So hallucinate how the world should be based on your world model and then just looking at
00:26:11 if there’s novelty, like trying to explain it. Hence, that’s why movement. We detect movement
00:26:19 really well. There’s all these kinds of things. And this is like at all different levels of the
00:26:25 cortex you’re saying. This happens at the lowest level or the highest level. Yes. Yeah. In fact,
00:26:30 feedback connections are more prevalent in everywhere in the cortex. And so one way to
00:26:36 think about it, and there’s a lot of evidence for this, is inference. So basically, if you have a
00:26:42 model of the world and when some evidence comes in, what you are doing is inference. You are trying
00:26:50 to now explain this evidence using your model of the world. And this inference includes projecting
00:26:58 your model onto the evidence and taking the evidence back into the model and doing an
00:27:04 iterative procedure. And this iterative procedure is what happens using the feed forward feedback
00:27:11 propagation. And feedback affects what you see in the world, and it also affects feed forward
00:27:17 propagation. And examples are everywhere. We see these kinds of things everywhere. The idea that
00:27:25 there can be multiple competing hypotheses in our model trying to explain the same evidence,
00:27:32 and then you have to kind of make them compete. And one hypothesis will explain away the other
00:27:39 hypothesis through this competition process. So you have competing models of the world
00:27:46 that try to explain. What do you mean by explain away?
00:27:50 So this is a classic example in graphical models, probabilistic models.
00:27:56 What are those?
00:28:01 I think it’s useful to mention because we’ll talk about them more.
00:28:05 So neural networks are one class of machine learning models. You have distributed set of
00:28:12 nodes, which are called the neurons. Each one is doing a dot product and you can approximate
00:28:18 any function using this multilevel network of neurons. So that’s a class of models which are
00:28:24 useful for function approximation. There is another class of models in machine learning
00:28:30 called probabilistic graphical models. And you can think of them as each node in that model is
00:28:38 variable, which is talking about something. It can be a variable representing, is an edge present
00:28:46 in the input or not? And at the top of the network, a node can be representing, is there an object
00:28:56 present in the world or not? So it is another way of encoding knowledge. And then once you
00:29:06 encode the knowledge, you can do inference in the right way. What is the best way to
00:29:15 explain some set of evidence using this model that you encoded? So when you encode the model,
00:29:20 you are encoding the relationship between these different variables. How is the edge
00:29:24 connected to the model of the object? How is the surface connected to the model of the object?
00:29:29 And then, of course, this is a very distributed, complicated model. And inference is, how do you
00:29:37 explain a piece of evidence when a set of stimulus comes in? If somebody tells me there is a 50%
00:29:42 probability that there is an edge here in this part of the model, how does that affect my belief
00:29:47 on whether I should think that there is a square present in the image? So this is the process of
00:29:54 inference. So one example of inference is having this expiring away effect between multiple causes.
00:30:02 So graphical models can be used to represent causality in the world. So let’s say, you know,
00:30:10 your alarm at home can be triggered by a burglar getting into your house, or it can be triggered
00:30:22 by an earthquake. Both can be causes of the alarm going off. So now, you’re in your office,
00:30:30 you heard burglar alarm going off, you are heading home, thinking that there’s a burglar got in. But
00:30:36 while driving home, if you hear on the radio that there was an earthquake in the vicinity,
00:30:41 now your strength of evidence for a burglar getting into their house is diminished. Because
00:30:49 now that piece of evidence is explained by the earthquake being present. So if you think about
00:30:56 these two causes explaining at lower level variable, which is alarm, now, what we’re seeing
00:31:01 is that increasing the evidence for some cause, you know, there is evidence coming in from below
00:31:08 for alarm being present. And initially, it was flowing to a burglar being present. But now,
00:31:14 since there is side evidence for this other cause, it explains away this evidence and evidence will
00:31:20 now flow to the other cause. This is, you know, two competing causal things trying to explain
00:31:26 the same evidence. And the brain has a similar kind of mechanism for doing so. That’s kind of
00:31:31 interesting. And how’s that all encoded in the brain? Like, where’s the storage of information?
00:31:39 Are we talking just maybe to get it a little bit more specific? Is it in the hardware of the actual
00:31:46 connections? Is it in chemical communication? Is it electrical communication? Do we know?
00:31:53 So this is, you know, a paper that we are bringing out soon.
00:31:56 Which one is this?
00:31:57 This is the cortical microcircuits paper that I sent you a draft of. Of course, this is a lot of
00:32:03 this. A lot of it is still hypothesis. One hypothesis is that you can think of a cortical column
00:32:09 as encoding a concept. A concept, you know, think of it as an example of a concept. Is an edge
00:32:20 present or not? Or is an object present or not? Okay, so you can think of it as a binary variable,
00:32:27 a binary random variable. The presence of an edge or not, or the presence of an object or not.
00:32:32 So each cortical column can be thought of as representing that one concept, one variable.
00:32:38 And then the connections between these cortical columns are basically encoding the relationship
00:32:43 between these random variables. And then there are connections within the cortical column.
00:32:49 Each cortical column is implemented using multiple layers of neurons with very, very,
00:32:54 very rich structure there. You know, there are thousands of neurons in a cortical column.
00:33:00 But that structure is similar across the different cortical columns.
00:33:03 Correct. And also these cortical columns connect to a substructure called thalamus.
00:33:10 So all cortical columns pass through this substructure. So our hypothesis is that
00:33:17 the connections between the cortical columns implement this, you know, that’s where the
00:33:21 knowledge is stored about how these different concepts connect to each other. And then the
00:33:28 neurons inside this cortical column and in thalamus in combination implement this actual
00:33:35 computation for inference, which includes explaining away and competing between the
00:33:41 different hypotheses. And it is all very… So what is amazing is that neuroscientists have
00:33:49 actually done experiments to the tune of showing these things. They might not be putting it in the
00:33:55 overall inference framework, but they will show things like, if I poke this higher level neuron,
00:34:03 it will inhibit through this complicated loop through thalamus, it will inhibit this other
00:34:07 column. So they will do such experiments. But do they use terminology of concepts,
00:34:14 for example? So, I mean, is it something where it’s easy to anthropomorphize
00:34:22 and think about concepts like you started moving into logic based kind of reasoning systems. So
00:34:31 I would just think of concepts in that kind of way, or is it a lot messier, a lot more gray area,
00:34:40 you know, even more gray, even more messy than the artificial neural network kinds,
00:34:47 kinds of abstractions? Easiest way to think of it as a variable,
00:34:50 right? It’s a binary variable, which is showing the presence or absence of something.
00:34:55 So, but I guess what I’m asking is, is that something that we’re supposed to think of
00:35:01 something that’s human interpretable of that something?
00:35:04 It doesn’t need to be. It doesn’t need to be human interpretable. There’s no need for it to
00:35:07 be human interpretable. But it’s almost like you will be able to find some interpretation of it
00:35:17 because it is connected to the other things that you know about.
00:35:20 Yeah. And the point is it’s useful somehow.
00:35:23 Yeah. It’s useful as an entity in the graphic,
00:35:29 in connecting to the other entities that are, let’s call them concepts.
00:35:33 Right. Okay. So, by the way, are these the cortical microcircuits?
00:35:38 Correct. These are the cortical microcircuits. You know, that’s what neuroscientists use to
00:35:43 talk about the circuits within a level of the cortex. So, you can think of, you know,
00:35:49 let’s think of a neural network, artificial neural network terms. People talk about the
00:35:54 architecture of how many layers they build, what is the fan in, fan out, et cetera. That is the
00:36:01 macro architecture. And then within a layer of the neural network, the cortical neural network
00:36:11 is much more structured within a level. There’s a lot more intricate structure there. But even
00:36:18 within an artificial neural network, you can think of feature detection plus pooling as one
00:36:23 level. And so, that is kind of a microcircuit. It’s much more complex in the real brain. And so,
00:36:32 within a level, whatever is that circuitry within a column of the cortex and between the layers of
00:36:38 the cortex, that’s the microcircuitry. I love that terminology. Machine learning
00:36:43 people don’t use the circuit terminology. Right.
00:36:45 But they should. It’s nice. So, okay. Okay. So, that’s the cortical microcircuit. So,
00:36:53 what’s interesting about, what can we say, what is the paper that you’re working on
00:37:00 propose about the ideas around these cortical microcircuits?
00:37:04 So, this is a fully functional model for the microcircuits of the visual cortex.
00:37:10 So, the paper focuses on your idea and our discussion now is focusing on vision.
00:37:15 Yeah. The visual cortex. Okay. So,
00:37:18 this is a model. This is a full model. This is how vision works.
00:37:22 But this is a hypothesis. Okay. So, let me step back a bit. So, we looked at neuroscience for
00:37:32 insights on how to build a vision model. Right.
00:37:35 And we synthesized all those insights into a computational model. This is called the recursive
00:37:40 cortical network model that we used for breaking captures. And we are using the same model for
00:37:47 robotic picking and tracking of objects. And that, again, is a vision system.
00:37:52 That’s a vision system. Computer vision system.
00:37:54 That’s a computer vision system. Takes in images and outputs what?
00:37:59 On one side, it outputs the class of the image and also segments the image. And you can also ask it
00:38:06 further queries. Where is the edge of the object? Where is the interior of the object? So, it’s a
00:38:11 model that you build to answer multiple questions. So, you’re not trying to build a model for just
00:38:17 classification or just segmentation, et cetera. It’s a joint model that can do multiple things.
00:38:23 So, that’s the model that we built using insights from neuroscience. And some of those insights are
00:38:30 what is the role of feedback connections? What is the role of lateral connections? So,
00:38:34 all those things went into the model. The model actually uses feedback connections.
00:38:38 All these ideas from neuroscience. Yeah.
00:38:41 So, what the heck is a recursive cortical network? What are the architecture approaches,
00:38:47 interesting aspects here, which is essentially a brain inspired approach to computer vision?
00:38:54 Yeah. So, there are multiple layers to this question. I can go from the very,
00:38:58 very top and then zoom in. Okay. So, one important thing, constraint that went into the model is that
00:39:05 you should not think vision, think of vision as something in isolation. We should not think
00:39:11 perception as something as a preprocessor for cognition. Perception and cognition are interconnected.
00:39:19 And so, you should not think of one problem in separation from the other problem. And so,
00:39:24 that means if you finally want to have a system that understand concepts about the world and can
00:39:30 learn a very conceptual model of the world and can reason and connect to language, all of those
00:39:36 things, you need to think all the way through and make sure that your perception system
00:39:41 is compatible with your cognition system and language system and all of them.
00:39:45 And one aspect of that is top down controllability. What does that mean?
00:39:52 So, that means, you know, so think of, you know, you can close your eyes and think about
00:39:58 the details of one object, right? I can zoom in further and further. So, think of the bottle in
00:40:05 front of me, right? And now, you can think about, okay, what the cap of that bottle looks.
00:40:11 I know we can think about what’s the texture on that bottle of the cap. You know, you can think
00:40:18 about, you know, what will happen if something hits that. So, you can manipulate your visual
00:40:25 knowledge in cognition driven ways. Yes. And so, this top down controllability and being able to
00:40:35 simulate scenarios in the world. So, you’re not just a passive player in this perception game.
00:40:43 You can control it. You have imagination. Correct. Correct. So, basically, you know,
00:40:50 basically having a generative network, which is a model and it is not just some arbitrary
00:40:56 generative network. It has to be built in a way that it is controllable top down. It is not just
00:41:02 trying to generate a whole picture at once. You know, it’s not trying to generate photorealistic
00:41:07 things of the world. You know, you don’t have good photorealistic models of the world. Human
00:41:11 brains do not have. If I, for example, ask you the question, what is the color of the letter E
00:41:17 in the Google logo? You have no idea. Although, you have seen it millions of times, hundreds of
00:41:25 times. So, it’s not, our model is not photorealistic, but it has other properties that we can
00:41:32 manipulate it. And you can think about filling in a different color in that logo. You can think
00:41:37 about expanding the letter E. You know, you can see what, so you can imagine the consequence of,
00:41:44 you know, actions that you have never performed. So, these are the kind of characteristics the
00:41:49 generative model need to have. So, this is one constraint that went into our model. Like, you
00:41:52 know, so this is, when you read the, just the perception side of the paper, it is not obvious
00:41:57 that this was a constraint into the, that went into the model, this top down controllability
00:42:02 of the generative model. So, what does top down controllability in a model look like? It’s a
00:42:10 really interesting concept. Fascinating concept. What does that, is that the recursiveness gives
00:42:16 you that? Or how do you do it? Quite a few things. It’s like, what does the model factor,
00:42:22 factorize? You know, what are the, what is the model representing as different pieces in the
00:42:26 puzzle? Like, you know, so, so in the RCN network, it thinks of the world, you know, so what I said,
00:42:33 the background of an image is modeled separately from the foreground of the image. So,
00:42:39 the objects are separate from the background. They are different entities. So, there’s a kind
00:42:43 of segmentation that’s built in fundamentally. And then even that object is composed of parts.
00:42:49 And also, another one is the shape of the object is differently modeled from the texture of the
00:42:57 object. Got it. So, there’s like these, you know who Francois Chollet is? Yeah. So, there’s, he
00:43:08 developed this like IQ test type of thing for ARC challenge for, and it’s kind of cool that there’s
00:43:16 these concepts, priors that he defines that you bring to the table in order to be able to reason
00:43:22 about basic shapes and things in IQ test. So, here you’re making it quite explicit that here are the
00:43:30 things that you should be, these are like distinct things that you should be able to model in this.
00:43:36 Keep in mind that you can derive this from much more general principles. It doesn’t, you don’t
00:43:42 need to explicitly put it as, oh, objects versus foreground versus background, the surface versus
00:43:48 the structure. No, these are, these are derivable from more fundamental principles of how, you know,
00:43:55 what’s the property of continuity of natural signals. What’s the property of continuity of
00:44:01 natural signals? Yeah. By the way, that sounds very poetic, but yeah. So, you’re saying that’s a,
00:44:07 there’s some low level properties from which emerges the idea that shapes should be different
00:44:12 than like there should be a parts of an object. There should be, I mean, kind of like Francois,
00:44:18 I mean, there’s objectness, there’s all these things that it’s kind of crazy that we humans,
00:44:25 I guess, evolved to have because it’s useful for us to perceive the world. Yeah. Correct. And it
00:44:30 derives mostly from the properties of natural signals. And so, natural signals. So, natural
00:44:38 signals are the kind of things we’ll perceive in the natural world. Correct. I don’t know. I don’t
00:44:43 know why that sounds so beautiful. Natural signals. Yeah. As opposed to a QR code, right? Which is an
00:44:48 artificial signal that we created. Humans are not very good at classifying QR codes. We are very
00:44:52 good at saying something is a cat or a dog, but not very good at, you know, where computers are
00:44:58 very good at classifying QR codes. So, our visual system is tuned for natural signals. So,
00:45:05 it’s tuned for natural signals. And there are fundamental assumptions in the architecture
00:45:11 that are derived from natural signals properties. I wonder when you take hallucinogenic drugs,
00:45:18 does that go into natural or is that closer to the QR code? It’s still natural. It’s still natural?
00:45:25 Yeah. Because it is still operating using your brains. By the way, on that topic, I mean,
00:45:30 I haven’t been following. I think they’re becoming legalized and certain. I can’t wait
00:45:34 they become legalized to a degree that you, like, vision science researchers could study it.
00:45:40 Yeah. Just like through medical, chemical ways, modify. There could be ethical concerns, but
00:45:47 modify. That’s another way to study the brain is to be able to chemically modify it. There’s
00:45:53 probably very long a way to figure out how to do it ethically. Yeah, but I think there are studies
00:46:01 on that already. Yeah, I think so. Because it’s not unethical to give it to rats.
00:46:08 Oh, that’s true. That’s true. There’s a lot of drugged up rats out there. Okay, cool. Sorry.
00:46:15 Sorry. It’s okay. So, there’s these low level things from natural signals that…
00:46:23 …from which these properties will emerge. But it is still a very hard problem on how to encode
00:46:33 that. So, you mentioned the priors Francho wanted to encode in the abstract reasoning challenge,
00:46:44 but it is not straightforward how to encode those priors. So, some of those challenges,
00:46:50 like the object completion challenges are things that we purely use our visual system to do.
00:46:57 It looks like abstract reasoning, but it is purely an output of the vision system. For example,
00:47:03 completing the corners of that condenser triangle, completing the lines of that condenser triangle.
00:47:07 It’s purely a visual system property. There is no abstract reasoning involved. It uses all these
00:47:12 priors, but it is stored in our visual system in a particular way that is amenable to inference.
00:47:18 That is one of the things that we tackled in the… Basically saying, okay, these are the
00:47:25 prior knowledge which will be derived from the world, but then how is that prior knowledge
00:47:31 represented in the model such that inference when some piece of evidence comes in can be
00:47:38 done very efficiently and in a very distributed way? Because there are so many ways of representing
00:47:44 knowledge, which is not amenable to very quick inference, quick lookups. So that’s one core part
00:47:53 of what we tackled in the RCN model. How do you encode visual knowledge to do very quick inference?
00:48:02 Can you maybe comment on… So folks listening to this in general may be familiar with
00:48:08 different kinds of architectures of a neural networks.
00:48:10 What are we talking about with RCN? What does the architecture look like? What are the different
00:48:16 components? Is it close to neural networks? Is it far away from neural networks? What does it look
00:48:20 like? Yeah. So you can think of the Delta between the model and a convolutional neural network,
00:48:27 if people are familiar with convolutional neural networks. So convolutional neural networks have
00:48:31 this feed forward processing cascade, which is called feature detectors and pooling. And that
00:48:37 is repeated in a multi level system. And if you want an intuitive idea of what is happening,
00:48:46 feature detectors are detecting interesting co occurrences in the input. It can be a line,
00:48:53 a corner, an eye or a piece of texture, et cetera. And the pooling neurons are doing some local
00:49:03 transformation of that and making it invariant to local transformations. So this is what the
00:49:07 structure of convolutional neural network is. Recursive cortical network has a similar structure
00:49:14 when you look at just the feed forward pathway. But in addition to that, it is also structured
00:49:19 in a way that it is generative so that it can run it backward and combine the forward with the
00:49:25 backward. Another aspect that it has is it has lateral connections. So if you have an edge here
00:49:37 and an edge here, it has connections between these edges. It is not just feed forward connections.
00:49:42 It is something between these edges, which is the nodes representing these edges, which is to
00:49:49 enforce compatibility between them. So otherwise what will happen is that constraints. It’s a
00:49:53 constraint. It’s basically if you do just feature detection followed by pooling, then your
00:50:01 transformations in different parts of the visual field are not coordinated. And so you will create
00:50:07 a jagged, when you generate from the model, you will create jagged things and uncoordinated
00:50:14 transformations. So these lateral connections are enforcing the transformations.
00:50:20 Is the whole thing still differentiable?
00:50:22 No, it’s not. It’s not trained using backprop.
00:50:27 Okay. That’s really important. So there’s this feed forward, there’s feedback mechanisms.
00:50:33 There’s some interesting connectivity things. It’s still layered like multiple layers.
00:50:41 Okay. Very, very interesting. And yeah. Okay. So the interconnection between adjacent connections
00:50:48 across service constraints that keep the thing stable.
00:50:52 Correct.
00:50:53 Okay. So what else?
00:50:55 And then there’s this idea of doing inference. A neural network does not do inference on the fly.
00:51:03 So an example of why this inference is important is, you know, so one of the first applications
00:51:09 that we showed in the paper was to crack text based captures.
00:51:15 What are captures?
00:51:16 I mean, by the way, one of the most awesome, like the people don’t use this term anymore
00:51:21 as human computation, I think. I love this term. The guy who created captures,
00:51:26 I think came up with this term. I love it. Anyway. What are captures?
00:51:32 So captures are those things that you fill in when you’re, you know, if you’re
00:51:38 opening a new account in Google, they show you a picture, you know, usually
00:51:43 it used to be set of garbled letters that you have to kind of figure out what is that string
00:51:48 of characters and type it. And the reason captures exist is because, you know, Google or Twitter
00:51:56 do not want automatic creation of accounts. You can use a computer to create millions of accounts
00:52:03 and use that for nefarious purposes. So you want to make sure that to the extent possible,
00:52:10 the interaction that their system is having is with a human. So it’s a, it’s called a human
00:52:16 interaction proof. A capture is a human interaction proof. So, so this is a captures are by design,
00:52:23 things that are easy for humans to solve, but hard for computers.
00:52:27 Hard for robots.
00:52:28 Yeah. So, and text based captures was the one which is prevalent around 2014,
00:52:36 because at that time, text based captures were hard for computers to crack. Even now,
00:52:42 they are actually in the sense of an arbitrary text based capture will be unsolvable even now,
00:52:48 but with the techniques that we have developed, it can be, you know, you can quickly develop
00:52:52 a mechanism that solves the capture.
00:52:55 They’ve probably gotten a lot harder too. They’ve been getting clever and clever
00:53:00 generating these text captures. So, okay. So that was one of the things you’ve tested it on is these
00:53:06 kinds of captures in 2014, 15, that kind of stuff. So what, I mean, why, by the way, why captures?
00:53:15 Yeah. Even now, I would say capture is a very, very good challenge problem. If you want to
00:53:21 understand how human perception works, and if you want to build systems that work,
00:53:27 like the human brain, and I wouldn’t say capture is a solved problem. We have cracked the fundamental
00:53:32 defense of captures, but it is not solved in the way that humans solve it. So I can give an example.
00:53:40 I can take a five year old child who has just learned characters and show them any new capture
00:53:48 that we create. They will be able to solve it. I can show you, I can show you a picture of a
00:53:56 character. I can show you pretty much any new capture from any new website. You’ll be able to
00:54:02 solve it without getting any training examples from that particular style of capture.
00:54:06 You’re assuming I’m human. Yeah.
00:54:08 Yes. Yeah. That’s right. So if you are human, otherwise I will be able to figure that out
00:54:15 using this one. But this whole podcast is just a touring test, a long touring test. Anyway,
00:54:22 yeah. So humans can figure it out with very few examples. Or no training examples. No training
00:54:28 examples from that particular style of capture. So even now this is unreachable for the current
00:54:37 deep learning system. So basically there is no, I don’t think a system exists where you can
00:54:41 basically say, train on whatever you want. And then now say, hey, I will show you a new capture,
00:54:47 which I did not show you in the training setup. Will the system be able to solve it? It still
00:54:54 doesn’t exist. So that is the magic of human perception. And Doug Hofstadter put this very
00:55:01 beautifully in one of his talks. The central problem in AI is what is the letter A. If you
00:55:11 can build a system that reliably can detect all the variations of the letter A, you don’t even
00:55:17 know to go to the B and the C. Yeah. You don’t even know to go to the B and the C or the strings
00:55:23 of characters. And so that is the spirit with which we tackle that problem.
00:55:28 What does it mean by that? I mean, is it like without training examples, try to figure out
00:55:36 the fundamental elements that make up the letter A in all of its forms?
00:55:43 In all of its forms. A can be made with two humans standing, leaning against each other,
00:55:47 holding the hands. And it can be made of leaves.
00:55:52 Yeah. You might have to understand everything about this world in order to understand the
00:55:56 letter A. Yeah. Exactly.
00:55:57 So it’s common sense reasoning, essentially. Yeah.
00:56:00 Right. So to finally, to really solve, finally to say that you have solved capture,
00:56:07 you have to solve the whole problem.
00:56:08 Yeah. Okay. So how does this kind of the RCN architecture help us to do a better job of that
00:56:18 kind of thing? Yeah. So as I mentioned, one of the important things was being able to do inference,
00:56:24 being able to dynamically do inference.
00:56:28 Can you clarify what you mean? Because you said like neural networks don’t do inference.
00:56:33 Yeah. So what do you mean by inference in this context then?
00:56:35 So, okay. So in captures, what they do to confuse people is to make these characters crowd together.
00:56:43 Yes. Okay. And when you make the characters crowd together, what happens is that you will now start
00:56:48 seeing combinations of characters as some other new character or an existing character. So you
00:56:53 would put an R and N together. It will start looking like an M. And so locally, there is
00:57:02 very strong evidence for it being some incorrect character. But globally, the only explanation that
00:57:11 fits together is something that is different from what you can find locally. Yes. So this is
00:57:18 inference. You are basically taking local evidence and putting it in the global context and often
00:57:25 coming to a conclusion locally, which is conflicting with the local information.
00:57:29 So actually, so you mean inference like in the way it’s used when you talk about reasoning,
00:57:36 for example, as opposed to like inference, which is with artificial neural networks,
00:57:42 which is a single pass to the network. Okay. So like you’re basically doing some basic forms of
00:57:47 reasoning, like integration of like how local things fit into the global picture.
00:57:54 And things like explaining a way coming into this one, because you are explaining that piece
00:57:59 of evidence as something else, because globally, that’s the only thing that makes sense. So now
00:58:08 you can amortize this inference in a neural network. If you want to do this, you can brute
00:58:15 force it. You can just show it all combinations of things that you want your reasoning to work over.
00:58:23 And you can just train the help out of that neural network and it will look like it is doing inference
00:58:30 on the fly, but it is really just doing amortized inference. It is because you have shown it a lot
00:58:37 of these combinations during training time. So what you want to do is be able to do dynamic
00:58:43 inference rather than just being able to show all those combinations in the training time.
00:58:48 And that’s something we emphasized in the model. What does it mean, dynamic inference? Is that
00:58:54 that has to do with the feedback thing? Yes. Like what is dynamic? I’m trying to visualize what
00:59:00 dynamic inference would be in this case. Like what is it doing with the input? It’s shown the input
00:59:05 the first time. Yeah. And is like what’s changing over temporally? What’s the dynamics of this
00:59:13 inference process? So you can think of it as you have at the top of the model, the characters that
00:59:19 you are trained on. They are the causes that you are trying to explain the pixels using the
00:59:26 characters as the causes. The characters are the things that cause the pixels. Yeah. So there’s
00:59:33 this causality thing. So the reason you mentioned causality, I guess, is because there’s a temporal
00:59:38 aspect to this whole thing. In this particular case, the temporal aspect is not important.
00:59:43 It is more like when if I turn the character on, the pixels will turn on. Yeah, it will be after
00:59:50 this a little bit. Okay. So that is causality in the sense of like a logic causality, like
00:59:55 hence inference. Okay. The dynamics is that even though locally it will look like, okay, this is an
01:00:03 A. And locally, just when I look at just that patch of the image, it looks like an A. But when I look
01:00:11 at it in the context of all the other causes, A is not something that makes sense. So that is
01:00:17 something you have to kind of recursively figure out. Yeah. So, okay. And this thing performed
01:00:24 pretty well on the CAPTCHAs. Correct. And I mean, is there some kind of interesting intuition you
01:00:32 can provide why it did well? Like what did it look like? Is there visualizations that could be human
01:00:37 interpretable to us humans? Yes. Yeah. So the good thing about the model is that it is extremely,
01:00:44 so it is not just doing a classification, right? It is providing a full explanation for the scene.
01:00:50 So when it operates on a scene, it is coming back and saying, look, this is the part is the A,
01:00:59 and these are the pixels that turned on. These are the pixels in the input that makes me think that
01:01:06 it is an A. And also, these are the portions I hallucinated. It provides a complete explanation
01:01:14 of that form. And then these are the contours. This is the interior. And this is in front of
01:01:21 this other object. So that’s the kind of explanation the inference network provides.
01:01:28 So that is useful and interpretable. And then the kind of errors it makes are also,
01:01:40 I don’t want to read too much into it, but the kind of errors the network makes are very similar
01:01:47 to the kinds of errors humans would make in a similar situation. So there’s something about
01:01:51 the structure that feels reminiscent of the way humans visual system works. Well, I mean,
01:02:00 how hardcoded is this to the capture problem, this idea?
01:02:04 Not really hardcoded because the assumptions, as I mentioned, are general, right? It is more,
01:02:11 and those themselves can be applied in many situations which are natural signals. So it’s
01:02:17 the foreground versus background factorization and the factorization of the surfaces versus
01:02:24 the contours. So these are all generally applicable assumptions.
01:02:27 In all vision. So why attack the capture problem, which is quite unique in the computer vision
01:02:36 context versus like the traditional benchmarks of ImageNet and all those kinds of image
01:02:42 classification or even segmentation tasks and all of that kind of stuff. What’s your thinking about
01:02:49 those kinds of benchmarks in this context? I mean, those benchmarks are useful for deep
01:02:55 learning kind of algorithms. So the settings that deep learning works in are here is my huge
01:03:03 training set and here is my test set. So the training set is almost 100x, 1000x bigger than
01:03:10 the test set in many, many cases. What we wanted to do was invert that. The training set is way
01:03:18 smaller than the test set. And capture is a problem that is by definition hard for computers
01:03:30 and it has these good properties of strong generalization, strong out of training distribution
01:03:36 generalization. If you are interested in studying that and having your model have that property,
01:03:44 then it’s a good data set to tackle. So have you attempted to, which I think,
01:03:49 I believe there’s quite a growing body of work on looking at MNIST and ImageNet without training.
01:03:58 So it’s like taking the basic challenge is what tiny fraction of the training set can we take in
01:04:05 order to do a reasonable job of the classification task? Have you explored that angle in these
01:04:13 classic benchmarks? Yes. So we did do MNIST. So it’s not just capture. So there was also
01:04:23 multiple versions of MNIST, including the standard version where we inverted the problem,
01:04:28 which is basically saying rather than train on 60,000 training data, how quickly can you get
01:04:37 to high level accuracy with very little training data? Is there some performance you remember,
01:04:42 like how well did it do? How many examples did it need? Yeah. I remember that it was
01:04:50 on the order of tens or hundreds of examples to get into 95% accuracy. And it was definitely
01:05:00 better than the other systems out there at that time.
01:05:03 At that time. Yeah. They’re really pushing. I think that’s a really interesting space,
01:05:07 actually. I think there’s an actual name for MNIST. There’s different names to the different
01:05:17 sizes of training sets. I mean, people are like attacking this problem. I think it’s
01:05:21 super interesting. It’s funny how like the MNIST will probably be with us all the way to AGI.
01:05:29 It’s a data set that just sticks by. It’s a clean, simple data set to study the fundamentals of
01:05:37 learning with just like captures. It’s interesting. Not enough people. I don’t know. Maybe you can
01:05:43 correct me, but I feel like captures don’t show up as often in papers as they probably should.
01:05:48 That’s correct. Yeah. Because usually these things have a momentum. Once something gets
01:05:56 established as a standard benchmark, there is a dynamics of how graduate students operate and how
01:06:06 academic system works that pushes people to track that benchmark.
01:06:10 Yeah. Nobody wants to think outside the box. Okay. Okay. So good performance on the captures.
01:06:20 What else is there interesting on the RCN side before we talk about the cortical micros?
01:06:25 Yeah. So the same model. So the important part of the model was that it trains very
01:06:31 quickly with very little training data and it’s quite robust to out of distribution
01:06:37 perturbations. And we are using that very fruitfully at Vicarious in many of the
01:06:45 robotics tasks we are solving. Well, let me ask you this kind of touchy question. I have to,
01:06:51 I’ve spoken with your friend, colleague, Jeff Hawkins, too. I have to kind of ask,
01:06:59 there is a bit of, whenever you have brain inspired stuff and you make big claims,
01:07:05 big sexy claims, there’s critics, I mean, machine learning subreddit, don’t get me started on those
01:07:14 people. Criticism is good, but they’re a bit over the top. There is quite a bit of sort of
01:07:23 skepticism and criticism. Is this work really as good as it promises to be? Do you have thoughts
01:07:31 on that kind of skepticism? Do you have comments on the kind of criticism I might have received
01:07:36 about, you know, is this approach legit? Is this a promising approach? Or at least as promising as
01:07:44 it seems to be, you know, advertised as? Yeah, I can comment on it. So, you know, our RCN paper
01:07:52 is published in Science, which I would argue is a very high quality journal, very hard to publish
01:07:58 in. And, you know, usually it is indicative of the quality of the work. And I am very,
01:08:08 very certain that the ideas that we brought together in that paper, in terms of the importance
01:08:13 of feedback connections, recursive inference, lateral connections, coming to best explanation
01:08:20 of the scene as the problem to solve, trying to solve recognition, segmentation, all jointly,
01:08:27 in a way that is compatible with higher level cognition, top down attention, all those ideas
01:08:31 that we brought together into something, you know, coherent and workable in the world and
01:08:36 solving a challenging, tackling a challenging problem. I think that will stay and that
01:08:40 contribution I stand by. Now, I can tell you a story which is funny in the context of this. So,
01:08:49 if you read the abstract of the paper and, you know, the argument we are putting in, you know,
01:08:53 we are putting in, look, current deep learning systems take a lot of training data. They don’t
01:08:59 use these insights. And here is our new model, which is not a deep neural network. It’s a
01:09:03 graphical model. It does inference. This is how the paper is, right? Now, once the paper was
01:09:08 accepted and everything, it went to the press department in Science, you know, AAAS Science
01:09:14 Office. We didn’t do any press release when it was published. It went to the press department.
01:09:18 What was the press release that they wrote up? A new deep learning model.
01:09:24 Solves CAPTCHAs.
01:09:25 Solves CAPTCHAs. And so, you can see where was, you know, what was being hyped in that thing,
01:09:32 right? So, there is a dynamic in the community of, you know, so that especially happens when
01:09:42 there are lots of new people coming into the field and they get attracted to one thing.
01:09:46 And some people are trying to think different compared to that. So, there is some, I think
01:09:52 skepticism is science is important and it is, you know, very much required. But it’s also,
01:09:59 it’s not skepticism. Usually, it’s mostly bandwagon effect that is happening rather than.
01:10:05 Well, but that’s not even that. I mean, I’ll tell you what they react to, which is like,
01:10:09 I’m sensitive to as well. If you look at just companies, OpenAI, DeepMind, Vicarious, I mean,
01:10:16 they just, there’s a little bit of a race to the top and hype, right? It’s like, it doesn’t pay off
01:10:27 to be humble. So, like, and the press is just irresponsible often. They just, I mean, don’t
01:10:37 get me started on the state of journalism today. Like, it seems like the people who write articles
01:10:42 about these things, they literally have not even spent an hour on the Wikipedia article about what
01:10:49 is neural networks. Like, they haven’t like invested just even the language to laziness.
01:10:56 It’s like, robots beat humans. Like, they write this kind of stuff that just, and then of course,
01:11:06 the researchers are quite sensitive to that because it gets a lot of attention. They’re like,
01:11:11 why did this word get so much attention? That’s over the top and people get really sensitive.
01:11:18 The same kind of criticism with OpenAI did work with Rubik’s cube with the robot that people
01:11:24 criticized. Same with GPT2 and 3, they criticize. Same thing with DeepMinds with AlphaZero. I mean,
01:11:33 yeah, I’m sensitive to it. But, and of course, with your work, you mentioned deep learning, but
01:11:39 there’s something super sexy to the public about brain inspired. I mean, that immediately grabs
01:11:45 people’s imagination, not even like neural networks, but like really brain inspired, like
01:11:53 brain like neural networks. That seems really compelling to people and to me as well, to the
01:12:00 world as a narrative. And so people hook up, hook onto that. And sometimes the skepticism engine
01:12:10 turns on in the research community and they’re skeptical. But I think putting aside the ideas
01:12:17 of the actual performance and captures or performance in any data set. I mean, to me,
01:12:22 all these data sets are useless anyway. It’s nice to have them. But in the grand scheme of things,
01:12:28 they’re silly toy examples. The point is, is there intuition about the ideas, just like you
01:12:36 mentioned, bringing the ideas together in a unique way? Is there something there? Is there some value
01:12:42 there? And is it going to stand the test of time? And that’s the hope. That’s the hope.
01:12:46 Yes. My confidence there is very high. I don’t treat brain inspired as a marketing term.
01:12:53 I am looking into the details of biology and puzzling over those things and I am grappling
01:13:01 with those things. And so it is not a marketing term at all. You can use it as a marketing term
01:13:07 and people often use it and you can get combined with them. And when people don’t understand
01:13:13 how you’re approaching the problem, it is easy to be misunderstood and think of it as purely
01:13:20 marketing. But that’s not the way we are. So you really, I mean, as a scientist,
01:13:27 you believe that if we kind of just stick to really understanding the brain, that’s going to,
01:13:33 that’s the right, like you should constantly meditate on the, how does the brain do this?
01:13:39 Because that’s going to be really helpful for engineering and technology systems.
01:13:43 Yes. You need to, so I think it’s one input and it is helpful, but you should know when to deviate
01:13:51 from it too. So an example is convolutional neural networks, right? Convolution is not an
01:13:59 operation brain implements. The visual cortex is not convolutional. Visual cortex has local
01:14:06 receptive fields, local connectivity, but there is no translation invariance in the network weights
01:14:18 in the visual cortex. That is a computational trick, which is a very good engineering trick
01:14:24 that we use for sharing the training between the different nodes. And that trick will be with us
01:14:31 for some time. It will go away when we have robots with eyes and heads that move. And so then that
01:14:41 trick will go away. It will not be useful at that time. So the brain doesn’t have translational
01:14:49 invariance. It has the focal point, like it has a thing it focuses on. Correct. It has a phobia.
01:14:54 And because of the phobia, the receptive fields are not like the copying of the weights. Like the
01:15:01 weights in the center are very different from the weights in the periphery. Yes. At the periphery.
01:15:05 I mean, I did this, actually wrote a paper and just gotten a chance to really study peripheral
01:15:12 vision, which is a fascinating thing. Very under understood thing of what the brain, you know,
01:15:21 at every level the brain does with the periphery. It does some funky stuff. Yeah. So it’s another
01:15:28 kind of trick than convolutional. Like it does, it’s, you know, convolution in neural networks is
01:15:39 a trick for efficiency, is efficiency trick. And the brain does a whole nother kind of thing.
01:15:44 Correct. So you need to understand the principles or processing so that you can still apply
01:15:51 engineering tricks where you want it to. You don’t want to be slavishly mimicking all the things of
01:15:55 the brain. And so, yeah, so it should be one input. And I think it is extremely helpful,
01:16:02 but it should be the point of really understanding so that you know when to deviate from it.
01:16:06 So, okay. That’s really cool. That’s work from a few years ago. You did work in Umenta with Jeff
01:16:14 Hawkins with hierarchical temporal memory. How is your just, if you could give a brief history,
01:16:23 how is your view of the way the models of the brain changed over the past few years leading up
01:16:30 to now? Is there some interesting aspects where there was an adjustment to your understanding of
01:16:36 the brain or is it all just building on top of each other? In terms of the higher level ideas,
01:16:42 especially the ones Jeff wrote about in the book, if you blur out, right. Yeah. On intelligence.
01:16:47 Right. On intelligence. If you blur out the details and if you just zoom out and at the
01:16:52 higher level idea, things are, I would say, consistent with what he wrote about. But many
01:17:02 things will be consistent with that because it’s a blur. Deep learning systems are also
01:17:08 multi level, hierarchical, all of those things. But in terms of the detail, a lot of things are
01:17:16 different. And those details matter a lot. So one point of difference I had with Jeff was how to
01:17:28 approach, how much of biological plausibility and realism do you want in the learning algorithms?
01:17:36 So when I was there, this was almost 10 years ago now.
01:17:41 It flies when you’re having fun.
01:17:43 Yeah. I don’t know what Jeff thinks now, but 10 years ago, the difference was that
01:17:49 I did not want to be so constrained on saying my learning algorithms need to be
01:17:56 biologically plausible based on some filter of biological plausibility available at that time.
01:18:03 To me, that is a dangerous cut to make because we are discovering more and more things about
01:18:09 the brain all the time. New biophysical mechanisms, new channels are being discovered
01:18:14 all the time. So I don’t want to upfront kill off a learning algorithm just because we don’t
01:18:21 really understand the full biophysics or whatever of how the brain learns.
01:18:27 Exactly. Exactly.
01:18:29 Let me ask and I’m sorry to interrupt. What’s your sense? What’s our best understanding of
01:18:34 how the brain learns?
01:18:36 So things like backpropagation, credit assignment. So many of these algorithms have,
01:18:42 learning algorithms have things in common, right? It is a backpropagation is one way of
01:18:47 credit assignment. There is another algorithm called expectation maximization, which is,
01:18:52 you know, another weight adjustment algorithm.
01:18:55 But is it your sense the brain does something like this?
01:18:58 Has to. There is no way around it in the sense of saying that you do have to adjust the
01:19:04 connections.
01:19:06 So yeah, and you’re saying credit assignment, you have to reward the connections that were
01:19:09 useful in making a correct prediction and not, yeah, I guess what else, but yeah, it
01:19:14 doesn’t have to be differentiable.
01:19:16 Yeah, it doesn’t have to be differentiable. Yeah. But you have to have a, you know, you
01:19:22 have a model that you start with, you have data comes in and you have to have a way of
01:19:27 adjusting the model such that it better fits the data. So that is all of learning, right?
01:19:33 And some of them can be using backprop to do that. Some of it can be using, you know,
01:19:40 very local graph changes to do that.
01:19:45 That can be, you know, many of these learning algorithms have similar update properties
01:19:52 locally in terms of what the neurons need to do locally.
01:19:57 I wonder if small differences in learning algorithms can have huge differences in the
01:20:01 actual effect. So the dynamics of, I mean, sort of the reverse like spiking, like if
01:20:09 credit assignment is like a lightning versus like a rainstorm or something, like whether
01:20:18 there’s like a looping local type of situation with the credit assignment, whether there is
01:20:26 like regularization, like how it injects robustness into the whole thing, like whether
01:20:34 it’s chemical or electrical or mechanical. Yeah. All those kinds of things. I feel like
01:20:42 it, that, yeah, I feel like those differences could be essential, right? It could be. It’s
01:20:48 just that you don’t know enough to, on the learning side, you don’t know, you don’t know
01:20:54 enough to say that is definitely not the way the brain does it. Got it. So you don’t want
01:20:59 to be stuck to it. So that, yeah. So you’ve been open minded on that side of things.
01:21:04 On the inference side, on the recognition side, I am much more, I’m able to be constrained
01:21:09 because it’s much easier to do experiments because, you know, it’s like, okay, here’s
01:21:13 the stimulus, you know, how many steps did it get to take the answer? I can trace it
01:21:18 back. I can, I can understand the speed of that computation, et cetera. I’m able to do
01:21:23 of that computation, et cetera, much more readily on the inference side. Got it. And
01:21:28 then you can’t do good experiments on the learning side. Correct. So let’s go right
01:21:34 into the cortical microcircuits right back. So what are these ideas beyond recursive cortical
01:21:42 network that you’re looking at now? So we have made a, you know, pass through multiple
01:21:48 of the steps that, you know, as I mentioned earlier, you know, we were looking at perception
01:21:54 from the angle of cognition, right? It was not just perception for perception’s sake.
01:21:58 How do you, how do you connect it to cognition? How do you learn concepts and how do you learn
01:22:04 abstract reasoning? Similar to some of the things Francois talked about, right? So we
01:22:13 have taken one pass through it basically saying, what is the basic cognitive architecture that
01:22:19 you need to have, which has a perceptual system, which has a system that learns dynamics of
01:22:25 the world and then has something like a routine program learning system on top of it to learn
01:22:32 concepts. So we have built one, you know, the version point one of that system. This
01:22:38 was another science robotics paper. It’s the title of that paper was, you know, something
01:22:44 like cognitive programs. How do you build cognitive programs? And the application there
01:22:49 was on manipulation, robotic manipulation? It was, so think of it like this. Suppose
01:22:56 you wanted to tell a new person that you met, you don’t know the language that person uses.
01:23:04 You want to communicate to that person to achieve some task, right? So I want to say,
01:23:10 hey, you need to pick up all the red cups from the kitchen counter and put it here, right?
01:23:17 How do you communicate that, right? You can show pictures. You can basically say, look,
01:23:21 this is the starting state. The things are here. This is the ending state. And what does
01:23:28 the person need to understand from that? The person needs to understand what conceptually
01:23:32 happened in those pictures from the input to the output, right? So we are looking at
01:23:39 preverbal conceptual understanding. Without language, how do you have a set of concepts
01:23:45 that you can manipulate in your head? And from a set of images of input and output,
01:23:52 can you infer what is happening in those images?
01:23:55 Got it. With concepts that are pre language. Okay. So what’s it mean for a concept to be pre language?
01:24:02 Like why is language so important here?
01:24:10 So I want to make a distinction between concepts that are just learned from text
01:24:17 by just feeding brute force text. You can start extracting things like, okay,
01:24:23 a cow is likely to be on grass. So those kinds of things, you can extract purely from text.
01:24:32 But that’s kind of a simple association thing rather than a concept as an abstraction of
01:24:37 something that happens in the real world in a grounded way that I can simulate it in my
01:24:44 mind and connect it back to the real world. And you think kind of the visual world,
01:24:51 concepts in the visual world are somehow lower level than just the language?
01:24:58 The lower level kind of makes it feel like, okay, that’s unimportant. It’s more like,
01:25:04 I would say the concepts in the visual and the motor system and the concept learning system,
01:25:15 which if you cut off the language part, just what we learn by interacting with the world
01:25:20 and abstractions from that, that is a prerequisite for any real language understanding.
01:25:26 So you disagree with Chomsky because he says language is at the bottom of everything.
01:25:32 No, I disagree with Chomsky completely on how many levels from universal grammar to…
01:25:39 So that was a paper in science beyond the recursive cortical network.
01:25:43 What other interesting problems are there, the open problems and brain inspired approaches
01:25:50 that you’re thinking about?
01:25:51 I mean, everything is open, right? No problem is solved, solved. I think of perception as kind of
01:26:02 the first thing that you have to build, but the last thing that you will be actually solved.
01:26:07 Because if you do not build perception system in the right way, you cannot build concept system in
01:26:12 the right way. So you have to build a perception system, however wrong that might be, you have to
01:26:18 still build that and learn concepts from there and then keep iterating. And finally, perception
01:26:24 will get solved fully when perception, cognition, language, all those things work together finally.
01:26:30 So great, we’ve talked a lot about perception, but then maybe on the concept side and like common
01:26:37 sense or just general reasoning side, is there some intuition you can draw from the brain about
01:26:45 how we can do that?
01:26:46 So I have this classic example I give. So suppose I give you a few sentences and then ask you a
01:26:56 question following that sentence. This is a natural language processing problem, right? So here
01:27:01 it goes. I’m telling you, Sally pounded a nail on the ceiling. Okay, that’s a sentence. Now I’m
01:27:10 asking you a question. Was the nail horizontal or vertical?
01:27:14 Vertical.
01:27:15 Okay, how did you answer that?
01:27:16 Well, I imagined Sally, it was kind of hard to imagine what the hell she was doing, but I
01:27:24 imagined I had a visual of the whole situation.
01:27:28 Exactly, exactly. So here, you know, I post a question in natural language. The answer to
01:27:34 that question was you got the answer from actually simulating the scene. Now I can go more and more
01:27:40 detailed about, okay, was Sally standing on something while doing this? Could she have been
01:27:47 standing on a light bulb to do this? I could ask more and more questions about this and I can ask,
01:27:53 make you simulate the scene in more and more detail, right? Where is all that knowledge that
01:27:59 you’re accessing stored? It is not in your language system. It was not just by reading
01:28:05 text, you got that knowledge. It is stored from the everyday experiences that you have had from,
01:28:12 and by the age of five, you have pretty much all of this, right? And it is stored in your visual
01:28:18 system, motor system in a way such that it can be accessed through language.
01:28:24 Got it. I mean, right. So the language is just almost sort of the query into the whole visual
01:28:30 cortex and that does the whole feedback thing. But I mean, it is all reasoning kind of connected to
01:28:36 the perception system in some way. You can do a lot of it. You know, you can still do a lot of it
01:28:43 by quick associations without having to go into the depth. And most of the time you will be right,
01:28:49 right? You can just do quick associations, but I can easily create tricky situations for you.
01:28:55 Where that quick associations is wrong and you have to actually run the simulation.
01:29:00 So figuring out how these concepts connect. Do I have a good idea of how to do that?
01:29:06 That’s exactly one of the problems that we are working on. And the way we are approaching that
01:29:13 is basically saying, okay, you need to, so the takeaway is that language,
01:29:20 is simulation control and your perceptual plus a motor system is building a simulation of the world.
01:29:28 And so that’s basically the way we are approaching it. And the first thing that we built was a
01:29:34 controllable perceptual system. And we built a schema networks, which was a controllable dynamic
01:29:40 system. Then we built a concept learning system that puts all these things together
01:29:44 into programs or subtractions that you can run and simulate. And now we are taking the step
01:29:51 of connecting it to language. And it will be very simple examples. Initially, it will not be
01:29:57 the GPT3 like examples, but it will be grounded simulation based language.
01:30:02 And for like the querying would be like question answering kind of thing?
01:30:08 Correct. Correct. And so that’s what we’re trying to do. We’re trying to build a system
01:30:13 kind of thing. Correct. Correct. And it will be in some simple world initially on, you know,
01:30:19 but it will be about, okay, can the system connect the language and ground it in the right way and
01:30:25 run the right simulations to come up with the answer. And the goal is to try to do things that,
01:30:29 for example, GPT3 couldn’t do. Correct. Speaking of which, if we could talk about GPT3 a little
01:30:38 bit, I think it’s an interesting thought provoking set of ideas that OpenAI is pushing forward. I
01:30:46 think it’s good for us to talk about the limits and the possibilities in the neural network. So
01:30:51 in general, what are your thoughts about this recently released very large 175 billion parameter
01:30:58 language model? So I haven’t directly evaluated it yet. From what I have seen on Twitter and
01:31:05 other people evaluating it, it looks very intriguing. I am very intrigued by some of
01:31:09 the properties it is displaying. And of course the text generation part of that was already
01:31:17 evident in GPT2 that it can generate coherent text over long distances. But of course the
01:31:26 weaknesses are also pretty visible in saying that, okay, it is not really carrying a world state
01:31:32 around. And sometimes you get sentences like, I went up the hill to reach the valley or the thing
01:31:39 like some completely incompatible statements, or when you’re traveling from one place to the other,
01:31:46 it doesn’t take into account the time of travel, things like that. So those things I think will
01:31:50 happen less in GPT3 because it is trained on even more data and it can do even more longer distance
01:31:59 coherence. But it will still have the fundamental limitations that it doesn’t have a world model
01:32:07 and it can’t run simulations in its head to find whether something is true in the world or not.
01:32:13 So it’s taking a huge amount of text from the internet and forming a compressed representation.
01:32:20 Do you think in that could emerge something that’s an approximation of a world model,
01:32:27 which essentially could be used for reasoning? I’m not talking about GPT3, I’m talking about GPT4,
01:32:35 5 and GPT10. Yeah, I mean they will look more impressive than GPT3. So if you take that to
01:32:42 the extreme then a Markov chain of just first order and if you go to, I’m taking the other
01:32:51 extreme, if you read Shannon’s book, he has a model of English text which is based on first
01:32:59 order Markov chains, second order Markov chains, third order Markov chains and saying that okay,
01:33:03 third order Markov chains look better than first order Markov chains. So does that mean a first
01:33:09 order Markov chain has a model of the world? Yes, it does. So yes, in that level when you go higher
01:33:18 order models or more sophisticated structure in the model like the transformer networks have,
01:33:24 yes they have a model of the text world, but that is not a model of the world. It’s a model
01:33:32 of the text world and it will have interesting properties and it will be useful, but just scaling
01:33:41 it up is not going to give us AGI or natural language understanding or meaning. Well the
01:33:49 question is whether being forced to compress a very large amount of text forces you to construct
01:33:58 things that are very much like, because the ideas of concepts and meaning is a spectrum.
01:34:06 Sure, yeah. So in order to form that kind of compression,
01:34:13 maybe it will be forced to figure out abstractions which look awfully a lot like the kind of things
01:34:24 that we think about as concepts, as world models, as common sense. Is that possible?
01:34:31 No, I don’t think it is possible because the information is not there.
01:34:34 The information is there behind the text, right?
01:34:38 No, unless somebody has written down all the details about how everything works in the world
01:34:44 to the absurd amounts like, okay, it is easier to walk forward than backward, that you have to open
01:34:51 the door to go out of the thing, doctors wear underwear. Unless all these things somebody
01:34:56 has written down somewhere or somehow the program found it to be useful for compression from some
01:35:01 other text, the information is not there. So that’s an argument that text is a lot
01:35:07 lower fidelity than the experience of our physical world.
01:35:13 Right, correct. Pictures worth a thousand words.
01:35:17 Well, in this case, pictures aren’t really… So the richest aspect of the physical world isn’t
01:35:24 even just pictures, it’s the interactivity with the world.
01:35:28 Exactly, yeah.
01:35:29 It’s being able to interact. It’s almost like…
01:35:36 It’s almost like if you could interact… Well, maybe I agree with you that pictures
01:35:42 worth a thousand words, but a thousand…
01:35:45 It’s still… Yeah, you could capture it with the GPTX.
01:35:49 So I wonder if there’s some interactive element where a system could live in text world where it
01:35:54 could be part of the chat, be part of talking to people. It’s interesting. I mean, fundamentally…
01:36:03 So you’re making a statement about the limitation of text. Okay, so let’s say we have a text
01:36:10 corpus that includes basically every experience we could possibly have. I mean, just a very large
01:36:19 corpus of text and also interactive components. I guess the question is whether the neural network
01:36:25 architecture, these very simple transformers, but if they had like hundreds of trillions or
01:36:33 whatever comes after a trillion parameters, whether that could store the information
01:36:42 needed, that’s architecturally. Do you have thoughts about the limitation on that side of
01:36:46 things with neural networks? I mean, so transformers are still a feed forward neural
01:36:52 network. It has a very interesting architecture, which is good for text modeling and probably some
01:36:59 aspects of video modeling, but it is still a feed forward architecture. You believe in the
01:37:04 feedback mechanism, the recursion. Oh, and also causality, being able to do counterfactual
01:37:11 reasoning, being able to do interventions, which is actions in the world. So all those things
01:37:20 require different kinds of models to be built. I don’t think transformers captures that family. It
01:37:28 is very good at statistical modeling of text and it will become better and better with more data,
01:37:35 bigger models, but that is only going to get so far. So I had this joke on Twitter saying that,
01:37:44 hey, this is a model that has read all of quantum mechanics and theory of relativity and we are
01:37:51 asking you to do text completion or we are asking you to solve simple puzzles. When you have AGI,
01:37:59 that is not what you ask the system to do. We will ask the system to do experiments and come
01:38:08 up with hypothesis and revise the hypothesis based on evidence from experiments, all those things.
01:38:13 Those are the things that we want the system to do when we have AGI, not solve simple puzzles.
01:38:20 Like impressive demos, somebody generating a red button in HTML.
01:38:24 Right, which are all useful. There is no dissing the usefulness of it.
01:38:29 So by the way, I am playing a little bit of a devil’s advocate, so calm down internet.
01:38:37 So I am curious almost in which ways will a dumb but large neural network will surprise us.
01:38:47 I completely agree with your intuition. It is just that I do not want to dogmatically
01:38:58 100% put all the chips there. We have been surprised so much. Even the current GPT2 and
01:39:06 GPT3 are so surprising. The self play mechanisms of AlphaZero are really surprising. The fact that
01:39:18 reinforcement learning works at all to me is really surprising. The fact that neural networks work at
01:39:23 all is quite surprising given how nonlinear the space is, the fact that it is able to find local
01:39:30 minima that are at all reasonable. It is very surprising. I wonder sometimes whether us humans
01:39:39 just want for AGI not to be such a dumb thing. Because exactly what you are saying is like
01:39:52 the ideas of concepts and be able to reason with those concepts and connect those concepts in
01:39:57 hierarchical ways and then to be able to have world models. Just everything we are describing
01:40:05 in human language in this poetic way seems to make sense. That is what intelligence and reasoning
01:40:11 are like. I wonder if at the core of it, it could be much dumber. Well, finally it is still
01:40:17 connections and messages passing over. So in that way it is dumb. So I guess the recursion,
01:40:24 the feedback mechanism, that does seem to be a fundamental kind of thing.
01:40:32 The idea of concepts. Also memory. Correct. Having an episodic memory. That seems to be
01:40:39 an important thing. So how do we get memory? So we have another piece of work which came
01:40:45 out recently on how do you form episodic memories and form abstractions from them.
01:40:52 And we haven’t figured out all the connections of that to the overall cognitive architecture.
01:40:57 But what are your ideas about how you could have episodic memory? So at least it is very clear
01:41:04 that you need to have two kinds of memory. That is very, very clear. There are things that happen
01:41:13 as statistical patterns in the world, but then there is the one timeline of things that happen
01:41:19 only once in your life. And this day is not going to happen ever again. And that needs to be stored
01:41:27 as just a stream of strings. This is my experience. And then the question is about
01:41:36 how do you take that experience and connect it to the statistical part of it? How do you
01:41:40 now say that, okay, I experienced this thing. Now I want to be careful about similar situations.
01:41:47 So you need to be able to index that similarity using your other giants that is the model of the
01:41:57 world that you have learned. Although the situation came from the episode, you need to be able to
01:42:02 index the other one. So the episodic memory being implemented as an indexing over the other model
01:42:13 that you’re building. So the memories remain and they’re indexed into the statistical thing
01:42:24 that you form. Yeah, statistical causal structural model that you built over time. So it’s basically
01:42:30 the idea is that the hippocampus is just storing or sequencing a set of pointers that happens over
01:42:41 time. And then whenever you want to reconstitute that memory and evaluate the different aspects of
01:42:48 it, whether it was good, bad, do I need to encounter the situation again? You need the cortex
01:42:55 to reinstantiate, to replay that memory. So how do you find that memory? Like which
01:43:00 direction is the important direction? Both directions are again, bidirectional.
01:43:05 I mean, I guess how do you retrieve the memory? So this is again, hypothesis. We’re making this
01:43:11 up. So when you come to a new situation, your cortex is doing inference over in the new situation.
01:43:21 And then of course, hippocampus is connected to different parts of the cortex and you have this
01:43:27 deja vu situation, right? Okay, I have seen this thing before. And then in the hippocampus, you can
01:43:35 have an index of, okay, this is when it happened as a timeline. And then you can use the hippocampus
01:43:44 to drive the similar timelines to say now I am, rather than being driven by my current input
01:43:52 stimuli, I am going back in time and rewinding my experience from there, putting back into the
01:43:58 cortex. And then putting it back into the cortex of course affects what you’re going to see next
01:44:03 in your current situation. Got it. Yeah. So that’s the whole thing, having a world model and then
01:44:09 yeah, connecting to the perception. Yeah, it does seem to be that that’s what’s happening. On the
01:44:16 neural network side, it’s interesting to think of how we actually do that. Yeah. To have a knowledge
01:44:24 base. Yes. It is possible that you can put many of these structures into neural networks and we will
01:44:31 find ways of combining properties of neural networks and graphical models. So, I mean,
01:44:39 it’s already started happening. Graph neural networks are kind of a merge between them.
01:44:43 Yeah. And there will be more of that thing. So, but to me it is, the direction is pretty clear,
01:44:51 looking at biology and the history of evolutionary history of intelligence, it is pretty clear that,
01:44:59 okay, what is needed is more structure in the models and modeling of the world and supporting
01:45:06 dynamic inference. Well, let me ask you, there’s a guy named Elon Musk, there’s a company called
01:45:13 Neuralink and there’s a general field called brain computer interfaces. Yeah. It’s kind of a
01:45:20 interface between your two loves. Yes. The brain and the intelligence. So there’s like
01:45:26 very direct applications of brain computer interfaces for people with different conditions,
01:45:32 more in the short term. Yeah. But there’s also these sci fi futuristic kinds of ideas of AI
01:45:38 systems being able to communicate in a high bandwidth way with the brain, bidirectional.
01:45:45 Yeah. What are your thoughts about Neuralink and BCI in general as a possibility? So I think BCI
01:45:53 is a cool research area. And in fact, when I got interested in brains initially, when I was
01:46:02 enrolled at Stanford and when I got interested in brains, it was through a brain computer
01:46:07 interface talk that Krishna Shenoy gave. That’s when I even started thinking about the problem.
01:46:14 So it is definitely a fascinating research area and the applications are enormous. So there is a
01:46:21 science fiction scenario of brains directly communicating. Let’s keep that aside for the
01:46:26 time being. Even just the intermediate milestones that pursuing, which are very reasonable as far
01:46:32 as I can see, being able to control an external limb using direct connections from the brain
01:46:40 and being able to write things into the brain. So those are all good steps to take and they have
01:46:49 enormous applications. People losing limbs being able to control prosthetics, quadriplegics being
01:46:55 able to control something, and therapeutics. I also know about another company working in
01:47:01 the space called Paradromics. They’re based on a different electrode array, but trying to attack
01:47:09 some of the same problems. So I think it’s a very… Also surgery? Correct. Surgically implanted
01:47:14 electrodes. Yeah. So yeah, I think of it as a very, very promising field, especially when it is
01:47:22 helping people overcome some limitations. Now, at some point, of course, it will advance the level of
01:47:29 being able to communicate. How hard is that problem do you think? Let’s say we magically solve
01:47:37 what I think is a really hard problem of doing all of this safely. Yeah. So being able to connect
01:47:45 electrodes and not just thousands, but like millions to the brain. I think it’s very,
01:47:51 very hard because you also do not know what will happen to the brain with that in the sense of how
01:47:58 does the brain adapt to something like that? And as we were learning, the brain is quite,
01:48:04 in terms of neuroplasticity, is pretty malleable. Correct. So it’s going to adjust. Correct. So the
01:48:10 machine learning side, the computer side is going to adjust, and then the brain is going to adjust.
01:48:14 Exactly. And then what soup does this land us into? The kind of hallucinations you might get
01:48:20 from this that might be pretty intense. Just connecting to all of Wikipedia. It’s interesting
01:48:28 whether we need to be able to figure out the basic protocol of the brain’s communication schemes
01:48:34 in order to get them to the machine and the brain to talk. Because another possibility is the brain
01:48:41 actually just adjust to whatever the heck the computer is doing. Exactly. That’s the way I think
01:48:45 that I find that to be a more promising way. It’s basically saying, okay, attach electrodes
01:48:51 to some part of the cortex. Maybe if it is done from birth, the brain will adapt. It says that
01:48:58 that part is not damaged. It was not used for anything. These electrodes are attached there.
01:49:02 And now you train that part of the brain to do this high bandwidth communication between
01:49:09 something else. And if you do it like that, then it is brain adapting to… And of course,
01:49:15 your external system is designed so that it is adaptable. Just like we designed computers
01:49:21 or mouse, keyboard, all of them to be interacting with humans. So of course, that feedback system
01:49:28 is designed to be human compatible, but now it is not trying to record from all of the brain.
01:49:37 And now two systems trying to adapt to each other. It’s the brain adapting into one way.
01:49:44 That’s fascinating. The brain is connected to the internet. Just imagine just connecting it
01:49:51 to Twitter and just taking that stream of information. Yeah. But again, if we take a
01:49:59 step back, I don’t know what your intuition is. I feel like that is not as hard of a problem as the
01:50:08 doing it safely. There’s a huge barrier to surgery because the biological system, it’s a mush of
01:50:19 like weird stuff. So that the surgery part of it, biology part of it, the longterm repercussions
01:50:26 part of it. I don’t know what else will… We often find after a long time in biology that,
01:50:35 okay, that idea was wrong. So people used to cut off the gland called the thymus or something.
01:50:43 And then they found that, oh no, that actually causes cancer.
01:50:50 And then there’s a subtle like millions of variables involved. But this whole process,
01:50:55 the nice thing, just like again with Elon, just like colonizing Mars, seems like a ridiculously
01:51:02 difficult idea. But in the process of doing it, we might learn a lot about the biology of the
01:51:08 neurobiology of the brain, the neuroscience side of things. It’s like, if you want to learn
01:51:13 something, do the most difficult version of it and see what you learn. The intermediate steps
01:51:19 that they are taking sounded all very reasonable to me. It’s great. Well, but like everything with
01:51:25 Elon is the timeline seems insanely fast. So that’s the only awful question. Well,
01:51:34 we’ve been talking about cognition a little bit. So like reasoning,
01:51:38 we haven’t mentioned the other C word, which is consciousness. Do you ever think about that one?
01:51:43 Is that useful at all in this whole context of what it takes to create an intelligent reasoning
01:51:51 being? Or is that completely outside of your, like the engineering perspective of intelligence?
01:51:58 It is not outside the realm, but it doesn’t on a day to day basis inform what we do,
01:52:05 but it’s more, so in many ways, the company name is connected to this idea of consciousness.
01:52:12 What’s the company name? Vicarious. So Vicarious is the company name. And so what does Vicarious
01:52:19 mean? At the first level, it is about modeling the world and it is internalizing the external actions.
01:52:29 So you interact with the world and learn a lot about the world. And now after having learned
01:52:34 a lot about the world, you can run those things in your mind without actually having to act
01:52:42 in the world. So you can run things vicariously just in your brain. And similarly, you can
01:52:48 experience another person’s thoughts by having a model of how that person works
01:52:54 and running there, putting yourself in some other person’s shoes. So that is being vicarious.
01:53:01 Now it’s the same modeling apparatus that you’re using to model the external world
01:53:06 or some other person’s thoughts. You can turn it to yourself. If that same modeling thing is
01:53:14 applied to your own modeling apparatus, then that is what gives rise to consciousness, I think.
01:53:21 Well, that’s more like self awareness. There’s the hard problem of consciousness, which is
01:53:25 when the model feels like something, when this whole process is like you really are in it.
01:53:37 You feel like an entity in this world. Not just you know that you’re an entity, but it feels like
01:53:43 something to be that entity. And thereby, we attribute this. Then it starts to be where
01:53:54 something that has consciousness can suffer. You start to have these kinds of things that we can
01:53:59 reason about that is much heavier. It seems like there’s much greater cost to your decisions.
01:54:09 And mortality is tied up into that. The fact that these things end. First of all, I end at some
01:54:18 point, and then other things end. That somehow seems to be, at least for us humans, a deep
01:54:27 motivator. That idea of motivation in general, we talk about goals in AI, but goals aren’t quite
01:54:38 the same thing as our mortality. It feels like, first of all, humans don’t have a goal, and they
01:54:46 just kind of create goals at different levels. They make up goals because we’re terrified by
01:54:54 the mystery of the thing that gets us all. We make these goals up. We’re like a goal generation
01:55:02 machine, as opposed to a machine which optimizes the trajectory towards a singular goal. It feels
01:55:10 like that’s an important part of cognition, that whole mortality thing. Well, it is a part of human
01:55:18 cognition, but there is no reason for that mortality to come to the equation for an artificial
01:55:30 system, because we can copy the artificial system. The problem with humans is that I can’t clone
01:55:36 you. Even if I clone you as the hardware, your experience that was stored in your brain,
01:55:45 your episodic memory, all those will not be captured in the new clone. But that’s not the
01:55:52 same with an AI system. But it’s also possible that the thing that you mentioned with us humans
01:56:02 is actually of fundamental importance for intelligence. The fact that you can copy an AI
01:56:07 system means that that AI system is not yet an AGI. If you look at existence proof, if we reason
01:56:18 based on existence proof, you could say that it doesn’t feel like death is a fundamental property
01:56:24 of an intelligent system. But we don’t yet. Give me an example of an immortal intelligent being.
01:56:33 We don’t have those. It’s very possible that that is a fundamental property of intelligence,
01:56:42 is a thing that has a deadline for itself. Well, you can think of it like this. Suppose you invent
01:56:49 a way to freeze people for a long time. It’s not dying. So you can be frozen and woken up
01:56:58 thousands of years from now. So it’s no fear of death. Well, no, it’s not about time. It’s about
01:57:08 the knowledge that it’s temporary. And that aspect of it, the finiteness of it, I think
01:57:17 creates a kind of urgency. Correct. For us, for humans. Yeah, for humans. Yes. And that is part
01:57:23 of our drives. And that’s why I’m not too worried about AI having motivations to kill all humans
01:57:35 and those kinds of things. Why? Just wait. So why do you need to do that? I’ve never heard that
01:57:43 before. That’s a good point. Yeah, just murder seems like a lot of work. Let’s just wait it out.
01:57:52 They’ll probably hurt themselves. Let me ask you, people often kind of wonder, world class researchers
01:58:01 such as yourself, what kind of books, technical fiction, philosophical, had an impact on you and
01:58:10 your life and maybe ones you could possibly recommend that others read? Maybe if you have
01:58:17 three books that pop into mind. Yeah. So I definitely liked Judea Pearl’s book,
01:58:23 Probabilistic Reasoning and Intelligent Systems. It’s a very deep technical book. But what I liked
01:58:30 is that, so there are many places where you can learn about probabilistic graphical models from.
01:58:36 But throughout this book, Judea Pearl kind of sprinkles his philosophical observations and he
01:58:42 thinks about, connects us to how the brain thinks and attentions and resources, all those things. So
01:58:48 that whole thing makes it more interesting to read. He emphasizes the importance of causality.
01:58:54 So that was in his later book. So this was the first book, Probabilistic Reasoning and Intelligent
01:58:58 Systems. He mentions causality, but he hadn’t really sunk his teeth into causality. But he
01:59:05 really sunk his teeth into, how do you actually formalize it? And the second book,
01:59:11 Causality, the one in 2000, that one is really hard. So I would recommend that.
01:59:17 Yeah. So that looks at the mathematical, his model of…
01:59:22 Do calculus.
01:59:23 Do calculus. Yeah. It was pretty dense mathematically.
01:59:25 Right. The book of Y is definitely more enjoyable.
01:59:28 For sure.
01:59:29 Yeah. So I would recommend Probabilistic Reasoning and Intelligent Systems.
01:59:34 Another book I liked was one from Doug Hofstadter. This was a long time ago. He had a book,
01:59:41 I think it was called The Mind’s Eye. It was probably Hofstadter and Daniel Dennett together.
01:59:49 Yeah. And I actually was, I bought that book. It’s on my show. I haven’t read it yet,
01:59:54 but I couldn’t get an electronic version of it, which is annoying because you read everything on
02:00:00 Kindle. So you had to actually purchase the physical. It’s one of the only physical books
02:00:06 I have because anyway, a lot of people recommended it highly. So yeah.
02:00:11 And the third one I would definitely recommend reading is, this is not a technical book. It is
02:00:18 history. The name of the book, I think, is Bishop’s Boys. It’s about Wright brothers
02:00:25 and their path and how it was… There are multiple books on this topic and all of them
02:00:34 are great. It’s fascinating how flight was treated as an unsolvable problem. And also,
02:00:46 what aspects did people emphasize? People thought, oh, it is all about
02:00:51 just powerful engines. You just need to have powerful lightweight engines. And so some people
02:01:00 thought of it as, how far can we just throw the thing? Just throw it.
02:01:04 Like a catapult.
02:01:05 Yeah. So it’s very fascinating. And even after they made the invention,
02:01:11 people are not believing it.
02:01:13 Ah, the social aspect of it.
02:01:15 The social aspect. It’s very fascinating.
02:01:18 I mean, do you draw any parallels between birds fly? So there’s the natural approach to flight
02:01:28 and then there’s the engineered approach. Do you see the same kind of thing with the brain
02:01:33 and our trying to engineer intelligence?
02:01:37 Yeah. It’s a good analogy to have. Of course, all analogies have their limits.
02:01:43 So people in AI often use airplanes as an example of, hey, we didn’t learn anything from birds.
02:01:55 But the funny thing is that, and the saying is, airplanes don’t flap wings. This is what they
02:02:02 say. The funny thing and the ironic thing is that you don’t need to flap to fly is something
02:02:09 Wright brothers found by observing birds. So they have in their notebook, in some of these books,
02:02:18 they show their notebook drawings. They make detailed notes about buzzards just soaring over
02:02:26 thermals. And they basically say, look, flapping is not the important, propulsion is not the
02:02:31 important problem to solve here. We want to solve control. And once you solve control,
02:02:37 propulsion will fall into place. All of these are people, they realize this by observing birds.
02:02:44 Beautifully put. That’s actually brilliant because people do use that analogy a lot. I’m
02:02:49 going to have to remember that one. Do you have advice for people interested in artificial
02:02:54 intelligence like young folks today? I talk to undergraduate students all the time,
02:02:59 interested in neuroscience, interested in understanding how the brain works. Is there
02:03:03 advice you would give them about their career, maybe about their life in general?
02:03:09 Sure. I think every piece of advice should be taken with a pinch of salt, of course,
02:03:14 because each person is different, their motivations are different. But I can definitely
02:03:20 say if your goal is to understand the brain from the angle of wanting to build one, then
02:03:28 being an experimental neuroscientist might not be the way to go about it. A better way to pursue it
02:03:36 might be through computer science, electrical engineering, machine learning, and AI. And of
02:03:42 course, you have to study the neuroscience, but that you can do on your own. If you’re more
02:03:48 attracted by finding something intriguing about, discovering something intriguing about the brain,
02:03:53 then of course, it is better to be an experimentalist. So find that motivation,
02:03:58 what are you intrigued by? And of course, find your strengths too. Some people are very good
02:04:03 experimentalists and they enjoy doing that. And it’s interesting to see which department,
02:04:10 if you’re picking in terms of your education path, whether to go with like, at MIT, it’s
02:04:18 brain and computer, no, it’d be CS. Yeah. Brain and cognitive sciences, yeah. Or the CS side of
02:04:29 things. And actually the brain folks, the neuroscience folks are more and more now
02:04:34 embracing of learning TensorFlow and PyTorch, right? They see the power of trying to engineer
02:04:44 ideas that they get from the brain into, and then explore how those could be used to create
02:04:52 intelligent systems. So that might be the right department actually. Yeah. So this was a question
02:04:58 in one of the Redwood Neuroscience Institute workshops that Jeff Hawkins organized almost 10
02:05:06 years ago. This question was put to a panel, right? What should be the undergrad major you should
02:05:11 take if you want to understand the brain? And the majority opinion in that one was electrical
02:05:17 engineering. Interesting. Because, I mean, I’m a double undergrad, so I got lucky in that way.
02:05:25 But I think it does have some of the right ingredients because you learn about circuits.
02:05:30 You learn about how you can construct circuits to approach, do functions. You learn about
02:05:37 microprocessors. You learn information theory. You learn signal processing. You learn continuous
02:05:43 math. So in that way, it’s a good step. If you want to go to computer science or neuroscience,
02:05:50 it’s a good step. The downside, you’re more likely to be forced to use MATLAB.
02:05:56 You’re more likely to be forced to use MATLAB. So one of the interesting things about, I mean,
02:06:07 this is changing. The world is changing. But certain departments lagged on the programming
02:06:13 side of things, on developing good habits in terms of software engineering. But I think that’s more
02:06:19 and more changing. And students can take that into their own hands, like learn to program. I feel
02:06:26 like everybody should learn to program because it, like everyone in the sciences, because it
02:06:34 empowers, it puts the data at your fingertips. So you can organize it. You can find all kinds of
02:06:40 things in the data. And then you can also, for the appropriate sciences, build systems that,
02:06:46 like based on that. So like then engineer intelligent systems.
02:06:49 We already talked about mortality. So we hit a ridiculous point. But let me ask you,
02:07:04 one of the things about intelligence is it’s goal driven. And you study the brain. So the question
02:07:13 is like, what’s the goal that the brain is operating under? What’s the meaning of it all
02:07:17 for us humans in your view? What’s the meaning of life? The meaning of life is whatever you
02:07:23 construct out of it. It’s completely open. It’s open. So there’s nothing, like you mentioned,
02:07:31 you like constraints. So it’s wide open. Is there some useful aspect that you think about in terms
02:07:42 of like the openness of it and just the basic mechanisms of generating goals in studying
02:07:50 cognition in the brain that you think about? Or is it just about, because everything we’ve talked
02:07:56 about kind of the perception system is to understand the environment. That’s like to be
02:08:00 able to like not die, like not fall over and like be able to, you don’t think we need to
02:08:09 think about anything bigger than that. Yeah, I think so, because it’s basically being able to
02:08:16 understand the machinery of the world such that you can pursue whatever goals you want.
02:08:21 So the machinery of the world is really ultimately what we should be striving to understand. The
02:08:26 rest is just whatever the heck you want to do or whatever fun you have.
02:08:31 One who is culturally popular. I think that’s beautifully put. I don’t think there’s a better
02:08:42 way to end it. Dilip, I’m so honored that you show up here and waste your time with me. It’s
02:08:49 been an awesome conversation. Thanks so much for talking today. Oh, thank you so much. This was
02:08:54 so much more fun than I expected. Thank you. Thanks for listening to this conversation with
02:09:00 Dilip George. And thank you to our sponsors, Babbel, Raycon Earbuds, and Masterclass. Please
02:09:07 consider supporting this podcast by going to babbel.com and use code LEX, going to buyraycon.com
02:09:16 and signing up at masterclass.com. Click the links, get the discount. It really is the best
02:09:22 way to support this podcast. If you enjoy this thing, subscribe on YouTube, review the Five
02:09:27 Stars Napa podcast, support it on Patreon, or connect with me on Twitter at Lex Friedman,
02:09:33 spelled yes, without the E, just F R I D M A M. And now let me leave you with some words from Marcus
02:09:43 Aurelius. You have power over your mind, not outside events. Realize this and you will find
02:09:51 strength. Thank you for listening and hope to see you next time.