Marcus Hutter: Universal Artificial Intelligence, AIXI, and AGI #75

Transcript

00:00:00 The following is a conversation with Marcus Hutter,

00:00:03 senior research scientist at Google DeepMind.

00:00:06 Throughout his career of research,

00:00:08 including with Jürgen Schmidhuber and Shane Legge,

00:00:11 he has proposed a lot of interesting ideas

00:00:13 in and around the field of artificial general

00:00:16 intelligence, including the development of AICSI,

00:00:20 spelled AIXI model, which is a mathematical approach to AGI

00:00:25 that incorporates ideas of Kolmogorov complexity,

00:00:28 Solomonov induction, and reinforcement learning.

00:00:33 In 2006, Marcus launched the 50,000 Euro Hutter Prize

00:00:38 for lossless compression of human knowledge.

00:00:41 The idea behind this prize is that the ability

00:00:43 to compress well is closely related to intelligence.

00:00:47 This, to me, is a profound idea.

00:00:51 Specifically, if you can compress the first 100

00:00:54 megabytes or 1 gigabyte of Wikipedia

00:00:56 better than your predecessors, your compressor

00:00:59 likely has to also be smarter.

00:01:02 The intention of this prize is to encourage

00:01:04 the development of intelligent compressors as a path to AGI.

00:01:09 In conjunction with his podcast release just a few days ago,

00:01:13 Marcus announced a 10x increase in several aspects

00:01:16 of this prize, including the money, to 500,000 Euros.

00:01:22 The better your compressor works relative to the previous

00:01:25 winners, the higher fraction of that prize money

00:01:27 is awarded to you.

00:01:29 You can learn more about it if you Google simply Hutter Prize.

00:01:35 I’m a big fan of benchmarks for developing AI systems,

00:01:38 and the Hutter Prize may indeed be

00:01:39 one that will spark some good ideas for approaches that

00:01:43 will make progress on the path of developing AGI systems.

00:01:47 This is the Artificial Intelligence Podcast.

00:01:50 If you enjoy it, subscribe on YouTube,

00:01:52 give it five stars on Apple Podcast,

00:01:54 support it on Patreon, or simply connect with me on Twitter

00:01:58 at Lex Friedman, spelled F R I D M A N.

00:02:02 As usual, I’ll do one or two minutes of ads

00:02:04 now and never any ads in the middle

00:02:06 that can break the flow of the conversation.

00:02:09 I hope that works for you and doesn’t

00:02:11 hurt the listening experience.

00:02:13 This show is presented by Cash App, the number one finance

00:02:16 app in the App Store.

00:02:17 When you get it, use code LEX PODCAST.

00:02:21 Cash App lets you send money to friends,

00:02:23 buy Bitcoin, and invest in the stock market

00:02:26 with as little as $1.

00:02:27 Broker services are provided by Cash App Investing,

00:02:30 a subsidiary of Square, a member SIPC.

00:02:34 Since Cash App allows you to send and receive money

00:02:37 digitally, peer to peer, and security

00:02:39 in all digital transactions is very important.

00:02:42 Let me mention the PCI data security standard

00:02:45 that Cash App is compliant with.

00:02:48 I’m a big fan of standards for safety and security.

00:02:52 PCI DSS is a good example of that,

00:02:55 where a bunch of competitors got together

00:02:57 and agreed that there needs to be

00:02:59 a global standard around the security of transactions.

00:03:02 Now, we just need to do the same for autonomous vehicles

00:03:06 and AI systems in general.

00:03:08 So again, if you get Cash App from the App Store or Google

00:03:11 Play and use the code LEX PODCAST, you’ll get $10.

00:03:16 And Cash App will also donate $10 to FIRST,

00:03:19 one of my favorite organizations that

00:03:21 is helping to advance robotics and STEM education

00:03:24 for young people around the world.

00:03:27 And now, here’s my conversation with Markus Hutter.

00:03:32 Do you think of the universe as a computer

00:03:34 or maybe an information processing system?

00:03:37 Let’s go with a big question first.

00:03:39 Okay, with a big question first.

00:03:41 I think it’s a very interesting hypothesis or idea.

00:03:45 And I have a background in physics,

00:03:47 so I know a little bit about physical theories,

00:03:50 the standard model of particle physics

00:03:52 and general relativity theory.

00:03:54 And they are amazing and describe virtually everything

00:03:57 in the universe.

00:03:58 And they’re all in a sense, computable theories.

00:03:59 I mean, they’re very hard to compute.

00:04:01 And it’s very elegant, simple theories,

00:04:04 which describe virtually everything in the universe.

00:04:07 So there’s a strong indication that somehow

00:04:12 the universe is computable, but it’s a plausible hypothesis.

00:04:17 So what do you think, just like you said, general relativity,

00:04:21 quantum field theory, what do you think that

00:04:23 the laws of physics are so nice and beautiful

00:04:26 and simple and compressible?

00:04:29 Do you think our universe was designed,

00:04:32 is naturally this way?

00:04:34 Are we just focusing on the parts

00:04:36 that are especially compressible?

00:04:39 Are human minds just enjoy something about that simplicity?

00:04:42 And in fact, there’s other things

00:04:44 that are not so compressible.

00:04:46 I strongly believe and I’m pretty convinced

00:04:49 that the universe is inherently beautiful, elegant

00:04:52 and simple and described by these equations.

00:04:55 And we’re not just picking that.

00:04:57 I mean, if there were some phenomena

00:05:00 which cannot be neatly described,

00:05:02 scientists would try that.

00:05:04 And there’s biology, which is more messy,

00:05:06 but we understand that it’s an emergent phenomena

00:05:09 and it’s complex systems,

00:05:11 but they still follow the same rules

00:05:12 of quantum and electrodynamics.

00:05:14 All of chemistry follows that and we know that.

00:05:16 I mean, we cannot compute everything

00:05:18 because we have limited computational resources.

00:05:20 No, I think it’s not a bias of the humans,

00:05:22 but it’s objectively simple.

00:05:23 I mean, of course, you never know,

00:05:25 maybe there’s some corners very far out in the universe

00:05:28 or super, super tiny below the nucleus of atoms

00:05:32 or parallel universes which are not nice and simple,

00:05:38 but there’s no evidence for that.

00:05:40 And we should apply Occam’s razor

00:05:42 and choose the simplest three consistent with it.

00:05:45 But also it’s a little bit self referential.

00:05:48 So maybe a quick pause.

00:05:49 What is Occam’s razor?

00:05:50 So Occam’s razor says that you should not multiply entities

00:05:55 beyond necessity, which sort of,

00:05:58 if you translate it to proper English means,

00:06:01 and in the scientific context means

00:06:03 that if you have two theories or hypothesis or models,

00:06:06 which equally well describe the phenomenon,

00:06:09 your study or the data,

00:06:11 you should choose the more simple one.

00:06:13 So that’s just the principle or sort of,

00:06:16 that’s not like a provable law, perhaps.

00:06:20 Perhaps we’ll kind of discuss it and think about it,

00:06:23 but what’s the intuition of why the simpler answer

00:06:28 is the one that is likely to be more correct descriptor

00:06:33 of whatever we’re talking about?

00:06:35 I believe that Occam’s razor

00:06:36 is probably the most important principle in science.

00:06:40 I mean, of course we lead logical deduction

00:06:42 and we do experimental design,

00:06:44 but science is about finding, understanding the world,

00:06:49 finding models of the world.

00:06:51 And we can come up with crazy complex models,

00:06:53 which explain everything but predict nothing.

00:06:56 But the simple model seem to have predictive power

00:07:00 and it’s a valid question why?

00:07:03 And there are two answers to that.

00:07:06 You can just accept it.

00:07:07 That is the principle of science and we use this principle

00:07:10 and it seems to be successful.

00:07:12 We don’t know why, but it just happens to be.

00:07:15 Or you can try, find another principle

00:07:18 which explains Occam’s razor.

00:07:21 And if we start with the assumption

00:07:24 that the world is governed by simple rules,

00:07:27 then there’s a bias towards simplicity

00:07:31 and applying Occam’s razor is the mechanism

00:07:36 to finding these rules.

00:07:37 And actually in a more quantitative sense,

00:07:39 and we come back to that later in terms of somnolent reduction,

00:07:41 you can rigorously prove that.

00:07:43 You can assume that the world is simple,

00:07:45 then Occam’s razor is the best you can do

00:07:47 in a certain sense.

00:07:49 So I apologize for the romanticized question,

00:07:51 but why do you think, outside of its effectiveness,

00:07:56 why do you think we find simplicity

00:07:58 so appealing as human beings?

00:08:00 Why does E equals MC squared seem so beautiful to us humans?

00:08:05 I guess mostly, in general, many things

00:08:08 can be explained by an evolutionary argument.

00:08:12 And there’s some artifacts in humans

00:08:14 which are just artifacts and not evolutionary necessary.

00:08:18 But with this beauty and simplicity,

00:08:21 it’s, I believe, at least the core is about,

00:08:28 like science, finding regularities in the world,

00:08:31 understanding the world, which is necessary for survival.

00:08:35 If I look at a bush and I just see noise,

00:08:39 and there is a tiger and it eats me, then I’m dead.

00:08:42 But if I try to find a pattern,

00:08:44 and we know that humans are prone to find more patterns

00:08:49 in data than they are, like the Mars face

00:08:53 and all these things, but these biads

00:08:55 towards finding patterns, even if they are non,

00:08:58 but, I mean, it’s best, of course, if they are, yeah,

00:09:01 helps us for survival.

00:09:04 Yeah, that’s fascinating.

00:09:04 I haven’t thought really about the,

00:09:07 I thought I just loved science,

00:09:08 but indeed, in terms of just for survival purposes,

00:09:13 there is an evolutionary argument

00:09:15 for why we find the work of Einstein so beautiful.

00:09:21 Maybe a quick small tangent.

00:09:24 Could you describe what’s,

00:09:26 Salomonov induction is?

00:09:28 Yeah, so that’s a theory which I claim,

00:09:32 and Mr. Lomanov sort of claimed a long time ago,

00:09:35 that this solves the big philosophical problem of induction.

00:09:39 And I believe the claim is essentially true.

00:09:42 And what it does is the following.

00:09:44 So, okay, for the picky listener,

00:09:49 induction can be interpreted narrowly and widely.

00:09:53 Narrow means inferring models from data.

00:09:58 And widely means also then using these models

00:10:01 for doing predictions,

00:10:02 so predictions also part of the induction.

00:10:04 So I’m a little bit sloppy sort of with the terminology,

00:10:07 and maybe that comes from Ray Salomonov, you know,

00:10:10 being sloppy, maybe I shouldn’t say that.

00:10:12 He can’t complain anymore.

00:10:15 So let me explain a little bit this theory in simple terms.

00:10:20 So assume you have a data sequence,

00:10:21 make it very simple, the simplest one say 1, 1, 1, 1, 1,

00:10:24 and you see if 100 ones, what do you think comes next?

00:10:28 The natural answer, I’m gonna speed up a little bit,

00:10:30 the natural answer is of course, you know, one, okay?

00:10:33 And the question is why, okay?

00:10:36 Well, we see a pattern there, yeah, okay,

00:10:38 there’s a one and we repeat it.

00:10:40 And why should it suddenly after 100 ones be different?

00:10:43 So what we’re looking for is simple explanations or models

00:10:47 for the data we have.

00:10:48 And now the question is,

00:10:49 a model has to be presented in a certain language,

00:10:53 in which language do we use?

00:10:55 In science, we want formal languages,

00:10:57 and we can use mathematics,

00:10:58 or we can use programs on a computer.

00:11:01 So abstractly on a Turing machine, for instance,

00:11:04 or it can be a general purpose computer.

00:11:06 So, and there are of course, lots of models of,

00:11:09 you can say maybe it’s 100 ones and then 100 zeros

00:11:11 and 100 ones, that’s a model, right?

00:11:13 But there are simpler models, there’s a model print one loop,

00:11:17 and it also explains the data.

00:11:19 And if you push that to the extreme,

00:11:23 you are looking for the shortest program,

00:11:25 which if you run this program reproduces the data you have,

00:11:29 it will not stop, it will continue naturally.

00:11:32 And this you take for your prediction.

00:11:34 And on the sequence of ones, it’s very plausible, right?

00:11:37 That print one loop is the shortest program.

00:11:39 We can give some more complex examples

00:11:41 like one, two, three, four, five.

00:11:43 What comes next?

00:11:44 The short program is again, you know,

00:11:46 counter, and so that is roughly speaking

00:11:50 how solomotive induction works.

00:11:53 The extra twist is that it can also deal with noisy data.

00:11:56 So if you have, for instance, a coin flip,

00:11:58 say a biased coin, which comes up head with 60% probability,

00:12:03 then it will predict, it will learn and figure this out,

00:12:06 and after a while it predicts, oh, the next coin flip

00:12:09 will be head with probability 60%.

00:12:11 So it’s the stochastic version of that.

00:12:13 But the goal is, the dream is always the search

00:12:16 for the short program.

00:12:17 Yes, yeah.

00:12:18 Well, in solomotive induction, precisely what you do is,

00:12:21 so you combine, so looking for the shortest program

00:12:24 is like applying Opaque’s razor,

00:12:26 like looking for the simplest theory.

00:12:28 There’s also Epicorus principle, which says,

00:12:31 if you have multiple hypotheses,

00:12:32 which equally well describe your data,

00:12:34 don’t discard any of them, keep all of them around,

00:12:36 you never know.

00:12:37 And you can put that together and say,

00:12:39 okay, I have a bias towards simplicity,

00:12:42 but it don’t rule out the larger models.

00:12:44 And technically what we do is,

00:12:46 we weigh the shorter models higher

00:12:49 and the longer models lower.

00:12:52 And you use a Bayesian techniques, you have a prior,

00:12:55 and which is precisely two to the minus

00:12:59 the complexity of the program.

00:13:01 And you weigh all this hypothesis and take this mixture,

00:13:04 and then you get also the stochasticity in.

00:13:06 Yeah, like many of your ideas,

00:13:08 that’s just a beautiful idea of weighing based

00:13:10 on the simplicity of the program.

00:13:12 I love that, that seems to me

00:13:15 maybe a very human centric concept.

00:13:17 It seems to be a very appealing way

00:13:19 of discovering good programs in this world.

00:13:24 You’ve used the term compression quite a bit.

00:13:27 I think it’s a beautiful idea.

00:13:30 Sort of, we just talked about simplicity

00:13:32 and maybe science or just all of our intellectual pursuits

00:13:37 is basically the time to compress the complexity

00:13:41 all around us into something simple.

00:13:43 So what does this word mean to you, compression?

00:13:49 I essentially have already explained it.

00:13:51 So it compression means for me,

00:13:53 finding short programs for the data

00:13:58 or the phenomenon at hand.

00:13:59 You could interpret it more widely,

00:14:01 finding simple theories,

00:14:03 which can be mathematical theories

00:14:05 or maybe even informal, like just in words.

00:14:09 Compression means finding short descriptions,

00:14:11 explanations, programs for the data.

00:14:14 Do you see science as a kind of our human attempt

00:14:20 at compression, so we’re speaking more generally,

00:14:23 because when you say programs,

00:14:24 you’re kind of zooming in on a particular sort of

00:14:26 almost like a computer science,

00:14:28 artificial intelligence focus,

00:14:30 but do you see all of human endeavor

00:14:31 as a kind of compression?

00:14:34 Well, at least all of science,

00:14:35 I see as an endeavor of compression,

00:14:37 not all of humanity, maybe.

00:14:39 And well, there are also some other aspects of science

00:14:42 like experimental design, right?

00:14:43 I mean, we create experiments specifically

00:14:47 to get extra knowledge.

00:14:48 And that isn’t part of the decision making process,

00:14:53 but once we have the data,

00:14:55 to understand the data is essentially compression.

00:14:58 So I don’t see any difference between compression,

00:15:00 compression, understanding, and prediction.

00:15:05 So we’re jumping around topics a little bit,

00:15:07 but returning back to simplicity,

00:15:10 a fascinating concept of Kolmogorov complexity.

00:15:14 So in your sense, do most objects

00:15:17 in our mathematical universe

00:15:19 have high Kolmogorov complexity?

00:15:21 And maybe what is, first of all,

00:15:24 what is Kolmogorov complexity?

00:15:25 Okay, Kolmogorov complexity is a notion

00:15:28 of simplicity or complexity,

00:15:31 and it takes the compression view to the extreme.

00:15:35 So I explained before that if you have some data sequence,

00:15:39 just think about a file in a computer

00:15:41 and best sort of, you know, just a string of bits.

00:15:45 And if you, and we have data compressors,

00:15:49 like we compress big files into zip files

00:15:52 with certain compressors.

00:15:53 And you can also produce self extracting ArcaFs.

00:15:56 That means as an executable,

00:15:58 if you run it, it reproduces your original file

00:16:00 without needing an extra decompressor.

00:16:02 It’s just a decompressor plus the ArcaF together in one.

00:16:06 And now there are better and worse compressors,

00:16:08 and you can ask, what is the ultimate compressor?

00:16:11 So what is the shortest possible self extracting ArcaF

00:16:14 you could produce for a certain data set here,

00:16:17 which reproduces the data set.

00:16:19 And the length of this is called the Kolmogorov complexity.

00:16:23 And arguably that is the information content

00:16:26 in the data set.

00:16:27 I mean, if the data set is very redundant or very boring,

00:16:30 you can compress it very well.

00:16:31 So the information content should be low

00:16:34 and you know, it is low according to this definition.

00:16:36 So it’s the length of the shortest program

00:16:39 that summarizes the data?

00:16:41 Yes.

00:16:42 And what’s your sense of our sort of universe

00:16:46 when we think about the different objects in our universe

00:16:51 that we try, concepts or whatever at every level,

00:16:55 do they have higher or low Kolmogorov complexity?

00:16:58 So what’s the hope?

00:17:00 Do we have a lot of hope

00:17:01 and be able to summarize much of our world?

00:17:05 That’s a tricky and difficult question.

00:17:08 So as I said before, I believe that the whole universe

00:17:13 based on the evidence we have is very simple.

00:17:16 So it has a very short description.

00:17:19 Sorry, to linger on that, the whole universe,

00:17:23 what does that mean?

00:17:24 You mean at the very basic fundamental level

00:17:26 in order to create the universe?

00:17:28 Yes, yeah.

00:17:29 So you need a very short program and you run it.

00:17:32 To get the thing going.

00:17:34 To get the thing going

00:17:35 and then it will reproduce our universe.

00:17:37 There’s a problem with noise.

00:17:39 We can come back to that later possibly.

00:17:42 Is noise a problem or is it a bug or a feature?

00:17:46 I would say it makes our life as a scientist

00:17:49 really, really much harder.

00:17:52 I mean, think about without noise,

00:17:53 we wouldn’t need all of the statistics.

00:17:55 But then maybe we wouldn’t feel like there’s a free will.

00:17:58 Maybe we need that for the…

00:18:01 This is an illusion that noise can give you free will.

00:18:04 At least in that way, it’s a feature.

00:18:06 But also, if you don’t have noise,

00:18:09 you have chaotic phenomena,

00:18:10 which are effectively like noise.

00:18:12 So we can’t get away with statistics even then.

00:18:15 I mean, think about rolling a dice

00:18:17 and forget about quantum mechanics

00:18:19 and you know exactly how you throw it.

00:18:21 But I mean, it’s still so hard to compute the trajectory

00:18:24 that effectively it is best to model it

00:18:26 as coming out with a number,

00:18:30 this probability one over six.

00:18:33 But from this set of philosophical

00:18:36 Kolmogorov complexity perspective,

00:18:38 if we didn’t have noise,

00:18:39 then arguably you could describe the whole universe

00:18:43 as well as a standard model plus generativity.

00:18:47 I mean, we don’t have a theory of everything yet,

00:18:49 but sort of assuming we are close to it or have it.

00:18:52 Plus the initial conditions, which may hopefully be simple.

00:18:55 And then you just run it

00:18:56 and then you would reproduce the universe.

00:18:59 But that’s spoiled by noise or by chaotic systems

00:19:03 or by initial conditions, which may be complex.

00:19:06 So now if we don’t take the whole universe,

00:19:09 but just a subset, just take planet Earth.

00:19:13 Planet Earth cannot be compressed

00:19:15 into a couple of equations.

00:19:17 This is a hugely complex system.

00:19:19 So interesting.

00:19:20 So when you look at the window,

00:19:21 like the whole thing might be simple,

00:19:23 but when you just take a small window, then…

00:19:26 It may become complex and that may be counterintuitive,

00:19:28 but there’s a very nice analogy.

00:19:31 The book, the library of all books.

00:19:34 So imagine you have a normal library with interesting books

00:19:36 and you go there, great, lots of information

00:19:39 and quite complex.

00:19:41 So now I create a library which contains all possible books,

00:19:45 say of 500 pages.

00:19:46 So the first book just has A, A, A, A, A over all the pages.

00:19:49 The next book A, A, A and ends with B and so on.

00:19:52 I create this library of all books.

00:19:54 I can write a super short program which creates this library.

00:19:57 So this library which has all books

00:19:59 has zero information content.

00:20:01 And you take a subset of this library

00:20:02 and suddenly you have a lot of information in there.

00:20:05 So that’s fascinating.

00:20:06 I think one of the most beautiful object,

00:20:08 mathematical objects that at least today

00:20:10 seems to be understudied or under talked about

00:20:12 is cellular automata.

00:20:14 What lessons do you draw from sort of the game of life

00:20:18 for cellular automata where you start with the simple rules

00:20:20 just like you’re describing with the universe

00:20:22 and somehow complexity emerges.

00:20:26 Do you feel like you have an intuitive grasp

00:20:30 on the fascinating behavior of such systems

00:20:34 where like you said, some chaotic behavior could happen,

00:20:37 some complexity could emerge,

00:20:39 some it could die out and some very rigid structures.

00:20:43 Do you have a sense about cellular automata

00:20:46 that somehow transfers maybe

00:20:48 to the bigger questions of our universe?

00:20:50 Yeah, the cellular automata

00:20:51 and especially the Conway’s game of life

00:20:54 is really great because these rules are so simple.

00:20:56 You can explain it to every child

00:20:57 and even by hand you can simulate a little bit

00:21:00 and you see these beautiful patterns emerge

00:21:04 and people have proven that it’s even Turing complete.

00:21:06 You cannot just use a computer to simulate game of life

00:21:09 but you can also use game of life to simulate any computer.

00:21:13 That is truly amazing.

00:21:16 And it’s the prime example probably to demonstrate

00:21:21 that very simple rules can lead to very rich phenomena.

00:21:25 And people sometimes,

00:21:26 how is chemistry and biology so rich?

00:21:29 I mean, this can’t be based on simple rules.

00:21:32 But no, we know quantum electrodynamics

00:21:34 describes all of chemistry.

00:21:36 And we come later back to that.

00:21:38 I claim intelligence can be explained

00:21:40 or described in one single equation.

00:21:43 This very rich phenomenon.

00:21:45 You asked also about whether I understand this phenomenon

00:21:49 and it’s probably not.

00:21:54 And there’s this saying,

00:21:55 you never understand really things,

00:21:56 you just get used to them.

00:21:58 And I think I got pretty used to cellular automata.

00:22:03 So you believe that you understand

00:22:05 now why this phenomenon happens.

00:22:07 But I give you a different example.

00:22:09 I didn’t play too much with Conway’s game of life

00:22:11 but a little bit more with fractals

00:22:15 and with the Mandelbrot set and these beautiful patterns,

00:22:18 just look Mandelbrot set.

00:22:21 And well, when the computers were really slow

00:22:23 and I just had a black and white monitor

00:22:25 and programmed my own programs in assembler too.

00:22:29 Assembler, wow.

00:22:30 Wow, you’re legit.

00:22:33 To get these fractals on the screen

00:22:35 and it was mesmerized and much later.

00:22:37 So I returned to this every couple of years

00:22:40 and then I tried to understand what is going on.

00:22:42 And you can understand a little bit.

00:22:44 So I tried to derive the locations,

00:22:48 there are these circles and the apple shape

00:22:53 and then you have smaller Mandelbrot sets

00:22:57 recursively in this set.

00:22:59 And there’s a way to mathematically

00:23:01 by solving high order polynomials

00:23:03 to figure out where these centers are

00:23:05 and what size they are approximately.

00:23:08 And by sort of mathematically approaching this problem,

00:23:12 you slowly get a feeling of why things are like they are

00:23:18 and that sort of isn’t, you know,

00:23:21 first step to understanding why this rich phenomena.

00:23:24 Do you think it’s possible, what’s your intuition?

00:23:27 Do you think it’s possible to reverse engineer

00:23:28 and find the short program that generated these fractals

00:23:33 sort of by looking at the fractals?

00:23:36 Well, in principle, yes, yeah.

00:23:38 So, I mean, in principle, what you can do is

00:23:42 you take, you know, any data set, you know,

00:23:43 you take these fractals or you take whatever your data set,

00:23:46 whatever you have, say a picture of Convey’s Game of Life

00:23:51 and you run through all programs.

00:23:53 You take a program size one, two, three, four

00:23:55 and all these programs around them all in parallel

00:23:57 in so called dovetailing fashion,

00:23:59 give them computational resources,

00:24:01 first one 50%, second one half resources and so on

00:24:03 and let them run, wait until they halt,

00:24:06 give an output, compare it to your data

00:24:09 and if some of these programs produce the correct data,

00:24:12 then you stop and then you have already some program.

00:24:14 It may be a long program because it’s faster

00:24:16 and then you continue and you get shorter

00:24:18 and shorter programs until you eventually

00:24:20 find the shortest program.

00:24:22 The interesting thing, you can never know

00:24:24 whether it’s the shortest program

00:24:25 because there could be an even shorter program

00:24:27 which is just even slower and you just have to wait here.

00:24:32 But asymptotically and actually after a finite time,

00:24:35 you have the shortest program.

00:24:36 So this is a theoretical but completely impractical way

00:24:40 of finding the underlying structure in every data set

00:24:47 and that is what Solomov induction does

00:24:49 and Kolmogorov complexity.

00:24:50 In practice, of course, we have to approach the problem

00:24:52 more intelligently.

00:24:53 And then if you take resource limitations into account,

00:24:58 there’s, for instance, a field of pseudo random numbers

00:25:01 and these are deterministic sequences,

00:25:06 but no algorithm which is fast,

00:25:09 fast means runs in polynomial time,

00:25:10 can detect that it’s actually deterministic.

00:25:13 So we can produce interesting,

00:25:16 I mean, random numbers maybe not that interesting,

00:25:17 but just an example.

00:25:18 We can produce complex looking data

00:25:22 and we can then prove that no fast algorithm

00:25:25 can detect the underlying pattern.

00:25:27 Which is, unfortunately, that’s a big challenge

00:25:34 for our search for simple programs

00:25:35 in the space of artificial intelligence, perhaps.

00:25:38 Yes, it definitely is for artificial intelligence

00:25:40 and it’s quite surprising that it’s, I can’t say easy.

00:25:44 I mean, physicists worked really hard to find these theories,

00:25:48 but apparently it was possible for human minds

00:25:51 to find these simple rules in the universe.

00:25:54 It could have been different, right?

00:25:59 It could have been different.

00:26:00 It’s awe inspiring.

00:26:04 So let me ask another absurdly big question.

00:26:09 What is intelligence in your view?

00:26:13 So I have, of course, a definition.

00:26:17 I wasn’t sure what you’re going to say

00:26:18 because you could have just as easily said,

00:26:20 I have no clue.

00:26:21 Which many people would say,

00:26:23 but I’m not modest in this question.

00:26:26 So the informal version,

00:26:31 which I worked out together with Shane Lack,

00:26:33 who cofounded DeepMind,

00:26:35 is that intelligence measures an agent’s ability

00:26:38 to perform well in a wide range of environments.

00:26:42 So that doesn’t sound very impressive.

00:26:45 And these words have been very carefully chosen

00:26:49 and there is a mathematical theory behind that

00:26:52 and we come back to that later.

00:26:54 And if you look at this definition by itself,

00:26:59 it seems like, yeah, okay,

00:27:01 but it seems a lot of things are missing.

00:27:03 But if you think it through,

00:27:05 then you realize that most,

00:27:08 and I claim all of the other traits,

00:27:10 at least of rational intelligence,

00:27:12 which we usually associate with intelligence,

00:27:14 are emergent phenomena from this definition.

00:27:17 Like creativity, memorization, planning, knowledge.

00:27:22 You all need that in order to perform well

00:27:25 in a wide range of environments.

00:27:27 So you don’t have to explicitly mention

00:27:29 that in a definition.

00:27:29 Interesting.

00:27:30 So yeah, so the consciousness, abstract reasoning,

00:27:34 all these kinds of things are just emergent phenomena

00:27:36 that help you in towards,

00:27:40 can you say the definition again?

00:27:41 So multiple environments.

00:27:44 Did you mention the word goals?

00:27:45 No, but we have an alternative definition.

00:27:47 Instead of performing well,

00:27:48 you can just replace it by goals.

00:27:50 So intelligence measures an agent’s ability

00:27:53 to achieve goals in a wide range of environments.

00:27:55 That’s more or less equal.

00:27:56 But interesting,

00:27:57 because in there, there’s an injection of the word goals.

00:27:59 So we want to specify there should be a goal.

00:28:03 Yeah, but perform well is sort of,

00:28:04 what does it mean?

00:28:05 It’s the same problem.

00:28:06 Yeah.

00:28:07 There’s a little bit gray area,

00:28:09 but it’s much closer to something that could be formalized.

00:28:14 In your view, are humans,

00:28:16 where do humans fit into that definition?

00:28:18 Are they general intelligence systems

00:28:21 that are able to perform in,

00:28:24 like how good are they at fulfilling that definition

00:28:27 at performing well in multiple environments?

00:28:31 Yeah, that’s a big question.

00:28:32 I mean, the humans are performing best among all species.

00:28:37 We know of, yeah.

00:28:40 Depends.

00:28:41 You could say that trees and plants are doing a better job.

00:28:44 They’ll probably outlast us.

00:28:46 Yeah, but they are in a much more narrow environment, right?

00:28:49 I mean, you just have a little bit of air pollutions

00:28:51 and these trees die and we can adapt, right?

00:28:54 We build houses, we build filters,

00:28:55 we do geoengineering.

00:28:59 So the multiple environment part.

00:29:01 Yeah, that is very important, yeah.

00:29:02 So that distinguish narrow intelligence

00:29:04 from wide intelligence, also in the AI research.

00:29:08 So let me ask the Allentourian question.

00:29:12 Can machines think?

00:29:14 Can machines be intelligent?

00:29:15 So in your view, I have to kind of ask,

00:29:19 the answer is probably yes,

00:29:20 but I want to kind of hear what your thoughts on it.

00:29:24 Can machines be made to fulfill this definition

00:29:27 of intelligence, to achieve intelligence?

00:29:30 Well, we are sort of getting there

00:29:33 and on a small scale, we are already there.

00:29:36 The wide range of environments are missing,

00:29:38 but we have self driving cars,

00:29:40 we have programs which play Go and chess,

00:29:42 we have speech recognition.

00:29:44 So that’s pretty amazing,

00:29:45 but these are narrow environments.

00:29:49 But if you look at AlphaZero,

00:29:51 that was also developed by DeepMind.

00:29:53 I mean, got famous with AlphaGo

00:29:55 and then came AlphaZero a year later.

00:29:57 That was truly amazing.

00:29:59 So reinforcement learning algorithm,

00:30:01 which is able just by self play,

00:30:04 to play chess and then also Go.

00:30:08 And I mean, yes, they’re both games,

00:30:10 but they’re quite different games.

00:30:11 And you didn’t don’t feed them the rules of the game.

00:30:15 And the most remarkable thing,

00:30:16 which is still a mystery to me,

00:30:18 that usually for any decent chess program,

00:30:21 I don’t know much about Go,

00:30:22 you need opening books and end game tables and so on too.

00:30:26 And nothing in there, nothing was put in there.

00:30:29 Especially with AlphaZero,

00:30:31 the self playing mechanism starting from scratch,

00:30:33 being able to learn actually new strategies is…

00:30:39 Yeah, it rediscovered all these famous openings

00:30:43 within four hours by itself.

00:30:46 What I was really happy about,

00:30:47 I’m a terrible chess player, but I like Queen Gumby.

00:30:50 And AlphaZero figured out that this is the best opening.

00:30:53 Finally, somebody proved you correct.

00:30:59 So yes, to answer your question,

00:31:01 yes, I believe that general intelligence is possible.

00:31:05 And it also, I mean, it depends how you define it.

00:31:08 Do you say AGI with general intelligence,

00:31:11 artificial intelligence,

00:31:13 only refers to if you achieve human level

00:31:16 or a subhuman level, but quite broad,

00:31:18 is it also general intelligence?

00:31:19 So we have to distinguish,

00:31:20 or it’s only super human intelligence,

00:31:23 general artificial intelligence.

00:31:25 Is there a test in your mind,

00:31:26 like the Turing test for natural language

00:31:28 or some other test that would impress the heck out of you

00:31:32 that would kind of cross the line of your sense

00:31:36 of intelligence within the framework that you said?

00:31:39 Well, the Turing test has been criticized a lot,

00:31:42 but I think it’s not as bad as some people think.

00:31:45 And some people think it’s too strong.

00:31:47 So it tests not just for system to be intelligent,

00:31:52 but it also has to fake human deception,

00:31:56 which is much harder.

00:31:58 And on the other hand, they say it’s too weak

00:32:01 because it just maybe fakes emotions

00:32:05 or intelligent behavior.

00:32:07 It’s not real.

00:32:09 But I don’t think that’s the problem or a big problem.

00:32:11 So if you would pass the Turing test,

00:32:15 so a conversation over terminal with a bot for an hour,

00:32:20 or maybe a day or so,

00:32:21 and you can fool a human into not knowing

00:32:25 whether this is a human or not,

00:32:26 so that’s the Turing test,

00:32:27 I would be truly impressed.

00:32:30 And we have this annual competition, the Lübner Prize.

00:32:34 And I mean, it started with ELISA,

00:32:35 that was the first conversational program.

00:32:38 And what is it called?

00:32:40 The Japanese Mitsuko or so.

00:32:41 That’s the winner of the last couple of years.

00:32:44 And well.

00:32:45 Quite impressive.

00:32:46 Yeah, it’s quite impressive.

00:32:47 And then Google has developed Mina, right?

00:32:50 Just recently, that’s an open domain conversational bot,

00:32:55 just a couple of weeks ago, I think.

00:32:57 Yeah, I kind of like the metric

00:32:58 that sort of the Alexa Prize has proposed.

00:33:01 I mean, maybe it’s obvious to you.

00:33:02 It wasn’t to me of setting sort of a length

00:33:06 of a conversation.

00:33:07 Like you want the bot to be sufficiently interesting

00:33:10 that you would want to keep talking to it

00:33:12 for like 20 minutes.

00:33:13 And that’s a surprisingly effective in aggregate metric,

00:33:19 because really, like nobody has the patience

00:33:24 to be able to talk to a bot that’s not interesting

00:33:27 and intelligent and witty,

00:33:29 and is able to go on to different tangents, jump domains,

00:33:32 be able to say something interesting

00:33:35 to maintain your attention.

00:33:36 And maybe many humans will also fail this test.

00:33:39 That’s the, unfortunately, we set,

00:33:42 just like with autonomous vehicles, with chatbots,

00:33:45 we also set a bar that’s way too high to reach.

00:33:48 I said, you know, the Turing test is not as bad

00:33:50 as some people believe,

00:33:51 but what is really not useful about the Turing test,

00:33:55 it gives us no guidance

00:33:58 how to develop these systems in the first place.

00:34:00 Of course, you know, we can develop them by trial and error

00:34:02 and, you know, do whatever and then run the test

00:34:05 and see whether it works or not.

00:34:06 But a mathematical definition of intelligence

00:34:12 gives us, you know, an objective,

00:34:16 which we can then analyze by theoretical tools

00:34:19 or computational, and, you know,

00:34:22 maybe even prove how close we are.

00:34:25 And we will come back to that later with the iXe model.

00:34:28 So, I mentioned the compression, right?

00:34:31 So in natural language processing,

00:34:33 they have achieved amazing results.

00:34:36 And one way to test this, of course,

00:34:38 you know, take the system, you train it,

00:34:40 and then you see how well it performs on the task.

00:34:43 But a lot of performance measurement

00:34:47 is done by so called perplexity,

00:34:49 which is essentially the same as complexity

00:34:51 or compression length.

00:34:53 So the NLP community develops new systems

00:34:55 and then they measure the compression length

00:34:57 and then they have ranking and leaks

00:35:01 because there’s a strong correlation

00:35:02 between compressing well,

00:35:04 and then the system’s performing well at the task at hand.

00:35:07 It’s not perfect, but it’s good enough

00:35:09 for them as an intermediate aim.

00:35:14 So you mean a measure,

00:35:16 so this is kind of almost returning

00:35:18 to the common goal of complexity.

00:35:19 So you’re saying good compression

00:35:22 usually means good intelligence.

00:35:24 Yes.

00:35:27 So you mentioned you’re one of the only people

00:35:31 who dared boldly to try to formalize

00:35:36 the idea of artificial general intelligence,

00:35:38 to have a mathematical framework for intelligence,

00:35:42 just like as we mentioned,

00:35:45 termed AIXI, A, I, X, I.

00:35:49 So let me ask the basic question.

00:35:51 What is AIXI?

00:35:54 Okay, so let me first say what it stands for because…

00:35:57 What it stands for, actually,

00:35:58 that’s probably the more basic question.

00:36:00 What it…

00:36:01 The first question is usually how it’s pronounced,

00:36:04 but finally I put it on the website how it’s pronounced

00:36:07 and you figured it out.

00:36:10 The name comes from AI, artificial intelligence,

00:36:13 and the X, I, is the Greek letter Xi,

00:36:16 which are used for Solomonov’s distribution

00:36:19 for quite stupid reasons,

00:36:22 which I’m not willing to repeat here in front of camera.

00:36:24 Sure.

00:36:27 So it just happened to be more or less arbitrary.

00:36:29 I chose the Xi.

00:36:31 But it also has nice other interpretations.

00:36:34 So there are actions and perceptions in this model.

00:36:38 An agent has actions and perceptions over time.

00:36:42 So this is A index I, X index I.

00:36:44 So there’s the action at time I

00:36:46 and then followed by perception at time I.

00:36:49 Yeah, we’ll go with that.

00:36:50 I’ll edit out the first part.

00:36:52 I’m just kidding.

00:36:53 I have some more interpretations.

00:36:55 So at some point, maybe five years ago or 10 years ago,

00:36:59 I discovered in Barcelona, it was on a big church

00:37:04 that was in stone engraved, some text,

00:37:08 and the word Aixia appeared there a couple of times.

00:37:11 I was very surprised and happy about that.

00:37:16 And I looked it up.

00:37:17 So it is a Catalan language

00:37:19 and it means with some interpretation of that’s it,

00:37:22 that’s the right thing to do.

00:37:23 Yeah, Huayrica.

00:37:24 Oh, so it’s almost like destined somehow.

00:37:27 It came to you in a dream.

00:37:32 And similar, there’s a Chinese word, Aixi,

00:37:34 also written like Aixi, if you transcribe that to Pinyin.

00:37:37 And the final one is that it’s AI crossed with induction

00:37:41 because that is, and that’s going more to the content now.

00:37:44 So good old fashioned AI is more about planning

00:37:47 and known deterministic world

00:37:48 and induction is more about often IID data

00:37:51 and inferring models.

00:37:53 And essentially what this Aixi model does

00:37:54 is combining these two.

00:37:56 And I actually also recently, I think heard that

00:37:59 in Japanese AI means love.

00:38:02 So if you can combine XI somehow with that,

00:38:06 I think we can, there might be some interesting ideas there.

00:38:10 So Aixi, let’s then take the next step.

00:38:12 Can you maybe talk at the big level

00:38:16 of what is this mathematical framework?

00:38:19 Yeah, so it consists essentially of two parts.

00:38:22 One is the learning and induction and prediction part.

00:38:26 And the other one is the planning part.

00:38:28 So let’s come first to the learning,

00:38:31 induction, prediction part,

00:38:32 which essentially I explained already before.

00:38:35 So what we need for any agent to act well

00:38:40 is that it can somehow predict what happens.

00:38:43 I mean, if you have no idea what your actions do,

00:38:47 how can you decide which actions are good or not?

00:38:48 So you need to have some model of what your actions effect.

00:38:52 So what you do is you have some experience,

00:38:56 you build models like scientists of your experience,

00:38:59 then you hope these models are roughly correct,

00:39:01 and then you use these models for prediction.

00:39:03 And the model is, sorry to interrupt,

00:39:05 and the model is based on your perception of the world,

00:39:08 how your actions will affect that world.

00:39:10 That’s not…

00:39:12 So how do you think about a model?

00:39:12 That’s not the important part,

00:39:14 but it is technically important,

00:39:16 but at this stage we can just think about predicting,

00:39:18 let’s say, stock market data, weather data,

00:39:20 or IQ sequences, one, two, three, four, five,

00:39:23 what comes next, yeah?

00:39:24 So of course our actions affect what we’re doing,

00:39:28 but I’ll come back to that in a second.

00:39:30 So, and I’ll keep just interrupting.

00:39:32 So just to draw a line between prediction and planning,

00:39:37 what do you mean by prediction in this way?

00:39:40 It’s trying to predict the environment

00:39:43 without your long term action in the environment?

00:39:47 What is prediction?

00:39:49 Okay, if you want to put the actions in now,

00:39:51 okay, then let’s put it in now, yeah?

00:39:53 So…

00:39:54 We don’t have to put them now.

00:39:55 Yeah, yeah.

00:39:56 Scratch it, scratch it, dumb question, okay.

00:39:58 So the simplest form of prediction is

00:40:01 that you just have data which you passively observe,

00:40:04 and you want to predict what happens

00:40:06 without interfering, as I said,

00:40:08 weather forecasting, stock market, IQ sequences,

00:40:12 or just anything, okay?

00:40:16 And Solomonov’s theory of induction based on compression,

00:40:18 so you look for the shortest program

00:40:20 which describes your data sequence,

00:40:22 and then you take this program, run it,

00:40:24 it reproduces your data sequence by definition,

00:40:26 and then you let it continue running,

00:40:29 and then it will produce some predictions,

00:40:30 and you can rigorously prove that for any prediction task,

00:40:37 this is essentially the best possible predictor.

00:40:40 Of course, if there’s a prediction task,

00:40:43 or a task which is unpredictable,

00:40:45 like, you know, you have fair coin flips.

00:40:46 Yeah, I cannot predict the next fair coin flip.

00:40:48 What Solomonov does is says,

00:40:49 okay, next head is probably 50%.

00:40:51 It’s the best you can do.

00:40:52 So if something is unpredictable,

00:40:54 Solomonov will also not magically predict it.

00:40:56 But if there is some pattern and predictability,

00:40:59 then Solomonov induction will figure that out eventually,

00:41:03 and not just eventually, but rather quickly,

00:41:06 and you can have proof convergence rates,

00:41:10 whatever your data is.

00:41:11 So there’s pure magic in a sense.

00:41:14 What’s the catch?

00:41:15 Well, the catch is that it’s not computable,

00:41:17 and we come back to that later.

00:41:18 You cannot just implement it

00:41:19 even with Google resources here,

00:41:21 and run it and predict the stock market and become rich.

00:41:24 I mean, Ray Solomonov already tried it at the time.

00:41:28 But so the basic task is you’re in the environment,

00:41:31 and you’re interacting with the environment

00:41:33 to try to learn to model that environment,

00:41:35 and the model is in the space of all these programs,

00:41:38 and your goal is to get a bunch of programs that are simple.

00:41:41 Yeah, so let’s go to the actions now.

00:41:44 But actually, good that you asked.

00:41:45 Usually I skip this part,

00:41:46 although there is also a minor contribution which I did,

00:41:48 so the action part,

00:41:49 but I usually sort of just jump to the decision part.

00:41:51 So let me explain the action part now.

00:41:53 Thanks for asking.

00:41:55 So you have to modify it a little bit

00:41:58 by now not just predicting a sequence

00:42:01 which just comes to you,

00:42:03 but you have an observation, then you act somehow,

00:42:06 and then you want to predict the next observation

00:42:09 based on the past observation and your action.

00:42:11 Then you take the next action.

00:42:14 You don’t care about predicting it because you’re doing it.

00:42:17 Then you get the next observation,

00:42:19 and you want, well, before you get it,

00:42:20 you want to predict it, again,

00:42:21 based on your past action and observation sequence.

00:42:24 You just condition extra on your actions.

00:42:28 There’s an interesting alternative

00:42:30 that you also try to predict your own actions.

00:42:35 If you want.

00:42:36 In the past or the future?

00:42:37 In your future actions.

00:42:39 That’s interesting.

00:42:40 Yeah. Wait, let me wrap.

00:42:43 I think my brain just broke.

00:42:45 We should maybe discuss that later

00:42:47 after I’ve explained the IXE model.

00:42:48 That’s an interesting variation.

00:42:50 But that is a really interesting variation,

00:42:52 and a quick comment.

00:42:53 I don’t know if you want to insert that in here,

00:42:55 but you’re looking at the, in terms of observations,

00:42:59 you’re looking at the entire, the big history,

00:43:01 the long history of the observations.

00:43:03 Exactly. That’s very important.

00:43:04 The whole history from birth sort of of the agent,

00:43:07 and we can come back to that.

00:43:09 And also why this is important.

00:43:10 Often, you know, in RL, you have MDPs,

00:43:13 micro decision processes, which are much more limiting.

00:43:15 Okay. So now we can predict conditioned on actions.

00:43:19 So even if you influence environment,

00:43:21 but prediction is not all we want to do, right?

00:43:24 We also want to act really in the world.

00:43:26 And the question is how to choose the actions.

00:43:29 And we don’t want to greedily choose the actions,

00:43:33 you know, just, you know, what is best in the next time step.

00:43:36 And we first, I should say, you know, what is, you know,

00:43:38 how do we measure performance?

00:43:39 So we measure performance by giving the agent reward.

00:43:43 That’s the so called reinforcement learning framework.

00:43:45 So every time step, you can give it a positive reward

00:43:48 or negative reward, or maybe no reward.

00:43:50 It could be a very scarce, right?

00:43:51 Like if you play chess, just at the end of the game,

00:43:54 you give plus one for winning or minus one for losing.

00:43:56 So in the RxC framework, that’s completely sufficient.

00:43:59 So occasionally you give a reward signal

00:44:01 and you ask the agent to maximize reward,

00:44:04 but not greedily sort of, you know, the next one, next one,

00:44:06 because that’s very bad in the long run if you’re greedy.

00:44:10 So, but over the lifetime of the agent.

00:44:12 So let’s assume the agent lives for M time steps,

00:44:14 or say dies in sort of a hundred years sharp.

00:44:16 That’s just, you know, the simplest model to explain.

00:44:19 So it looks at the future reward sum

00:44:22 and ask what is my action sequence,

00:44:24 or actually more precisely my policy,

00:44:26 which leads in expectation, because I don’t know the world,

00:44:32 to the maximum reward sum.

00:44:34 Let me give you an analogy.

00:44:36 In chess, for instance,

00:44:38 we know how to play optimally in theory.

00:44:40 It’s just a mini max strategy.

00:44:42 I play the move which seems best to me

00:44:44 under the assumption that the opponent plays the move

00:44:46 which is best for him.

00:44:48 So best, so worst for me under the assumption that he,

00:44:52 I play again, the best move.

00:44:54 And then you have this expecting max three

00:44:55 to the end of the game, and then you back propagate,

00:44:58 and then you get the best possible move.

00:45:00 So that is the optimal strategy,

00:45:02 which von Neumann already figured out a long time ago,

00:45:06 for playing adversarial games.

00:45:09 Luckily, or maybe unluckily for the theory,

00:45:11 it becomes harder.

00:45:12 The world is not always adversarial.

00:45:14 So it can be, if there are other humans,

00:45:17 even cooperative, or nature is usually,

00:45:20 I mean, the dead nature is stochastic, you know,

00:45:22 things just happen randomly, or don’t care about you.

00:45:26 So what you have to take into account is the noise,

00:45:29 and not necessarily adversarialty.

00:45:30 So you replace the minimum on the opponent’s side

00:45:34 by an expectation,

00:45:36 which is general enough to include also adversarial cases.

00:45:40 So now instead of a mini max strategy,

00:45:41 you have an expected max strategy.

00:45:43 So far, so good.

00:45:44 So that is well known.

00:45:45 It’s called sequential decision theory.

00:45:48 But the question is,

00:45:49 on which probability distribution do you base that?

00:45:52 If I have the true probability distribution,

00:45:55 like say I play backgammon, right?

00:45:56 There’s dice, and there’s certain randomness involved.

00:45:59 Yeah, I can calculate probabilities

00:46:00 and feed it in the expected max,

00:46:02 or the sequential decision tree,

00:46:04 come up with the optimal decision if I have enough compute.

00:46:07 But for the real world, we don’t know that, you know,

00:46:09 what is the probability the driver in front of me breaks?

00:46:13 I don’t know.

00:46:14 So depends on all kinds of things,

00:46:16 and especially new situations, I don’t know.

00:46:19 So this is this unknown thing about prediction,

00:46:22 and there’s where Solomonov comes in.

00:46:24 So what you do is in sequential decision tree,

00:46:26 you just replace the true distribution,

00:46:28 which we don’t know, by this universal distribution.

00:46:32 I didn’t explicitly talk about it,

00:46:34 but this is used for universal prediction

00:46:36 and plug it into the sequential decision tree mechanism.

00:46:40 And then you get the best of both worlds.

00:46:42 You have a long term planning agent,

00:46:45 but it doesn’t need to know anything about the world

00:46:48 because the Solomonov induction part learns.

00:46:51 Can you explicitly try to describe

00:46:54 the universal distribution

00:46:56 and how Solomonov induction plays a role here?

00:46:59 I’m trying to understand.

00:47:00 So what it does it, so in the simplest case,

00:47:03 I said, take the shortest program, describing your data,

00:47:06 run it, have a prediction which would be deterministic.

00:47:09 Yes. Okay.

00:47:10 But you should not just take the shortest program,

00:47:13 but also consider the longer ones,

00:47:15 but give it lower a priori probability.

00:47:18 So in the Bayesian framework, you say a priori,

00:47:22 any distribution, which is a model or a stochastic program,

00:47:29 has a certain a priori probability,

00:47:30 which is two to the minus, and why two to the minus length?

00:47:33 You know, I could explain length of this program.

00:47:35 So longer programs are punished a priori.

00:47:39 And then you multiply it

00:47:41 with the so called likelihood function,

00:47:43 which is, as the name suggests,

00:47:46 is how likely is this model given the data at hand.

00:47:51 So if you have a very wrong model,

00:47:53 it’s very unlikely that this model is true.

00:47:55 And so it is very small number.

00:47:56 So even if the model is simple, it gets penalized by that.

00:48:00 And what you do is then you take just the sum,

00:48:02 or this is the average over it.

00:48:04 And this gives you a probability distribution.

00:48:07 So it’s universal distribution or Solomonov distribution.

00:48:10 So it’s weighed by the simplicity of the program

00:48:13 and the likelihood.

00:48:14 Yes.

00:48:15 It’s kind of a nice idea.

00:48:17 Yeah.

00:48:18 So okay, and then you said there’s you’re playing N or M

00:48:23 or forgot the letter steps into the future.

00:48:25 So how difficult is that problem?

00:48:28 What’s involved there?

00:48:29 Okay, so basic optimization problem.

00:48:31 What are we talking about?

00:48:32 Yeah, so you have a planning problem up to horizon M,

00:48:34 and that’s exponential time in the horizon M,

00:48:38 which is, I mean, it’s computable, but intractable.

00:48:41 I mean, even for chess, it’s already intractable

00:48:43 to do that exactly.

00:48:44 And you know, for goal.

00:48:45 But it could be also discounted kind of framework where.

00:48:48 Yeah, so having a hard horizon, you know, at 100 years,

00:48:52 it’s just for simplicity of discussing the model

00:48:55 and also sometimes the math is simple.

00:48:58 But there are lots of variations,

00:49:00 actually quite interesting parameter.

00:49:03 There’s nothing really problematic about it,

00:49:07 but it’s very interesting.

00:49:08 So for instance, you think, no,

00:49:09 let’s let the parameter M tend to infinity, right?

00:49:12 You want an agent which lives forever, right?

00:49:15 If you do it normally, you have two problems.

00:49:17 First, the mathematics breaks down

00:49:19 because you have an infinite reward sum,

00:49:21 which may give infinity,

00:49:22 and getting reward 0.1 every time step is infinity,

00:49:25 and giving reward one every time step is infinity,

00:49:27 so equally good.

00:49:29 Not really what we want.

00:49:31 Other problem is that if you have an infinite life,

00:49:35 you can be lazy for as long as you want for 10 years

00:49:38 and then catch up with the same expected reward.

00:49:41 And think about yourself or maybe some friends or so.

00:49:47 If they knew they lived forever, why work hard now?

00:49:51 Just enjoy your life and then catch up later.

00:49:54 So that’s another problem with infinite horizon.

00:49:56 And you mentioned, yes, we can go to discounting,

00:49:59 but then the standard discounting

00:50:01 is so called geometric discounting.

00:50:03 So a dollar today is about worth

00:50:05 as much as $1.05 tomorrow.

00:50:08 So if you do the so called geometric discounting,

00:50:10 you have introduced an effective horizon.

00:50:12 So the agent is now motivated to look ahead

00:50:15 a certain amount of time effectively.

00:50:18 It’s like a moving horizon.

00:50:20 And for any fixed effective horizon,

00:50:23 there is a problem to solve,

00:50:26 which requires larger horizon.

00:50:28 So if I look ahead five time steps,

00:50:30 I’m a terrible chess player, right?

00:50:32 I’ll need to look ahead longer.

00:50:34 If I play go, I probably have to look ahead even longer.

00:50:36 So for every problem, for every horizon,

00:50:40 there is a problem which this horizon cannot solve.

00:50:43 But I introduced the so called near harmonic horizon,

00:50:46 which goes down with one over T

00:50:48 rather than exponential in T,

00:50:49 which produces an agent,

00:50:51 which effectively looks into the future

00:50:53 proportional to each age.

00:50:55 So if it’s five years old, it plans for five years.

00:50:57 If it’s 100 years old, it then plans for 100 years.

00:51:00 And it’s a little bit similar to humans too, right?

00:51:02 I mean, children don’t plan ahead very long,

00:51:04 but then we get adult, we play ahead more longer.

00:51:07 Maybe when we get very old,

00:51:08 I mean, we know that we don’t live forever.

00:51:10 Maybe then our horizon shrinks again.

00:51:12 So that’s really interesting.

00:51:16 So adjusting the horizon,

00:51:18 is there some mathematical benefit of that?

00:51:20 Or is it just a nice,

00:51:22 I mean, intuitively, empirically,

00:51:25 it would probably be a good idea

00:51:26 to sort of push the horizon back,

00:51:27 extend the horizon as you experience more of the world.

00:51:33 But is there some mathematical conclusions here

00:51:35 that are beneficial?

00:51:37 With solomonic reductions or the prediction part,

00:51:38 we have extremely strong finite time,

00:51:42 but not finite data results.

00:51:44 So you have so and so much data,

00:51:46 then you lose so and so much.

00:51:47 So it’s a, the theory is really great.

00:51:49 With the ICSE model, with the planning part,

00:51:51 many results are only asymptotic, which, well, this is…

00:51:56 What does asymptotic mean?

00:51:57 Asymptotic means you can prove, for instance,

00:51:59 that in the long run, if the agent, you know,

00:52:02 acts long enough, then, you know,

00:52:04 it performs optimal or some nice thing happens.

00:52:06 So, but you don’t know how fast it converges.

00:52:09 So it may converge fast,

00:52:10 but we’re just not able to prove it

00:52:12 because of a difficult problem.

00:52:13 Or maybe there’s a bug in the model

00:52:17 so that it’s really that slow.

00:52:19 So that is what asymptotic means,

00:52:21 sort of eventually, but we don’t know how fast.

00:52:24 And if I give the agent a fixed horizon M,

00:52:28 then I cannot prove asymptotic results, right?

00:52:32 So I mean, sort of if it dies in a hundred years,

00:52:35 then in a hundred years it’s over, I cannot say eventually.

00:52:37 So this is the advantage of the discounting

00:52:40 that I can prove asymptotic results.

00:52:42 So just to clarify, so I, okay, I made,

00:52:46 I’ve built up a model, we’re now in the moment of,

00:52:51 I have this way of looking several steps ahead.

00:52:55 How do I pick what action I will take?

00:52:58 It’s like with the playing chess, right?

00:53:00 You do this minimax.

00:53:02 In this case here, do expectimax based on the solomonov

00:53:05 distribution, you propagate back,

00:53:09 and then while an action falls out,

00:53:12 the action which maximizes the future expected reward

00:53:15 on the solomonov distribution,

00:53:16 and then you just take this action.

00:53:18 And then repeat.

00:53:19 And then you get a new observation,

00:53:20 and you feed it in this action observation,

00:53:22 then you repeat.

00:53:23 And the reward, so on.

00:53:24 Yeah, so you rewrote too, yeah.

00:53:26 And then maybe you can even predict your own action.

00:53:29 I love that idea.

00:53:29 But okay, this big framework,

00:53:33 what is it, I mean,

00:53:36 it’s kind of a beautiful mathematical framework

00:53:38 to think about artificial general intelligence.

00:53:41 What can you, what does it help you into it

00:53:45 about how to build such systems?

00:53:49 Or maybe from another perspective,

00:53:51 what does it help us in understanding AGI?

00:53:56 So when I started in the field,

00:54:00 I was always interested in two things.

00:54:01 One was AGI, the name didn’t exist then,

00:54:05 what’s called general AI or strong AI,

00:54:09 and the physics theory of everything.

00:54:10 So I switched back and forth between computer science

00:54:13 and physics quite often.

00:54:14 You said the theory of everything.

00:54:15 The theory of everything, yeah.

00:54:17 Those are basically the two biggest problems

00:54:19 before all of humanity.

00:54:21 Yeah, I can explain if you wanted some later time,

00:54:28 why I’m interested in these two questions.

00:54:29 Can I ask you in a small tangent,

00:54:32 if it was one to be solved,

00:54:37 which one would you,

00:54:38 if an apple fell on your head

00:54:41 and there was a brilliant insight

00:54:43 and you could arrive at the solution to one,

00:54:46 would it be AGI or the theory of everything?

00:54:49 Definitely AGI, because once the AGI problem is solved,

00:54:51 I can ask the AGI to solve the other problem for me.

00:54:56 Yeah, brilliant input.

00:54:57 Okay, so as you were saying about it.

00:55:01 Okay, so, and the reason why I didn’t settle,

00:55:04 I mean, this thought about,

00:55:07 once you have solved AGI, it solves all kinds of other,

00:55:09 not just the theory of every problem,

00:55:11 but all kinds of more useful problems to humanity

00:55:14 is very appealing to many people.

00:55:16 And I had this thought also,

00:55:18 but I was quite disappointed with the state of the art

00:55:23 of the field of AI.

00:55:25 There was some theory about logical reasoning,

00:55:28 but I was never convinced that this will fly.

00:55:30 And then there was this more heuristic approaches

00:55:33 with neural networks and I didn’t like these heuristics.

00:55:37 So, and also I didn’t have any good idea myself.

00:55:42 So that’s the reason why I toggled back and forth

00:55:44 quite some while and even worked four and a half years

00:55:46 in a company developing software,

00:55:48 something completely unrelated.

00:55:49 But then I had this idea about the ICSE model.

00:55:52 And so what it gives you, it gives you a gold standard.

00:55:57 So I have proven that this is the most intelligent agents

00:56:02 which anybody could build in quotation mark,

00:56:06 because it’s just mathematical

00:56:08 and you need infinite compute.

00:56:11 But this is the limit and this is completely specified.

00:56:14 It’s not just a framework and every year,

00:56:19 tens of frameworks are developed,

00:56:21 which are just skeletons and then pieces are missing.

00:56:23 And usually these missing pieces,

00:56:25 turn out to be really, really difficult.

00:56:27 And so this is completely and uniquely defined

00:56:31 and we can analyze that mathematically.

00:56:33 And we’ve also developed some approximations.

00:56:37 I can talk about that a little bit later.

00:56:40 That would be sort of the top down approach,

00:56:41 like say for Neumann’s minimax theory,

00:56:44 that’s the theoretical optimal play of games.

00:56:47 And now we need to approximate it,

00:56:48 put heuristics in, prune the tree, blah, blah, blah,

00:56:51 and so on.

00:56:51 So we can do that also with the ICSE model,

00:56:53 but for general AI.

00:56:55 It can also inspire those,

00:56:57 and most researchers go bottom up, right?

00:57:00 They have the systems,

00:57:01 they try to make it more general, more intelligent.

00:57:04 It can inspire in which direction to go.

00:57:08 What do you mean by that?

00:57:09 So if you have some choice to make, right?

00:57:11 So how should I evaluate my system

00:57:13 if I can’t do cross validation?

00:57:15 How should I do my learning

00:57:18 if my standard regularization doesn’t work well?

00:57:21 So the answer is always this,

00:57:22 we have a system which does everything, that’s ICSE.

00:57:25 It’s just completely in the ivory tower,

00:57:27 completely useless from a practical point of view.

00:57:30 But you can look at it and see,

00:57:31 ah, yeah, maybe I can take some aspects.

00:57:34 And instead of Kolmogorov complexity,

00:57:36 that just takes some compressors,

00:57:38 which has been developed so far.

00:57:39 And for the planning, well, we have UCT,

00:57:42 which has also been used in Go.

00:57:45 And at least it’s inspired me a lot

00:57:50 to have this formal definition.

00:57:54 And if you look at other fields,

00:57:55 like I always come back to physics

00:57:57 because I have a physics background,

00:57:58 think about the phenomenon of energy.

00:58:00 That was long time a mysterious concept.

00:58:03 And at some point it was completely formalized.

00:58:05 And that really helped a lot.

00:58:08 And you can point out a lot of these things

00:58:10 which were first mysterious and vague,

00:58:12 and then they have been rigorously formalized.

00:58:15 Speed and acceleration has been confused, right?

00:58:18 Until it was formally defined,

00:58:19 yeah, there was a time like this.

00:58:21 And people often who don’t have any background,

00:58:25 still confuse it.

00:58:28 And this ICSE model or the intelligence definitions,

00:58:31 which is sort of the dual to it,

00:58:33 we come back to that later,

00:58:34 formalizes the notion of intelligence

00:58:37 uniquely and rigorously.

00:58:38 So in a sense, it serves as kind of the light

00:58:41 at the end of the tunnel.

00:58:43 So for, I mean, there’s a million questions

00:58:46 I could ask her.

00:58:47 So maybe kind of, okay,

00:58:50 let’s feel around in the dark a little bit.

00:58:52 So there’s been here a deep mind,

00:58:54 but in general, been a lot of breakthrough ideas,

00:58:56 just like we’ve been saying around reinforcement learning.

00:58:59 So how do you see the progress

00:59:02 in reinforcement learning is different?

00:59:04 Like which subset of ICSE does it occupy?

00:59:08 The current, like you said,

00:59:10 maybe the Markov assumption is made quite often

00:59:14 in reinforcement learning.

00:59:16 There’s other assumptions made

00:59:20 in order to make the system work.

00:59:21 What do you see as the difference connection

00:59:24 between reinforcement learning and ICSE?

00:59:26 And so the major difference is that

00:59:30 essentially all other approaches,

00:59:33 they make stronger assumptions.

00:59:35 So in reinforcement learning, the Markov assumption

00:59:38 is that the next state or next observation

00:59:41 only depends on the previous observation

00:59:43 and not the whole history,

00:59:45 which makes, of course, the mathematics much easier

00:59:47 rather than dealing with histories.

00:59:49 Of course, they profit from it also,

00:59:51 because then you have algorithms

00:59:53 that run on current computers

00:59:54 and do something practically useful.

00:59:56 But for general AI, all the assumptions

00:59:59 which are made by other approaches,

01:00:01 we know already now they are limiting.

01:00:04 So, for instance, usually you need

01:00:07 a goddessity assumption in the MDP frameworks

01:00:09 in order to learn.

01:00:10 A goddessity essentially means that you can recover

01:00:13 from your mistakes and that there are no traps

01:00:15 in the environment.

01:00:17 And if you make this assumption,

01:00:19 then essentially you can go back to a previous state,

01:00:22 go there a couple of times and then learn

01:00:24 what statistics and what the state is like,

01:00:29 and then in the long run perform well in this state.

01:00:32 But there are no fundamental problems.

01:00:35 But in real life, we know there can be one single action.

01:00:38 One second of being inattentive while driving a car fast

01:00:43 can ruin the rest of my life.

01:00:45 I can become quadriplegic or whatever.

01:00:47 So, and there’s no recovery anymore.

01:00:49 So, the real world is not ergodic, I always say.

01:00:52 There are traps and there are situations

01:00:53 where you are not recover from.

01:00:55 And very little theory has been developed for this case.

01:01:00 What about, what do you see in the context of IECSIA

01:01:05 as the role of exploration?

01:01:07 Sort of, you mentioned in the real world

01:01:13 you can get into trouble when we make the wrong decisions

01:01:16 and really pay for it.

01:01:17 But exploration seems to be fundamentally important

01:01:20 for learning about this world, for gaining new knowledge.

01:01:23 So, is exploration baked in?

01:01:27 Another way to ask it, what are the potential

01:01:29 to ask it, what are the parameters of IECSIA

01:01:34 that can be controlled?

01:01:36 Yeah, I say the good thing is that there are no parameters

01:01:38 to control.

01:01:40 Some other people track knobs to control.

01:01:43 And you can do that.

01:01:44 I mean, you can modify IECSIA so that you have some knobs

01:01:46 to play with if you want to.

01:01:48 But the exploration is directly baked in.

01:01:53 And that comes from the Bayesian learning

01:01:56 and the longterm planning.

01:01:58 So these together already imply exploration.

01:02:04 You can nicely and explicitly prove that

01:02:08 for simple problems like so called bandit problems,

01:02:13 where you say, to give a real world example,

01:02:18 say you have two medical treatments, A and B,

01:02:20 you don’t know the effectiveness,

01:02:21 you try A a little bit, B a little bit,

01:02:23 but you don’t want to harm too many patients.

01:02:25 So you have to sort of trade off exploring.

01:02:29 And at some point you want to explore

01:02:31 and you can do the mathematics

01:02:34 and figure out the optimal strategy.

01:02:38 They talk about Bayesian agents,

01:02:39 they’re also non Bayesian agents,

01:02:41 but it shows that this Bayesian framework

01:02:44 by taking a prior or possible worlds,

01:02:47 doing the Bayesian mixture,

01:02:48 then the Bayes optimal decision with longterm planning

01:02:50 that is important,

01:02:52 automatically implies exploration,

01:02:55 also to the proper extent,

01:02:57 not too much exploration and not too little.

01:02:59 It is very simple settings.

01:03:01 In the IXE model, I was also able to prove

01:03:04 that it is a self optimizing theorem

01:03:06 or asymptotic optimality theorems,

01:03:07 although they’re only asymptotic, not finite time bounds.

01:03:10 So it seems like the longterm planning is really important,

01:03:13 but the longterm part of the planning is really important.

01:03:15 And also, I mean, maybe a quick tangent,

01:03:18 how important do you think is removing

01:03:21 the Markov assumption and looking at the full history?

01:03:25 Sort of intuitively, of course, it’s important,

01:03:28 but is it like fundamentally transformative

01:03:30 to the entirety of the problem?

01:03:33 What’s your sense of it?

01:03:34 Like, cause we all, we make that assumption quite often.

01:03:37 It’s just throwing away the past.

01:03:40 No, I think it’s absolutely crucial.

01:03:42 The question is whether there’s a way to deal with it

01:03:47 in a more heuristic and still sufficiently well way.

01:03:52 So I have to come up with an example and fly,

01:03:55 but you have some key event in your life,

01:03:59 long time ago in some city or something,

01:04:02 you realized that’s a really dangerous street or whatever.

01:04:05 And you want to remember that forever,

01:04:08 in case you come back there.

01:04:09 Kind of a selective kind of memory.

01:04:11 So you remember all the important events in the past,

01:04:15 but somehow selecting the important is.

01:04:17 That’s very hard.

01:04:18 And I’m not concerned about just storing the whole history.

01:04:21 Just, you can calculate, human life says 30 or 100 years,

01:04:26 doesn’t matter, right?

01:04:28 How much data comes in through the vision system

01:04:31 and the auditory system, you compress it a little bit,

01:04:35 in this case, lossily and store it.

01:04:37 We are soon in the means of just storing it.

01:04:40 But you still need to the selection for the planning part

01:04:44 and the compression for the understanding part.

01:04:47 The raw storage I’m really not concerned about.

01:04:50 And I think we should just store,

01:04:52 if you develop an agent,

01:04:54 preferably just store all the interaction history.

01:04:59 And then you build of course models on top of it

01:05:02 and you compress it and you are selective,

01:05:04 but occasionally you go back to the old data

01:05:08 and reanalyze it based on your new experience you have.

01:05:12 Sometimes you are in school,

01:05:13 you learn all these things you think is totally useless

01:05:16 and much later you realize,

01:05:18 oh, they were not so useless as you thought.

01:05:21 I’m looking at you, linear algebra.

01:05:24 Right.

01:05:25 So maybe let me ask about objective functions

01:05:27 because that rewards, it seems to be an important part.

01:05:33 The rewards are kind of given to the system.

01:05:38 For a lot of people,

01:05:39 the specification of the objective function

01:05:46 is a key part of intelligence.

01:05:48 The agent itself figuring out what is important.

01:05:52 What do you think about that?

01:05:54 Is it possible within the IXE framework

01:05:58 to yourself discover the reward

01:06:01 based on which you should operate?

01:06:05 Okay, that will be a long answer.

01:06:07 So, and that is a very interesting question.

01:06:10 And I’m asked a lot about this question,

01:06:13 where do the rewards come from?

01:06:15 And that depends.

01:06:17 So, and then I give you now a couple of answers.

01:06:21 So if you want to build agents, now let’s start simple.

01:06:26 So let’s assume we want to build an agent

01:06:28 based on the IXE model, which performs a particular task.

01:06:33 Let’s start with something super simple,

01:06:34 like, I mean, super simple, like playing chess,

01:06:37 or go or something, yeah.

01:06:38 Then you just, the reward is winning the game is plus one,

01:06:42 losing the game is minus one, done.

01:06:45 You apply this agent.

01:06:46 If you have enough compute, you let it self play

01:06:49 and it will learn the rules of the game,

01:06:50 will play perfect chess after some while, problem solved.

01:06:54 Okay, so if you have more complicated problems,

01:06:59 then you may believe that you have the right reward,

01:07:03 but it’s not.

01:07:04 So a nice, cute example is the elevator control

01:07:08 that is also in Rich Sutton’s book,

01:07:10 which is a great book, by the way.

01:07:13 So you control the elevator and you think,

01:07:15 well, maybe the reward should be coupled

01:07:17 to how long people wait in front of the elevator.

01:07:20 Long wait is bad.

01:07:21 You program it and you do it.

01:07:23 And what happens is the elevator eagerly picks up

01:07:25 all the people, but never drops them off.

01:07:28 So then you realize, oh, maybe the time in the elevator

01:07:33 also counts, so you minimize the sum, yeah?

01:07:36 And the elevator does that, but never picks up the people

01:07:39 in the 10th floor and the top floor

01:07:40 because in expectation, it’s not worth it.

01:07:42 Just let them stay.

01:07:43 Yeah.

01:07:44 Yeah.

01:07:44 Yeah.

01:07:45 So even in apparently simple problems,

01:07:49 you can make mistakes, yeah?

01:07:51 And that’s what in more serious contexts

01:07:55 AGI safety researchers consider.

01:07:58 So now let’s go back to general agents.

01:08:00 So assume you want to build an agent,

01:08:02 which is generally useful to humans, yeah?

01:08:05 So you have a household robot, yeah?

01:08:07 And it should do all kinds of tasks.

01:08:09 So in this case, the human should give the reward

01:08:13 on the fly.

01:08:14 I mean, maybe it’s pre trained in the factory

01:08:16 and that there’s some sort of internal reward

01:08:18 for the battery level or whatever, yeah?

01:08:19 But so it does the dishes badly, you punish the robot,

01:08:24 it does it good, you reward the robot

01:08:25 and then train it to a new task, yeah, like a child, right?

01:08:28 So you need the human in the loop.

01:08:31 If you want a system, which is useful to the human.

01:08:34 And as long as these agents stay subhuman level,

01:08:39 that should work reasonably well,

01:08:41 apart from these examples.

01:08:43 It becomes critical if they become on a human level.

01:08:45 It’s like with children, small children,

01:08:47 you have reasonably well under control,

01:08:48 they become older, the reward technique

01:08:51 doesn’t work so well anymore.

01:08:54 So then finally, so this would be agents,

01:08:58 which are just, you could say slaves to the humans, yeah?

01:09:01 So if you are more ambitious and just say,

01:09:03 we want to build a new species of intelligent beings,

01:09:08 we put them on a new planet

01:09:09 and we want them to develop this planet or whatever.

01:09:12 So we don’t give them any reward.

01:09:15 So what could we do?

01:09:16 And you could try to come up with some reward functions

01:09:21 like it should maintain itself, the robot,

01:09:23 it should maybe multiply, build more robots, right?

01:09:28 And maybe all kinds of things which you find useful,

01:09:33 but that’s pretty hard, right?

01:09:34 What does self maintenance mean?

01:09:36 What does it mean to build a copy?

01:09:38 Should it be exact copy, an approximate copy?

01:09:40 And so that’s really hard,

01:09:42 but Laurent also at DeepMind developed a beautiful model.

01:09:48 So it just took the ICSE model

01:09:50 and coupled the rewards to information gain.

01:09:54 So he said the reward is proportional

01:09:57 to how much the agent had learned about the world.

01:10:00 And you can rigorously, formally, uniquely define that

01:10:03 in terms of archival versions, okay?

01:10:05 So if you put that in, you get a completely autonomous agent.

01:10:09 And actually, interestingly, for this agent,

01:10:11 we can prove much stronger result

01:10:13 than for the general agent, which is also nice.

01:10:16 And if you let this agent loose,

01:10:18 it will be in a sense, the optimal scientist.

01:10:20 It is absolutely curious to learn as much as possible

01:10:22 about the world.

01:10:24 And of course, it will also have

01:10:25 a lot of instrumental goals, right?

01:10:27 In order to learn, it needs to at least survive, right?

01:10:29 A dead agent is not good for anything.

01:10:31 So it needs to have self preservation.

01:10:33 And if it builds small helpers, acquiring more information,

01:10:38 it will do that, yeah?

01:10:39 If exploration, space exploration or whatever is necessary,

01:10:43 right, to gathering information and develop it.

01:10:45 So it has a lot of instrumental goals

01:10:48 falling on this information gain.

01:10:51 And this agent is completely autonomous of us.

01:10:53 No rewards necessary anymore.

01:10:55 Yeah, of course, it could find a way

01:10:57 to game the concept of information

01:10:59 and get stuck in that library

01:11:04 that you mentioned beforehand

01:11:05 with a very large number of books.

01:11:08 The first agent had this problem.

01:11:10 It would get stuck in front of an old TV screen,

01:11:13 which has just had white noise.

01:11:14 Yeah, white noise, yeah.

01:11:16 But the second version can deal with at least stochasticity.

01:11:21 Well.

01:11:22 Yeah, what about curiosity?

01:11:23 This kind of word, curiosity, creativity,

01:11:27 is that kind of the reward function being

01:11:30 of getting new information?

01:11:31 Is that similar to idea of kind of injecting exploration

01:11:39 for its own sake inside the reward function?

01:11:41 Do you find this at all appealing, interesting?

01:11:44 I think that’s a nice definition.

01:11:46 Curiosity is rewards.

01:11:48 Sorry, curiosity is exploration for its own sake.

01:11:54 Yeah, I would accept that.

01:11:57 But most curiosity, well, in humans,

01:11:59 and especially in children,

01:12:01 is not just for its own sake,

01:12:03 but for actually learning about the environment

01:12:05 and for behaving better.

01:12:08 So I think most curiosity is tied in the end

01:12:13 towards performing better.

01:12:14 Well, okay, so if intelligence systems

01:12:17 need to have this reward function,

01:12:19 let me, you’re an intelligence system,

01:12:23 currently passing the torrent test quite effectively.

01:12:26 What’s the reward function

01:12:30 of our human intelligence existence?

01:12:33 What’s the reward function

01:12:35 that Marcus Hutter is operating under?

01:12:37 Okay, to the first question,

01:12:39 the biological reward function is to survive and to spread,

01:12:44 and very few humans sort of are able to overcome

01:12:48 this biological reward function.

01:12:50 But we live in a very nice world

01:12:54 where we have lots of spare time

01:12:56 and can still survive and spread,

01:12:57 so we can develop arbitrary other interests,

01:13:01 which is quite interesting.

01:13:03 On top of that.

01:13:04 On top of that, yeah.

01:13:06 But the survival and spreading sort of is,

01:13:09 I would say, the goal or the reward function of humans,

01:13:13 so that the core one.

01:13:15 I like how you avoided answering the second question,

01:13:17 which a good intelligence system would.

01:13:19 So my.

01:13:20 That your own meaning of life and the reward function.

01:13:24 My own meaning of life and reward function

01:13:26 is to find an AGI to build it.

01:13:31 Beautifully put.

01:13:32 Okay, let’s dissect the X even further.

01:13:34 So one of the assumptions is kind of infinity

01:13:37 keeps creeping up everywhere,

01:13:39 which, what are your thoughts

01:13:44 on kind of bounded rationality

01:13:46 and sort of the nature of our existence

01:13:50 and intelligence systems is that we’re operating

01:13:52 always under constraints, under limited time,

01:13:55 limited resources.

01:13:57 How does that, how do you think about that

01:13:59 within the IXE framework,

01:14:01 within trying to create an AGI system

01:14:04 that operates under these constraints?

01:14:06 Yeah, that is one of the criticisms about IXE,

01:14:09 that it ignores computation and completely.

01:14:11 And some people believe that intelligence

01:14:13 is inherently tied to what’s bounded resources.

01:14:19 What do you think on this one point?

01:14:21 Do you think it’s,

01:14:22 do you think the bounded resources

01:14:23 are fundamental to intelligence?

01:14:27 I would say that an intelligence notion,

01:14:31 which ignores computational limits is extremely useful.

01:14:35 A good intelligence notion,

01:14:37 which includes these resources would be even more useful,

01:14:40 but we don’t have that yet.

01:14:43 And so look at other fields outside of computer science,

01:14:48 computational aspects never play a fundamental role.

01:14:52 You develop biological models for cells,

01:14:54 something in physics, these theories,

01:14:56 I mean, become more and more crazy

01:14:58 and harder and harder to compute.

01:15:00 Well, in the end, of course,

01:15:01 we need to do something with this model,

01:15:02 but this is more a nuisance than a feature.

01:15:05 And I’m sometimes wondering if artificial intelligence

01:15:10 would not sit in a computer science department,

01:15:12 but in a philosophy department,

01:15:14 then this computational focus

01:15:16 would be probably significantly less.

01:15:18 I mean, think about the induction problem

01:15:19 is more in the philosophy department.

01:15:22 There’s virtually no paper who cares about,

01:15:24 how long it takes to compute the answer.

01:15:26 That is completely secondary.

01:15:28 Of course, once we have figured out the first problem,

01:15:31 so intelligence without computational resources,

01:15:35 then the next and very good question is,

01:15:39 could we improve it by including computational resources,

01:15:42 but nobody was able to do that so far

01:15:45 in an even halfway satisfactory manner.

01:15:49 I like that, that in the long run,

01:15:51 the right department to belong to is philosophy.

01:15:55 That’s actually quite a deep idea,

01:15:58 or even to at least to think about

01:16:01 big picture philosophical questions,

01:16:03 big picture questions,

01:16:05 even in the computer science department.

01:16:07 But you’ve mentioned approximation.

01:16:10 Sort of, there’s a lot of infinity,

01:16:12 a lot of huge resources needed.

01:16:13 Are there approximations to IXE

01:16:16 that within the IXE framework that are useful?

01:16:19 Yeah, we have developed a couple of approximations.

01:16:23 And what we do there is that

01:16:27 the Solomov induction part,

01:16:29 which was find the shortest program describing your data,

01:16:33 we just replace it by standard data compressors.

01:16:36 And the better compressors get,

01:16:39 the better this part will become.

01:16:41 We focus on a particular compressor

01:16:43 called context tree weighting,

01:16:44 which is pretty amazing, not so well known.

01:16:48 It has beautiful theoretical properties,

01:16:50 also works reasonably well in practice.

01:16:52 So we use that for the approximation of the induction

01:16:55 and the learning and the prediction part.

01:16:58 And for the planning part,

01:17:01 we essentially just took the ideas from a computer go

01:17:05 from 2006.

01:17:07 It was Java Zipes Bari, also now at DeepMind,

01:17:11 who developed the so called UCT algorithm,

01:17:14 upper confidence bound for trees algorithm

01:17:17 on top of the Monte Carlo tree search.

01:17:19 So we approximate this planning part by sampling.

01:17:23 And it’s successful on some small toy problems.

01:17:29 We don’t want to lose the generality, right?

01:17:33 And that’s sort of the handicap, right?

01:17:34 If you want to be general, you have to give up something.

01:17:38 So, but this single agent was able to play small games

01:17:41 like Coon poker and Tic Tac Toe and even Pacman

01:17:49 in the same architecture, no change.

01:17:52 The agent doesn’t know the rules of the game,

01:17:54 really nothing and all by self or by a player

01:17:57 with these environments.

01:17:59 So Jürgen Schmidhuber proposed something called

01:18:03 Ghetto Machines, which is a self improving program

01:18:06 that rewrites its own code.

01:18:10 Sort of mathematically, philosophically,

01:18:12 what’s the relationship in your eyes,

01:18:15 if you’re familiar with it,

01:18:16 between AXI and the Ghetto Machines?

01:18:18 Yeah, familiar with it.

01:18:19 He developed it while I was in his lab.

01:18:22 Yeah, so the Ghetto Machine, to explain it briefly,

01:18:27 you give it a task.

01:18:28 It could be a simple task as, you know,

01:18:30 finding prime factors in numbers, right?

01:18:32 You can formally write it down.

01:18:33 There’s a very slow algorithm to do that.

01:18:35 Just try all the factors, yeah.

01:18:37 Or play chess, right?

01:18:39 Optimally, you write the algorithm to minimax

01:18:41 to the end of the game.

01:18:42 So you write down what the Ghetto Machine should do.

01:18:45 Then it will take part of its resources to run this program

01:18:50 and other part of its resources to improve this program.

01:18:54 And when it finds an improved version,

01:18:56 which provably computes the same answer.

01:19:00 So that’s the key part, yeah.

01:19:02 It needs to prove by itself that this change of program

01:19:05 still satisfies the original specification.

01:19:08 And if it does so, then it replaces the original program

01:19:11 by the improved program.

01:19:13 And by definition, it does the same job,

01:19:15 but just faster, okay?

01:19:17 And then, you know, it proves over it and over it.

01:19:19 And it’s developed in a way that all parts

01:19:24 of this Ghetto Machine can self improve,

01:19:26 but it stays provably consistent

01:19:29 with the original specification.

01:19:31 So from this perspective, it has nothing to do with iXe.

01:19:36 But if you would now put iXe as the starting axioms in,

01:19:40 it would run iXe, but you know, that takes forever.

01:19:44 But then if it finds a provable speed up of iXe,

01:19:48 it would replace it by this and this and this.

01:19:50 And maybe eventually it comes up with a model

01:19:52 which is still the iXe model.

01:19:54 It cannot be, I mean, just for the knowledgeable reader,

01:19:59 iXe is incomputable and that can prove that therefore

01:20:03 there cannot be a computable exact algorithm computers.

01:20:08 There needs to be some approximations

01:20:10 and this is not dealt with the Ghetto Machine.

01:20:11 So you have to do something about it.

01:20:13 But there’s the iXe TL model, which is finitely computable,

01:20:15 which we could put in.

01:20:16 Which part of iXe is noncomputable?

01:20:19 The Solomonov induction part.

01:20:20 The induction, okay, so.

01:20:22 But there is ways of getting computable approximations

01:20:26 of the iXe model, so then it’s at least computable.

01:20:30 It is still way beyond any resources anybody will ever have,

01:20:33 but then the Ghetto Machine could sort of improve it

01:20:35 further and further in an exact way.

01:20:37 So is it theoretically possible

01:20:41 that the Ghetto Machine process could improve?

01:20:45 Isn’t iXe already optimal?

01:20:51 It is optimal in terms of the reward collected

01:20:56 over its interaction cycles,

01:20:59 but it takes infinite time to produce one action.

01:21:03 And the world continues whether you want it or not.

01:21:07 So the model is assuming you had an oracle,

01:21:09 which solved this problem,

01:21:11 and then in the next 100 milliseconds

01:21:12 or the reaction time you need gives the answer,

01:21:15 then iXe is optimal.

01:21:18 It’s optimal in sense of also from learning efficiency

01:21:21 and data efficiency, but not in terms of computation time.

01:21:25 And then the Ghetto Machine in theory,

01:21:27 but probably not provably could make it go faster.

01:21:31 Yes.

01:21:31 Okay, interesting.

01:21:34 Those two components are super interesting.

01:21:36 The sort of the perfect intelligence combined

01:21:39 with self improvement,

01:21:44 sort of provable self improvement

01:21:45 since you’re always getting the correct answer

01:21:48 and you’re improving.

01:21:50 Beautiful ideas.

01:21:51 Okay, so you’ve also mentioned that different kinds

01:21:55 of things in the chase of solving this reward,

01:21:59 sort of optimizing for the goal,

01:22:02 interesting human things could emerge.

01:22:04 So is there a place for consciousness within iXe?

01:22:10 Where does, maybe you can comment,

01:22:13 because I suppose we humans are just another instantiation

01:22:17 of iXe agents and we seem to have consciousness.

01:22:20 You say humans are an instantiation of an iXe agent?

01:22:23 Yes.

01:22:24 Well, that would be amazing,

01:22:25 but I think that’s not true even for the smartest

01:22:27 and most rational humans.

01:22:29 I think maybe we are very crude approximations.

01:22:32 Interesting.

01:22:33 I mean, I tend to believe, again, I’m Russian,

01:22:35 so I tend to believe our flaws are part of the optimal.

01:22:41 So we tend to laugh off and criticize our flaws

01:22:45 and I tend to think that that’s actually close

01:22:49 to an optimal behavior.

01:22:50 Well, some flaws, if you think more carefully about it,

01:22:53 are actually not flaws, yeah,

01:22:54 but I think there are still enough flaws.

01:22:58 I don’t know.

01:23:00 It’s unclear.

01:23:00 As a student of history,

01:23:01 I think all the suffering that we’ve endured

01:23:05 as a civilization,

01:23:06 it’s possible that that’s the optimal amount of suffering

01:23:10 we need to endure to minimize longterm suffering.

01:23:15 That’s your Russian background, I think.

01:23:17 That’s the Russian.

01:23:18 Whether humans are or not instantiations of an iXe agent,

01:23:21 do you think there’s a consciousness

01:23:23 of something that could emerge

01:23:25 in a computational form or framework like iXe?

01:23:29 Let me also ask you a question.

01:23:31 Do you think I’m conscious?

01:23:36 Yeah, that’s a good question.

01:23:38 That tie is confusing me, but I think so.

01:23:44 You think that makes me unconscious

01:23:45 because it strangles me or?

01:23:47 If an agent were to solve the imitation game

01:23:49 posed by Turing,

01:23:50 I think that would be dressed similarly to you.

01:23:53 That because there’s a kind of flamboyant,

01:23:56 interesting, complex behavior pattern

01:24:01 that sells that you’re human and you’re conscious.

01:24:04 But why do you ask?

01:24:06 Was it a yes or was it a no?

01:24:07 Yes, I think you’re conscious, yes.

01:24:12 So, and you explained sort of somehow why,

01:24:16 but you infer that from my behavior, right?

01:24:18 You can never be sure about that.

01:24:20 And I think the same thing will happen

01:24:23 with any intelligent agent we develop

01:24:26 if it behaves in a way sufficiently close to humans

01:24:31 or maybe even not humans.

01:24:32 I mean, maybe a dog is also sometimes

01:24:34 a little bit self conscious, right?

01:24:35 So if it behaves in a way

01:24:38 where we attribute typically consciousness,

01:24:41 we would attribute consciousness

01:24:42 to these intelligent systems.

01:24:44 And I see probably in particular

01:24:47 that of course doesn’t answer the question

01:24:48 whether it’s really conscious.

01:24:50 And that’s the big hard problem of consciousness.

01:24:53 Maybe I’m a zombie.

01:24:55 I mean, not the movie zombie, but the philosophical zombie.

01:24:59 Is to you the display of consciousness

01:25:02 close enough to consciousness

01:25:05 from a perspective of AGI

01:25:06 that the distinction of the hard problem of consciousness

01:25:09 is not an interesting one?

01:25:11 I think we don’t have to worry

01:25:12 about the consciousness problem,

01:25:13 especially the hard problem for developing AGI.

01:25:16 I think, you know, we progress.

01:25:20 At some point we have solved all the technical problems

01:25:23 and this system will behave intelligent

01:25:25 and then super intelligent.

01:25:26 And this consciousness will emerge.

01:25:30 I mean, definitely it will display behavior

01:25:32 which we will interpret as conscious.

01:25:35 And then it’s a philosophical question.

01:25:38 Did this consciousness really emerge

01:25:39 or is it a zombie which just, you know, fakes everything?

01:25:43 We still don’t have to figure that out.

01:25:45 Although it may be interesting,

01:25:47 at least from a philosophical point of view,

01:25:48 it’s very interesting,

01:25:49 but it may also be sort of practically interesting.

01:25:53 You know, there’s some people saying,

01:25:54 if it’s just faking consciousness and feelings,

01:25:56 you know, then we don’t need to be concerned about,

01:25:58 you know, rights.

01:25:59 But if it’s real conscious and has feelings,

01:26:01 then we need to be concerned, yeah.

01:26:05 I can’t wait till the day

01:26:07 where AI systems exhibit consciousness

01:26:10 because it’ll truly be some of the hardest ethical questions

01:26:14 of what we do with that.

01:26:15 It is rather easy to build systems

01:26:18 which people ascribe consciousness.

01:26:21 And I give you an analogy.

01:26:22 I mean, remember, maybe it was before you were born,

01:26:25 the Tamagotchi?

01:26:26 Yeah.

01:26:27 Freaking born.

01:26:28 How dare you, sir?

01:26:30 Why, that’s the, you’re young, right?

01:26:33 Yes, that’s good.

01:26:34 Thank you, thank you very much.

01:26:36 But I was also in the Soviet Union.

01:26:37 We didn’t have any of those fun things.

01:26:41 But you have heard about this Tamagotchi,

01:26:42 which was, you know, really, really primitive,

01:26:44 actually, for the time it was,

01:26:46 and, you know, you could raise, you know, this,

01:26:48 and kids got so attached to it

01:26:51 and, you know, didn’t want to let it die

01:26:53 and probably, if we would have asked, you know,

01:26:56 the children, do you think this Tamagotchi is conscious?

01:26:59 They would have said yes.

01:27:00 Half of them would have said yes, I would guess.

01:27:01 I think that’s kind of a beautiful thing, actually,

01:27:04 because that consciousness, ascribing consciousness,

01:27:08 seems to create a deeper connection.

01:27:10 Yeah.

01:27:11 Which is a powerful thing.

01:27:12 But we’ll have to be careful on the ethics side of that.

01:27:15 Well, let me ask about the AGI community broadly.

01:27:18 You kind of represent some of the most serious work on AGI,

01:27:22 as of at least earlier,

01:27:24 and DeepMind represents serious work on AGI these days.

01:27:29 But why, in your sense, is the AGI community so small

01:27:34 or has been so small until maybe DeepMind came along?

01:27:38 Like, why aren’t more people seriously working

01:27:41 on human level and superhuman level intelligence

01:27:45 from a formal perspective?

01:27:48 Okay, from a formal perspective,

01:27:49 that’s sort of an extra point.

01:27:53 So I think there are a couple of reasons.

01:27:54 I mean, AI came in waves, right?

01:27:56 You know, AI winters and AI summers,

01:27:58 and then there were big promises which were not fulfilled,

01:28:01 and people got disappointed.

01:28:05 And that narrow AI solving particular problems,

01:28:11 which seemed to require intelligence,

01:28:14 was always to some extent successful,

01:28:17 and there were improvements, small steps.

01:28:19 And if you build something which is useful for society

01:28:24 or industrial useful, then there’s a lot of funding.

01:28:26 So I guess it was in parts the money,

01:28:29 which drives people to develop a specific system

01:28:34 solving specific tasks.

01:28:36 But you would think that, at least in university,

01:28:39 you should be able to do ivory tower research.

01:28:43 And that was probably better a long time ago,

01:28:46 but even nowadays, there’s quite some pressure

01:28:48 of doing applied research or translational research,

01:28:52 and it’s harder to get grants as a theorist.

01:28:56 So that also drives people away.

01:28:59 It’s maybe also harder

01:29:01 attacking the general intelligence problem.

01:29:03 So I think enough people, I mean, maybe a small number

01:29:05 were still interested in formalizing intelligence

01:29:09 and thinking of general intelligence,

01:29:12 but not much came up, right?

01:29:17 Well, not much great stuff came up.

01:29:19 So what do you think,

01:29:21 we talked about the formal big light

01:29:24 at the end of the tunnel,

01:29:26 but from the engineering perspective,

01:29:27 what do you think it takes to build an AGI system?

01:29:30 Is that, and I don’t know if that’s a stupid question

01:29:33 or a distinct question

01:29:35 from everything we’ve been talking about at AICSI,

01:29:37 but what do you see as the steps that are necessary to take

01:29:41 to start to try to build something?

01:29:43 So you want a blueprint now,

01:29:44 and then you go off and do it?

01:29:46 That’s the whole point of this conversation,

01:29:48 trying to squeeze that in there.

01:29:49 Now, is there, I mean, what’s your intuition?

01:29:51 Is it in the robotics space

01:29:53 or something that has a body and tries to explore the world?

01:29:56 Is it in the reinforcement learning space,

01:29:58 like the efforts with AlphaZero and AlphaStar

01:30:01 that are kind of exploring how you can solve it through

01:30:04 in the simulation in the gaming world?

01:30:06 Is there stuff in sort of all the transformer work

01:30:11 and natural English processing,

01:30:13 sort of maybe attacking the open domain dialogue?

01:30:15 Like what, where do you see a promising pathways?

01:30:21 Let me pick the embodiment maybe.

01:30:24 So embodiment is important, yes and no.

01:30:33 I don’t believe that we need a physical robot

01:30:38 walking or rolling around, interacting with the real world

01:30:42 in order to achieve AGI.

01:30:45 And I think it’s more of a distraction probably

01:30:50 than helpful, it’s sort of confusing the body with the mind.

01:30:54 For industrial applications or near term applications,

01:30:58 of course we need robots for all kinds of things,

01:31:01 but for solving the big problem, at least at this stage,

01:31:06 I think it’s not necessary.

01:31:08 But the answer is also yes,

01:31:10 that I think the most promising approach

01:31:13 is that you have an agent

01:31:15 and that can be a virtual agent in a computer

01:31:18 interacting with an environment,

01:31:20 possibly a 3D simulated environment

01:31:22 like in many computer games.

01:31:25 And you train and learn the agent,

01:31:29 even if you don’t intend to later put it sort of,

01:31:33 this algorithm in a robot brain

01:31:35 and leave it forever in the virtual reality,

01:31:38 getting experience in a,

01:31:40 although it’s just simulated 3D world,

01:31:45 is possibly, and I say possibly,

01:31:47 important to understand things

01:31:51 on a similar level as humans do,

01:31:55 especially if the agent or primarily if the agent

01:31:58 needs to interact with the humans.

01:32:00 If you talk about objects on top of each other in space

01:32:02 and flying and cars and so on,

01:32:04 and the agent has no experience

01:32:06 with even virtual 3D worlds,

01:32:09 it’s probably hard to grasp.

01:32:12 So if you develop an abstract agent,

01:32:14 say we take the mathematical path

01:32:16 and we just want to build an agent

01:32:18 which can prove theorems

01:32:19 and becomes a better and better mathematician,

01:32:21 then this agent needs to be able to reason

01:32:24 in very abstract spaces

01:32:25 and then maybe sort of putting it into 3D environments,

01:32:28 simulated or not is even harmful.

01:32:30 It should sort of, you put it in, I don’t know,

01:32:33 an environment which it creates itself or so.

01:32:36 It seems like you have a interesting, rich,

01:32:38 complex trajectory through life

01:32:40 in terms of your journey of ideas.

01:32:42 So it’s interesting to ask what books,

01:32:45 technical, fiction, philosophical,

01:32:49 books, ideas, people had a transformative effect.

01:32:52 Books are most interesting

01:32:53 because maybe people could also read those books

01:32:57 and see if they could be inspired as well.

01:33:00 Yeah, luckily I asked books and not singular book.

01:33:03 It’s very hard and I try to pin down one book.

01:33:08 And I can do that at the end.

01:33:10 So the most,

01:33:14 the books which were most transformative for me

01:33:16 or which I can most highly recommend

01:33:19 to people interested in AI.

01:33:21 Both perhaps.

01:33:22 Yeah, yeah, both, both, yeah, yeah.

01:33:25 I would always start with Russell and Norvig,

01:33:28 Artificial Intelligence, A Modern Approach.

01:33:30 That’s the AI Bible.

01:33:33 It’s an amazing book.

01:33:35 It’s very broad.

01:33:36 It covers all approaches to AI.

01:33:38 And even if you focused on one approach,

01:33:40 I think that is the minimum you should know

01:33:42 about the other approaches out there.

01:33:44 So that should be your first book.

01:33:46 Fourth edition should be coming out soon.

01:33:48 Oh, okay, interesting.

01:33:50 There’s a deep learning chapter now,

01:33:51 so there must be.

01:33:53 Written by Ian Goodfellow, okay.

01:33:55 And then the next book I would recommend,

01:33:59 The Reinforcement Learning Book by Satneen Barto.

01:34:02 That’s a beautiful book.

01:34:04 If there’s any problem with the book,

01:34:06 it makes RL feel and look much easier than it actually is.

01:34:12 It’s very gentle book.

01:34:14 It’s very nice to read, the exercises to do.

01:34:16 You can very quickly get some RL systems to run.

01:34:19 You know, very toy problems, but it’s a lot of fun.

01:34:22 And in a couple of days you feel you know what RL is about,

01:34:28 but it’s much harder than the book.

01:34:30 Yeah.

01:34:31 Oh, come on now, it’s an awesome book.

01:34:34 Yeah, it is, yeah.

01:34:36 And maybe, I mean, there’s so many books out there.

01:34:41 If you like the information theoretic approach,

01:34:43 then there’s Kolmogorov Complexity by Alin Vitani,

01:34:46 but probably, you know, some short article is enough.

01:34:50 You don’t need to read a whole book,

01:34:52 but it’s a great book.

01:34:54 And if you have to mention one all time favorite book,

01:34:59 it’s of different flavor, that’s a book

01:35:01 which is used in the International Baccalaureate

01:35:04 for high school students in several countries.

01:35:08 That’s from Nicholas Alchin, Theory of Knowledge,

01:35:12 second edition or first, not the third, please.

01:35:16 The third one, they took out all the fun.

01:35:18 Okay.

01:35:20 So this asks all the interesting,

01:35:25 or to me, interesting philosophical questions

01:35:27 about how we acquire knowledge from all perspectives,

01:35:30 from math, from art, from physics,

01:35:33 and ask how can we know anything?

01:35:36 And the book is called Theory of Knowledge.

01:35:38 From which, is this almost like a philosophical exploration

01:35:40 of how we get knowledge from anything?

01:35:43 Yes, yeah, I mean, can religion tell us, you know,

01:35:45 about something about the world?

01:35:46 Can science tell us something about the world?

01:35:48 Can mathematics, or is it just playing with symbols?

01:35:51 And, you know, it’s open ended questions.

01:35:54 And, I mean, it’s for high school students,

01:35:56 so they have then resources from Hitchhiker’s Guide

01:35:58 to the Galaxy and from Star Wars

01:35:59 and The Chicken Crossed the Road, yeah.

01:36:01 And it’s fun to read, but it’s also quite deep.

01:36:07 If you could live one day of your life over again,

01:36:11 has it made you truly happy?

01:36:12 Or maybe like we said with the books,

01:36:14 it was truly transformative.

01:36:16 What day, what moment would you choose

01:36:19 that something pop into your mind?

01:36:22 Does it need to be a day in the past,

01:36:23 or can it be a day in the future?

01:36:25 Well, space time is an emergent phenomena,

01:36:27 so it’s all the same anyway.

01:36:30 Okay.

01:36:32 Okay, from the past.

01:36:34 You’re really good at saying from the future, I love it.

01:36:36 No, I will tell you from the future, okay.

01:36:39 So from the past, I would say

01:36:41 when I discovered my Axie model.

01:36:43 I mean, it was not in one day,

01:36:45 but it was one moment where I realized

01:36:48 Kolmogorov complexity and didn’t even know that it existed,

01:36:53 but I discovered sort of this compression idea

01:36:55 myself, but immediately I knew I can’t be the first one,

01:36:58 but I had this idea.

01:37:00 And then I knew about sequential decisionry,

01:37:02 and I knew if I put it together, this is the right thing.

01:37:06 And yeah, still when I think back about this moment,

01:37:09 I’m super excited about it.

01:37:12 Was there any more details and context that moment?

01:37:16 Did an apple fall on your head?

01:37:20 So it was like, if you look at Ian Goodfellow

01:37:21 talking about GANs, there was beer involved.

01:37:25 Is there some more context of what sparked your thought,

01:37:30 or was it just?

01:37:31 No, it was much more mundane.

01:37:32 So I worked in this company.

01:37:34 So in this sense, the four and a half years

01:37:36 was not completely wasted.

01:37:39 And I worked on an image interpolation problem,

01:37:43 and I developed a quite neat new interpolation techniques

01:37:48 and they got patented, which happens quite often.

01:37:52 I got sort of overboard and thought about,

01:37:54 yeah, that’s pretty good, but it’s not the best.

01:37:56 So what is the best possible way of doing interpolation?

01:37:59 And then I thought, yeah, you want the simplest picture,

01:38:03 which is if you coarse grain it,

01:38:04 recovers your original picture.

01:38:06 And then I thought about the simplicity concept

01:38:08 more in quantitative terms,

01:38:11 and then everything developed.

01:38:15 And somehow the four beautiful mix

01:38:17 of also being a physicist

01:38:18 and thinking about the big picture of it,

01:38:20 then led you to probably think big with AIX.

01:38:24 So as a physicist, I was probably trained

01:38:26 not to always think in computational terms,

01:38:28 just ignore that and think about

01:38:30 the fundamental properties, which you want to have.

01:38:34 So what about if you could really one day in the future?

01:38:36 What would that be?

01:38:39 When I solve the AGI problem.

01:38:43 In practice, so in theory,

01:38:45 I have solved it with the AIX model, but in practice.

01:38:48 And then I ask the first question.

01:38:50 What would be the first question?

01:38:53 What’s the meaning of life?

01:38:55 I don’t think there’s a better way to end it.

01:38:58 Thank you so much for talking today.

01:38:59 It’s a huge honor to finally meet you.

01:39:01 Yeah, thank you too.

01:39:02 It was a pleasure of mine too.

01:39:33 And now let me leave you with some words of wisdom

01:39:35 from Albert Einstein.

01:39:38 The measure of intelligence is the ability to change.

01:39:42 Thank you for listening and hope to see you next time.