Vladimir Vapnik: Predicates, Invariants, and the Essence of Intelligence #71

Transcript

00:00:00 The following is a conversation with Vladimir Vapnik, part two, the second

00:00:05 time we spoke on the podcast.

00:00:07 He’s the coinventor of support vector machines, support vector clustering, VC

00:00:11 theory, and many foundational ideas and statistical learning.

00:00:14 He was born in the Soviet Union, worked at the Institute of Control Sciences

00:00:19 in Moscow, then in the US, worked at AT&T, NEC labs, Facebook AI research,

00:00:26 and now is a professor at Columbia University.

00:00:28 His work has been cited over 200,000 times.

00:00:32 The first time we spoke on the podcast was just over a year

00:00:35 ago, one of the early episodes.

00:00:38 This time we spoke after a lecture he gave titled complete statistical theory

00:00:42 of learning as part of the MIT series of lectures on deep learning

00:00:46 and AI that I organized.

00:00:49 I’ll release the video of the lecture in the next few days.

00:00:53 This podcast and lecture are independent from each other, so you don’t need

00:00:56 one to understand the other.

00:00:59 The lecture is quite technical and math heavy, so if you do watch both, I

00:01:04 recommend listening to this podcast first, since the podcast is

00:01:07 probably a bit more accessible.

00:01:10 This is the artificial intelligence podcast.

00:01:13 If you enjoy it, subscribe on YouTube, give it five stars on Apple podcasts,

00:01:17 support it on Patreon, or simply connect with me on Twitter

00:01:20 at Lex Friedman spelled F R I D M A N.

00:01:23 As usual, I’ll do one or two minutes of ads now and never any ads in

00:01:27 the middle that can break the flow of the conversation.

00:01:30 I hope that works for you and doesn’t hurt the listening experience.

00:01:35 This show is presented by Cash App, the number one finance app in the app store.

00:01:39 When you get it, use code LexPodcast.

00:01:42 Cash App lets you send money to friends, buy Bitcoin, and invest in the

00:01:46 stock market with as little as $1.

00:01:48 Broker services are provided by Cash App Investing, a subsidiary of Square

00:01:52 and member SIPC, since Cash App allows you to send and receive money

00:01:57 digitally, peer to peer, and security in all digital transactions is very important.

00:02:02 Let me mention that PCI data security standard, PCI DSS level one,

00:02:07 that Cash App is compliant with.

00:02:10 I’m a big fan of standards for safety and security and PCI DSS is a good

00:02:16 example of that, where a bunch of competitors got together and agreed

00:02:20 that there needs to be a global standard around the security of transactions.

00:02:24 Now we just need to do the same for autonomous vehicles

00:02:27 and AI systems in general.

00:02:30 So again, if you get Cash App from the app store or Google Play and use the code

00:02:34 LexPodcast, you get $10 and Cash App will also donate $10 to FIRST, one of my

00:02:40 favorite organizations that is helping to advance robotics and STEM education

00:02:45 for young people around the world.

00:02:46 And now here’s my conversation with Vladimir Vapnik.

00:02:52 You and I talked about Alan Turing yesterday a little bit and that he, as the

00:02:58 father of artificial intelligence, may have instilled in our field, an ethic

00:03:02 of engineering and not science, seeking more to build intelligence

00:03:06 rather than to understand it.

00:03:09 What do you think is the difference between these two paths of engineering

00:03:13 intelligence and the science of intelligence?

00:03:18 It’s a completely different story.

00:03:20 Engineering is a mutation of human activity.

00:03:25 You have to make a device which behaves as humans behave, have all the functions

00:03:34 of humans.

00:03:36 It doesn’t matter how you do it, but to understand what is intelligence,

00:03:41 but to understand what is intelligence about, it’s quite a different problem.

00:03:48 So I think, I believe that it’s somehow related to the predicate we talked

00:03:55 yesterday about, because look at the Vladimir Propp’s idea.

00:04:04 He just found 31 here, predicates, he called it units, which can explain

00:04:17 human behavior, at least in Russian tales.

00:04:20 You look at Russian tales and derive from that.

00:04:24 And then people realize that it’s more wide than in Russian tales.

00:04:29 It is in TV, in movie serials and so on and so on.

00:04:33 So you’re talking about Vladimir Propp, who in 1928 published a book,

00:04:39 Morphology of the Folktale, describing 31 predicates that have this kind of

00:04:46 sequential structure that a lot of the stories, narratives follow in Russian

00:04:53 folklore and in other contexts.

00:04:54 We’ll talk about it.

00:04:56 I’d like to talk about predicates in a focused way, but let me, if you allow

00:05:00 me to stay zoomed out on our friend, Alan Turing, and, you know, he inspired

00:05:06 a generation with the imitation game.

00:05:10 Yes.

00:05:11 Do you think if we can linger on that a little bit longer, do you think we can

00:05:17 learn, do you think learning to imitate intelligence can get us closer to the

00:05:22 science, to understanding intelligence?

00:05:24 So why do you think imitation is so far from understanding?

00:05:32 I think that it is different between you have different goals.

00:05:37 So your goal is to create something, something useful.

00:05:43 Yeah.

00:05:43 And that is great.

00:05:45 And you can see how much things was done and I believe that it will be done even

00:05:51 more, it’s self driving cars and also the business, it is great.

00:05:57 And it was inspired by Turing’s vision.

00:06:02 But understanding is very difficult.

00:06:05 It’s more or less philosophical category.

00:06:07 What means understand the world?

00:06:10 I believe in scheme which starts from Plato, that there exists world of ideas.

00:06:18 I believe that intelligence, it is world of ideas, but it is world of pure ideas.

00:06:24 And when you combine them with reality things, it creates, as in my case,

00:06:34 invariants, which is very specific.

00:06:37 And that’s, I believe, the combination of ideas in way to constructing invariants.

00:06:47 Constructing invariant is intelligence.

00:06:49 But first of all, predicate, if you know, predicate and hopefully

00:06:56 then not too much predicate exists.

00:07:00 For example, 31 predicate for human behavior, it is not a lot.

00:07:06 Vladimir Propp used 31, you can even call them predicate, 31

00:07:12 predicates to describe stories, narratives.

00:07:17 Do you think human behavior, how much of human behavior, how much of our

00:07:22 world, our universe, all the things that matter in our existence can be

00:07:28 summarized in predicates of the kind that Propp was working with?

00:07:32 I think that we have a lot of form of behavior, but I think that

00:07:38 predicate is much less because even in this example, which I gave you

00:07:43 yesterday, you saw that predicate can be, one predicate can construct many

00:07:55 different invariants depending on your data.

00:07:59 They’re applying to different data and they give different invariants.

00:08:04 So, but pure ideas, maybe not so much.

00:08:08 Not so many.

00:08:09 I don’t know about that, but my guess, I hope that’s why challenge

00:08:15 about digit recognition, how much you need.

00:08:19 I think we’ll talk about computer vision and 2D images a little bit

00:08:23 in your challenge.

00:08:24 That’s exactly about intelligence.

00:08:26 That’s exactly, that’s exactly about, no, that hopes to be exactly about

00:08:33 the spirit of intelligence in the simplest possible way.

00:08:37 Yeah, absolutely you should start the simplest way, otherwise you

00:08:40 will not be able to do it.

00:08:42 Well, there’s an open question whether starting at the MNIST digit

00:08:46 recognition is a step towards intelligence or it’s an entirely different thing.

00:08:52 I think that to beat records using say 100, 200 times less examples,

00:08:59 you need intelligence.

00:09:00 You need intelligence.

00:09:01 So let’s, because you use this term and it would be nice, I’d like to

00:09:05 ask simple, maybe even dumb questions.

00:09:09 Let’s start with a predicate.

00:09:12 In terms of terms and how you think about it, what is a predicate?

00:09:17 I don’t know.

00:09:18 I have a feeling formally they exist, but I believe that predicate for

00:09:26 2D images, one of them is symmetry.

00:09:31 Hold on a second.

00:09:32 Sorry.

00:09:32 Sorry, sorry to interrupt and pull you back.

00:09:36 At the simplest level, we’re not even, we’re not being profound currently.

00:09:40 A predicate is a statement of something that is true.

00:09:44 Yes.

00:09:46 Do you think of predicates as somehow probabilistic in nature or is this binary?

00:09:54 This is truly constraints of logical statements about the world.

00:09:59 In my definition, the simplest predicate is function.

00:10:03 Function, and you can use this function to make inner product that is predicate.

00:10:10 What’s the input and what’s the output of the function?

00:10:13 Input is X, something which is input in reality.

00:10:18 Say if you consider digit recognition, it pixel space input, but it is

00:10:25 function which in pixel space, but it can be any function from pixel space and you

00:10:36 choose, and I believe that there are several functions which is important for

00:10:43 understanding of images.

00:10:46 One of them is symmetry.

00:10:48 It’s not so simple construction as I described with the derivative, with all

00:10:53 this stuff, but another, I believe, I don’t know how many, is how well

00:10:59 structurized is picture.

00:11:03 Structurized?

00:11:04 Yeah.

00:11:04 What do you mean by structurized?

00:11:06 It is formal definition.

00:11:09 Say something heavy on the left corner, not so heavy in the middle and so on.

00:11:17 You describe in general concept of what you assume.

00:11:21 Concepts, some kind of universal concepts.

00:11:25 Yeah, but I don’t know how to formalize this.

00:11:29 Do you?

00:11:29 So this is the thing.

00:11:31 There’s a million ways we can talk about this.

00:11:33 I’ll keep bringing it up, but we humans have such concepts when we look at

00:11:40 digits, but it’s hard to put them, just like you’re saying now, it’s

00:11:44 hard to put them into words.

00:11:45 You know, that is example, when critics in music, trying to describe music,

00:11:55 they use predicate and not too many predicate, but in different combination,

00:12:02 but they have some special words for describing music and the same

00:12:10 should be for images, but maybe there are critics who understand essence

00:12:16 of what this image is about.

00:12:20 Do you think there exists critics who can summarize the essence of

00:12:26 images, human beings?

00:12:29 I hope so, yes, but that…

00:12:32 Explicitly state them on paper.

00:12:34 The fundamental question I’m asking is, do you think there exists a small

00:12:41 set of predicates that will summarize images?

00:12:45 It feels to our mind, like it does, that the concept of what makes a two

00:12:50 and a three and a four…

00:12:53 No, no, no, it’s not on this level.

00:12:58 It should not describe two, three, four.

00:13:01 It describes some construction, which allow you to create invariance.

00:13:08 And invariance, sorry to stick on this, but terminology.

00:13:12 Invariance, it is property of your image.

00:13:21 Say, I can say, looking on my image, it is more or less symmetric.

00:13:27 Looking on my image, it is more or less symmetric, and I can give you value

00:13:33 of symmetry, say, level of symmetry, using this function which I gave

00:13:40 yesterday. And you can describe that your image has these characteristics

00:13:51 exactly in the way how musical critics describe music.

00:13:56 So, but this is invariant applied to specific data, to specific music,

00:14:05 to something.

00:14:07 I strongly believe in this plot ideas that there exists world of predicate

00:14:14 and world of reality, and predicate and reality is somehow connected,

00:14:20 and you have to know that.

00:14:22 Let’s talk about Plato a little bit.

00:14:23 So you draw a line from Plato, to Hegel, to Wigner, to today.

00:14:30 So Plato has forms, the theory of forms.

00:14:35 So there’s a world of ideas and a world of things, as you talk about,

00:14:39 and there’s a connection.

00:14:40 And presumably the world of ideas is very small, and the world of things

00:14:45 is arbitrarily big, but they’re all what Plato calls them like, it’s a shadow.

00:14:52 The real world is a shadow from the world of forms.

00:14:55 Yeah, you have projection of a world of ideas.

00:14:58 Yeah, very poetic.

00:15:00 In reality, you can realize this projection using these invariants

00:15:07 because it is projection for own specific examples, which create specific features

00:15:13 of specific objects.

00:15:14 So the essence of intelligence is while only being able to observe

00:15:22 the world of things, try to come up with a world of ideas.

00:15:26 Exactly.

00:15:27 Like in this music story, intelligent musical critics knows all these words

00:15:33 and have a feeling about what they mean.

00:15:34 I feel like that’s a contradiction, intelligent music critics.

00:15:38 But I think music is to be enjoyed in all its forms.

00:15:47 The notion of critic, like a food critic.

00:15:49 No, I don’t want touch emotion.

00:15:51 That’s an interesting question.

00:15:53 Does emotion…

00:15:54 There’s certain elements of the human psychology, of the human experience,

00:15:59 which seem to almost contradict intelligence and reason.

00:16:04 Like emotion, like fear, like love, all of those things,

00:16:11 are those not connected in any way to the space of ideas?

00:16:16 That I don’t know.

00:16:18 I just want to be concentrate on very simple story, on digit recognition.

00:16:27 So you don’t think you have to love and fear death in order to recognize digits?

00:16:31 I don’t know.

00:16:33 Because it’s so complicated.

00:16:36 It involves a lot of stuff which I never considered.

00:16:41 But I know about digit recognition.

00:16:44 And I know that for digit recognition,

00:16:50 to get records from small number of observations, you need predicate.

00:16:59 But not special predicate for this problem.

00:17:03 But universal predicate, which understand world of images.

00:17:08 Of visual information.

00:17:09 Visual, yes.

00:17:11 But on the first step, they understand, say, world of handwritten digits,

00:17:18 or characters, or something simple.

00:17:21 So like you said, symmetry is an interesting one.

00:17:23 No, that’s what I think one of the predicate is related to symmetry.

00:17:28 The level of symmetry.

00:17:30 Okay, degree of symmetry.

00:17:32 So you think symmetry at the bottom is a universal notion,

00:17:37 and there’s degrees of a single kind of symmetry,

00:17:41 or is there many kinds of symmetries?

00:17:44 Many kinds of symmetries.

00:17:46 There is a symmetry, antisymmetry, say, letter S.

00:17:52 So it has vertical antisymmetry.

00:17:58 And it could be diagonal symmetry, vertical symmetry.

00:18:02 So when you cut vertically the letter S…

00:18:07 Yeah, then the upper part and lower part in different directions.

00:18:16 Inverted, along the Y axis.

00:18:18 But that’s just like one example of symmetry, right?

00:18:21 Isn’t there like…

00:18:21 Right, but there is a degree of symmetry.

00:18:26 If you play all this iterative stuff to do tangent distance,

00:18:35 whatever I describe, you can have a degree of symmetry.

00:18:40 And that is what describing reason of image.

00:18:45 It is the same as you will describe this image.

00:18:53 Think about digit S, it has antisymmetry.

00:18:57 Digit three is symmetric.

00:19:00 More or less, look for symmetry.

00:19:04 Do you think such concepts like symmetry,

00:19:07 predicates like symmetry, is it a hierarchical set of concepts?

00:19:14 Or are these independent, distinct predicates

00:19:20 that we want to discover as some set of…

00:19:23 No, there is an idea of symmetry.

00:19:25 And you can, this idea of symmetry, make very general.

00:19:34 Like degree of symmetry.

00:19:37 If degree of symmetry can be zero, no symmetry at all.

00:19:40 Or degree of symmetry, say, more or less symmetrical.

00:19:46 But you have one of these descriptions.

00:19:50 And symmetry can be different.

00:19:52 As I told, horizontal, vertical, diagonal,

00:19:56 and antisymmetry is also concept of symmetry.

00:20:01 What about shape in general?

00:20:03 I mean, symmetry is a fascinating notion, but…

00:20:06 No, no, I’m talking about digit.

00:20:08 I would like to concentrate on all I would like to know,

00:20:12 predicate for digit recognition.

00:20:14 Yes, but symmetry is not enough for digit recognition, right?

00:20:19 It is not necessarily for digit recognition.

00:20:22 It helps to create invariant, which you can use

00:20:30 when you will have examples for digit recognition.

00:20:35 You have regular problem of digit recognition.

00:20:38 You have examples of the first class or second class.

00:20:41 Plus, you know that there exists concept of symmetry.

00:20:45 And you apply, when you’re looking for decision rule,

00:20:50 you will apply concept of symmetry,

00:20:55 of this level of symmetry, which you estimate from…

00:21:00 So let’s talk.

00:21:01 Everything comes from weak convergence.

00:21:06 What is convergence?

00:21:07 What is weak convergence?

00:21:09 What is strong convergence?

00:21:11 I’m sorry, I’m gonna do this to you.

00:21:13 What are we converging from and to?

00:21:16 You’re converging, you would like to have a function.

00:21:20 The function which, say, indicator function,

00:21:23 which indicate your digit five, for example.

00:21:29 A classification task.

00:21:31 Let’s talk only about classification.

00:21:33 So classification means you will say

00:21:36 whether this is a five or not,

00:21:38 or say which of the 10 digits it is.

00:21:40 Right, right.

00:21:42 I would like to have these functions.

00:21:46 Then, I have some examples.

00:21:56 I can consider property of these examples.

00:22:01 Say, symmetry.

00:22:02 And I can measure level of symmetry for every digit.

00:22:08 And then I can take average from my training data.

00:22:16 And I will consider only functions

00:22:20 of conditional probability,

00:22:24 which I’m looking for my decision rule.

00:22:27 Which applying to digits will give me the same average

00:22:38 as I observe on training data.

00:22:41 So, actually, this is different level

00:22:45 of description of what you want.

00:22:48 You want not just, you show not one digit.

00:22:54 You show, this predicate, show general property

00:22:59 of all digits which you have in mind.

00:23:03 If you have in mind digit three,

00:23:06 it gives you property of digit three.

00:23:10 And you select as admissible set of function,

00:23:13 only function, which keeps this property.

00:23:16 You will not consider other functions.

00:23:20 So, you immediately looking for smaller subset of function.

00:23:24 That’s what you mean by admissible functions.

00:23:27 Admissible function, exactly.

00:23:28 Which is still a pretty large,

00:23:30 for the number three, is a large.

00:23:32 It is pretty large, but if you have one predicate.

00:23:36 But according to, there is a strong and weak convergence.

00:23:42 Strong convergence is convergence in function.

00:23:46 You’re looking for the function on one function,

00:23:49 and you’re looking for another function.

00:23:51 And square difference from them should be small.

00:23:59 If you take difference in any points,

00:24:01 make a square, make an integral, and it should be small.

00:24:05 That is convergence in function.

00:24:08 Suppose you have some function, any function.

00:24:11 So, I would say, I say that some function

00:24:15 converge to this function.

00:24:17 If integral from square difference between them is small.

00:24:22 That’s the definition of strong convergence.

00:24:24 That definition of strong convergence.

00:24:25 Two functions, the integral, the difference, is small.

00:24:28 Yeah, it is convergence in functions.

00:24:31 Yeah.

00:24:32 But you have different convergence in functionals.

00:24:36 You take any function, you take some function, phi,

00:24:41 and take inner product, this function, this f function.

00:24:46 f0 function, which you want to find.

00:24:50 And that gives you some value.

00:24:52 So, you say that set of functions converge

00:24:59 in inner product to this function,

00:25:03 if this value of inner product converge to value f0.

00:25:10 That is for one phi.

00:25:12 But weak convergence requires that it converge for any

00:25:16 function of Hilbert space.

00:25:20 If it converge for any function of Hilbert space,

00:25:24 then you will say that this is weak convergence.

00:25:28 You can think that when you take integral,

00:25:32 that is integral property of function.

00:25:35 For example, if you will take sine or cosine,

00:25:39 it is coefficient of, say, Fourier expansion.

00:25:45 So, if it converge for all coefficients of Fourier

00:25:51 expansion, so under some condition,

00:25:54 it converge to function you’re looking for.

00:25:58 But weak convergence means any property.

00:26:02 Convergence not point wise, but integral property

00:26:07 of function.

00:26:09 So, weak convergence means integral property of functions.

00:26:13 When I’m talking about predicate,

00:26:16 I would like to formulate which integral properties

00:26:23 I would like to have for convergence.

00:26:27 So, and if I will take one predicated function,

00:26:33 which I measure property, if I will use one predicate

00:26:39 and say, I will consider only function which give me

00:26:44 the same value as this predicate,

00:26:47 I selecting set of functions from functions

00:26:53 which is admissible in the sense that function which I’m

00:26:58 looking for in this set of functions

00:27:01 because I checking in training data, it gives the same.

00:27:08 Yeah, so it always has to be connected to the training

00:27:10 data in terms of?

00:27:12 Yeah, but property, you can know independent on training data.

00:27:18 And this guy, prop, says that there is formal property,

00:27:24 31 property.

00:27:25 A fairy tale, a Russian fairy tale.

00:27:27 But Russian fairy tale is not so interesting.

00:27:30 More interesting that people apply this to movies,

00:27:34 to theater, to different things.

00:27:38 And the same works, they’re universal.

00:27:41 Well, so I would argue that there’s

00:27:44 a little bit of a difference between the kinds of things

00:27:48 that were applied to which are essentially stories

00:27:51 and digit recognition.

00:27:54 It is the same story.

00:27:55 You’re saying digits, there’s a story within the digit.

00:27:59 Yeah.

00:28:00 And so but my point is why I hope

00:28:04 that it possible to beat record using not 60,000,

00:28:11 but say 100 times less.

00:28:13 Because instead, you will give predicates.

00:28:17 And you will select your decision

00:28:21 not from wide set of functions, but from set of functions

00:28:25 which keeps this predicates.

00:28:28 But predicate is not related just to digit recognition.

00:28:32 Right.

00:28:33 Like in Plato’s case.

00:28:37 Do you think it’s possible to automatically discover

00:28:40 the predicates?

00:28:42 So you basically said that the essence of intelligence

00:28:46 is the discovery of good predicates.

00:28:49 Yeah.

00:28:51 Now, the natural question is that’s

00:28:55 what Einstein was good at doing in physics.

00:28:59 Can we make machines do these kinds

00:29:02 of discovery of good predicates?

00:29:04 Or is this ultimately a human endeavor?

00:29:07 That I don’t know.

00:29:09 I don’t think that machine can do.

00:29:11 Because according to theory about weak convergence,

00:29:18 any function from Hilbert space can be predicated.

00:29:23 So you have infinite number of predicate in upper.

00:29:27 And before, you don’t know which predicate is good and which.

00:29:32 But whatever prop show and why people call it breakthrough,

00:29:39 that there is not too many predicate

00:29:44 which cover most of situation happened in the world.

00:29:48 Right.

00:29:51 So there’s a sea of predicates.

00:29:54 And most of the only a small amount

00:29:57 are useful for the kinds of things

00:29:58 that happen in the world.

00:30:01 I think that I would say only small part of predicate

00:30:07 very useful.

00:30:08 Useful all of them.

00:30:11 Only very few are what we should let’s call them

00:30:14 good predicates.

00:30:15 Very good predicates.

00:30:16 Very good predicates.

00:30:18 So can we linger on it?

00:30:20 What’s your intuition?

00:30:21 Why is it hard for a machine to discover good predicates?

00:30:27 Even in my talk described how to do predicate.

00:30:30 How to find new predicate.

00:30:32 I’m not sure that it is very good.

00:30:34 What did you propose in your talk?

00:30:36 No.

00:30:37 In my talk, I gave example for diabetes.

00:30:42 Diabetes, yeah.

00:30:43 When we achieve some percent.

00:30:46 So then we’re looking for area where

00:30:50 some sort of predicate, which I formulate,

00:30:54 does not keeps invariant.

00:31:03 So if it doesn’t keep, I retrain my data.

00:31:06 I select only function which keeps this invariant.

00:31:11 And when I did it, I improved my performance.

00:31:14 I can looking for this predicate.

00:31:16 I know technically how to do that.

00:31:19 And you can, of course, do it using machine.

00:31:25 But I’m not sure that we will construct the smartest

00:31:29 predicate.

00:31:30 But this is the, allow me to linger on it.

00:31:34 Because that’s the essence.

00:31:35 That’s the challenge.

00:31:36 That is artificial.

00:31:37 That’s the human level intelligence

00:31:40 that we seek is the discovery of these good predicates.

00:31:43 You’ve talked about deep learning as a way to,

00:31:47 the predicates they use and the functions are mediocre.

00:31:52 You can find better ones.

00:31:55 Let’s talk about deep learning.

00:31:57 Sure, let’s do it.

00:31:58 I know only Jan’s Likun convolutional network.

00:32:04 And what else?

00:32:05 I don’t know.

00:32:05 And it’s a very simple convolution.

00:32:07 There’s not much else to know.

00:32:09 To pixel left and right.

00:32:10 I can do it like that with one predicate.

00:32:14 Convolution is a single predicate.

00:32:16 It’s single.

00:32:17 It’s single predicate.

00:32:21 Yes, but that’s it.

00:32:22 You know exactly.

00:32:23 You take the derivative for translation and predicate.

00:32:28 This should be kept.

00:32:31 So that’s a single predicate.

00:32:32 But humans discovered that one.

00:32:34 Or at least.

00:32:35 Not it.

00:32:36 That is a risk.

00:32:37 Not too many predicates.

00:32:38 And that is big story because Jan did it 25 years ago

00:32:43 and nothing so clear was added to deep network.

00:32:50 And then I don’t understand why we

00:32:55 should talk about deep network instead of talking

00:32:58 about piecewise linear functions which keeps this predicate.

00:33:02 Well, a counter argument is that maybe the amount

00:33:08 of predicates necessary to solve general intelligence,

00:33:14 say in the space of images, doing

00:33:16 efficient recognition of handwritten digits

00:33:20 is very small.

00:33:22 And so we shouldn’t be so obsessed about finding.

00:33:26 We’ll find other good predicates like convolution, for example.

00:33:30 There has been other advancements

00:33:33 like if you look at the work with attention,

00:33:37 there’s intentional mechanisms in especially used

00:33:40 in natural language focusing the network’s ability

00:33:44 to learn at which part of the input to look at.

00:33:47 The thing is, there’s other things besides predicates

00:33:51 that are important for the actual engineering mechanism

00:33:55 of showing how much you can really

00:33:57 do given these predicates.

00:34:02 I mean, that’s essentially the work of deep learning

00:34:04 is constructing architectures that are able to be,

00:34:09 given the training data, to be able to converge

00:34:13 towards a function that can generalize well.

00:34:22 It’s an engineering problem.

00:34:24 Yeah, I understand.

00:34:26 But let’s talk not on emotional level,

00:34:29 but on a mathematical level.

00:34:31 You have set of piecewise linear functions.

00:34:36 It is all possible neural networks.

00:34:42 It’s just piecewise linear functions.

00:34:44 It’s many, many pieces.

00:34:45 Large number of piecewise linear functions.

00:34:47 Exactly.

00:34:48 Very large.

00:34:49 Very large.

00:34:50 Almost feels like too large.

00:34:51 It’s still simpler than, say, convolution,

00:34:56 than reproducing kernel Hilbert space, which

00:34:59 have a Hilbert set of functions.

00:35:00 What’s Hilbert space?

00:35:02 It’s space with infinite number of coordinates,

00:35:07 say, or function for expansion, something like that.

00:35:11 So it’s much richer.

00:35:14 And when I’m talking about closed form solution,

00:35:17 I’m talking about this set of function,

00:35:20 not piecewise linear set, which is particular case of it

00:35:29 is small part.

00:35:31 So neural networks is a small part

00:35:32 of the space of functions you’re talking about.

00:35:35 Say, small set of functions.

00:35:39 Let me take that.

00:35:40 But it is fine.

00:35:42 It is fine.

00:35:42 I don’t want to discuss the small or big.

00:35:46 You take advantage.

00:35:47 So you have some set of functions.

00:35:51 So now, when you’re trying to create architecture,

00:35:55 you would like to create admissible set of functions,

00:35:58 which all your tricks to use not all functions,

00:36:03 but some subset of this set of functions.

00:36:07 Say, when you’re introducing convolutional net,

00:36:10 it is way to make this subset useful for you.

00:36:16 But from my point of view, convolutional,

00:36:19 it is something you want to keep some invariants,

00:36:24 say, translation invariants.

00:36:27 But now, if you understand this and you cannot explain

00:36:35 on the level of ideas what neural network does,

00:36:41 you should agree that it is much better

00:36:44 to have a set of functions.

00:36:46 And they say, this set of functions should be admissible.

00:36:51 It must keep this invariant, this invariant,

00:36:53 and that invariant.

00:36:55 You know that as soon as you incorporate

00:36:58 new invariant set of function, because smaller and smaller

00:37:01 and smaller.

00:37:02 But all the invariants are specified by you, the human.

00:37:06 Yeah, but what I hope that there is a standard predicate,

00:37:12 like PROPSHOW, that’s what I want

00:37:17 to find for digit recognition.

00:37:19 If we start, it is completely new area,

00:37:22 what is intelligence about on the level,

00:37:25 starting from Plato’s idea, what is world of ideas.

00:37:32 And I believe that is not too many.

00:37:36 But it is amusing that mathematicians doing something,

00:37:40 a neural network in general function,

00:37:44 but people from literature, from art, they use this all

00:37:48 the time.

00:37:49 That’s right.

00:37:50 Invariants saying, it is great how people describe music.

00:37:57 We should learn from that.

00:37:58 And something on this level.

00:38:02 But so why Vladimir Propp, who was just theoretical,

00:38:09 who studied theoretical literature, he found that.

00:38:12 You know what?

00:38:13 Let me throw that right back at you,

00:38:15 because there’s a little bit of a,

00:38:17 that’s less mathematical and more emotional, philosophical,

00:38:21 Vladimir Propp.

00:38:22 I mean, he wasn’t doing math.

00:38:24 No.

00:38:26 And you just said another emotional statement,

00:38:30 which is you believe that this Plato world of ideas is small.

00:38:35 I hope.

00:38:36 I hope.

00:38:38 Do you, what’s your intuition, though?

00:38:42 If we can linger on it.

00:38:44 You know, it is not just small or big.

00:38:48 I know exactly.

00:38:50 Then when I introducing some predicate,

00:38:56 I decrease set of functions.

00:38:59 But my goal to decrease set of function much.

00:39:04 By as much as possible.

00:39:05 By as much as possible.

00:39:07 Good predicate, which does this, then

00:39:11 I should choose next predicate, which decrease set

00:39:15 as much as possible.

00:39:17 So set of good predicate, it is such

00:39:21 that they decrease this amount of admissible function.

00:39:27 So if each good predicate significantly

00:39:30 reduces the set of admissible functions,

00:39:32 that there naturally should not be that many good predicates.

00:39:35 No, but if you reduce very well the VC dimension

00:39:43 of the function, of admissible set of function, it’s small.

00:39:46 And you need not too much training data to do well.

00:39:52 And VC dimension, by the way, is some measure of capacity

00:39:56 of this set of functions.

00:39:57 Right.

00:39:59 Roughly speaking, how many function in this set.

00:40:01 So you’re decreasing, decreasing.

00:40:03 And it makes easy for you to find function

00:40:08 you’re looking for.

00:40:10 But the most important part, to create good admissible set

00:40:14 of functions.

00:40:15 And it probably, there are many ways.

00:40:18 But the good predicates such that they can do that.

00:40:25 So for this duck, you should know a little bit about duck.

00:40:30 Because what are the three fundamental laws of ducks?

00:40:35 Looks like a duck, swims like a duck, and quacks like a duck.

00:40:38 You should know something about ducks to be able to.

00:40:41 Not necessarily.

00:40:42 Looks like, say, horse.

00:40:44 It’s also good.

00:40:46 So it’s not, it generalizes from ducks.

00:40:49 And talk like, and make sound like horse or something.

00:40:54 And run like horse, and moves like horse.

00:40:57 It is general, it is general predicate

00:41:02 that this applied to duck.

00:41:04 But for duck, you can say, play chess like duck.

00:41:09 You cannot say play chess like duck.

00:41:11 Why not?

00:41:12 So you’re saying you can, but that would not be a good.

00:41:15 No, you will not reduce a lot of functions.

00:41:18 You would not do, yeah, you would not

00:41:19 reduce the set of functions.

00:41:21 So you can, the story is formal story, mathematical story.

00:41:26 Is that you can use any function you want as a predicate.

00:41:31 But some of them are good, some of them are not,

00:41:33 because some of them reduce a lot of functions

00:41:36 to admissible set of some of them.

00:41:39 But the question is, and I’ll probably

00:41:41 keep asking this question, but how do we find such,

00:41:45 what’s your intuition?

00:41:47 Handwritten recognition.

00:41:49 How do we find the answer to your challenge?

00:41:52 Yeah, I understand it like that.

00:41:55 I understand what.

00:41:57 What defined?

00:41:59 What it means, I knew predicate.

00:42:01 Yeah.

00:42:02 Like guy who understand music can say this word,

00:42:06 which he described when he listened to music.

00:42:09 He understand music.

00:42:11 He use not too many different, oh, you can do like prop.

00:42:15 You can make collection.

00:42:17 What he talking about music, about this, about that.

00:42:20 It’s not too many different situation he described.

00:42:24 Because we mentioned Vladimir prop a bunch.

00:42:26 Let me just mention, there’s a sequence of 31

00:42:33 structural notions that are common in stories.

00:42:36 And I think.

00:42:37 You call it units.

00:42:38 Units.

00:42:39 And I think they resonate.

00:42:40 I mean, it starts just to give an example,

00:42:43 obsession, a member of the hero’s community,

00:42:46 a family leaves the security of the home environment.

00:42:48 Then it goes to the interdiction,

00:42:51 a forbidding edict or command is passed upon the hero.

00:42:54 Don’t go there.

00:42:55 Don’t do this.

00:42:56 The hero is warned against some action.

00:42:58 Then step three, violation of interdiction.

00:43:05 Break the rules, break out on your own.

00:43:07 Then reconnaissance.

00:43:09 The villain makes an effort to attain knowledge,

00:43:11 needing to fulfill their plan, so on.

00:43:13 It goes on like this, ends in a wedding, number 31.

00:43:19 Happily ever after.

00:43:20 No, he just gave description of all situations.

00:43:26 He understands this world.

00:43:28 Of folktales.

00:43:29 Yeah, not folktales, but stories.

00:43:33 And these stories not in just folktales.

00:43:36 These stories in detective serials as well.

00:43:40 And probably in our lives.

00:43:42 We probably live.

00:43:43 Read this.

00:43:45 And then they wrote that this predicate is good

00:43:52 for different situation.

00:43:54 From movie, for theater.

00:43:57 By the way, there’s also criticism, right?

00:44:00 There’s an other way to interpret narratives

00:44:03 from Claude Levi Strauss.

00:44:09 I don’t know.

00:44:10 I am not in this business.

00:44:12 No, I know, it’s theoretical literature,

00:44:14 but it’s looking at paradigms behind things.

00:44:15 It’s always the discussion, yeah.

00:44:20 But at least there is units.

00:44:23 It’s not too many units that can describe.

00:44:27 But this guy probably gives another units.

00:44:30 Or another way of…

00:44:31 Exactly, another set of units.

00:44:34 Another set of predicates.

00:44:35 It doesn’t matter how.

00:44:37 But they exist.

00:44:40 Probably.

00:44:40 My question is, whether given those units,

00:44:46 whether without our human brains to interpret these units,

00:44:50 they would still hold as much power as they have.

00:44:53 Meaning, are those units enough

00:44:56 when we give them to an alien species?

00:44:58 Let me ask you.

00:45:00 Do you understand digit images?

00:45:06 No, I don’t understand.

00:45:07 No, no, no.

00:45:08 When you can recognize these digit images,

00:45:11 it means that you understand.

00:45:13 Yes, exactly.

00:45:14 You understand characters, you understand…

00:45:17 No, no, no, no.

00:45:22 It’s the imitation versus understanding question,

00:45:25 because I don’t understand the mechanism

00:45:28 by which I understand.

00:45:29 No, no, no.

00:45:30 I’m not talking about, I’m talking about predicates.

00:45:32 You understand that it involves symmetry,

00:45:35 maybe structure, maybe something else.

00:45:37 I cannot formulate.

00:45:38 I just was able to find symmetries, degree of symmetries.

00:45:43 That’s really good.

00:45:44 So this is a good line.

00:45:47 I feel like I understand the basic elements

00:45:50 of what makes a good hand recognition system my own.

00:45:54 Like symmetry connects with me.

00:45:56 It seems like that’s a very powerful predicate.

00:45:59 My question is, is there a lot more going on

00:46:02 that we’re not able to introspect?

00:46:04 Maybe I need to be able to understand

00:46:09 a huge amount in the world of ideas,

00:46:14 thousands of predicates, millions of predicates

00:46:18 in order to do hand recognition.

00:46:20 I don’t think so.

00:46:23 So both your hope and your intuition

00:46:26 are such that very few predicates are enough.

00:46:28 You’re using digits, you’re using examples as well.

00:46:33 Theory says that if you will use all possible functions

00:46:43 from Hilbert space, all possible predicate,

00:46:46 you don’t need training data.

00:46:49 You just will have admissible set of function

00:46:53 which contain one function.

00:46:56 Yes.

00:46:57 So the trade off is when you’re not using all predicates,

00:47:01 you’re only using a few good predicates

00:47:03 you need to have some training data.

00:47:05 Yes, exactly.

00:47:06 The more good predicates you have,

00:47:08 the less training data you need.

00:47:09 Exactly.

00:47:10 That is intelligent.

00:47:13 Still, okay, I’m gonna keep asking the same dumb question,

00:47:17 handwritten recognition to solve the challenge.

00:47:20 You kind of propose a challenge that says

00:47:21 we should be able to get state of the art MNIST error rates

00:47:27 by using very few, 60, maybe fewer examples per digit.

00:47:31 What kind of predicates do you think it will look like?

00:47:35 That is the challenge.

00:47:37 So people who will solve this problem,

00:47:39 they will answer.

00:47:41 Do you think they’ll be able to answer it

00:47:44 in a human explainable way?

00:47:47 They just need to write function, that’s it.

00:47:50 But so can that function be written, I guess,

00:47:54 by an automated reasoning system?

00:47:58 Whether we’re talking about a neural network

00:48:01 learning a particular function or another mechanism?

00:48:05 No, I’m not against neural network.

00:48:08 I’m against admissible set of function

00:48:11 which create neural network.

00:48:13 You did it by hand.

00:48:16 You don’t do it by invariance, by predicate, by reason.

00:48:24 But neural networks can then reverse,

00:48:26 do the reverse step of helping you find a function

00:48:29 that just, the task of a neural network

00:48:33 is to find a disentangled representation, for example,

00:48:38 that they call, is to find that one predicate function

00:48:42 that’s really capture some kind of essence.

00:48:45 One, not the entire essence, but one very useful essence

00:48:48 of this particular visual space.

00:48:52 Do you think that’s possible?

00:48:53 Listen, I’m grasping, hoping there’s an automated way

00:48:58 to find good predicates, right?

00:49:00 So the question is what are the mechanisms

00:49:03 of finding good predicates, ideas

00:49:05 that you think we should pursue?

00:49:08 A young grad student listening right now.

00:49:11 I gave example.

00:49:13 So find situation where predicate which you’re suggesting

00:49:23 don’t create invariant.

00:49:24 It’s like in physics.

00:49:28 Find situation where existing theory cannot explain it.

00:49:37 Find situation where the existing theory

00:49:39 can’t explain it.

00:49:40 So you’re finding contradictions.

00:49:42 Find contradiction, and then remove this contradiction.

00:49:46 But in my case, what means contradiction,

00:49:48 you find function which, if you will use this function,

00:49:53 you’re not keeping invariants.

00:49:56 This is really the process of discovering contradictions.

00:50:01 Yeah.

00:50:04 It is like in physics.

00:50:05 Find situation where you have contradiction

00:50:09 for one of the property, for one of the predicate.

00:50:15 Then include this predicate, making invariants,

00:50:19 and solve again this problem.

00:50:20 Now you don’t have contradiction.

00:50:22 But it is not the best way, probably, I don’t know,

00:50:30 to looking for predicate.

00:50:31 That’s just one way, okay.

00:50:33 That, no, no, it is brute force way.

00:50:35 The brute force way.

00:50:37 What about the ideas of what,

00:50:42 big umbrella term of symbolic AI?

00:50:45 There’s what in the 80s with expert systems,

00:50:48 sort of logic reasoning based systems.

00:50:52 Is there hope there to find some,

00:50:57 through sort of deductive reasoning,

00:51:00 to find good predicates?

00:51:05 I don’t think so.

00:51:08 I think that just logic is not enough.

00:51:12 It’s kind of a compelling notion, though.

00:51:14 You know, that when smart people sit in a room

00:51:17 and reason through things, it seems compelling.

00:51:20 And making our machines do the same is also compelling.

00:51:24 So, everything is very simple.

00:51:29 When you have infinite number of predicate,

00:51:34 you can choose the function you want.

00:51:38 You have invariants and you can choose the function you want.

00:51:41 But you have to have not too many invariants

00:51:51 to solve the problem.

00:51:56 So, and have from infinite number of function

00:51:59 to select finite number

00:52:04 and hopefully small number of functions,

00:52:08 which is good enough to extract small set

00:52:14 of admissible functions.

00:52:17 So, they will be admissible, it’s for sure,

00:52:19 because every function just decrease set of function

00:52:23 and leaving it admissible.

00:52:25 But it will be small.

00:52:27 But why do you think logic based systems don’t,

00:52:32 can’t help, intuition, not?

00:52:35 Because you should know reality.

00:52:37 You should know life.

00:52:39 This guy like Propp, he knows something.

00:52:44 And he tried to put in invariant his understanding.

00:52:49 That’s the human, yeah, but see,

00:52:51 you’re putting too much value into Vladimir Propp

00:52:56 knowing something.

00:52:57 No, it is, in the story, what means you know life?

00:53:04 What it means?

00:53:05 You know common sense.

00:53:07 No, no, you know something.

00:53:10 Common sense, it is some rules.

00:53:13 You think so?

00:53:14 Common sense is simply rules?

00:53:17 Common sense is every, it’s mortality,

00:53:21 it’s fear of death, it’s love, it’s spirituality,

00:53:27 it’s happiness and sadness.

00:53:30 All of it is tied up into understanding gravity,

00:53:34 which is what we think of as common sense.

00:53:36 I don’t really need to discuss so wide.

00:53:39 I want to discuss, understand digit recognition.

00:53:45 Anytime I bring up love and death,

00:53:47 you bring it back to digit recognition, I like it.

00:53:51 No, you know, it is durable because there is a challenge.

00:53:55 Yeah.

00:53:56 Which I see how to solve it.

00:53:59 If I will have a student concentrate on this work,

00:54:02 I will suggest something to solve.

00:54:04 You mean handwritten record?

00:54:07 Yeah, it’s a beautifully simple, elegant, and yet.

00:54:10 I think that I know invariants which will solve this.

00:54:13 You do?

00:54:14 I think so, yes.

00:54:15 But it is not universal, it is maybe,

00:54:21 I want some universal invariants

00:54:24 which are good not only for digit recognition,

00:54:27 for image understanding.

00:54:28 So let me ask, how hard do you think

00:54:34 is 2D image understanding?

00:54:38 So if we, we can kind of intuit handwritten recognition.

00:54:43 How big of a step, leap, journey is it from that?

00:54:49 If I gave you good, if I solved your challenge

00:54:51 for handwritten recognition,

00:54:53 how long would my journey then be from that

00:54:56 to understanding more general, natural images?

00:54:59 Immediately, you will understand this

00:55:01 as soon as you will make a record.

00:55:05 Because it is not for free.

00:55:07 As soon as you will create several invariants

00:55:13 which will help you to get the same performance

00:55:20 that the best neural net did using 100,

00:55:23 there might be more than 100 times less examples,

00:55:27 you have to have something smart to do that.

00:55:31 And you’re saying?

00:55:32 That is invariant, it is predicate.

00:55:35 Because you should put some idea how to do that.

00:55:39 But okay, let me just pause.

00:55:42 Maybe it’s a trivial point, maybe not.

00:55:44 But handwritten recognition feels like a 2D,

00:55:48 two dimensional problem.

00:55:50 And it seems like how much complicated is the fact

00:55:55 that most images are projection of a three dimensional world

00:56:00 onto a 2D plane.

00:56:03 It feels like for a three dimensional world,

00:56:05 we need to start understanding common sense

00:56:08 in order to understand an image.

00:56:11 It’s no longer visual shape and symmetry.

00:56:17 It’s having to start to understand concepts

00:56:19 of, understand life.

00:56:22 Yeah, you’re talking that there are different invariant,

00:56:27 different predicate, yeah.

00:56:28 And potentially much larger number.

00:56:32 You know, maybe, but let’s start from simple.

00:56:36 Yeah, but you said that it would be immediate.

00:56:38 No, you know, I cannot think about things

00:56:41 which I don’t understand.

00:56:43 This I understand, but I’m sure that I don’t understand

00:56:46 everything there.

00:56:48 Yeah, that’s the difference.

00:56:50 Do as simple as possible, but not simpler.

00:56:54 And that is exact case.

00:56:56 With handwritten.

00:56:57 With handwritten.

00:56:58 Yeah, but that’s the difference between you and I.

00:57:04 I welcome and enjoy thinking about things

00:57:07 I completely don’t understand.

00:57:09 Because to me, it’s a natural extension

00:57:12 without having solved handwritten recognition

00:57:15 to wonder how difficult is the next step

00:57:23 of understanding 2D, 3D images.

00:57:25 Because ultimately, while the science of intelligence

00:57:29 is fascinating, it’s also fascinating to see

00:57:31 how that maps to the engineering of intelligence.

00:57:34 And recognizing handwritten digits is not,

00:57:39 doesn’t help you, it might, it may not help you

00:57:43 with the problem of general intelligence.

00:57:46 We don’t know.

00:57:47 It’ll help you a little bit.

00:57:48 We don’t know how much.

00:57:49 It’s unclear.

00:57:50 Yeah.

00:57:51 It might very much.

00:57:52 But I would like to make a remark.

00:57:53 Yes.

00:57:54 I start not from very primitive problem,

00:57:58 make a challenge problem.

00:58:03 I start with very general problem, with PLATO.

00:58:07 So you understand, and it comes from PLATO

00:58:10 to digit recognition.

00:58:14 So you basically took PLATO and the world

00:58:18 of forms and ideas and mapped and projected

00:58:22 into the clearest, simplest formulation

00:58:25 of that big world.

00:58:26 You know, I would say that I did not understand PLATO

00:58:31 until recently, and until I consider

00:58:36 the convergence and then predicate,

00:58:40 and then, oh, this is what PLATO told.

00:58:45 So.

00:58:46 Can you linger on that?

00:58:47 Like why, how do you think about this world of ideas

00:58:50 and world of things in PLATO?

00:58:52 No, it is metaphor.

00:58:54 It is.

00:58:55 It’s a metaphor, for sure.

00:58:55 Yeah.

00:58:56 It’s a compelling, it’s a poetic

00:58:57 and a beautiful metaphor.

00:58:58 Yeah, yeah, yeah.

00:58:59 But what, can you?

00:59:00 But it is a way how you should try to understand

00:59:04 how to talk ideas in the world.

00:59:07 So from my point of view,

00:59:11 it is very clear, but it is lying.

00:59:14 All the time, people looking for that.

00:59:17 Say, PLATO, then Hegel, whatever reasonable it exists,

00:59:24 whatever exists, it is reasonable.

00:59:26 I don’t know what he have in mind reasonable.

00:59:30 Right, this philosophers again,

00:59:31 their words. No, no, no, no, no, no, no.

00:59:33 It is next stop of Wigner.

00:59:37 That mathematics understand something of reality.

00:59:40 It is the same PLATO line.

00:59:43 And then it comes suddenly to Vladimir Propp.

00:59:48 Look, 31 ideas, 31 units, and this corrects everything.

00:59:54 There’s abstractions, ideas that represent our world.

00:59:59 Our world, and we should always try to reach into that.

01:00:03 Yeah, but you should make a projection on reality.

01:00:07 But understanding is, it is abstract ideas.

01:00:11 You have in your mind several abstract ideas

01:00:15 which you can apply to reality.

01:00:17 And reality in this case,

01:00:19 so if you look at machine learning as data.

01:00:21 This example, data.

01:00:22 Data.

01:00:24 Okay, let me put this on you

01:00:26 because I’m an emotional creature.

01:00:28 I’m not a mathematical creature like you.

01:00:30 I find compelling the idea,

01:00:33 forget the space, the sea of functions.

01:00:36 There’s also a sea of data in the world.

01:00:39 And I find compelling that there might be,

01:00:42 like you said, teacher,

01:00:44 small examples of data that are most useful

01:00:49 for discovering good,

01:00:53 whether it’s predicates or good functions,

01:00:55 that the selection of data may be a powerful journey,

01:01:00 a useful, you know, coming up with a mechanism

01:01:03 for selecting good data might be useful too.

01:01:07 Do you find this idea of finding the right data set

01:01:12 interesting at all?

01:01:14 Or do you kind of take the data set as a given?

01:01:17 I think that it is, you know, my theme is very simple.

01:01:22 You have huge set of functions.

01:01:25 If you will apply, and you have not too many data,

01:01:31 if you pick up function which describes this data,

01:01:37 you will do not very well.

01:01:41 You will.

01:01:42 Like randomly pick up.

01:01:42 Yeah, you will overfit.

01:01:43 Yeah, it will be overfitting.

01:01:46 So you should decrease set of function

01:01:50 from which you’re picking up one.

01:01:53 So you should go somehow to admissible set of function.

01:01:59 And this, what about weak conversions?

01:02:03 So, but from another point of view,

01:02:08 to make admissible set of function,

01:02:13 you need just a DG, just function

01:02:15 which you will take in inner product,

01:02:19 which you will measure property of your function.

01:02:27 And that is how it works.

01:02:31 No, I get it, I get it, I understand it,

01:02:32 but do you, the reality is.

01:02:34 But let’s think about examples.

01:02:40 You have huge set of function,

01:02:41 and you have several examples.

01:02:44 If you just trying to keep, take function

01:02:50 which satisfies these examples, you still will overfit.

01:02:56 You need decrease, you need admissible set of function.

01:02:59 Absolutely, but what, say you have more data than functions.

01:03:06 So sort of consider the, I mean,

01:03:08 maybe not more data than functions,

01:03:09 because that’s impossible.

01:03:12 But what, I was trying to be poetic for a second.

01:03:15 I mean, you have a huge amount of data,

01:03:17 a huge amount of examples.

01:03:19 But amount of function can be even bigger.

01:03:22 It can get bigger, I understand.

01:03:24 Everything is.

01:03:25 There’s always a bigger boat.

01:03:27 Full Hilbert space.

01:03:29 I got you, but okay.

01:03:31 But you don’t find the world of data

01:03:35 to be an interesting optimization space.

01:03:38 Like the optimization should be in the space of functions.

01:03:45 Creating admissible set of functions.

01:03:47 Admissible set of functions.

01:03:48 No, you know, even from the classical business theory,

01:03:54 from structure risk minimization,

01:03:56 you should organize function in the way

01:04:02 that they will be useful for you.

01:04:06 Right.

01:04:07 And that is admissible set.

01:04:10 The way you’re thinking about useful

01:04:13 is you’re given a small set of examples.

01:04:17 Useful small, small set of function

01:04:19 which contain function I’m looking for.

01:04:21 Yeah, but looking for based on

01:04:25 the empirical set of small examples.

01:04:27 Yeah, but that is another story.

01:04:29 I don’t touch it.

01:04:31 Because I believe that this small examples

01:04:35 is not too small.

01:04:37 Say 60 per class.

01:04:39 Law of large numbers works.

01:04:41 I don’t need uniform law.

01:04:43 The story is that in statistics there are two law.

01:04:46 Law of large numbers and uniform law of large numbers.

01:04:51 So I want to be in situation where I use

01:04:54 law of large numbers but not uniform law of large numbers.

01:04:58 Right, so 60 is law of large, it’s large enough.

01:05:01 I hope, no, it still need some evaluations,

01:05:05 some bonds.

01:05:07 But the idea is the following that

01:05:11 if you trust that

01:05:15 say this average gives you something close to expectations

01:05:21 so you can talk about that, about this predicate.

01:05:26 And that is basis of human intelligence.

01:05:30 Good predicates is the,

01:05:32 the discovery of good predicates is the basis of human intelligence.

01:05:34 It is discoverer of your understanding world.

01:05:39 Of your methodology of understanding world.

01:05:45 Because you have several function

01:05:47 which you will apply to reality.

01:05:51 Can you say that again?

01:05:52 So you’re…

01:05:54 You have several functions predicate.

01:05:58 But they’re abstract.

01:06:00 Yes.

01:06:01 Then you will apply them to reality, to your data.

01:06:04 And you will create in this way predicate.

01:06:07 Which is useful for your task.

01:06:11 But predicate are not related specifically to your task.

01:06:16 To this your task.

01:06:17 It is abstract functions.

01:06:20 Which being applying, applied to…

01:06:23 Many tasks that you might be interested in.

01:06:25 It might be many tasks, I don’t know.

01:06:27 Or…

01:06:28 Different tasks.

01:06:29 Well they should be many tasks, right?

01:06:31 I believe like, like in prop case.

01:06:35 It was for fairytales, but it’s happened everywhere.

01:06:40 Okay, so we talked about images a little bit.

01:06:42 But, can we talk about Noam Chomsky for a second?

01:06:49 No, I believe I…

01:06:52 I don’t know him very well.

01:06:54 Personally, well…

01:06:55 Not personally, I don’t know.

01:06:57 His ideas.

01:06:58 Well let me just say,

01:06:59 do you think language, human language,

01:07:02 is essential to expressing ideas?

01:07:05 As Noam Chomsky believes.

01:07:08 So like, language is at the core

01:07:10 of our formation of predicates.

01:07:13 The human language.

01:07:14 For me, language and all the story of language

01:07:18 is very complicated.

01:07:20 I don’t understand this.

01:07:22 And I am not…

01:07:24 I thought about…

01:07:25 Nobody does.

01:07:26 I am not ready to work on that.

01:07:28 Because it’s so huge.

01:07:30 It is not for me, and I believe not for our century.

01:07:35 The 21st century.

01:07:37 Not for 21st century.

01:07:39 You should learn something, a lot of stuff,

01:07:42 from simple task like digit recognition.

01:07:45 So you think, okay, you think digital recognition,

01:07:49 2D image, how would you more abstractly define

01:07:55 digit recognition?

01:07:56 It’s 2D image, symbol recognition, essentially.

01:08:03 I mean, I’m trying to get a sense,

01:08:08 sort of thinking about it now,

01:08:09 having worked with MNIST forever,

01:08:12 how small of a subset is this

01:08:16 of the general vision recognition problem

01:08:18 and the general intelligence problem?

01:08:21 Is it…

01:08:24 Yeah.

01:08:25 Is it a giant subset?

01:08:26 Is it not?

01:08:27 And how far away is language?

01:08:30 You know, let me refer to Einstein.

01:08:34 Take the simplest problem, as simple as possible,

01:08:38 but not simpler.

01:08:39 And this is challenge, this simple problem.

01:08:44 But it’s simple by idea, but not simple to get it.

01:08:50 When you will do this, you will find some predicate,

01:08:55 which helps it a bit.

01:08:57 Well, yeah, I mean, with Einstein, you can,

01:09:01 you look at general relativity,

01:09:04 but that doesn’t help you with quantum mechanics.

01:09:07 That’s another story.

01:09:08 You don’t have any universal instrument.

01:09:11 Yes, so I’m trying to wonder which space we’re in,

01:09:16 whether handwritten recognition is like general relativity,

01:09:21 and then language is like quantum mechanics.

01:09:23 So you’re still gonna have to do a lot of mess

01:09:27 to universalize it.

01:09:28 But I’m trying to see,

01:09:35 so what’s your intuition why handwritten recognition

01:09:39 is easier than language?

01:09:42 Just, I think a lot of people would agree with that,

01:09:45 but if you could elucidate sort of the intuition of why.

01:09:50 I don’t know, no, I don’t think in this direction.

01:09:56 I just think in directions that this is problem,

01:10:00 which if we will solve it well,

01:10:07 we will create some abstract understanding of images.

01:10:18 Maybe not all images.

01:10:19 I would like to talk to guys who doing in real images

01:10:24 in Columbia University.

01:10:26 What kind of images, unreal?

01:10:28 Real images.

01:10:29 Real images.

01:10:30 Yeah, what they’re ready, is there a predicate,

01:10:33 what can be predicate?

01:10:35 I still symmetry will play role in real life images,

01:10:40 in any real life images, 2D images.

01:10:43 Let’s talk about 2D images.

01:10:46 Because that’s what we know.

01:10:52 A neural network was created for 2D images.

01:10:55 So the people I know in vision science, for example,

01:10:58 the people who study human vision,

01:11:01 that they usually go to the world of symbols

01:11:04 and like handwritten recognition,

01:11:06 but not really, it’s other kinds of symbols

01:11:08 to study our visual perception system.

01:11:11 As far as I know, not much predicate type of thinking

01:11:15 is understood about our vision system.

01:11:17 They did not think in this direction.

01:11:19 They don’t, yeah, but how do you even begin

01:11:21 to think in that direction?

01:11:23 That’s a, I would like to discuss with them.

01:11:26 Yeah.

01:11:27 Because if we will be able to show that it is what working,

01:11:35 and theoretical scheme, it’s not so bad.

01:11:40 So the unfortunate, so if we compare to language,

01:11:43 language is like letters, finite set of letters,

01:11:46 and a finite set of ways you can put together those letters.

01:11:50 So it feels more amenable to kind of analysis.

01:11:53 With natural images, there is so many pixels.

01:11:58 No, no, no, letter, language is much, much more complicated.

01:12:03 It’s involved a lot of different stuff.

01:12:08 It’s not just understanding of very simple class of tasks.

01:12:15 I would like to see list of task with language involved.

01:12:19 Yes, so there’s a lot of nice benchmarks now

01:12:23 in natural language processing from the very trivial,

01:12:27 like understanding the elements of a sentence,

01:12:30 to question answering, to much more complicated

01:12:33 where you talk about open domain dialogue.

01:12:36 The natural question is, with handwritten recognition,

01:12:39 is really the first step of understanding

01:12:42 visual information.

01:12:44 Right.

01:12:46 But even our records show that we go in the wrong direction

01:12:54 because we need 60,000 digits.

01:12:56 So even this first step, so forget about talking

01:12:59 about the full journey, this first step

01:13:01 should be taking in the right direction.

01:13:03 No, no, wrong direction because 60,000 is unacceptable.

01:13:07 No, I’m saying it should be taken in the right direction

01:13:11 because 60,000 is not acceptable.

01:13:13 If you can talk, it’s great, we have half percent of error.

01:13:18 And hopefully the step from doing hand recognition

01:13:22 using very few examples, the step towards what babies do

01:13:26 when they crawl and understand their physical environment.

01:13:30 I know you don’t know about babies.

01:13:31 If you will do from very small examples,

01:13:36 you will find principles which are different

01:13:40 from what we’re using now.

01:13:44 And so it’s more or less clear.

01:13:48 That means that you will use weak convergence,

01:13:52 not just strong convergence.

01:13:54 Do you think these principles

01:13:58 will naturally be human interpretable?

01:14:01 Oh, yeah.

01:14:02 So like when we’ll be able to explain them

01:14:04 and have a nice presentation to show

01:14:06 what those principles are, or are they very,

01:14:10 going to be very kind of abstract kinds of functions?

01:14:14 For example, I talked yesterday about symmetry.

01:14:17 Yes.

01:14:18 And I gave very simple examples.

01:14:20 The same will be like that.

01:14:22 You gave like a predicate of a basic for?

01:14:24 For symmetries.

01:14:25 Yes, for different symmetries and you have for?

01:14:29 Degree of symmetries, that is important.

01:14:31 Not just symmetry.

01:14:33 Existence doesn’t exist, degree of symmetry.

01:14:38 Yeah, for handwritten recognition.

01:14:41 No, it’s not for handwritten, it’s for any images.

01:14:45 But I would like apply to handwritten.

01:14:47 Right, in theory it’s more general, okay, okay.

01:14:55 So a lot of the things we’ve been talking about

01:14:58 falls, we’ve been talking about philosophy a little bit,

01:15:01 but also about mathematics and statistics.

01:15:05 A lot of it falls into this idea,

01:15:08 a universal idea of statistical theory of learning.

01:15:11 What is the most beautiful and sort of powerful

01:15:16 or essential idea you’ve come across,

01:15:19 even just for yourself personally in the world

01:15:22 of statistics or statistic theory of learning?

01:15:25 Probably uniform convergence, which we did

01:15:29 with Alexei Chilvonenkis.

01:15:33 Can you describe universal convergence?

01:15:36 You have law of large numbers.

01:15:40 So for any function, expectation of function,

01:15:44 average of function converged to expectation.

01:15:48 But if you have set of functions,

01:15:50 for any function it is true.

01:15:52 But it should converge simultaneously

01:15:55 for all set of functions.

01:15:59 And for learning, you need uniform convergence.

01:16:06 Just convergence is not enough.

01:16:11 Because when you pick up one which gives minimum,

01:16:16 you can pick up one function which does not converge

01:16:21 and it will give you the best answer for this function.

01:16:31 So you need uniform convergence to guarantee learning.

01:16:34 So learning does not rely on trivial law of large numbers,

01:16:40 it relies on universal law.

01:16:42 But idea of convergence exists in statistics for a long time.

01:16:51 But it is interesting that as I think about myself,

01:17:02 how stupid I was 50 years, I did not see weak convergence.

01:17:08 I work on strong convergence.

01:17:10 But now I think that most powerful is weak convergence.

01:17:15 Because it makes admissible set of functions.

01:17:18 And even in all proverbs,

01:17:22 when people try to understand recognition about dog law,

01:17:28 looks like a dog and so on, they use weak convergence.

01:17:32 People in language, they understand this.

01:17:34 But when we’re trying to create artificial intelligence,

01:17:42 we want event in different way.

01:17:46 We just consider strong convergence arguments.

01:17:50 So reducing the set of admissible functions,

01:17:52 you think there should be effort put into understanding

01:17:58 the properties of weak convergence?

01:18:01 You know, in classical mathematics, in Gilbert space,

01:18:07 there are only two ways,

01:18:08 two form of convergence, strong and weak.

01:18:14 Now we can use both.

01:18:16 That means that we did everything.

01:18:21 And it so happened that when we use Hilbert space,

01:18:26 which is very rich space, space of continuous functions,

01:18:34 which has integral and square.

01:18:38 So we can apply weak and strong convergence for learning

01:18:42 and have closed form solution.

01:18:45 So for computationally simple.

01:18:47 For me, it is sign that it is right way.

01:18:51 Because you don’t need any heuristic here,

01:18:55 just do whatever you want.

01:18:59 But now the only what left is this concept

01:19:03 of what is predicate, but it is not statistics.

01:19:08 By the way, I like the fact that you think that heuristics

01:19:11 are a mess that should be removed from the system.

01:19:14 So closed form solution is the ultimate goal.

01:19:18 No, it so happened that when you’re using right instrument,

01:19:23 you have closed form solution.

01:19:28 Do you think intelligence, human level intelligence,

01:19:32 when we create it,

01:19:37 will have something like a closed form solution?

01:19:42 You know, now I’m looking on bounds,

01:19:46 which I gave bounds for convergence.

01:19:51 And when I’m looking for bounds,

01:19:53 I’m thinking what is the most appropriate kernel

01:19:59 for this bound would be.

01:20:02 So we know that in say,

01:20:05 all our businesses, we use radial basis function.

01:20:11 But looking on the bound,

01:20:13 I think that I start to understand that maybe

01:20:17 we need to make corrections to radial basis function

01:20:21 to be closer to work better for this bounds.

01:20:28 So I’m again trying to understand what type of kernel

01:20:33 have best approximation,

01:20:37 best fit to this bound.

01:20:43 Sure, so there’s a lot of interesting work

01:20:45 that could be done in discovering better functions

01:20:47 than radial basis functions for bounds you find.

01:20:53 It still comes from,

01:20:55 you’re looking to mass and trying to understand what.

01:21:00 From your own mind, looking at the, I don’t know.

01:21:03 Then I’m trying to understand what will be good for that.

01:21:11 Yeah, but to me, there’s still a beauty.

01:21:14 Again, maybe I’m a descendant of Alan Turing to heuristics.

01:21:17 To me, ultimately, intelligence will be a mess of heuristics.

01:21:23 And that’s the engineering answer, I guess.

01:21:26 Absolutely.

01:21:27 When you’re doing say, self driving cars,

01:21:31 the great guy who will do this.

01:21:35 It doesn’t matter what theory behind that.

01:21:40 Who has a better feeling how to apply it.

01:21:43 But by the way, it is the same story about predicates.

01:21:50 Because you cannot create rule for,

01:21:53 situation is much more than you have rule for that.

01:21:56 But maybe you can have more abstract rule

01:22:04 than it will be less literal.

01:22:08 It is the same story about ideas

01:22:10 and ideas applied to specific cases.

01:22:16 But still you should reach.

01:22:17 You cannot avoid this.

01:22:18 Yes, of course.

01:22:19 But you should still reach for the ideas

01:22:21 to understand the science.

01:22:22 Okay, let me kind of ask, do you think neural networks

01:22:27 or functions can be made to reason?

01:22:34 So what do you think, we’ve been talking about intelligence,

01:22:37 but this idea of reasoning,

01:22:39 there’s an element of sequentially disassembling,

01:22:44 interpreting the images.

01:22:48 So when you think of handwritten recognition, we kind of think

01:22:54 that there’ll be a single, there’s an input and output.

01:22:56 There’s not a recurrence.

01:23:01 What do you think about sort of the idea of recurrence,

01:23:04 of going back to memory and thinking through this

01:23:06 sort of sequentially mangling the different representations

01:23:11 over and over until you arrive at a conclusion?

01:23:20 Or is ultimately all that can be wrapped up into a function?

01:23:23 No, you’re suggesting that let us use this type of algorithm.

01:23:29 When I started thinking, I first of all,

01:23:33 starting to understand what I want.

01:23:36 Can I write down what I want?

01:23:39 And then I’m trying to formalize.

01:23:45 And when I do that, I think I have to solve this problem.

01:23:52 And till now I did not see a situation where you need recurrence.

01:24:04 But do you observe human beings?

01:24:07 Yeah.

01:24:08 You try to, it’s the imitation question, right?

01:24:12 It seems that human beings reason

01:24:14 this kind of sequentially sort of,

01:24:20 does that inspire in you a thought that we need to add that

01:24:24 into our intelligence systems?

01:24:30 You’re saying, okay, I mean, you’ve kind of answered saying

01:24:34 until now I haven’t seen a need for it.

01:24:37 And so because of that, you don’t see a reason

01:24:40 to think about it.

01:24:41 You know, most of things I don’t understand.

01:24:45 In reasoning in human, it is for me too complicated.

01:24:52 For me, the most difficult part is to ask questions,

01:25:01 to good questions, how it works,

01:25:03 how people asking questions, I don’t know this.

01:25:11 You said that machine learning is not only

01:25:13 about technical things, speaking of questions,

01:25:16 but it’s also about philosophy.

01:25:19 So what role does philosophy play in machine learning?

01:25:23 We talked about Plato, but generally thinking

01:25:28 in this philosophical way, does it have,

01:25:32 how does philosophy and math fit together in your mind?

01:25:36 First ideas and then their implementation.

01:25:39 It’s like predicate, like say admissible set of functions.

01:25:48 It comes together, everything.

01:25:51 Because the first iteration of theory was done 50 years ago.

01:25:58 I told that, this is theory.

01:26:00 So everything’s there, if you have data you can,

01:26:04 and your set of function has not big capacity.

01:26:13 So low VC dimension, you can do that.

01:26:15 You can make structural risk minimization, control capacity.

01:26:21 But you was not able to make admissible set of function good.

01:26:26 Now when suddenly realize that we did not use

01:26:33 another idea of convergence, which we can,

01:26:39 everything comes together.

01:26:41 But those are mathematical notions.

01:26:43 Philosophy plays a role of simply saying

01:26:48 that we should be swimming in the space of ideas.

01:26:52 Let’s talk what is philosophy.

01:26:54 Philosophy means understanding of life.

01:26:58 So understanding of life, say people like Plata,

01:27:03 they understand on very high abstract level of life.

01:27:07 So, and whatever I doing,

01:27:12 just implementation of my understanding of life.

01:27:16 But every new step, it is very difficult.

01:27:21 For example, to find this idea

01:27:28 that we need big convergence was not simple for me.

01:27:40 So that required thinking about life a little bit.

01:27:44 Hard to trace, but there was some thought process.

01:27:48 I’m working, I’m thinking about the same problem

01:27:52 for 50 years or more, and again, and again, and again.

01:28:00 I’m trying to be honest and that is very important.

01:28:02 Not to be very enthusiastic, but concentrate

01:28:06 on whatever we was not able to achieve, for example.

01:28:12 And understand why.

01:28:13 And now I understand that because I believe in math,

01:28:18 I believe that in Wigner’s idea.

01:28:23 But now when I see that there are only two way

01:28:28 of convergence and we’re using both,

01:28:32 that means that we must do as well as people doing.

01:28:37 But now, exactly in philosophy

01:28:42 and what we know about predicate,

01:28:45 how we understand life, can we describe as a predicate.

01:28:51 I thought about that and that is more or less obvious

01:28:57 level of symmetry.

01:29:00 But next, I have a feeling,

01:29:05 it’s something about structures.

01:29:09 But I don’t know how to formulate,

01:29:11 how to measure measure of structure and all this stuff.

01:29:16 And the guy who will solve this challenge problem,

01:29:22 then when we were looking how he did it,

01:29:27 probably just only symmetry is not enough.

01:29:30 But something like symmetry will be there.

01:29:33 Structure will be there.

01:29:34 Oh yeah, absolutely.

01:29:35 Symmetry will be there and level of symmetry will be there.

01:29:40 And level of symmetry, antisymmetry, diagonal, vertical.

01:29:44 And I even don’t know how you can use

01:29:48 in different direction idea of symmetry, it’s very general.

01:29:52 But it will be there.

01:29:54 I think that people very sensitive to idea of symmetry.

01:29:58 But there are several ideas like symmetry.

01:30:04 As I would like to learn.

01:30:07 But you cannot learn just thinking about that.

01:30:11 You should do challenging problems

01:30:14 and then analyze them, why it was able to solve them.

01:30:20 And then you will see.

01:30:22 Very simple things, it’s not easy to find.

01:30:25 But even with talking about this every time.

01:30:32 I was surprised, I tried to understand.

01:30:36 These people describe in language

01:30:40 strong convergence mechanism for learning.

01:30:44 I did not see, I don’t know.

01:30:46 But weak convergence, this dark story

01:30:50 and story like that when you will explain to kid,

01:30:54 you will use weak convergence argument.

01:30:57 It looks like it does like it does that.

01:31:00 But when you try to formalize, you’re just ignoring this.

01:31:05 Why, why 50 years from start of machine learning?

01:31:10 And that’s the role of philosophy, thinking about life.

01:31:12 I think that maybe, I don’t know.

01:31:18 Maybe this is theory also, we should blame for that

01:31:22 because empirical risk minimization and all this stuff.

01:31:27 And if you read now textbooks,

01:31:30 they just about bound about empirical risk minimization.

01:31:34 They don’t looking for another problem like admissible set.

01:31:41 But on the topic of life, perhaps we,

01:31:47 you could talk in Russian for a little bit.

01:31:50 What’s your favorite memory from childhood?

01:31:53 What’s your favorite memory from childhood?

01:31:56 Oh, music.

01:31:59 How about, can you try to answer in Russian?

01:32:02 Music?

01:32:04 It was very cool when…

01:32:08 What kind of music?

01:32:09 Classic music.

01:32:11 What’s your favorite?

01:32:13 Well, different composers.

01:32:15 At first, it was Vivaldi, I was surprised that it was possible.

01:32:23 And then when I understood Bach, I was absolutely shocked.

01:32:29 By the way, from him I think that there is a predicate,

01:32:35 like a structure.

01:32:36 In Bach?

01:32:37 Well, of course.

01:32:38 Because you can just feel the structure.

01:32:42 And I don’t think that different elements of life

01:32:49 are very much divided, in the sense of predicates.

01:32:53 Everywhere structure, in painting structure,

01:32:56 in human relations structure.

01:32:59 Here’s how to find these high level predicates, it’s…

01:33:05 In Bach and in life, everything is connected.

01:33:08 Now that we’re talking about Bach,

01:33:14 let’s switch back to English,

01:33:15 because I like Beethoven and Chopin, so…

01:33:18 Well, Chopin, it’s another amusing story.

01:33:21 But Bach, if we talk about predicates,

01:33:23 Bach probably has the most sort of

01:33:29 well defined predicates that underlie it.

01:33:31 It is very interesting to read what critics

01:33:36 are writing about Bach, which words they’re using.

01:33:40 They’re trying to describe predicates.

01:33:43 And then Chopin, it is very different vocabulary,

01:33:52 very different predicates.

01:33:55 And I think that if you will make collection of that,

01:34:02 so maybe from this you can describe predicate

01:34:05 for digit recognition as well.

01:34:08 From Bach and Chopin.

01:34:10 No, no, no, not from Bach and Chopin.

01:34:12 From the critic interpretation of the music, yeah.

01:34:15 When they’re trying to explain you music, what they use.

01:34:22 As they use, they describe high level ideas

01:34:25 of platos ideas, what behind this music.

01:34:28 That’s brilliant.

01:34:29 So art is not self explanatory in some sense.

01:34:34 So you have to try to convert it into ideas.

01:34:39 It is ill post problems.

01:34:40 When you go from ideas to the representation,

01:34:46 it is easy way.

01:34:47 But when you’re trying to go Bach, it is ill post problems.

01:34:51 But nevertheless, I believe that when you’re looking

01:34:55 from that, even from art, you will be able to find

01:35:00 predicates for digit recognition.

01:35:02 That’s such a fascinating and powerful notion.

01:35:08 Do you ponder your own mortality?

01:35:11 Do you think about it?

01:35:12 Do you fear it?

01:35:13 Do you draw insight from it?

01:35:16 About mortality, no, yeah.

01:35:21 Are you afraid of death?

01:35:25 Not too much, not too much.

01:35:29 It is pity that I will not be able to do something

01:35:33 which I think I have a feeling to do that.

01:35:39 For example, I will be very happy to work with guys

01:35:48 theoretician from music to write this collection

01:35:52 of description, how they describe music,

01:35:55 how they use that predicate, and from art as well.

01:36:00 Then take what is in common and try to understand

01:36:04 predicate which is absolute for everything.

01:36:08 And then use that for visual recognition

01:36:10 and see if there is a connection.

01:36:12 Yeah, exactly.

01:36:13 Ah, there’s still time.

01:36:14 We got time.

01:36:16 Ha ha ha ha.

01:36:18 Yeah.

01:36:19 We got time.

01:36:20 It take years and years and years.

01:36:24 Yes, yeah, it’s a long way.

01:36:26 Well, see, you’ve got the patient mathematicians mind.

01:36:30 I think it could be done very quickly and very beautifully.

01:36:34 I think it’s a really elegant idea.

01:36:35 Yeah, but also.

01:36:36 Some of many.

01:36:37 Yeah, you know, the most time,

01:36:40 it is not to make this collection to understand

01:36:45 what is the common to think about that once again

01:36:48 and again and again.

01:36:49 Again and again and again, but I think sometimes,

01:36:52 especially just when you say this idea now,

01:36:55 even just putting together the collection

01:36:58 and looking at the different sets of data,

01:37:03 language, trying to interpret music,

01:37:05 criticize music, and images,

01:37:08 I think there’ll be sparks of ideas that’ll come.

01:37:10 Of course, again and again, you’ll come up with better ideas,

01:37:13 but even just that notion is a beautiful notion.

01:37:16 I even have some example.

01:37:19 Yes, so I have friend

01:37:25 who was specialist in Russian poetry.

01:37:30 She is professor of Russian poetry.

01:37:35 He did not write poems,

01:37:39 but she know a lot of stuff.

01:37:43 She make book, several books,

01:37:48 and one of them is a collection of Russian poetry.

01:37:54 She have images of Russian poetry.

01:37:57 She collect all images of Russian poetry.

01:38:00 And I ask her to do following.

01:38:05 You have NIPS, digit recognition,

01:38:09 and we get 100 digits,

01:38:13 or maybe less than 100.

01:38:15 I don’t remember, maybe 50 digits.

01:38:18 And try from poetical point of view,

01:38:21 describe every image which she see,

01:38:25 using only words of images of Russian poetry.

01:38:31 And she did it.

01:38:34 And then we tried to,

01:38:41 I call it learning using privileged information.

01:38:43 I call it privileged information.

01:38:45 You have on two languages.

01:38:48 One language is just image of digit,

01:38:53 and another language, poetic description of this image.

01:38:57 And this is privileged information.

01:39:02 And there is an algorithm when you’re working

01:39:04 using privileged information, you’re doing better.

01:39:08 Much better, so.

01:39:10 So there’s something there.

01:39:11 Something there.

01:39:12 And there is a, in NEC,

01:39:16 she unfortunately died.

01:39:20 The collection of digits

01:39:24 in poetic descriptions of these digits.

01:39:29 Yeah.

01:39:30 So there’s something there in that poetic description.

01:39:32 But I think that there is a abstract ideas

01:39:38 on the plot of level of ideas.

01:39:40 Yeah, that they’re there.

01:39:42 That could be discovered.

01:39:43 And music seems to be a good entry point.

01:39:45 But as soon as we start with this challenge problem.

01:39:50 The challenge problem.

01:39:51 Listen.

01:39:52 It immediately connected to all this stuff.

01:39:55 Especially with your talk and this podcast,

01:39:58 and I’ll do whatever I can to advertise it.

01:40:00 It’s such a clean, beautiful Einstein like formulation

01:40:03 of the challenge before us.

01:40:05 Right.

01:40:06 Let me ask another absurd question.

01:40:09 We talked about mortality.

01:40:12 We talked about philosophy of life.

01:40:14 What do you think is the meaning of life?

01:40:17 What’s the predicate for mysterious existence here on earth?

01:40:29 I don’t know.

01:40:33 It’s very interesting how we have,

01:40:37 in Russia, I don’t know if you know the guy Strugatsky.

01:40:43 They are writing fiction.

01:40:46 They’re thinking about human, what’s going on.

01:40:51 And they have idea that there are developing

01:41:00 two type of people, common people and very smart people.

01:41:05 They just started.

01:41:06 And these two branches of people will go

01:41:10 in different direction very soon.

01:41:13 So that’s what they’re thinking about that.

01:41:18 So the purpose of life is to create two paths.

01:41:23 Two paths.

01:41:24 Of human societies.

01:41:25 Yes.

01:41:27 Simple people and more complicated people.

01:41:29 Which do you like best?

01:41:31 The simple people or the complicated ones?

01:41:34 I don’t know that it is just his fantasy,

01:41:38 but you know, every week we have guy

01:41:41 who is just a writer and also a theorist of literature.

01:41:51 And he explain how he understand literature

01:41:56 and human relationship.

01:41:58 How he see life.

01:42:00 And I understood that I’m just small kids

01:42:06 comparing to him.

01:42:09 He’s very smart guy in understanding life.

01:42:13 He knows this predicate.

01:42:15 He knows big blocks of life.

01:42:19 I am used every time when I listen to him.

01:42:24 And he just talking about literature.

01:42:27 And I think that I was surprised.

01:42:33 So the managers in big companies,

01:42:41 most of them are guys who study English language

01:42:48 and English literature.

01:42:51 So why?

01:42:52 Because they understand life.

01:42:54 They understand models.

01:42:57 And among them,

01:42:58 maybe many talented critics just analyzing this.

01:43:06 And this is big science like property.

01:43:10 This is blocks.

01:43:13 That’s very smart.

01:43:17 It amazes me that you are and continue to be humbled

01:43:21 by the brilliance of others.

01:43:22 I’m very modest about myself.

01:43:25 I see so smart guys around.

01:43:28 Well, let me be immodest for you.

01:43:31 You’re one of the greatest mathematicians,

01:43:33 statisticians of our time.

01:43:35 It’s truly an honor.

01:43:36 Thank you for talking again.

01:43:38 And let’s talk.

01:43:41 It is not.

01:43:43 I know my limits.

01:43:45 Let’s talk again when your challenge is taken on

01:43:49 and solved by grad student.

01:43:51 Especially when they use it.

01:43:55 It happens.

01:43:57 Maybe music will be involved.

01:43:58 Latimer, thank you so much.

01:43:59 It’s been an honor. Thank you very much.

01:44:02 Thanks for listening to this conversation

01:44:04 with Latimer Vapnik.

01:44:05 And thank you to our presenting sponsor, Cash App.

01:44:08 Download it, use code LexPodcast.

01:44:11 You’ll get $10 and $10 will go to FIRST,

01:44:14 an organization that inspires and educates young minds

01:44:17 to become science and technology innovators of tomorrow.

01:44:20 If you enjoy this podcast, subscribe on YouTube,

01:44:23 give us five stars on Apple Podcast,

01:44:25 support it on Patreon,

01:44:26 or simply connect with me on Twitter at Lex Friedman.

01:44:31 And now, let me leave you with some words

01:44:33 from Latimer Vapnik.

01:44:35 When solving a problem of interest,

01:44:37 do not solve a more general problem

01:44:40 as an intermediate step.

01:44:43 Thank you for listening.

01:44:44 I hope to see you next time.