Transcript
00:00:00 The following is a conversation with Vladimir Vapnik, part two, the second
00:00:05 time we spoke on the podcast.
00:00:07 He’s the coinventor of support vector machines, support vector clustering, VC
00:00:11 theory, and many foundational ideas and statistical learning.
00:00:14 He was born in the Soviet Union, worked at the Institute of Control Sciences
00:00:19 in Moscow, then in the US, worked at AT&T, NEC labs, Facebook AI research,
00:00:26 and now is a professor at Columbia University.
00:00:28 His work has been cited over 200,000 times.
00:00:32 The first time we spoke on the podcast was just over a year
00:00:35 ago, one of the early episodes.
00:00:38 This time we spoke after a lecture he gave titled complete statistical theory
00:00:42 of learning as part of the MIT series of lectures on deep learning
00:00:46 and AI that I organized.
00:00:49 I’ll release the video of the lecture in the next few days.
00:00:53 This podcast and lecture are independent from each other, so you don’t need
00:00:56 one to understand the other.
00:00:59 The lecture is quite technical and math heavy, so if you do watch both, I
00:01:04 recommend listening to this podcast first, since the podcast is
00:01:07 probably a bit more accessible.
00:01:10 This is the artificial intelligence podcast.
00:01:13 If you enjoy it, subscribe on YouTube, give it five stars on Apple podcasts,
00:01:17 support it on Patreon, or simply connect with me on Twitter
00:01:20 at Lex Friedman spelled F R I D M A N.
00:01:23 As usual, I’ll do one or two minutes of ads now and never any ads in
00:01:27 the middle that can break the flow of the conversation.
00:01:30 I hope that works for you and doesn’t hurt the listening experience.
00:01:35 This show is presented by Cash App, the number one finance app in the app store.
00:01:39 When you get it, use code LexPodcast.
00:01:42 Cash App lets you send money to friends, buy Bitcoin, and invest in the
00:01:46 stock market with as little as $1.
00:01:48 Broker services are provided by Cash App Investing, a subsidiary of Square
00:01:52 and member SIPC, since Cash App allows you to send and receive money
00:01:57 digitally, peer to peer, and security in all digital transactions is very important.
00:02:02 Let me mention that PCI data security standard, PCI DSS level one,
00:02:07 that Cash App is compliant with.
00:02:10 I’m a big fan of standards for safety and security and PCI DSS is a good
00:02:16 example of that, where a bunch of competitors got together and agreed
00:02:20 that there needs to be a global standard around the security of transactions.
00:02:24 Now we just need to do the same for autonomous vehicles
00:02:27 and AI systems in general.
00:02:30 So again, if you get Cash App from the app store or Google Play and use the code
00:02:34 LexPodcast, you get $10 and Cash App will also donate $10 to FIRST, one of my
00:02:40 favorite organizations that is helping to advance robotics and STEM education
00:02:45 for young people around the world.
00:02:46 And now here’s my conversation with Vladimir Vapnik.
00:02:52 You and I talked about Alan Turing yesterday a little bit and that he, as the
00:02:58 father of artificial intelligence, may have instilled in our field, an ethic
00:03:02 of engineering and not science, seeking more to build intelligence
00:03:06 rather than to understand it.
00:03:09 What do you think is the difference between these two paths of engineering
00:03:13 intelligence and the science of intelligence?
00:03:18 It’s a completely different story.
00:03:20 Engineering is a mutation of human activity.
00:03:25 You have to make a device which behaves as humans behave, have all the functions
00:03:34 of humans.
00:03:36 It doesn’t matter how you do it, but to understand what is intelligence,
00:03:41 but to understand what is intelligence about, it’s quite a different problem.
00:03:48 So I think, I believe that it’s somehow related to the predicate we talked
00:03:55 yesterday about, because look at the Vladimir Propp’s idea.
00:04:04 He just found 31 here, predicates, he called it units, which can explain
00:04:17 human behavior, at least in Russian tales.
00:04:20 You look at Russian tales and derive from that.
00:04:24 And then people realize that it’s more wide than in Russian tales.
00:04:29 It is in TV, in movie serials and so on and so on.
00:04:33 So you’re talking about Vladimir Propp, who in 1928 published a book,
00:04:39 Morphology of the Folktale, describing 31 predicates that have this kind of
00:04:46 sequential structure that a lot of the stories, narratives follow in Russian
00:04:53 folklore and in other contexts.
00:04:54 We’ll talk about it.
00:04:56 I’d like to talk about predicates in a focused way, but let me, if you allow
00:05:00 me to stay zoomed out on our friend, Alan Turing, and, you know, he inspired
00:05:06 a generation with the imitation game.
00:05:10 Yes.
00:05:11 Do you think if we can linger on that a little bit longer, do you think we can
00:05:17 learn, do you think learning to imitate intelligence can get us closer to the
00:05:22 science, to understanding intelligence?
00:05:24 So why do you think imitation is so far from understanding?
00:05:32 I think that it is different between you have different goals.
00:05:37 So your goal is to create something, something useful.
00:05:43 Yeah.
00:05:43 And that is great.
00:05:45 And you can see how much things was done and I believe that it will be done even
00:05:51 more, it’s self driving cars and also the business, it is great.
00:05:57 And it was inspired by Turing’s vision.
00:06:02 But understanding is very difficult.
00:06:05 It’s more or less philosophical category.
00:06:07 What means understand the world?
00:06:10 I believe in scheme which starts from Plato, that there exists world of ideas.
00:06:18 I believe that intelligence, it is world of ideas, but it is world of pure ideas.
00:06:24 And when you combine them with reality things, it creates, as in my case,
00:06:34 invariants, which is very specific.
00:06:37 And that’s, I believe, the combination of ideas in way to constructing invariants.
00:06:47 Constructing invariant is intelligence.
00:06:49 But first of all, predicate, if you know, predicate and hopefully
00:06:56 then not too much predicate exists.
00:07:00 For example, 31 predicate for human behavior, it is not a lot.
00:07:06 Vladimir Propp used 31, you can even call them predicate, 31
00:07:12 predicates to describe stories, narratives.
00:07:17 Do you think human behavior, how much of human behavior, how much of our
00:07:22 world, our universe, all the things that matter in our existence can be
00:07:28 summarized in predicates of the kind that Propp was working with?
00:07:32 I think that we have a lot of form of behavior, but I think that
00:07:38 predicate is much less because even in this example, which I gave you
00:07:43 yesterday, you saw that predicate can be, one predicate can construct many
00:07:55 different invariants depending on your data.
00:07:59 They’re applying to different data and they give different invariants.
00:08:04 So, but pure ideas, maybe not so much.
00:08:08 Not so many.
00:08:09 I don’t know about that, but my guess, I hope that’s why challenge
00:08:15 about digit recognition, how much you need.
00:08:19 I think we’ll talk about computer vision and 2D images a little bit
00:08:23 in your challenge.
00:08:24 That’s exactly about intelligence.
00:08:26 That’s exactly, that’s exactly about, no, that hopes to be exactly about
00:08:33 the spirit of intelligence in the simplest possible way.
00:08:37 Yeah, absolutely you should start the simplest way, otherwise you
00:08:40 will not be able to do it.
00:08:42 Well, there’s an open question whether starting at the MNIST digit
00:08:46 recognition is a step towards intelligence or it’s an entirely different thing.
00:08:52 I think that to beat records using say 100, 200 times less examples,
00:08:59 you need intelligence.
00:09:00 You need intelligence.
00:09:01 So let’s, because you use this term and it would be nice, I’d like to
00:09:05 ask simple, maybe even dumb questions.
00:09:09 Let’s start with a predicate.
00:09:12 In terms of terms and how you think about it, what is a predicate?
00:09:17 I don’t know.
00:09:18 I have a feeling formally they exist, but I believe that predicate for
00:09:26 2D images, one of them is symmetry.
00:09:31 Hold on a second.
00:09:32 Sorry.
00:09:32 Sorry, sorry to interrupt and pull you back.
00:09:36 At the simplest level, we’re not even, we’re not being profound currently.
00:09:40 A predicate is a statement of something that is true.
00:09:44 Yes.
00:09:46 Do you think of predicates as somehow probabilistic in nature or is this binary?
00:09:54 This is truly constraints of logical statements about the world.
00:09:59 In my definition, the simplest predicate is function.
00:10:03 Function, and you can use this function to make inner product that is predicate.
00:10:10 What’s the input and what’s the output of the function?
00:10:13 Input is X, something which is input in reality.
00:10:18 Say if you consider digit recognition, it pixel space input, but it is
00:10:25 function which in pixel space, but it can be any function from pixel space and you
00:10:36 choose, and I believe that there are several functions which is important for
00:10:43 understanding of images.
00:10:46 One of them is symmetry.
00:10:48 It’s not so simple construction as I described with the derivative, with all
00:10:53 this stuff, but another, I believe, I don’t know how many, is how well
00:10:59 structurized is picture.
00:11:03 Structurized?
00:11:04 Yeah.
00:11:04 What do you mean by structurized?
00:11:06 It is formal definition.
00:11:09 Say something heavy on the left corner, not so heavy in the middle and so on.
00:11:17 You describe in general concept of what you assume.
00:11:21 Concepts, some kind of universal concepts.
00:11:25 Yeah, but I don’t know how to formalize this.
00:11:29 Do you?
00:11:29 So this is the thing.
00:11:31 There’s a million ways we can talk about this.
00:11:33 I’ll keep bringing it up, but we humans have such concepts when we look at
00:11:40 digits, but it’s hard to put them, just like you’re saying now, it’s
00:11:44 hard to put them into words.
00:11:45 You know, that is example, when critics in music, trying to describe music,
00:11:55 they use predicate and not too many predicate, but in different combination,
00:12:02 but they have some special words for describing music and the same
00:12:10 should be for images, but maybe there are critics who understand essence
00:12:16 of what this image is about.
00:12:20 Do you think there exists critics who can summarize the essence of
00:12:26 images, human beings?
00:12:29 I hope so, yes, but that…
00:12:32 Explicitly state them on paper.
00:12:34 The fundamental question I’m asking is, do you think there exists a small
00:12:41 set of predicates that will summarize images?
00:12:45 It feels to our mind, like it does, that the concept of what makes a two
00:12:50 and a three and a four…
00:12:53 No, no, no, it’s not on this level.
00:12:58 It should not describe two, three, four.
00:13:01 It describes some construction, which allow you to create invariance.
00:13:08 And invariance, sorry to stick on this, but terminology.
00:13:12 Invariance, it is property of your image.
00:13:21 Say, I can say, looking on my image, it is more or less symmetric.
00:13:27 Looking on my image, it is more or less symmetric, and I can give you value
00:13:33 of symmetry, say, level of symmetry, using this function which I gave
00:13:40 yesterday. And you can describe that your image has these characteristics
00:13:51 exactly in the way how musical critics describe music.
00:13:56 So, but this is invariant applied to specific data, to specific music,
00:14:05 to something.
00:14:07 I strongly believe in this plot ideas that there exists world of predicate
00:14:14 and world of reality, and predicate and reality is somehow connected,
00:14:20 and you have to know that.
00:14:22 Let’s talk about Plato a little bit.
00:14:23 So you draw a line from Plato, to Hegel, to Wigner, to today.
00:14:30 So Plato has forms, the theory of forms.
00:14:35 So there’s a world of ideas and a world of things, as you talk about,
00:14:39 and there’s a connection.
00:14:40 And presumably the world of ideas is very small, and the world of things
00:14:45 is arbitrarily big, but they’re all what Plato calls them like, it’s a shadow.
00:14:52 The real world is a shadow from the world of forms.
00:14:55 Yeah, you have projection of a world of ideas.
00:14:58 Yeah, very poetic.
00:15:00 In reality, you can realize this projection using these invariants
00:15:07 because it is projection for own specific examples, which create specific features
00:15:13 of specific objects.
00:15:14 So the essence of intelligence is while only being able to observe
00:15:22 the world of things, try to come up with a world of ideas.
00:15:26 Exactly.
00:15:27 Like in this music story, intelligent musical critics knows all these words
00:15:33 and have a feeling about what they mean.
00:15:34 I feel like that’s a contradiction, intelligent music critics.
00:15:38 But I think music is to be enjoyed in all its forms.
00:15:47 The notion of critic, like a food critic.
00:15:49 No, I don’t want touch emotion.
00:15:51 That’s an interesting question.
00:15:53 Does emotion…
00:15:54 There’s certain elements of the human psychology, of the human experience,
00:15:59 which seem to almost contradict intelligence and reason.
00:16:04 Like emotion, like fear, like love, all of those things,
00:16:11 are those not connected in any way to the space of ideas?
00:16:16 That I don’t know.
00:16:18 I just want to be concentrate on very simple story, on digit recognition.
00:16:27 So you don’t think you have to love and fear death in order to recognize digits?
00:16:31 I don’t know.
00:16:33 Because it’s so complicated.
00:16:36 It involves a lot of stuff which I never considered.
00:16:41 But I know about digit recognition.
00:16:44 And I know that for digit recognition,
00:16:50 to get records from small number of observations, you need predicate.
00:16:59 But not special predicate for this problem.
00:17:03 But universal predicate, which understand world of images.
00:17:08 Of visual information.
00:17:09 Visual, yes.
00:17:11 But on the first step, they understand, say, world of handwritten digits,
00:17:18 or characters, or something simple.
00:17:21 So like you said, symmetry is an interesting one.
00:17:23 No, that’s what I think one of the predicate is related to symmetry.
00:17:28 The level of symmetry.
00:17:30 Okay, degree of symmetry.
00:17:32 So you think symmetry at the bottom is a universal notion,
00:17:37 and there’s degrees of a single kind of symmetry,
00:17:41 or is there many kinds of symmetries?
00:17:44 Many kinds of symmetries.
00:17:46 There is a symmetry, antisymmetry, say, letter S.
00:17:52 So it has vertical antisymmetry.
00:17:58 And it could be diagonal symmetry, vertical symmetry.
00:18:02 So when you cut vertically the letter S…
00:18:07 Yeah, then the upper part and lower part in different directions.
00:18:16 Inverted, along the Y axis.
00:18:18 But that’s just like one example of symmetry, right?
00:18:21 Isn’t there like…
00:18:21 Right, but there is a degree of symmetry.
00:18:26 If you play all this iterative stuff to do tangent distance,
00:18:35 whatever I describe, you can have a degree of symmetry.
00:18:40 And that is what describing reason of image.
00:18:45 It is the same as you will describe this image.
00:18:53 Think about digit S, it has antisymmetry.
00:18:57 Digit three is symmetric.
00:19:00 More or less, look for symmetry.
00:19:04 Do you think such concepts like symmetry,
00:19:07 predicates like symmetry, is it a hierarchical set of concepts?
00:19:14 Or are these independent, distinct predicates
00:19:20 that we want to discover as some set of…
00:19:23 No, there is an idea of symmetry.
00:19:25 And you can, this idea of symmetry, make very general.
00:19:34 Like degree of symmetry.
00:19:37 If degree of symmetry can be zero, no symmetry at all.
00:19:40 Or degree of symmetry, say, more or less symmetrical.
00:19:46 But you have one of these descriptions.
00:19:50 And symmetry can be different.
00:19:52 As I told, horizontal, vertical, diagonal,
00:19:56 and antisymmetry is also concept of symmetry.
00:20:01 What about shape in general?
00:20:03 I mean, symmetry is a fascinating notion, but…
00:20:06 No, no, I’m talking about digit.
00:20:08 I would like to concentrate on all I would like to know,
00:20:12 predicate for digit recognition.
00:20:14 Yes, but symmetry is not enough for digit recognition, right?
00:20:19 It is not necessarily for digit recognition.
00:20:22 It helps to create invariant, which you can use
00:20:30 when you will have examples for digit recognition.
00:20:35 You have regular problem of digit recognition.
00:20:38 You have examples of the first class or second class.
00:20:41 Plus, you know that there exists concept of symmetry.
00:20:45 And you apply, when you’re looking for decision rule,
00:20:50 you will apply concept of symmetry,
00:20:55 of this level of symmetry, which you estimate from…
00:21:00 So let’s talk.
00:21:01 Everything comes from weak convergence.
00:21:06 What is convergence?
00:21:07 What is weak convergence?
00:21:09 What is strong convergence?
00:21:11 I’m sorry, I’m gonna do this to you.
00:21:13 What are we converging from and to?
00:21:16 You’re converging, you would like to have a function.
00:21:20 The function which, say, indicator function,
00:21:23 which indicate your digit five, for example.
00:21:29 A classification task.
00:21:31 Let’s talk only about classification.
00:21:33 So classification means you will say
00:21:36 whether this is a five or not,
00:21:38 or say which of the 10 digits it is.
00:21:40 Right, right.
00:21:42 I would like to have these functions.
00:21:46 Then, I have some examples.
00:21:56 I can consider property of these examples.
00:22:01 Say, symmetry.
00:22:02 And I can measure level of symmetry for every digit.
00:22:08 And then I can take average from my training data.
00:22:16 And I will consider only functions
00:22:20 of conditional probability,
00:22:24 which I’m looking for my decision rule.
00:22:27 Which applying to digits will give me the same average
00:22:38 as I observe on training data.
00:22:41 So, actually, this is different level
00:22:45 of description of what you want.
00:22:48 You want not just, you show not one digit.
00:22:54 You show, this predicate, show general property
00:22:59 of all digits which you have in mind.
00:23:03 If you have in mind digit three,
00:23:06 it gives you property of digit three.
00:23:10 And you select as admissible set of function,
00:23:13 only function, which keeps this property.
00:23:16 You will not consider other functions.
00:23:20 So, you immediately looking for smaller subset of function.
00:23:24 That’s what you mean by admissible functions.
00:23:27 Admissible function, exactly.
00:23:28 Which is still a pretty large,
00:23:30 for the number three, is a large.
00:23:32 It is pretty large, but if you have one predicate.
00:23:36 But according to, there is a strong and weak convergence.
00:23:42 Strong convergence is convergence in function.
00:23:46 You’re looking for the function on one function,
00:23:49 and you’re looking for another function.
00:23:51 And square difference from them should be small.
00:23:59 If you take difference in any points,
00:24:01 make a square, make an integral, and it should be small.
00:24:05 That is convergence in function.
00:24:08 Suppose you have some function, any function.
00:24:11 So, I would say, I say that some function
00:24:15 converge to this function.
00:24:17 If integral from square difference between them is small.
00:24:22 That’s the definition of strong convergence.
00:24:24 That definition of strong convergence.
00:24:25 Two functions, the integral, the difference, is small.
00:24:28 Yeah, it is convergence in functions.
00:24:31 Yeah.
00:24:32 But you have different convergence in functionals.
00:24:36 You take any function, you take some function, phi,
00:24:41 and take inner product, this function, this f function.
00:24:46 f0 function, which you want to find.
00:24:50 And that gives you some value.
00:24:52 So, you say that set of functions converge
00:24:59 in inner product to this function,
00:25:03 if this value of inner product converge to value f0.
00:25:10 That is for one phi.
00:25:12 But weak convergence requires that it converge for any
00:25:16 function of Hilbert space.
00:25:20 If it converge for any function of Hilbert space,
00:25:24 then you will say that this is weak convergence.
00:25:28 You can think that when you take integral,
00:25:32 that is integral property of function.
00:25:35 For example, if you will take sine or cosine,
00:25:39 it is coefficient of, say, Fourier expansion.
00:25:45 So, if it converge for all coefficients of Fourier
00:25:51 expansion, so under some condition,
00:25:54 it converge to function you’re looking for.
00:25:58 But weak convergence means any property.
00:26:02 Convergence not point wise, but integral property
00:26:07 of function.
00:26:09 So, weak convergence means integral property of functions.
00:26:13 When I’m talking about predicate,
00:26:16 I would like to formulate which integral properties
00:26:23 I would like to have for convergence.
00:26:27 So, and if I will take one predicated function,
00:26:33 which I measure property, if I will use one predicate
00:26:39 and say, I will consider only function which give me
00:26:44 the same value as this predicate,
00:26:47 I selecting set of functions from functions
00:26:53 which is admissible in the sense that function which I’m
00:26:58 looking for in this set of functions
00:27:01 because I checking in training data, it gives the same.
00:27:08 Yeah, so it always has to be connected to the training
00:27:10 data in terms of?
00:27:12 Yeah, but property, you can know independent on training data.
00:27:18 And this guy, prop, says that there is formal property,
00:27:24 31 property.
00:27:25 A fairy tale, a Russian fairy tale.
00:27:27 But Russian fairy tale is not so interesting.
00:27:30 More interesting that people apply this to movies,
00:27:34 to theater, to different things.
00:27:38 And the same works, they’re universal.
00:27:41 Well, so I would argue that there’s
00:27:44 a little bit of a difference between the kinds of things
00:27:48 that were applied to which are essentially stories
00:27:51 and digit recognition.
00:27:54 It is the same story.
00:27:55 You’re saying digits, there’s a story within the digit.
00:27:59 Yeah.
00:28:00 And so but my point is why I hope
00:28:04 that it possible to beat record using not 60,000,
00:28:11 but say 100 times less.
00:28:13 Because instead, you will give predicates.
00:28:17 And you will select your decision
00:28:21 not from wide set of functions, but from set of functions
00:28:25 which keeps this predicates.
00:28:28 But predicate is not related just to digit recognition.
00:28:32 Right.
00:28:33 Like in Plato’s case.
00:28:37 Do you think it’s possible to automatically discover
00:28:40 the predicates?
00:28:42 So you basically said that the essence of intelligence
00:28:46 is the discovery of good predicates.
00:28:49 Yeah.
00:28:51 Now, the natural question is that’s
00:28:55 what Einstein was good at doing in physics.
00:28:59 Can we make machines do these kinds
00:29:02 of discovery of good predicates?
00:29:04 Or is this ultimately a human endeavor?
00:29:07 That I don’t know.
00:29:09 I don’t think that machine can do.
00:29:11 Because according to theory about weak convergence,
00:29:18 any function from Hilbert space can be predicated.
00:29:23 So you have infinite number of predicate in upper.
00:29:27 And before, you don’t know which predicate is good and which.
00:29:32 But whatever prop show and why people call it breakthrough,
00:29:39 that there is not too many predicate
00:29:44 which cover most of situation happened in the world.
00:29:48 Right.
00:29:51 So there’s a sea of predicates.
00:29:54 And most of the only a small amount
00:29:57 are useful for the kinds of things
00:29:58 that happen in the world.
00:30:01 I think that I would say only small part of predicate
00:30:07 very useful.
00:30:08 Useful all of them.
00:30:11 Only very few are what we should let’s call them
00:30:14 good predicates.
00:30:15 Very good predicates.
00:30:16 Very good predicates.
00:30:18 So can we linger on it?
00:30:20 What’s your intuition?
00:30:21 Why is it hard for a machine to discover good predicates?
00:30:27 Even in my talk described how to do predicate.
00:30:30 How to find new predicate.
00:30:32 I’m not sure that it is very good.
00:30:34 What did you propose in your talk?
00:30:36 No.
00:30:37 In my talk, I gave example for diabetes.
00:30:42 Diabetes, yeah.
00:30:43 When we achieve some percent.
00:30:46 So then we’re looking for area where
00:30:50 some sort of predicate, which I formulate,
00:30:54 does not keeps invariant.
00:31:03 So if it doesn’t keep, I retrain my data.
00:31:06 I select only function which keeps this invariant.
00:31:11 And when I did it, I improved my performance.
00:31:14 I can looking for this predicate.
00:31:16 I know technically how to do that.
00:31:19 And you can, of course, do it using machine.
00:31:25 But I’m not sure that we will construct the smartest
00:31:29 predicate.
00:31:30 But this is the, allow me to linger on it.
00:31:34 Because that’s the essence.
00:31:35 That’s the challenge.
00:31:36 That is artificial.
00:31:37 That’s the human level intelligence
00:31:40 that we seek is the discovery of these good predicates.
00:31:43 You’ve talked about deep learning as a way to,
00:31:47 the predicates they use and the functions are mediocre.
00:31:52 You can find better ones.
00:31:55 Let’s talk about deep learning.
00:31:57 Sure, let’s do it.
00:31:58 I know only Jan’s Likun convolutional network.
00:32:04 And what else?
00:32:05 I don’t know.
00:32:05 And it’s a very simple convolution.
00:32:07 There’s not much else to know.
00:32:09 To pixel left and right.
00:32:10 I can do it like that with one predicate.
00:32:14 Convolution is a single predicate.
00:32:16 It’s single.
00:32:17 It’s single predicate.
00:32:21 Yes, but that’s it.
00:32:22 You know exactly.
00:32:23 You take the derivative for translation and predicate.
00:32:28 This should be kept.
00:32:31 So that’s a single predicate.
00:32:32 But humans discovered that one.
00:32:34 Or at least.
00:32:35 Not it.
00:32:36 That is a risk.
00:32:37 Not too many predicates.
00:32:38 And that is big story because Jan did it 25 years ago
00:32:43 and nothing so clear was added to deep network.
00:32:50 And then I don’t understand why we
00:32:55 should talk about deep network instead of talking
00:32:58 about piecewise linear functions which keeps this predicate.
00:33:02 Well, a counter argument is that maybe the amount
00:33:08 of predicates necessary to solve general intelligence,
00:33:14 say in the space of images, doing
00:33:16 efficient recognition of handwritten digits
00:33:20 is very small.
00:33:22 And so we shouldn’t be so obsessed about finding.
00:33:26 We’ll find other good predicates like convolution, for example.
00:33:30 There has been other advancements
00:33:33 like if you look at the work with attention,
00:33:37 there’s intentional mechanisms in especially used
00:33:40 in natural language focusing the network’s ability
00:33:44 to learn at which part of the input to look at.
00:33:47 The thing is, there’s other things besides predicates
00:33:51 that are important for the actual engineering mechanism
00:33:55 of showing how much you can really
00:33:57 do given these predicates.
00:34:02 I mean, that’s essentially the work of deep learning
00:34:04 is constructing architectures that are able to be,
00:34:09 given the training data, to be able to converge
00:34:13 towards a function that can generalize well.
00:34:22 It’s an engineering problem.
00:34:24 Yeah, I understand.
00:34:26 But let’s talk not on emotional level,
00:34:29 but on a mathematical level.
00:34:31 You have set of piecewise linear functions.
00:34:36 It is all possible neural networks.
00:34:42 It’s just piecewise linear functions.
00:34:44 It’s many, many pieces.
00:34:45 Large number of piecewise linear functions.
00:34:47 Exactly.
00:34:48 Very large.
00:34:49 Very large.
00:34:50 Almost feels like too large.
00:34:51 It’s still simpler than, say, convolution,
00:34:56 than reproducing kernel Hilbert space, which
00:34:59 have a Hilbert set of functions.
00:35:00 What’s Hilbert space?
00:35:02 It’s space with infinite number of coordinates,
00:35:07 say, or function for expansion, something like that.
00:35:11 So it’s much richer.
00:35:14 And when I’m talking about closed form solution,
00:35:17 I’m talking about this set of function,
00:35:20 not piecewise linear set, which is particular case of it
00:35:29 is small part.
00:35:31 So neural networks is a small part
00:35:32 of the space of functions you’re talking about.
00:35:35 Say, small set of functions.
00:35:39 Let me take that.
00:35:40 But it is fine.
00:35:42 It is fine.
00:35:42 I don’t want to discuss the small or big.
00:35:46 You take advantage.
00:35:47 So you have some set of functions.
00:35:51 So now, when you’re trying to create architecture,
00:35:55 you would like to create admissible set of functions,
00:35:58 which all your tricks to use not all functions,
00:36:03 but some subset of this set of functions.
00:36:07 Say, when you’re introducing convolutional net,
00:36:10 it is way to make this subset useful for you.
00:36:16 But from my point of view, convolutional,
00:36:19 it is something you want to keep some invariants,
00:36:24 say, translation invariants.
00:36:27 But now, if you understand this and you cannot explain
00:36:35 on the level of ideas what neural network does,
00:36:41 you should agree that it is much better
00:36:44 to have a set of functions.
00:36:46 And they say, this set of functions should be admissible.
00:36:51 It must keep this invariant, this invariant,
00:36:53 and that invariant.
00:36:55 You know that as soon as you incorporate
00:36:58 new invariant set of function, because smaller and smaller
00:37:01 and smaller.
00:37:02 But all the invariants are specified by you, the human.
00:37:06 Yeah, but what I hope that there is a standard predicate,
00:37:12 like PROPSHOW, that’s what I want
00:37:17 to find for digit recognition.
00:37:19 If we start, it is completely new area,
00:37:22 what is intelligence about on the level,
00:37:25 starting from Plato’s idea, what is world of ideas.
00:37:32 And I believe that is not too many.
00:37:36 But it is amusing that mathematicians doing something,
00:37:40 a neural network in general function,
00:37:44 but people from literature, from art, they use this all
00:37:48 the time.
00:37:49 That’s right.
00:37:50 Invariants saying, it is great how people describe music.
00:37:57 We should learn from that.
00:37:58 And something on this level.
00:38:02 But so why Vladimir Propp, who was just theoretical,
00:38:09 who studied theoretical literature, he found that.
00:38:12 You know what?
00:38:13 Let me throw that right back at you,
00:38:15 because there’s a little bit of a,
00:38:17 that’s less mathematical and more emotional, philosophical,
00:38:21 Vladimir Propp.
00:38:22 I mean, he wasn’t doing math.
00:38:24 No.
00:38:26 And you just said another emotional statement,
00:38:30 which is you believe that this Plato world of ideas is small.
00:38:35 I hope.
00:38:36 I hope.
00:38:38 Do you, what’s your intuition, though?
00:38:42 If we can linger on it.
00:38:44 You know, it is not just small or big.
00:38:48 I know exactly.
00:38:50 Then when I introducing some predicate,
00:38:56 I decrease set of functions.
00:38:59 But my goal to decrease set of function much.
00:39:04 By as much as possible.
00:39:05 By as much as possible.
00:39:07 Good predicate, which does this, then
00:39:11 I should choose next predicate, which decrease set
00:39:15 as much as possible.
00:39:17 So set of good predicate, it is such
00:39:21 that they decrease this amount of admissible function.
00:39:27 So if each good predicate significantly
00:39:30 reduces the set of admissible functions,
00:39:32 that there naturally should not be that many good predicates.
00:39:35 No, but if you reduce very well the VC dimension
00:39:43 of the function, of admissible set of function, it’s small.
00:39:46 And you need not too much training data to do well.
00:39:52 And VC dimension, by the way, is some measure of capacity
00:39:56 of this set of functions.
00:39:57 Right.
00:39:59 Roughly speaking, how many function in this set.
00:40:01 So you’re decreasing, decreasing.
00:40:03 And it makes easy for you to find function
00:40:08 you’re looking for.
00:40:10 But the most important part, to create good admissible set
00:40:14 of functions.
00:40:15 And it probably, there are many ways.
00:40:18 But the good predicates such that they can do that.
00:40:25 So for this duck, you should know a little bit about duck.
00:40:30 Because what are the three fundamental laws of ducks?
00:40:35 Looks like a duck, swims like a duck, and quacks like a duck.
00:40:38 You should know something about ducks to be able to.
00:40:41 Not necessarily.
00:40:42 Looks like, say, horse.
00:40:44 It’s also good.
00:40:46 So it’s not, it generalizes from ducks.
00:40:49 And talk like, and make sound like horse or something.
00:40:54 And run like horse, and moves like horse.
00:40:57 It is general, it is general predicate
00:41:02 that this applied to duck.
00:41:04 But for duck, you can say, play chess like duck.
00:41:09 You cannot say play chess like duck.
00:41:11 Why not?
00:41:12 So you’re saying you can, but that would not be a good.
00:41:15 No, you will not reduce a lot of functions.
00:41:18 You would not do, yeah, you would not
00:41:19 reduce the set of functions.
00:41:21 So you can, the story is formal story, mathematical story.
00:41:26 Is that you can use any function you want as a predicate.
00:41:31 But some of them are good, some of them are not,
00:41:33 because some of them reduce a lot of functions
00:41:36 to admissible set of some of them.
00:41:39 But the question is, and I’ll probably
00:41:41 keep asking this question, but how do we find such,
00:41:45 what’s your intuition?
00:41:47 Handwritten recognition.
00:41:49 How do we find the answer to your challenge?
00:41:52 Yeah, I understand it like that.
00:41:55 I understand what.
00:41:57 What defined?
00:41:59 What it means, I knew predicate.
00:42:01 Yeah.
00:42:02 Like guy who understand music can say this word,
00:42:06 which he described when he listened to music.
00:42:09 He understand music.
00:42:11 He use not too many different, oh, you can do like prop.
00:42:15 You can make collection.
00:42:17 What he talking about music, about this, about that.
00:42:20 It’s not too many different situation he described.
00:42:24 Because we mentioned Vladimir prop a bunch.
00:42:26 Let me just mention, there’s a sequence of 31
00:42:33 structural notions that are common in stories.
00:42:36 And I think.
00:42:37 You call it units.
00:42:38 Units.
00:42:39 And I think they resonate.
00:42:40 I mean, it starts just to give an example,
00:42:43 obsession, a member of the hero’s community,
00:42:46 a family leaves the security of the home environment.
00:42:48 Then it goes to the interdiction,
00:42:51 a forbidding edict or command is passed upon the hero.
00:42:54 Don’t go there.
00:42:55 Don’t do this.
00:42:56 The hero is warned against some action.
00:42:58 Then step three, violation of interdiction.
00:43:05 Break the rules, break out on your own.
00:43:07 Then reconnaissance.
00:43:09 The villain makes an effort to attain knowledge,
00:43:11 needing to fulfill their plan, so on.
00:43:13 It goes on like this, ends in a wedding, number 31.
00:43:19 Happily ever after.
00:43:20 No, he just gave description of all situations.
00:43:26 He understands this world.
00:43:28 Of folktales.
00:43:29 Yeah, not folktales, but stories.
00:43:33 And these stories not in just folktales.
00:43:36 These stories in detective serials as well.
00:43:40 And probably in our lives.
00:43:42 We probably live.
00:43:43 Read this.
00:43:45 And then they wrote that this predicate is good
00:43:52 for different situation.
00:43:54 From movie, for theater.
00:43:57 By the way, there’s also criticism, right?
00:44:00 There’s an other way to interpret narratives
00:44:03 from Claude Levi Strauss.
00:44:09 I don’t know.
00:44:10 I am not in this business.
00:44:12 No, I know, it’s theoretical literature,
00:44:14 but it’s looking at paradigms behind things.
00:44:15 It’s always the discussion, yeah.
00:44:20 But at least there is units.
00:44:23 It’s not too many units that can describe.
00:44:27 But this guy probably gives another units.
00:44:30 Or another way of…
00:44:31 Exactly, another set of units.
00:44:34 Another set of predicates.
00:44:35 It doesn’t matter how.
00:44:37 But they exist.
00:44:40 Probably.
00:44:40 My question is, whether given those units,
00:44:46 whether without our human brains to interpret these units,
00:44:50 they would still hold as much power as they have.
00:44:53 Meaning, are those units enough
00:44:56 when we give them to an alien species?
00:44:58 Let me ask you.
00:45:00 Do you understand digit images?
00:45:06 No, I don’t understand.
00:45:07 No, no, no.
00:45:08 When you can recognize these digit images,
00:45:11 it means that you understand.
00:45:13 Yes, exactly.
00:45:14 You understand characters, you understand…
00:45:17 No, no, no, no.
00:45:22 It’s the imitation versus understanding question,
00:45:25 because I don’t understand the mechanism
00:45:28 by which I understand.
00:45:29 No, no, no.
00:45:30 I’m not talking about, I’m talking about predicates.
00:45:32 You understand that it involves symmetry,
00:45:35 maybe structure, maybe something else.
00:45:37 I cannot formulate.
00:45:38 I just was able to find symmetries, degree of symmetries.
00:45:43 That’s really good.
00:45:44 So this is a good line.
00:45:47 I feel like I understand the basic elements
00:45:50 of what makes a good hand recognition system my own.
00:45:54 Like symmetry connects with me.
00:45:56 It seems like that’s a very powerful predicate.
00:45:59 My question is, is there a lot more going on
00:46:02 that we’re not able to introspect?
00:46:04 Maybe I need to be able to understand
00:46:09 a huge amount in the world of ideas,
00:46:14 thousands of predicates, millions of predicates
00:46:18 in order to do hand recognition.
00:46:20 I don’t think so.
00:46:23 So both your hope and your intuition
00:46:26 are such that very few predicates are enough.
00:46:28 You’re using digits, you’re using examples as well.
00:46:33 Theory says that if you will use all possible functions
00:46:43 from Hilbert space, all possible predicate,
00:46:46 you don’t need training data.
00:46:49 You just will have admissible set of function
00:46:53 which contain one function.
00:46:56 Yes.
00:46:57 So the trade off is when you’re not using all predicates,
00:47:01 you’re only using a few good predicates
00:47:03 you need to have some training data.
00:47:05 Yes, exactly.
00:47:06 The more good predicates you have,
00:47:08 the less training data you need.
00:47:09 Exactly.
00:47:10 That is intelligent.
00:47:13 Still, okay, I’m gonna keep asking the same dumb question,
00:47:17 handwritten recognition to solve the challenge.
00:47:20 You kind of propose a challenge that says
00:47:21 we should be able to get state of the art MNIST error rates
00:47:27 by using very few, 60, maybe fewer examples per digit.
00:47:31 What kind of predicates do you think it will look like?
00:47:35 That is the challenge.
00:47:37 So people who will solve this problem,
00:47:39 they will answer.
00:47:41 Do you think they’ll be able to answer it
00:47:44 in a human explainable way?
00:47:47 They just need to write function, that’s it.
00:47:50 But so can that function be written, I guess,
00:47:54 by an automated reasoning system?
00:47:58 Whether we’re talking about a neural network
00:48:01 learning a particular function or another mechanism?
00:48:05 No, I’m not against neural network.
00:48:08 I’m against admissible set of function
00:48:11 which create neural network.
00:48:13 You did it by hand.
00:48:16 You don’t do it by invariance, by predicate, by reason.
00:48:24 But neural networks can then reverse,
00:48:26 do the reverse step of helping you find a function
00:48:29 that just, the task of a neural network
00:48:33 is to find a disentangled representation, for example,
00:48:38 that they call, is to find that one predicate function
00:48:42 that’s really capture some kind of essence.
00:48:45 One, not the entire essence, but one very useful essence
00:48:48 of this particular visual space.
00:48:52 Do you think that’s possible?
00:48:53 Listen, I’m grasping, hoping there’s an automated way
00:48:58 to find good predicates, right?
00:49:00 So the question is what are the mechanisms
00:49:03 of finding good predicates, ideas
00:49:05 that you think we should pursue?
00:49:08 A young grad student listening right now.
00:49:11 I gave example.
00:49:13 So find situation where predicate which you’re suggesting
00:49:23 don’t create invariant.
00:49:24 It’s like in physics.
00:49:28 Find situation where existing theory cannot explain it.
00:49:37 Find situation where the existing theory
00:49:39 can’t explain it.
00:49:40 So you’re finding contradictions.
00:49:42 Find contradiction, and then remove this contradiction.
00:49:46 But in my case, what means contradiction,
00:49:48 you find function which, if you will use this function,
00:49:53 you’re not keeping invariants.
00:49:56 This is really the process of discovering contradictions.
00:50:01 Yeah.
00:50:04 It is like in physics.
00:50:05 Find situation where you have contradiction
00:50:09 for one of the property, for one of the predicate.
00:50:15 Then include this predicate, making invariants,
00:50:19 and solve again this problem.
00:50:20 Now you don’t have contradiction.
00:50:22 But it is not the best way, probably, I don’t know,
00:50:30 to looking for predicate.
00:50:31 That’s just one way, okay.
00:50:33 That, no, no, it is brute force way.
00:50:35 The brute force way.
00:50:37 What about the ideas of what,
00:50:42 big umbrella term of symbolic AI?
00:50:45 There’s what in the 80s with expert systems,
00:50:48 sort of logic reasoning based systems.
00:50:52 Is there hope there to find some,
00:50:57 through sort of deductive reasoning,
00:51:00 to find good predicates?
00:51:05 I don’t think so.
00:51:08 I think that just logic is not enough.
00:51:12 It’s kind of a compelling notion, though.
00:51:14 You know, that when smart people sit in a room
00:51:17 and reason through things, it seems compelling.
00:51:20 And making our machines do the same is also compelling.
00:51:24 So, everything is very simple.
00:51:29 When you have infinite number of predicate,
00:51:34 you can choose the function you want.
00:51:38 You have invariants and you can choose the function you want.
00:51:41 But you have to have not too many invariants
00:51:51 to solve the problem.
00:51:56 So, and have from infinite number of function
00:51:59 to select finite number
00:52:04 and hopefully small number of functions,
00:52:08 which is good enough to extract small set
00:52:14 of admissible functions.
00:52:17 So, they will be admissible, it’s for sure,
00:52:19 because every function just decrease set of function
00:52:23 and leaving it admissible.
00:52:25 But it will be small.
00:52:27 But why do you think logic based systems don’t,
00:52:32 can’t help, intuition, not?
00:52:35 Because you should know reality.
00:52:37 You should know life.
00:52:39 This guy like Propp, he knows something.
00:52:44 And he tried to put in invariant his understanding.
00:52:49 That’s the human, yeah, but see,
00:52:51 you’re putting too much value into Vladimir Propp
00:52:56 knowing something.
00:52:57 No, it is, in the story, what means you know life?
00:53:04 What it means?
00:53:05 You know common sense.
00:53:07 No, no, you know something.
00:53:10 Common sense, it is some rules.
00:53:13 You think so?
00:53:14 Common sense is simply rules?
00:53:17 Common sense is every, it’s mortality,
00:53:21 it’s fear of death, it’s love, it’s spirituality,
00:53:27 it’s happiness and sadness.
00:53:30 All of it is tied up into understanding gravity,
00:53:34 which is what we think of as common sense.
00:53:36 I don’t really need to discuss so wide.
00:53:39 I want to discuss, understand digit recognition.
00:53:45 Anytime I bring up love and death,
00:53:47 you bring it back to digit recognition, I like it.
00:53:51 No, you know, it is durable because there is a challenge.
00:53:55 Yeah.
00:53:56 Which I see how to solve it.
00:53:59 If I will have a student concentrate on this work,
00:54:02 I will suggest something to solve.
00:54:04 You mean handwritten record?
00:54:07 Yeah, it’s a beautifully simple, elegant, and yet.
00:54:10 I think that I know invariants which will solve this.
00:54:13 You do?
00:54:14 I think so, yes.
00:54:15 But it is not universal, it is maybe,
00:54:21 I want some universal invariants
00:54:24 which are good not only for digit recognition,
00:54:27 for image understanding.
00:54:28 So let me ask, how hard do you think
00:54:34 is 2D image understanding?
00:54:38 So if we, we can kind of intuit handwritten recognition.
00:54:43 How big of a step, leap, journey is it from that?
00:54:49 If I gave you good, if I solved your challenge
00:54:51 for handwritten recognition,
00:54:53 how long would my journey then be from that
00:54:56 to understanding more general, natural images?
00:54:59 Immediately, you will understand this
00:55:01 as soon as you will make a record.
00:55:05 Because it is not for free.
00:55:07 As soon as you will create several invariants
00:55:13 which will help you to get the same performance
00:55:20 that the best neural net did using 100,
00:55:23 there might be more than 100 times less examples,
00:55:27 you have to have something smart to do that.
00:55:31 And you’re saying?
00:55:32 That is invariant, it is predicate.
00:55:35 Because you should put some idea how to do that.
00:55:39 But okay, let me just pause.
00:55:42 Maybe it’s a trivial point, maybe not.
00:55:44 But handwritten recognition feels like a 2D,
00:55:48 two dimensional problem.
00:55:50 And it seems like how much complicated is the fact
00:55:55 that most images are projection of a three dimensional world
00:56:00 onto a 2D plane.
00:56:03 It feels like for a three dimensional world,
00:56:05 we need to start understanding common sense
00:56:08 in order to understand an image.
00:56:11 It’s no longer visual shape and symmetry.
00:56:17 It’s having to start to understand concepts
00:56:19 of, understand life.
00:56:22 Yeah, you’re talking that there are different invariant,
00:56:27 different predicate, yeah.
00:56:28 And potentially much larger number.
00:56:32 You know, maybe, but let’s start from simple.
00:56:36 Yeah, but you said that it would be immediate.
00:56:38 No, you know, I cannot think about things
00:56:41 which I don’t understand.
00:56:43 This I understand, but I’m sure that I don’t understand
00:56:46 everything there.
00:56:48 Yeah, that’s the difference.
00:56:50 Do as simple as possible, but not simpler.
00:56:54 And that is exact case.
00:56:56 With handwritten.
00:56:57 With handwritten.
00:56:58 Yeah, but that’s the difference between you and I.
00:57:04 I welcome and enjoy thinking about things
00:57:07 I completely don’t understand.
00:57:09 Because to me, it’s a natural extension
00:57:12 without having solved handwritten recognition
00:57:15 to wonder how difficult is the next step
00:57:23 of understanding 2D, 3D images.
00:57:25 Because ultimately, while the science of intelligence
00:57:29 is fascinating, it’s also fascinating to see
00:57:31 how that maps to the engineering of intelligence.
00:57:34 And recognizing handwritten digits is not,
00:57:39 doesn’t help you, it might, it may not help you
00:57:43 with the problem of general intelligence.
00:57:46 We don’t know.
00:57:47 It’ll help you a little bit.
00:57:48 We don’t know how much.
00:57:49 It’s unclear.
00:57:49 It’s unclear.
00:57:50 Yeah.
00:57:51 It might very much.
00:57:52 But I would like to make a remark.
00:57:53 Yes.
00:57:54 I start not from very primitive problem,
00:57:58 make a challenge problem.
00:58:03 I start with very general problem, with PLATO.
00:58:07 So you understand, and it comes from PLATO
00:58:10 to digit recognition.
00:58:14 So you basically took PLATO and the world
00:58:18 of forms and ideas and mapped and projected
00:58:22 into the clearest, simplest formulation
00:58:25 of that big world.
00:58:26 You know, I would say that I did not understand PLATO
00:58:31 until recently, and until I consider
00:58:36 the convergence and then predicate,
00:58:40 and then, oh, this is what PLATO told.
00:58:45 So.
00:58:46 Can you linger on that?
00:58:47 Like why, how do you think about this world of ideas
00:58:50 and world of things in PLATO?
00:58:52 No, it is metaphor.
00:58:54 It is.
00:58:55 It’s a metaphor, for sure.
00:58:55 Yeah.
00:58:56 It’s a compelling, it’s a poetic
00:58:57 and a beautiful metaphor.
00:58:58 Yeah, yeah, yeah.
00:58:59 But what, can you?
00:59:00 But it is a way how you should try to understand
00:59:04 how to talk ideas in the world.
00:59:07 So from my point of view,
00:59:11 it is very clear, but it is lying.
00:59:14 All the time, people looking for that.
00:59:17 Say, PLATO, then Hegel, whatever reasonable it exists,
00:59:24 whatever exists, it is reasonable.
00:59:26 I don’t know what he have in mind reasonable.
00:59:30 Right, this philosophers again,
00:59:31 their words. No, no, no, no, no, no, no.
00:59:33 It is next stop of Wigner.
00:59:37 That mathematics understand something of reality.
00:59:40 It is the same PLATO line.
00:59:43 And then it comes suddenly to Vladimir Propp.
00:59:48 Look, 31 ideas, 31 units, and this corrects everything.
00:59:54 There’s abstractions, ideas that represent our world.
00:59:59 Our world, and we should always try to reach into that.
01:00:03 Yeah, but you should make a projection on reality.
01:00:07 But understanding is, it is abstract ideas.
01:00:11 You have in your mind several abstract ideas
01:00:15 which you can apply to reality.
01:00:17 And reality in this case,
01:00:19 so if you look at machine learning as data.
01:00:21 This example, data.
01:00:22 Data.
01:00:24 Okay, let me put this on you
01:00:26 because I’m an emotional creature.
01:00:28 I’m not a mathematical creature like you.
01:00:30 I find compelling the idea,
01:00:33 forget the space, the sea of functions.
01:00:36 There’s also a sea of data in the world.
01:00:39 And I find compelling that there might be,
01:00:42 like you said, teacher,
01:00:44 small examples of data that are most useful
01:00:49 for discovering good,
01:00:53 whether it’s predicates or good functions,
01:00:55 that the selection of data may be a powerful journey,
01:01:00 a useful, you know, coming up with a mechanism
01:01:03 for selecting good data might be useful too.
01:01:07 Do you find this idea of finding the right data set
01:01:12 interesting at all?
01:01:14 Or do you kind of take the data set as a given?
01:01:17 I think that it is, you know, my theme is very simple.
01:01:22 You have huge set of functions.
01:01:25 If you will apply, and you have not too many data,
01:01:31 if you pick up function which describes this data,
01:01:37 you will do not very well.
01:01:41 You will.
01:01:42 Like randomly pick up.
01:01:42 Yeah, you will overfit.
01:01:43 Yeah, it will be overfitting.
01:01:46 So you should decrease set of function
01:01:50 from which you’re picking up one.
01:01:53 So you should go somehow to admissible set of function.
01:01:59 And this, what about weak conversions?
01:02:03 So, but from another point of view,
01:02:08 to make admissible set of function,
01:02:13 you need just a DG, just function
01:02:15 which you will take in inner product,
01:02:19 which you will measure property of your function.
01:02:27 And that is how it works.
01:02:31 No, I get it, I get it, I understand it,
01:02:32 but do you, the reality is.
01:02:34 But let’s think about examples.
01:02:40 You have huge set of function,
01:02:41 and you have several examples.
01:02:44 If you just trying to keep, take function
01:02:50 which satisfies these examples, you still will overfit.
01:02:56 You need decrease, you need admissible set of function.
01:02:59 Absolutely, but what, say you have more data than functions.
01:03:06 So sort of consider the, I mean,
01:03:08 maybe not more data than functions,
01:03:09 because that’s impossible.
01:03:12 But what, I was trying to be poetic for a second.
01:03:15 I mean, you have a huge amount of data,
01:03:17 a huge amount of examples.
01:03:19 But amount of function can be even bigger.
01:03:22 It can get bigger, I understand.
01:03:24 Everything is.
01:03:25 There’s always a bigger boat.
01:03:27 Full Hilbert space.
01:03:29 I got you, but okay.
01:03:31 But you don’t find the world of data
01:03:35 to be an interesting optimization space.
01:03:38 Like the optimization should be in the space of functions.
01:03:45 Creating admissible set of functions.
01:03:47 Admissible set of functions.
01:03:48 No, you know, even from the classical business theory,
01:03:54 from structure risk minimization,
01:03:56 you should organize function in the way
01:04:02 that they will be useful for you.
01:04:06 Right.
01:04:07 And that is admissible set.
01:04:10 The way you’re thinking about useful
01:04:13 is you’re given a small set of examples.
01:04:17 Useful small, small set of function
01:04:19 which contain function I’m looking for.
01:04:21 Yeah, but looking for based on
01:04:25 the empirical set of small examples.
01:04:27 Yeah, but that is another story.
01:04:29 I don’t touch it.
01:04:31 Because I believe that this small examples
01:04:35 is not too small.
01:04:37 Say 60 per class.
01:04:39 Law of large numbers works.
01:04:41 I don’t need uniform law.
01:04:43 The story is that in statistics there are two law.
01:04:46 Law of large numbers and uniform law of large numbers.
01:04:51 So I want to be in situation where I use
01:04:54 law of large numbers but not uniform law of large numbers.
01:04:58 Right, so 60 is law of large, it’s large enough.
01:05:01 I hope, no, it still need some evaluations,
01:05:05 some bonds.
01:05:07 But the idea is the following that
01:05:11 if you trust that
01:05:15 say this average gives you something close to expectations
01:05:21 so you can talk about that, about this predicate.
01:05:26 And that is basis of human intelligence.
01:05:30 Good predicates is the,
01:05:32 the discovery of good predicates is the basis of human intelligence.
01:05:34 It is discoverer of your understanding world.
01:05:39 Of your methodology of understanding world.
01:05:45 Because you have several function
01:05:47 which you will apply to reality.
01:05:51 Can you say that again?
01:05:52 So you’re…
01:05:54 You have several functions predicate.
01:05:58 But they’re abstract.
01:06:00 Yes.
01:06:01 Then you will apply them to reality, to your data.
01:06:04 And you will create in this way predicate.
01:06:07 Which is useful for your task.
01:06:11 But predicate are not related specifically to your task.
01:06:16 To this your task.
01:06:17 It is abstract functions.
01:06:20 Which being applying, applied to…
01:06:23 Many tasks that you might be interested in.
01:06:25 It might be many tasks, I don’t know.
01:06:27 Or…
01:06:28 Different tasks.
01:06:29 Well they should be many tasks, right?
01:06:31 I believe like, like in prop case.
01:06:35 It was for fairytales, but it’s happened everywhere.
01:06:40 Okay, so we talked about images a little bit.
01:06:42 But, can we talk about Noam Chomsky for a second?
01:06:49 No, I believe I…
01:06:52 I don’t know him very well.
01:06:54 Personally, well…
01:06:55 Not personally, I don’t know.
01:06:57 His ideas.
01:06:57 His ideas.
01:06:58 Well let me just say,
01:06:59 do you think language, human language,
01:07:02 is essential to expressing ideas?
01:07:05 As Noam Chomsky believes.
01:07:08 So like, language is at the core
01:07:10 of our formation of predicates.
01:07:13 The human language.
01:07:14 For me, language and all the story of language
01:07:18 is very complicated.
01:07:20 I don’t understand this.
01:07:22 And I am not…
01:07:24 I thought about…
01:07:25 Nobody does.
01:07:26 I am not ready to work on that.
01:07:28 Because it’s so huge.
01:07:30 It is not for me, and I believe not for our century.
01:07:35 The 21st century.
01:07:37 Not for 21st century.
01:07:39 You should learn something, a lot of stuff,
01:07:42 from simple task like digit recognition.
01:07:45 So you think, okay, you think digital recognition,
01:07:49 2D image, how would you more abstractly define
01:07:55 digit recognition?
01:07:56 It’s 2D image, symbol recognition, essentially.
01:08:03 I mean, I’m trying to get a sense,
01:08:08 sort of thinking about it now,
01:08:09 having worked with MNIST forever,
01:08:12 how small of a subset is this
01:08:16 of the general vision recognition problem
01:08:18 and the general intelligence problem?
01:08:21 Is it…
01:08:24 Yeah.
01:08:25 Is it a giant subset?
01:08:26 Is it not?
01:08:27 And how far away is language?
01:08:30 You know, let me refer to Einstein.
01:08:34 Take the simplest problem, as simple as possible,
01:08:38 but not simpler.
01:08:39 And this is challenge, this simple problem.
01:08:44 But it’s simple by idea, but not simple to get it.
01:08:50 When you will do this, you will find some predicate,
01:08:55 which helps it a bit.
01:08:57 Well, yeah, I mean, with Einstein, you can,
01:09:01 you look at general relativity,
01:09:04 but that doesn’t help you with quantum mechanics.
01:09:07 That’s another story.
01:09:08 You don’t have any universal instrument.
01:09:11 Yes, so I’m trying to wonder which space we’re in,
01:09:16 whether handwritten recognition is like general relativity,
01:09:21 and then language is like quantum mechanics.
01:09:23 So you’re still gonna have to do a lot of mess
01:09:27 to universalize it.
01:09:28 But I’m trying to see,
01:09:35 so what’s your intuition why handwritten recognition
01:09:39 is easier than language?
01:09:42 Just, I think a lot of people would agree with that,
01:09:45 but if you could elucidate sort of the intuition of why.
01:09:50 I don’t know, no, I don’t think in this direction.
01:09:56 I just think in directions that this is problem,
01:10:00 which if we will solve it well,
01:10:07 we will create some abstract understanding of images.
01:10:18 Maybe not all images.
01:10:19 I would like to talk to guys who doing in real images
01:10:24 in Columbia University.
01:10:26 What kind of images, unreal?
01:10:28 Real images.
01:10:29 Real images.
01:10:30 Yeah, what they’re ready, is there a predicate,
01:10:33 what can be predicate?
01:10:35 I still symmetry will play role in real life images,
01:10:40 in any real life images, 2D images.
01:10:43 Let’s talk about 2D images.
01:10:46 Because that’s what we know.
01:10:52 A neural network was created for 2D images.
01:10:55 So the people I know in vision science, for example,
01:10:58 the people who study human vision,
01:11:01 that they usually go to the world of symbols
01:11:04 and like handwritten recognition,
01:11:06 but not really, it’s other kinds of symbols
01:11:08 to study our visual perception system.
01:11:11 As far as I know, not much predicate type of thinking
01:11:15 is understood about our vision system.
01:11:17 They did not think in this direction.
01:11:19 They don’t, yeah, but how do you even begin
01:11:21 to think in that direction?
01:11:23 That’s a, I would like to discuss with them.
01:11:26 Yeah.
01:11:27 Because if we will be able to show that it is what working,
01:11:35 and theoretical scheme, it’s not so bad.
01:11:40 So the unfortunate, so if we compare to language,
01:11:43 language is like letters, finite set of letters,
01:11:46 and a finite set of ways you can put together those letters.
01:11:50 So it feels more amenable to kind of analysis.
01:11:53 With natural images, there is so many pixels.
01:11:58 No, no, no, letter, language is much, much more complicated.
01:12:03 It’s involved a lot of different stuff.
01:12:08 It’s not just understanding of very simple class of tasks.
01:12:15 I would like to see list of task with language involved.
01:12:19 Yes, so there’s a lot of nice benchmarks now
01:12:23 in natural language processing from the very trivial,
01:12:27 like understanding the elements of a sentence,
01:12:30 to question answering, to much more complicated
01:12:33 where you talk about open domain dialogue.
01:12:36 The natural question is, with handwritten recognition,
01:12:39 is really the first step of understanding
01:12:42 visual information.
01:12:44 Right.
01:12:46 But even our records show that we go in the wrong direction
01:12:54 because we need 60,000 digits.
01:12:56 So even this first step, so forget about talking
01:12:59 about the full journey, this first step
01:13:01 should be taking in the right direction.
01:13:03 No, no, wrong direction because 60,000 is unacceptable.
01:13:07 No, I’m saying it should be taken in the right direction
01:13:11 because 60,000 is not acceptable.
01:13:13 If you can talk, it’s great, we have half percent of error.
01:13:18 And hopefully the step from doing hand recognition
01:13:22 using very few examples, the step towards what babies do
01:13:26 when they crawl and understand their physical environment.
01:13:30 I know you don’t know about babies.
01:13:31 If you will do from very small examples,
01:13:36 you will find principles which are different
01:13:40 from what we’re using now.
01:13:44 And so it’s more or less clear.
01:13:48 That means that you will use weak convergence,
01:13:52 not just strong convergence.
01:13:54 Do you think these principles
01:13:58 will naturally be human interpretable?
01:14:01 Oh, yeah.
01:14:02 So like when we’ll be able to explain them
01:14:04 and have a nice presentation to show
01:14:06 what those principles are, or are they very,
01:14:10 going to be very kind of abstract kinds of functions?
01:14:14 For example, I talked yesterday about symmetry.
01:14:17 Yes.
01:14:18 And I gave very simple examples.
01:14:20 The same will be like that.
01:14:22 You gave like a predicate of a basic for?
01:14:24 For symmetries.
01:14:25 Yes, for different symmetries and you have for?
01:14:29 Degree of symmetries, that is important.
01:14:31 Not just symmetry.
01:14:33 Existence doesn’t exist, degree of symmetry.
01:14:38 Yeah, for handwritten recognition.
01:14:41 No, it’s not for handwritten, it’s for any images.
01:14:45 But I would like apply to handwritten.
01:14:47 Right, in theory it’s more general, okay, okay.
01:14:55 So a lot of the things we’ve been talking about
01:14:58 falls, we’ve been talking about philosophy a little bit,
01:15:01 but also about mathematics and statistics.
01:15:05 A lot of it falls into this idea,
01:15:08 a universal idea of statistical theory of learning.
01:15:11 What is the most beautiful and sort of powerful
01:15:16 or essential idea you’ve come across,
01:15:19 even just for yourself personally in the world
01:15:22 of statistics or statistic theory of learning?
01:15:25 Probably uniform convergence, which we did
01:15:29 with Alexei Chilvonenkis.
01:15:33 Can you describe universal convergence?
01:15:36 You have law of large numbers.
01:15:40 So for any function, expectation of function,
01:15:44 average of function converged to expectation.
01:15:48 But if you have set of functions,
01:15:50 for any function it is true.
01:15:52 But it should converge simultaneously
01:15:55 for all set of functions.
01:15:59 And for learning, you need uniform convergence.
01:16:06 Just convergence is not enough.
01:16:11 Because when you pick up one which gives minimum,
01:16:16 you can pick up one function which does not converge
01:16:21 and it will give you the best answer for this function.
01:16:31 So you need uniform convergence to guarantee learning.
01:16:34 So learning does not rely on trivial law of large numbers,
01:16:40 it relies on universal law.
01:16:42 But idea of convergence exists in statistics for a long time.
01:16:51 But it is interesting that as I think about myself,
01:17:02 how stupid I was 50 years, I did not see weak convergence.
01:17:08 I work on strong convergence.
01:17:10 But now I think that most powerful is weak convergence.
01:17:15 Because it makes admissible set of functions.
01:17:18 And even in all proverbs,
01:17:22 when people try to understand recognition about dog law,
01:17:28 looks like a dog and so on, they use weak convergence.
01:17:32 People in language, they understand this.
01:17:34 But when we’re trying to create artificial intelligence,
01:17:42 we want event in different way.
01:17:46 We just consider strong convergence arguments.
01:17:50 So reducing the set of admissible functions,
01:17:52 you think there should be effort put into understanding
01:17:58 the properties of weak convergence?
01:18:01 You know, in classical mathematics, in Gilbert space,
01:18:07 there are only two ways,
01:18:08 two form of convergence, strong and weak.
01:18:14 Now we can use both.
01:18:16 That means that we did everything.
01:18:21 And it so happened that when we use Hilbert space,
01:18:26 which is very rich space, space of continuous functions,
01:18:34 which has integral and square.
01:18:38 So we can apply weak and strong convergence for learning
01:18:42 and have closed form solution.
01:18:45 So for computationally simple.
01:18:47 For me, it is sign that it is right way.
01:18:51 Because you don’t need any heuristic here,
01:18:55 just do whatever you want.
01:18:59 But now the only what left is this concept
01:19:03 of what is predicate, but it is not statistics.
01:19:08 By the way, I like the fact that you think that heuristics
01:19:11 are a mess that should be removed from the system.
01:19:14 So closed form solution is the ultimate goal.
01:19:18 No, it so happened that when you’re using right instrument,
01:19:23 you have closed form solution.
01:19:28 Do you think intelligence, human level intelligence,
01:19:32 when we create it,
01:19:37 will have something like a closed form solution?
01:19:42 You know, now I’m looking on bounds,
01:19:46 which I gave bounds for convergence.
01:19:51 And when I’m looking for bounds,
01:19:53 I’m thinking what is the most appropriate kernel
01:19:59 for this bound would be.
01:20:02 So we know that in say,
01:20:05 all our businesses, we use radial basis function.
01:20:11 But looking on the bound,
01:20:13 I think that I start to understand that maybe
01:20:17 we need to make corrections to radial basis function
01:20:21 to be closer to work better for this bounds.
01:20:28 So I’m again trying to understand what type of kernel
01:20:33 have best approximation,
01:20:37 best fit to this bound.
01:20:43 Sure, so there’s a lot of interesting work
01:20:45 that could be done in discovering better functions
01:20:47 than radial basis functions for bounds you find.
01:20:53 It still comes from,
01:20:55 you’re looking to mass and trying to understand what.
01:21:00 From your own mind, looking at the, I don’t know.
01:21:03 Then I’m trying to understand what will be good for that.
01:21:11 Yeah, but to me, there’s still a beauty.
01:21:14 Again, maybe I’m a descendant of Alan Turing to heuristics.
01:21:17 To me, ultimately, intelligence will be a mess of heuristics.
01:21:23 And that’s the engineering answer, I guess.
01:21:26 Absolutely.
01:21:27 When you’re doing say, self driving cars,
01:21:31 the great guy who will do this.
01:21:35 It doesn’t matter what theory behind that.
01:21:40 Who has a better feeling how to apply it.
01:21:43 But by the way, it is the same story about predicates.
01:21:50 Because you cannot create rule for,
01:21:53 situation is much more than you have rule for that.
01:21:56 But maybe you can have more abstract rule
01:22:04 than it will be less literal.
01:22:08 It is the same story about ideas
01:22:10 and ideas applied to specific cases.
01:22:16 But still you should reach.
01:22:17 You cannot avoid this.
01:22:18 Yes, of course.
01:22:19 But you should still reach for the ideas
01:22:21 to understand the science.
01:22:22 Okay, let me kind of ask, do you think neural networks
01:22:27 or functions can be made to reason?
01:22:34 So what do you think, we’ve been talking about intelligence,
01:22:37 but this idea of reasoning,
01:22:39 there’s an element of sequentially disassembling,
01:22:44 interpreting the images.
01:22:48 So when you think of handwritten recognition, we kind of think
01:22:54 that there’ll be a single, there’s an input and output.
01:22:56 There’s not a recurrence.
01:23:01 What do you think about sort of the idea of recurrence,
01:23:04 of going back to memory and thinking through this
01:23:06 sort of sequentially mangling the different representations
01:23:11 over and over until you arrive at a conclusion?
01:23:20 Or is ultimately all that can be wrapped up into a function?
01:23:23 No, you’re suggesting that let us use this type of algorithm.
01:23:29 When I started thinking, I first of all,
01:23:33 starting to understand what I want.
01:23:36 Can I write down what I want?
01:23:39 And then I’m trying to formalize.
01:23:45 And when I do that, I think I have to solve this problem.
01:23:52 And till now I did not see a situation where you need recurrence.
01:24:04 But do you observe human beings?
01:24:07 Yeah.
01:24:08 You try to, it’s the imitation question, right?
01:24:12 It seems that human beings reason
01:24:14 this kind of sequentially sort of,
01:24:20 does that inspire in you a thought that we need to add that
01:24:24 into our intelligence systems?
01:24:30 You’re saying, okay, I mean, you’ve kind of answered saying
01:24:34 until now I haven’t seen a need for it.
01:24:37 And so because of that, you don’t see a reason
01:24:40 to think about it.
01:24:41 You know, most of things I don’t understand.
01:24:45 In reasoning in human, it is for me too complicated.
01:24:52 For me, the most difficult part is to ask questions,
01:25:01 to good questions, how it works,
01:25:03 how people asking questions, I don’t know this.
01:25:11 You said that machine learning is not only
01:25:13 about technical things, speaking of questions,
01:25:16 but it’s also about philosophy.
01:25:19 So what role does philosophy play in machine learning?
01:25:23 We talked about Plato, but generally thinking
01:25:28 in this philosophical way, does it have,
01:25:32 how does philosophy and math fit together in your mind?
01:25:36 First ideas and then their implementation.
01:25:39 It’s like predicate, like say admissible set of functions.
01:25:48 It comes together, everything.
01:25:51 Because the first iteration of theory was done 50 years ago.
01:25:58 I told that, this is theory.
01:26:00 So everything’s there, if you have data you can,
01:26:04 and your set of function has not big capacity.
01:26:13 So low VC dimension, you can do that.
01:26:15 You can make structural risk minimization, control capacity.
01:26:21 But you was not able to make admissible set of function good.
01:26:26 Now when suddenly realize that we did not use
01:26:33 another idea of convergence, which we can,
01:26:39 everything comes together.
01:26:41 But those are mathematical notions.
01:26:43 Philosophy plays a role of simply saying
01:26:48 that we should be swimming in the space of ideas.
01:26:52 Let’s talk what is philosophy.
01:26:54 Philosophy means understanding of life.
01:26:58 So understanding of life, say people like Plata,
01:27:03 they understand on very high abstract level of life.
01:27:07 So, and whatever I doing,
01:27:12 just implementation of my understanding of life.
01:27:16 But every new step, it is very difficult.
01:27:21 For example, to find this idea
01:27:28 that we need big convergence was not simple for me.
01:27:40 So that required thinking about life a little bit.
01:27:44 Hard to trace, but there was some thought process.
01:27:48 I’m working, I’m thinking about the same problem
01:27:52 for 50 years or more, and again, and again, and again.
01:28:00 I’m trying to be honest and that is very important.
01:28:02 Not to be very enthusiastic, but concentrate
01:28:06 on whatever we was not able to achieve, for example.
01:28:12 And understand why.
01:28:13 And now I understand that because I believe in math,
01:28:18 I believe that in Wigner’s idea.
01:28:23 But now when I see that there are only two way
01:28:28 of convergence and we’re using both,
01:28:32 that means that we must do as well as people doing.
01:28:37 But now, exactly in philosophy
01:28:42 and what we know about predicate,
01:28:45 how we understand life, can we describe as a predicate.
01:28:51 I thought about that and that is more or less obvious
01:28:57 level of symmetry.
01:29:00 But next, I have a feeling,
01:29:05 it’s something about structures.
01:29:09 But I don’t know how to formulate,
01:29:11 how to measure measure of structure and all this stuff.
01:29:16 And the guy who will solve this challenge problem,
01:29:22 then when we were looking how he did it,
01:29:27 probably just only symmetry is not enough.
01:29:30 But something like symmetry will be there.
01:29:33 Structure will be there.
01:29:34 Oh yeah, absolutely.
01:29:35 Symmetry will be there and level of symmetry will be there.
01:29:40 And level of symmetry, antisymmetry, diagonal, vertical.
01:29:44 And I even don’t know how you can use
01:29:48 in different direction idea of symmetry, it’s very general.
01:29:52 But it will be there.
01:29:54 I think that people very sensitive to idea of symmetry.
01:29:58 But there are several ideas like symmetry.
01:30:04 As I would like to learn.
01:30:07 But you cannot learn just thinking about that.
01:30:11 You should do challenging problems
01:30:14 and then analyze them, why it was able to solve them.
01:30:20 And then you will see.
01:30:22 Very simple things, it’s not easy to find.
01:30:25 But even with talking about this every time.
01:30:32 I was surprised, I tried to understand.
01:30:36 These people describe in language
01:30:40 strong convergence mechanism for learning.
01:30:44 I did not see, I don’t know.
01:30:46 But weak convergence, this dark story
01:30:50 and story like that when you will explain to kid,
01:30:54 you will use weak convergence argument.
01:30:57 It looks like it does like it does that.
01:31:00 But when you try to formalize, you’re just ignoring this.
01:31:05 Why, why 50 years from start of machine learning?
01:31:10 And that’s the role of philosophy, thinking about life.
01:31:12 I think that maybe, I don’t know.
01:31:18 Maybe this is theory also, we should blame for that
01:31:22 because empirical risk minimization and all this stuff.
01:31:27 And if you read now textbooks,
01:31:30 they just about bound about empirical risk minimization.
01:31:34 They don’t looking for another problem like admissible set.
01:31:41 But on the topic of life, perhaps we,
01:31:47 you could talk in Russian for a little bit.
01:31:50 What’s your favorite memory from childhood?
01:31:53 What’s your favorite memory from childhood?
01:31:56 Oh, music.
01:31:59 How about, can you try to answer in Russian?
01:32:02 Music?
01:32:04 It was very cool when…
01:32:08 What kind of music?
01:32:09 Classic music.
01:32:11 What’s your favorite?
01:32:13 Well, different composers.
01:32:15 At first, it was Vivaldi, I was surprised that it was possible.
01:32:23 And then when I understood Bach, I was absolutely shocked.
01:32:29 By the way, from him I think that there is a predicate,
01:32:35 like a structure.
01:32:36 In Bach?
01:32:37 Well, of course.
01:32:38 Because you can just feel the structure.
01:32:42 And I don’t think that different elements of life
01:32:49 are very much divided, in the sense of predicates.
01:32:53 Everywhere structure, in painting structure,
01:32:56 in human relations structure.
01:32:59 Here’s how to find these high level predicates, it’s…
01:33:05 In Bach and in life, everything is connected.
01:33:08 Now that we’re talking about Bach,
01:33:14 let’s switch back to English,
01:33:15 because I like Beethoven and Chopin, so…
01:33:18 Well, Chopin, it’s another amusing story.
01:33:21 But Bach, if we talk about predicates,
01:33:23 Bach probably has the most sort of
01:33:29 well defined predicates that underlie it.
01:33:31 It is very interesting to read what critics
01:33:36 are writing about Bach, which words they’re using.
01:33:40 They’re trying to describe predicates.
01:33:43 And then Chopin, it is very different vocabulary,
01:33:52 very different predicates.
01:33:55 And I think that if you will make collection of that,
01:34:02 so maybe from this you can describe predicate
01:34:05 for digit recognition as well.
01:34:08 From Bach and Chopin.
01:34:10 No, no, no, not from Bach and Chopin.
01:34:12 From the critic interpretation of the music, yeah.
01:34:15 When they’re trying to explain you music, what they use.
01:34:22 As they use, they describe high level ideas
01:34:25 of platos ideas, what behind this music.
01:34:28 That’s brilliant.
01:34:29 So art is not self explanatory in some sense.
01:34:34 So you have to try to convert it into ideas.
01:34:39 It is ill post problems.
01:34:40 When you go from ideas to the representation,
01:34:46 it is easy way.
01:34:47 But when you’re trying to go Bach, it is ill post problems.
01:34:51 But nevertheless, I believe that when you’re looking
01:34:55 from that, even from art, you will be able to find
01:35:00 predicates for digit recognition.
01:35:02 That’s such a fascinating and powerful notion.
01:35:08 Do you ponder your own mortality?
01:35:11 Do you think about it?
01:35:12 Do you fear it?
01:35:13 Do you draw insight from it?
01:35:16 About mortality, no, yeah.
01:35:21 Are you afraid of death?
01:35:25 Not too much, not too much.
01:35:29 It is pity that I will not be able to do something
01:35:33 which I think I have a feeling to do that.
01:35:39 For example, I will be very happy to work with guys
01:35:48 theoretician from music to write this collection
01:35:52 of description, how they describe music,
01:35:55 how they use that predicate, and from art as well.
01:36:00 Then take what is in common and try to understand
01:36:04 predicate which is absolute for everything.
01:36:08 And then use that for visual recognition
01:36:10 and see if there is a connection.
01:36:12 Yeah, exactly.
01:36:13 Ah, there’s still time.
01:36:14 We got time.
01:36:16 Ha ha ha ha.
01:36:18 Yeah.
01:36:19 We got time.
01:36:20 It take years and years and years.
01:36:24 Yes, yeah, it’s a long way.
01:36:26 Well, see, you’ve got the patient mathematicians mind.
01:36:30 I think it could be done very quickly and very beautifully.
01:36:34 I think it’s a really elegant idea.
01:36:35 Yeah, but also.
01:36:36 Some of many.
01:36:37 Yeah, you know, the most time,
01:36:40 it is not to make this collection to understand
01:36:45 what is the common to think about that once again
01:36:48 and again and again.
01:36:49 Again and again and again, but I think sometimes,
01:36:52 especially just when you say this idea now,
01:36:55 even just putting together the collection
01:36:58 and looking at the different sets of data,
01:37:03 language, trying to interpret music,
01:37:05 criticize music, and images,
01:37:08 I think there’ll be sparks of ideas that’ll come.
01:37:10 Of course, again and again, you’ll come up with better ideas,
01:37:13 but even just that notion is a beautiful notion.
01:37:16 I even have some example.
01:37:19 Yes, so I have friend
01:37:25 who was specialist in Russian poetry.
01:37:30 She is professor of Russian poetry.
01:37:35 He did not write poems,
01:37:39 but she know a lot of stuff.
01:37:43 She make book, several books,
01:37:48 and one of them is a collection of Russian poetry.
01:37:54 She have images of Russian poetry.
01:37:57 She collect all images of Russian poetry.
01:38:00 And I ask her to do following.
01:38:05 You have NIPS, digit recognition,
01:38:09 and we get 100 digits,
01:38:13 or maybe less than 100.
01:38:15 I don’t remember, maybe 50 digits.
01:38:18 And try from poetical point of view,
01:38:21 describe every image which she see,
01:38:25 using only words of images of Russian poetry.
01:38:31 And she did it.
01:38:34 And then we tried to,
01:38:41 I call it learning using privileged information.
01:38:43 I call it privileged information.
01:38:45 You have on two languages.
01:38:48 One language is just image of digit,
01:38:53 and another language, poetic description of this image.
01:38:57 And this is privileged information.
01:39:02 And there is an algorithm when you’re working
01:39:04 using privileged information, you’re doing better.
01:39:08 Much better, so.
01:39:10 So there’s something there.
01:39:11 Something there.
01:39:12 And there is a, in NEC,
01:39:16 she unfortunately died.
01:39:20 The collection of digits
01:39:24 in poetic descriptions of these digits.
01:39:29 Yeah.
01:39:30 So there’s something there in that poetic description.
01:39:32 But I think that there is a abstract ideas
01:39:38 on the plot of level of ideas.
01:39:40 Yeah, that they’re there.
01:39:42 That could be discovered.
01:39:43 And music seems to be a good entry point.
01:39:45 But as soon as we start with this challenge problem.
01:39:50 The challenge problem.
01:39:51 Listen.
01:39:52 It immediately connected to all this stuff.
01:39:55 Especially with your talk and this podcast,
01:39:58 and I’ll do whatever I can to advertise it.
01:40:00 It’s such a clean, beautiful Einstein like formulation
01:40:03 of the challenge before us.
01:40:05 Right.
01:40:06 Let me ask another absurd question.
01:40:09 We talked about mortality.
01:40:12 We talked about philosophy of life.
01:40:14 What do you think is the meaning of life?
01:40:17 What’s the predicate for mysterious existence here on earth?
01:40:29 I don’t know.
01:40:33 It’s very interesting how we have,
01:40:37 in Russia, I don’t know if you know the guy Strugatsky.
01:40:43 They are writing fiction.
01:40:46 They’re thinking about human, what’s going on.
01:40:51 And they have idea that there are developing
01:41:00 two type of people, common people and very smart people.
01:41:05 They just started.
01:41:06 And these two branches of people will go
01:41:10 in different direction very soon.
01:41:13 So that’s what they’re thinking about that.
01:41:18 So the purpose of life is to create two paths.
01:41:23 Two paths.
01:41:24 Of human societies.
01:41:25 Yes.
01:41:27 Simple people and more complicated people.
01:41:29 Which do you like best?
01:41:31 The simple people or the complicated ones?
01:41:34 I don’t know that it is just his fantasy,
01:41:38 but you know, every week we have guy
01:41:41 who is just a writer and also a theorist of literature.
01:41:51 And he explain how he understand literature
01:41:56 and human relationship.
01:41:58 How he see life.
01:42:00 And I understood that I’m just small kids
01:42:06 comparing to him.
01:42:09 He’s very smart guy in understanding life.
01:42:13 He knows this predicate.
01:42:15 He knows big blocks of life.
01:42:19 I am used every time when I listen to him.
01:42:24 And he just talking about literature.
01:42:27 And I think that I was surprised.
01:42:33 So the managers in big companies,
01:42:41 most of them are guys who study English language
01:42:48 and English literature.
01:42:51 So why?
01:42:52 Because they understand life.
01:42:54 They understand models.
01:42:57 And among them,
01:42:58 maybe many talented critics just analyzing this.
01:43:06 And this is big science like property.
01:43:10 This is blocks.
01:43:13 That’s very smart.
01:43:17 It amazes me that you are and continue to be humbled
01:43:21 by the brilliance of others.
01:43:22 I’m very modest about myself.
01:43:25 I see so smart guys around.
01:43:28 Well, let me be immodest for you.
01:43:31 You’re one of the greatest mathematicians,
01:43:33 statisticians of our time.
01:43:35 It’s truly an honor.
01:43:36 Thank you for talking again.
01:43:38 And let’s talk.
01:43:41 It is not.
01:43:43 I know my limits.
01:43:45 Let’s talk again when your challenge is taken on
01:43:49 and solved by grad student.
01:43:51 Especially when they use it.
01:43:55 It happens.
01:43:57 Maybe music will be involved.
01:43:58 Latimer, thank you so much.
01:43:59 It’s been an honor. Thank you very much.
01:44:02 Thanks for listening to this conversation
01:44:04 with Latimer Vapnik.
01:44:05 And thank you to our presenting sponsor, Cash App.
01:44:08 Download it, use code LexPodcast.
01:44:11 You’ll get $10 and $10 will go to FIRST,
01:44:14 an organization that inspires and educates young minds
01:44:17 to become science and technology innovators of tomorrow.
01:44:20 If you enjoy this podcast, subscribe on YouTube,
01:44:23 give us five stars on Apple Podcast,
01:44:25 support it on Patreon,
01:44:26 or simply connect with me on Twitter at Lex Friedman.
01:44:31 And now, let me leave you with some words
01:44:33 from Latimer Vapnik.
01:44:35 When solving a problem of interest,
01:44:37 do not solve a more general problem
01:44:40 as an intermediate step.
01:44:43 Thank you for listening.
01:44:44 I hope to see you next time.