Yoshua Bengio: Deep Learning #4

Transcript

00:00:00 What difference between biological neural networks and artificial neural networks

00:00:04 is most mysterious, captivating, and profound for you?

00:00:11 First of all, there’s so much we don’t know about biological neural networks,

00:00:15 and that’s very mysterious and captivating because maybe it holds the key to improving

00:00:21 artificial neural networks. One of the things I studied recently is something

00:00:29 that we don’t know how biological neural networks do but would be really useful for artificial ones

00:00:37 is the ability to do credit assignment through very long time spans. There are things that

00:00:46 we can in principle do with artificial neural nets, but it’s not very convenient and it’s

00:00:50 not biologically plausible. And this mismatch, I think this kind of mismatch

00:00:55 may be an interesting thing to study to, A, understand better how brains might do these

00:01:02 things because we don’t have good corresponding theories with artificial neural nets, and B,

00:01:09 maybe provide new ideas that we could explore about things that brain do differently and that

00:01:18 we could incorporate in artificial neural nets. So let’s break credit assignment up a little bit.

00:01:23 Yes. So what, it’s a beautifully technical term, but it could incorporate so many things. So is it

00:01:30 more on the RNN memory side, that thinking like that, or is it something about knowledge, building

00:01:37 up common sense knowledge over time? Or is it more in the reinforcement learning sense that you’re

00:01:44 picking up rewards over time for a particular, to achieve a certain kind of goal? So I was thinking

00:01:50 more about the first two meanings whereby we store all kinds of memories, episodic memories

00:01:59 in our brain, which we can access later in order to help us both infer causes of things that we

00:02:10 are observing now and assign credit to decisions or interpretations we came up with a while ago

00:02:20 when those memories were stored. And then we can change the way we would have reacted or interpreted

00:02:29 things in the past, and now that’s credit assignment used for learning.

00:02:33 So in which way do you think artificial neural networks, the current LSTM, the current architectures

00:02:43 are not able to capture the, presumably you’re thinking of very long term?

00:02:50 Yes. So current, the current nets are doing a fairly good jobs for sequences with dozens or

00:02:58 say hundreds of time steps. And then it gets harder and harder and depending on what you have

00:03:04 to remember and so on, as you consider longer durations. Whereas humans seem to be able to

00:03:12 do credit assignment through essentially arbitrary times, like I could remember something I did last

00:03:16 year. And then now because I see some new evidence, I’m going to change my mind about the way I was

00:03:23 thinking last year. And hopefully not do the same mistake again.

00:03:30 I think a big part of that is probably forgetting. You’re only remembering the really important

00:03:36 things. It’s very efficient forgetting.

00:03:40 Yes. So there’s a selection of what we remember. And I think there are really cool connection to

00:03:46 higher level cognition here regarding consciousness, deciding and emotions,

00:03:52 so deciding what comes to consciousness and what gets stored in memory, which are not trivial either.

00:04:00 So you’ve been at the forefront there all along, showing some of the amazing things that neural

00:04:07 networks, deep neural networks can do in the field of artificial intelligence is just broadly

00:04:12 in all kinds of applications. But we can talk about that forever. But what, in your view,

00:04:19 because we’re thinking towards the future, is the weakest aspect of the way deep neural networks

00:04:23 represent the world? What is that? What is in your view is missing?

00:04:29 So current state of the art neural nets trained on large quantities of images or texts

00:04:38 have some level of understanding of, you know, what explains those data sets, but it’s very

00:04:45 basic, it’s it’s very low level. And it’s not nearly as robust and abstract and general

00:04:54 as our understanding. Okay, so that doesn’t tell us how to fix things. But I think it encourages

00:05:02 us to think about how we can maybe train our neural nets differently, so that they would

00:05:14 focus, for example, on causal explanation, something that we don’t do currently with neural

00:05:20 net training. Also, one thing I’ll talk about in my talk this afternoon is the fact that

00:05:27 instead of learning separately from images and videos on one hand and from texts on the other

00:05:33 hand, we need to do a better job of jointly learning about language and about the world

00:05:42 to which it refers. So that, you know, both sides can help each other. We need to have good world

00:05:50 models in our neural nets for them to really understand sentences, which talk about what’s

00:05:57 going on in the world. And I think we need language input to help provide clues about

00:06:06 what high level concepts like semantic concepts should be represented at the top levels of our

00:06:13 neural nets. In fact, there is evidence that the purely unsupervised learning of representations

00:06:21 doesn’t give rise to high level representations that are as powerful as the ones we’re getting

00:06:28 from supervised learning. And so the clues we’re getting just with the labels, not even sentences,

00:06:35 is already very, very high level. And I think that’s a very important thing to keep in mind.

00:06:42 It’s already very powerful. Do you think that’s an architecture challenge or is it a data set challenge?

00:06:49 Neither. I’m tempted to just end it there. Can you elaborate slightly?

00:07:02 Of course, data sets and architectures are something you want to always play with. But

00:07:06 I think the crucial thing is more the training objectives, the training frameworks. For example,

00:07:13 going from passive observation of data to more active agents, which

00:07:22 learn by intervening in the world, the relationships between causes and effects,

00:07:27 the sort of objective functions, which could be important to allow the highest level explanations

00:07:36 to rise from the learning, which I don’t think we have now, the kinds of objective functions,

00:07:43 which could be used to reward exploration, the right kind of exploration. So these kinds of

00:07:50 questions are neither in the data set nor in the architecture, but more in how we learn,

00:07:57 under what objectives and so on. Yeah, I’ve heard you mention in several contexts, the idea of sort

00:08:04 of the way children learn, they interact with objects in the world. And it seems fascinating

00:08:08 because in some sense, except with some cases in reinforcement learning, that idea

00:08:15 is not part of the learning process in artificial neural networks. So it’s almost like,

00:08:21 do you envision something like an objective function saying, you know what, if you

00:08:29 poke this object in this kind of way, it would be really helpful for me to further learn.

00:08:36 Right, right.

00:08:37 Sort of almost guiding some aspect of the learning.

00:08:40 Right, right, right. So I was talking to Rebecca Sacks just a few minutes ago,

00:08:43 and she was talking about lots and lots of evidence from infants seem to clearly pick

00:08:52 what interests them in a directed way. And so they’re not passive learners, they focus their

00:09:03 attention on aspects of the world, which are most interesting, surprising in a non trivial way.

00:09:10 That makes them change their theories of the world.

00:09:16 So that’s a fascinating view of the future progress. But on a more maybe boring question,

00:09:26 do you think going deeper and larger, so do you think just increasing the size of the things that

00:09:33 have been increasing a lot in the past few years, is going to be a big thing?

00:09:38 I think increasing the size of the things that have been increasing a lot in the past few years

00:09:44 will also make significant progress. So some of the representational issues that you mentioned,

00:09:51 they’re kind of shallow, in some sense.

00:09:54 Oh, shallow in the sense of abstraction.

00:09:58 In the sense of abstraction, they’re not getting some…

00:10:00 I don’t think that having more depth in the network in the sense of instead of 100 layers,

00:10:06 you’re going to have more layers. I don’t think so. Is that obvious to you?

00:10:11 Yes. What is clear to me is that engineers and companies and labs and grad students will continue

00:10:19 to tune architectures and explore all kinds of tweaks to make the current state of the art

00:10:25 slightly ever slightly better. But I don’t think that’s going to be nearly enough. I think we need

00:10:31 changes in the way that we’re considering learning to achieve the goal that these learners actually

00:10:39 understand in a deep way the environment in which they are, you know, observing and acting.

00:10:46 But I guess I was trying to ask a question that’s more interesting than just more layers.

00:10:53 It’s basically, once you figure out a way to learn through interacting, how many parameters

00:11:00 it takes to store that information. So I think our brain is quite bigger than most neural networks.

00:11:07 Right, right. Oh, I see what you mean. Oh, I’m with you there. So I agree that in order to

00:11:14 build neural nets with the kind of broad knowledge of the world that typical adult humans have,

00:11:20 probably the kind of computing power we have now is going to be insufficient.

00:11:25 So the good news is there are hardware companies building neural net chips. And so

00:11:30 it’s going to get better. However, the good news in a way, which is also a bad news,

00:11:37 is that even our state of the art, deep learning methods fail to learn models that understand

00:11:46 even very simple environments, like some grid worlds that we have built.

00:11:52 Even these fairly simple environments, I mean, of course, if you train them with enough examples,

00:11:56 eventually they get it. But it’s just like, instead of what humans might need just

00:12:03 dozens of examples, these things will need millions for very, very, very simple tasks.

00:12:10 And so I think there’s an opportunity for academics who don’t have the kind of computing

00:12:16 power that, say, Google has to do really important and exciting research to advance

00:12:23 the state of the art in training frameworks, learning models, agent learning in even simple

00:12:30 environments that are synthetic, that seem trivial, but yet current machine learning fails on.

00:12:38 We talked about priors and common sense knowledge. It seems like

00:12:43 we humans take a lot of knowledge for granted. So what’s your view of these priors of forming

00:12:52 this broad view of the world, this accumulation of information and how we can teach neural networks

00:12:58 or learning systems to pick that knowledge up? So knowledge, for a while, the artificial

00:13:05 intelligence was maybe in the 80s, like there’s a time where knowledge representation, knowledge,

00:13:14 acquisition, expert systems, I mean, the symbolic AI was a view, was an interesting problem set to

00:13:22 solve and it was kind of put on hold a little bit, it seems like. Because it doesn’t work.

00:13:27 It doesn’t work. That’s right. But that’s right. But the goals of that remain important.

00:13:34 Yes. Remain important. And how do you think those goals can be addressed?

00:13:39 Right. So first of all, I believe that one reason why the classical expert systems approach failed

00:13:48 is because a lot of the knowledge we have, so you talked about common sense intuition,

00:13:56 there’s a lot of knowledge like this, which is not consciously accessible.

00:14:01 There are lots of decisions we’re taking that we can’t really explain, even if sometimes we make

00:14:05 up a story. And that knowledge is also necessary for machines to take good decisions. And that

00:14:15 knowledge is hard to codify in expert systems, rule based systems and classical AI formalism.

00:14:22 And there are other issues, of course, with the old AI, like not really good ways of handling

00:14:29 uncertainty, I would say something more subtle, which we understand better now, but I think still

00:14:37 isn’t enough in the minds of people. There’s something really powerful that comes from

00:14:43 distributed representations, the thing that really makes neural nets work so well.

00:14:49 And it’s hard to replicate that kind of power in a symbolic world. The knowledge in expert systems

00:14:58 and so on is nicely decomposed into like a bunch of rules. Whereas if you think about a neural net,

00:15:04 it’s the opposite. You have this big blob of parameters which work intensely together to

00:15:10 represent everything the network knows. And it’s not sufficiently factorized. It’s not

00:15:16 sufficiently factorized. And so I think this is one of the weaknesses of current neural nets,

00:15:24 that we have to take lessons from classical AI in order to bring in another kind of compositionality,

00:15:32 which is common in language, for example, and in these rules, but that isn’t so native to neural

00:15:38 nets. And on that line of thinking, disentangled representations. Yes. So let me connect with

00:15:48 disentangled representations, if you might, if you don’t mind. So for many years, I’ve thought,

00:15:55 and I still believe that it’s really important that we come up with learning algorithms,

00:16:00 either unsupervised or supervised, but reinforcement, whatever, that build representations

00:16:06 in which the important factors, hopefully causal factors are nicely separated and easy to pick up

00:16:13 from the representation. So that’s the idea of disentangled representations. It says transform

00:16:18 the data into a space where everything becomes easy. We can maybe just learn with linear models

00:16:25 about the things we care about. And I still think this is important, but I think this is missing out

00:16:30 on a very important ingredient, which classical AI systems can remind us of.

00:16:38 So let’s say we have these disentangled representations. You still need to learn about

00:16:43 the relationships between the variables, those high level semantic variables. They’re not going

00:16:47 to be independent. I mean, this is like too much of an assumption. They’re going to have some

00:16:52 interesting relationships that allow to predict things in the future, to explain what happened

00:16:56 in the past. The kind of knowledge about those relationships in a classical AI system

00:17:01 is encoded in the rules. Like a rule is just like a little piece of knowledge that says,

00:17:06 oh, I have these two, three, four variables that are linked in this interesting way,

00:17:10 then I can say something about one or two of them given a couple of others, right?

00:17:14 In addition to disentangling the elements of the representation, which are like the variables

00:17:22 in a rule based system, you also need to disentangle the mechanisms that relate those

00:17:31 variables to each other. So like the rules. So the rules are neatly separated. Like each rule is,

00:17:37 you know, living on its own. And when I change a rule because I’m learning, it doesn’t need to

00:17:43 break other rules. Whereas current neural nets, for example, are very sensitive to what’s called

00:17:48 catastrophic forgetting, where after I’ve learned some things and then I learn new things,

00:17:54 they can destroy the old things that I had learned, right? If the knowledge was better

00:17:59 factorized and separated, disentangled, then you would avoid a lot of that.

00:18:06 Now, you can’t do this in the sensory domain.

00:18:10 What do you mean by sensory domain?

00:18:13 Like in pixel space. But my idea is that when you project the data in the right semantic space,

00:18:18 it becomes possible to now represent this extra knowledge beyond the transformation from inputs

00:18:25 to representations, which is how representations act on each other and predict the future and so on

00:18:31 in a way that can be neatly disentangled. So now it’s the rules that are disentangled from each

00:18:37 other and not just the variables that are disentangled from each other.

00:18:40 And you draw a distinction between semantic space and pixel, like does there need to be

00:18:45 an architectural difference?

00:18:46 Well, yeah. So there’s the sensory space like pixels, which where everything is entangled.

00:18:52 The information, like the variables are completely interdependent in very complicated ways.

00:18:58 And also computation, like it’s not just the variables, it’s also how they are related to

00:19:03 each other is all intertwined. But I’m hypothesizing that in the right high level

00:19:11 representation space, both the variables and how they relate to each other can be

00:19:16 disentangled. And that will provide a lot of generalization power.

00:19:20 Generalization power.

00:19:22 Yes.

00:19:22 Distribution of the test set is assumed to be the same as the distribution of the training set.

00:19:29 Right. This is where current machine learning is too weak. It doesn’t tell us anything,

00:19:35 is not able to tell us anything about how our neural nets, say, are going to generalize to

00:19:40 a new distribution. And, you know, people may think, well, but there’s nothing we can say

00:19:45 if we don’t know what the new distribution will be. The truth is humans are able to generalize

00:19:50 to new distributions.

00:19:52 Yeah. How are we able to do that?

00:19:54 Yeah. Because there is something, these new distributions, even though they could look

00:19:57 very different from the training distributions, they have things in common. So let me give you

00:20:02 a concrete example. You read a science fiction novel. The science fiction novel, maybe, you

00:20:07 know, brings you in some other planet where things look very different on the surface,

00:20:15 but it’s still the same laws of physics. And so you can read the book and you understand

00:20:20 what’s going on. So the distribution is very different. But because you can transport

00:20:27 a lot of the knowledge you had from Earth about the underlying cause and effect relationships

00:20:33 and physical mechanisms and all that, and maybe even social interactions, you can now

00:20:38 make sense of what is going on on this planet where, like, visually, for example,

00:20:42 things are totally different.

00:20:45 Taking that analogy further and distorting it, let’s enter a science fiction world of,

00:20:50 say, Space Odyssey, 2001, with Hal. Or maybe, which is probably one of my favorite AI movies.

00:20:59 Me too.

00:21:00 And then there’s another one that a lot of people love that may be a little bit outside

00:21:05 of the AI community is Ex Machina. I don’t know if you’ve seen it.

00:21:10 Yes. Yes.

00:21:11 By the way, what are your views on that movie? Are you able to enjoy it?

00:21:16 Are there things I like and things I hate?

00:21:21 So you could talk about that in the context of a question I want to ask, which is, there’s

00:21:26 quite a large community of people from different backgrounds, often outside of AI, who are concerned

00:21:32 about existential threat of artificial intelligence. You’ve seen this community

00:21:37 develop over time. You’ve seen you have a perspective. So what do you think is the best

00:21:42 way to talk about AI safety, to think about it, to have discourse about it within AI community

00:21:48 and outside and grounded in the fact that Ex Machina is one of the main sources of information

00:21:54 for the general public about AI?

00:21:56 So I think you’re putting it right. There’s a big difference between the sort of discussion

00:22:02 we ought to have within the AI community and the sort of discussion that really matter

00:22:07 in the general public. So I think the picture of Terminator and AI loose and killing people

00:22:17 and super intelligence that’s going to destroy us, whatever we try, isn’t really so useful

00:22:24 for the public discussion. Because for the public discussion, the things I believe really

00:22:30 matter are the short term and medium term, very likely negative impacts of AI on society,

00:22:37 whether it’s from security, like, you know, big brother scenarios with face recognition

00:22:43 or killer robots, or the impact on the job market, or concentration of power and discrimination,

00:22:50 all kinds of social issues, which could actually, some of them could really threaten democracy,

00:22:57 for example.

00:22:58 Just to clarify, when you said killer robots, you mean autonomous weapon, weapon systems.

00:23:04 Yes, I don’t mean that’s right.

00:23:06 So I think these short and medium term concerns should be important parts of the public debate.

00:23:13 Now, existential risk, for me is a very unlikely consideration, but still worth academic investigation

00:23:24 in the same way that you could say, should we study what could happen if meteorite, you

00:23:30 know, came to earth and destroyed it. So I think it’s very unlikely that this is going

00:23:33 to happen in or happen in a reasonable future. The sort of scenario of an AI getting loose

00:23:43 goes against my understanding of at least current machine learning and current neural

00:23:46 nets and so on. It’s not plausible to me. But of course, I don’t have a crystal ball

00:23:51 and who knows what AI will be in 50 years from now. So I think it is worth that scientists

00:23:55 study those problems. It’s just not a pressing question as far as I’m concerned.

00:23:59 So before I continue down that line, I have a few questions there. But what do you like

00:24:05 and not like about Ex Machina as a movie? Because I actually watched it for the second

00:24:09 time and enjoyed it. I hated it the first time, and I enjoyed it quite a bit more the

00:24:15 second time when I sort of learned to accept certain pieces of it, see it as a concept

00:24:23 movie. What was your experience? What were your thoughts?

00:24:26 So the negative is the picture it paints of science is totally wrong. Science in general

00:24:36 and AI in particular. Science is not happening in some hidden place by some, you know, really

00:24:44 smart guy, one person. This is totally unrealistic. This is not how it happens. Even a team of

00:24:52 people in some isolated place will not make it. Science moves by small steps, thanks to

00:24:59 the collaboration and community of a large number of people interacting. And all the

00:25:10 scientists who are expert in their field kind of know what is going on, even in the industrial

00:25:14 labs. It’s information flows and leaks and so on. And the spirit of it is very different

00:25:21 from the way science is painted in this movie.

00:25:25 Yeah, let me ask on that point. It’s been the case to this point that kind of even if

00:25:32 the research happens inside Google or Facebook, inside companies, it still kind of comes out,

00:25:36 ideas come out. Do you think that will always be the case with AI? Is it possible to bottle

00:25:41 ideas to the point where there’s a set of breakthroughs that go completely undiscovered

00:25:47 by the general research community? Do you think that’s even possible?

00:25:52 It’s possible, but it’s unlikely. It’s not how it is done now. It’s not how I can foresee

00:25:59 it in the foreseeable future. But of course, I don’t have a crystal ball and science is

00:26:09 a crystal ball. And so who knows? This is science fiction after all.

00:26:14 I think it’s ominous that the lights went off during that discussion.

00:26:21 So the problem, again, there’s one thing is the movie and you could imagine all kinds

00:26:25 of science fiction. The problem for me, maybe similar to the question about existential

00:26:30 risk, is that this kind of movie paints such a wrong picture of what is the actual science

00:26:39 and how it’s going on that it can have unfortunate effects on people’s understanding of current

00:26:45 science. And so that’s kind of sad.

00:26:50 There’s an important principle in research, which is diversity. So in other words, research

00:26:58 is exploration. Research is exploration in the space of ideas. And different people will

00:27:03 focus on different directions. And this is not just good, it’s essential. So I’m totally

00:27:09 fine with people exploring directions that are contrary to mine or look orthogonal to

00:27:16 mine. I am more than fine. I think it’s important. I and my friends don’t claim we have universal

00:27:24 truth about what will, especially about what will happen in the future. Now that being

00:27:29 said, we have our intuitions and then we act accordingly according to where we think we

00:27:36 can be most useful and where society has the most to gain or to lose. We should have those

00:27:42 debates and not end up in a society where there’s only one voice and one way of thinking

00:27:49 and research money is spread out.

00:27:53 So disagreement is a sign of good research, good science.

00:27:59 Yes.

00:28:00 The idea of bias in the human sense of bias. How do you think about instilling in machine

00:28:08 learning something that’s aligned with human values in terms of bias? We intuitively as

00:28:15 human beings have a concept of what bias means, of what fundamental respect for other human

00:28:21 beings means. But how do we instill that into machine learning systems, do you think?

00:28:26 So I think there are short term things that are already happening and then there are long

00:28:32 term things that we need to do. In the short term, there are techniques that have been

00:28:38 proposed and I think will continue to be improved and maybe alternatives will come up to take

00:28:44 data sets in which we know there is bias, we can measure it. Pretty much any data set

00:28:50 where humans are being observed taking decisions will have some sort of bias, discrimination

00:28:55 against particular groups and so on.

00:28:59 And we can use machine learning techniques to try to build predictors, classifiers that

00:29:04 are going to be less biased. We can do it, for example, using adversarial methods to

00:29:11 make our systems less sensitive to these variables we should not be sensitive to.

00:29:18 So these are clear, well defined ways of trying to address the problem. Maybe they have weaknesses

00:29:23 and more research is needed and so on. But I think in fact they are sufficiently mature

00:29:28 that governments should start regulating companies where it matters, say like insurance companies,

00:29:35 so that they use those techniques. Because those techniques will probably reduce the

00:29:40 bias but at a cost. For example, maybe their predictions will be less accurate and so companies

00:29:46 will not do it until you force them.

00:29:48 All right, so this is short term. Long term, I’m really interested in thinking how we can

00:29:56 instill moral values into computers. Obviously, this is not something we’ll achieve in the

00:30:01 next five or 10 years. How can we, you know, there’s already work in detecting emotions,

00:30:08 for example, in images, in sounds, in texts, and also studying how different agents interacting

00:30:19 in different ways may correspond to patterns of, say, injustice, which could trigger anger.

00:30:28 So these are things we can do in the medium term and eventually train computers to model,

00:30:37 for example, how humans react emotionally. I would say the simplest thing is unfair situations

00:30:46 which trigger anger. This is one of the most basic emotions that we share with other animals.

00:30:52 I think it’s quite feasible within the next few years that we can build systems that can

00:30:57 detect these kinds of things to the extent, unfortunately, that they understand enough

00:31:01 about the world around us, which is a long time away. But maybe we can initially do this

00:31:08 in virtual environments. So you can imagine a video game where agents interact in some

00:31:14 ways and then some situations trigger an emotion. I think we could train machines to detect

00:31:21 those situations and predict that the particular emotion will likely be felt if a human was

00:31:27 playing one of the characters.

00:31:29 You have shown excitement and done a lot of excellent work with unsupervised learning.

00:31:35 But there’s been a lot of success on the supervised learning side.

00:31:39 Yes, yes.

00:31:40 And one of the things I’m really passionate about is how humans and robots work together.

00:31:46 And in the context of supervised learning, that means the process of annotation. Do you

00:31:52 think about the problem of annotation put in a more interesting way as humans teaching

00:32:00 machines?

00:32:01 Yes.

00:32:02 Is there?

00:32:03 Yes. I think it’s an important subject. Reducing it to annotation may be useful for somebody

00:32:09 building a system tomorrow. But longer term, the process of teaching, I think, is something

00:32:16 that deserves a lot more attention from the machine learning community. So there are people

00:32:19 who have coined the term machine teaching. So what are good strategies for teaching a

00:32:24 learning agent? And can we design and train a system that is going to be a good teacher?

00:32:33 So in my group, we have a project called BBI or BBI game, where there is a game or scenario

00:32:42 where there’s a learning agent and a teaching agent. Presumably, the teaching agent would

00:32:48 eventually be a human. But we’re not there yet. And the role of the teacher is to use

00:32:57 its knowledge of the environment, which it can acquire using whatever way brute force

00:33:04 to help the learner learn as quickly as possible. So the learner is going to try to learn by

00:33:10 itself, maybe using some exploration and whatever. But the teacher can choose, can have an influence

00:33:19 on the interaction with the learner, so as to guide the learner, maybe teach it the things

00:33:27 that the learner has most trouble with, or just add the boundary between what it knows

00:33:30 and doesn’t know, and so on. So there’s a tradition of these kind of ideas from other

00:33:36 fields and like tutorial systems, for example, and AI. And of course, people in the humanities

00:33:45 have been thinking about these questions. But I think it’s time that machine learning

00:33:48 people look at this, because in the future, we’ll have more and more human machine interaction

00:33:55 with the human in the loop. And I think understanding how to make this work better, all the problems

00:34:01 around that are very interesting and not sufficiently addressed. You’ve done a lot of work with

00:34:06 language, too. What aspect of the traditionally formulated Turing test, a test of natural

00:34:14 language understanding and generation in your eyes is the most difficult of conversation?

00:34:19 What in your eyes is the hardest part of conversation to solve for machines? So I would say it’s

00:34:25 everything having to do with the non linguistic knowledge, which implicitly you need in order

00:34:32 to make sense of sentences, things like the Winograd schema. So these sentences that are

00:34:37 semantically ambiguous. In other words, you need to understand enough about the world

00:34:43 in order to really interpret properly those sentences. I think these are interesting challenges

00:34:49 for machine learning, because they point in the direction of building systems that both

00:34:57 understand how the world works and this causal relationships in the world and associate that

00:35:03 knowledge with how to express it in language, either for reading or writing.

00:35:12 You speak French?

00:35:13 Yes, it’s my mother tongue.

00:35:14 It’s one of the romance languages. Do you think passing the Turing test and all the

00:35:20 underlying challenges we just mentioned depend on language? Do you think it might be easier

00:35:24 in French than it is in English, or is independent of language?

00:35:28 I think it’s independent of language. I would like to build systems that can use the same

00:35:37 principles, the same learning mechanisms to learn from human agents, whatever their language.

00:35:46 Well, certainly us humans can talk more beautifully and smoothly in poetry, some Russian originally.

00:35:53 I know poetry in Russian is maybe easier to convey complex ideas than it is in English.

00:36:02 But maybe I’m showing my bias and some people could say that about French. But of course,

00:36:09 the goal ultimately is our human brain is able to utilize any kind of those languages

00:36:15 to use them as tools to convey meaning.

00:36:18 Yeah, of course, there are differences between languages, and maybe some are slightly better

00:36:22 at some things, but in the grand scheme of things, where we’re trying to understand how

00:36:26 the brain works and language and so on, I think these differences are minute.

00:36:32 So you’ve lived perhaps through an AI winter of sorts?

00:36:38 Yes.

00:36:39 How did you stay warm and continue your research?

00:36:44 Stay warm with friends.

00:36:45 With friends. Okay, so it’s important to have friends. And what have you learned from the

00:36:51 experience?

00:36:53 Listen to your inner voice. Don’t, you know, be trying to just please the crowds and the

00:37:02 fashion. And if you have a strong intuition about something that is not contradicted by

00:37:10 actual evidence, go for it. I mean, it could be contradicted by people.

00:37:17 Not your own instinct of based on everything you’ve learned?

00:37:20 Of course, you have to adapt your beliefs when your experiments contradict those beliefs.

00:37:28 But you have to stick to your beliefs. Otherwise, it’s what allowed me to go through those years.

00:37:35 It’s what allowed me to persist in directions that, you know, took time, whatever other

00:37:42 people think, took time to mature and bring fruits.

00:37:48 So history of AI is marked with these, of course, it’s marked with technical breakthroughs,

00:37:54 but it’s also marked with these seminal events that capture the imagination of the community.

00:38:00 Most recent, I would say, AlphaGo beating the world champion human Go player was one

00:38:06 of those moments. What do you think the next such moment might be?

00:38:12 Okay, so first of all, I think that these so called seminal events are overrated. As

00:38:22 I said, science really moves by small steps. Now what happens is you make one more small

00:38:30 step and it’s like the drop that, you know, that fills the bucket and then you have drastic

00:38:39 consequences because now you’re able to do something you were not able to do before.

00:38:43 Or now, say, the cost of building some device or solving a problem becomes cheaper than

00:38:49 what existed and you have a new market that opens up, right? So especially in the world

00:38:53 of commerce and applications, the impact of a small scientific progress could be huge.

00:39:03 But in the science itself, I think it’s very, very gradual.

00:39:07 And where are these steps being taken now? So there’s unsupervised learning.

00:39:13 So if I look at one trend that I like in my community, so for example, at Milan, my institute,

00:39:23 what are the two hardest topics? GANs and reinforcement learning. Even though in Montreal

00:39:31 in particular, reinforcement learning was something pretty much absent just two or three

00:39:37 years ago. So there’s really a big interest from students and there’s a big interest from

00:39:44 people like me. So I would say this is something where we’re going to see more progress, even

00:39:51 though it hasn’t yet provided much in terms of actual industrial fallout. Like even though

00:39:58 there’s AlphaGo, there’s no, like Google is not making money on this right now. But I

00:40:03 think over the long term, this is really, really important for many reasons.

00:40:08 So in other words, I would say reinforcement learning may be more generally agent learning

00:40:13 because it doesn’t have to be with rewards. It could be in all kinds of ways that an agent

00:40:17 is learning about its environment.

00:40:20 Now reinforcement learning you’re excited about, do you think GANs could provide something,

00:40:28 at the moment? Well, GANs or other generative models, I believe, will be crucial ingredients

00:40:38 in building agents that can understand the world. A lot of the successes in reinforcement

00:40:45 learning in the past has been with policy gradient, where you just learn a policy, you

00:40:51 don’t actually learn a model of the world. But there are lots of issues with that. And

00:40:55 we don’t know how to do model based RL right now. But I think this is where we have to

00:41:00 go in order to build models that can generalize faster and better like to new distributions

00:41:09 that capture to some extent, at least the underlying causal mechanisms in the world.

00:41:16 Last question. What made you fall in love with artificial intelligence? If you look

00:41:21 back, what was the first moment in your life when you were fascinated by either the human

00:41:28 mind or the artificial mind?

00:41:31 You know, when I was an adolescent, I was reading a lot. And then I started reading

00:41:35 science fiction.

00:41:36 There you go.

00:41:37 That’s it. That’s where I got hooked. And then, you know, I had one of the first personal

00:41:46 computers and I got hooked in programming. And so it just, you know,

00:41:52 Start with fiction and then make it a reality.

00:41:54 That’s right.

00:41:55 Yoshua, thank you so much for talking to me.

00:41:57 My pleasure.