Oriol Vinyals: DeepMind AlphaStar, StarCraft, and Language #20

Transcript

00:00:00 The following is a conversation with Ariel Vinales.

00:00:03 He’s a senior research scientist at Google DeepMind,

00:00:05 and before that, he was at Google Brain and Berkeley.

00:00:09 His research has been cited over 39,000 times.

00:00:13 He’s truly one of the most brilliant and impactful minds

00:00:16 in the field of deep learning.

00:00:18 He’s behind some of the biggest papers and ideas in AI,

00:00:20 including sequence to sequence learning,

00:00:23 audio generation, image captioning,

00:00:25 neural machine translation,

00:00:27 and, of course, reinforcement learning.

00:00:29 He’s a lead researcher of the AlphaStar project,

00:00:32 creating an agent that defeated a top professional

00:00:35 at the game of StarCraft.

00:00:38 This conversation is part

00:00:39 of the Artificial Intelligence podcast.

00:00:41 If you enjoy it, subscribe on YouTube, iTunes,

00:00:44 or simply connect with me on Twitter at Lex Friedman,

00:00:48 spelled F R I D.

00:00:51 And now, here’s my conversation with Ariel Vinales.

00:00:55 You spearheaded the DeepMind team behind AlphaStar

00:00:59 that recently beat a top professional player at StarCraft.

00:01:04 So you have an incredible wealth of work

00:01:07 in deep learning and a bunch of fields,

00:01:09 but let’s talk about StarCraft first.

00:01:11 Let’s go back to the very beginning,

00:01:13 even before AlphaStar, before DeepMind,

00:01:16 before deep learning first.

00:01:18 What came first for you,

00:01:21 a love for programming or a love for video games?

00:01:24 I think for me, it definitely came first

00:01:28 the drive to play video games.

00:01:31 I really liked computers.

00:01:35 I didn’t really code much, but what I would do is

00:01:38 I would just mess with the computer, break it and fix it.

00:01:42 That was the level of skills, I guess,

00:01:43 that I gained in my very early days,

00:01:46 I mean, when I was 10 or 11.

00:01:48 And then I really got into video games,

00:01:50 especially StarCraft, actually, the first version.

00:01:53 I spent most of my time

00:01:55 just playing kind of pseudo professionally,

00:01:57 as professionally as you could play back in 98 in Europe,

00:02:01 which was not a very main scene

00:02:03 like what’s called nowadays eSports.

00:02:05 Right, of course, in the 90s.

00:02:07 So how’d you get into StarCraft?

00:02:09 What was your favorite race?

00:02:11 How did you develop your skill?

00:02:15 What was your strategy?

00:02:16 All that kind of thing.

00:02:18 So as a player, I tended to try to play not many games,

00:02:21 not to kind of disclose the strategies

00:02:23 that I kind of developed.

00:02:25 And I like to play random, actually,

00:02:27 not in competitions, but just to…

00:02:30 I think in StarCraft, there’s three main races

00:02:33 and I found it very useful to play with all of them.

00:02:36 And so I would choose random many times,

00:02:38 even sometimes in tournaments,

00:02:40 to gain skill on the three races

00:02:42 because it’s not how you play against someone,

00:02:45 but also if you understand the race because you played,

00:02:48 you also understand what’s annoying,

00:02:51 then when you’re on the other side,

00:02:52 what to do to annoy that person,

00:02:54 to try to gain advantages here and there and so on.

00:02:57 So I actually played random,

00:02:59 although I must say in terms of favorite race,

00:03:02 I really liked Zerg.

00:03:03 I was probably best at Zerg

00:03:05 and that’s probably what I tend to use

00:03:08 towards the end of my career before starting university.

00:03:11 So let’s step back a little bit.

00:03:13 Could you try to describe StarCraft

00:03:15 to people that may never have played video games,

00:03:18 especially the massively online variety like StarCraft?

00:03:22 So StarCraft is a real time strategy game.

00:03:25 And the way to think about StarCraft,

00:03:27 perhaps if you understand a bit chess,

00:03:30 is that there’s a board which is called map

00:03:34 or the map where people play against each other.

00:03:39 There’s obviously many ways you can play,

00:03:40 but the most interesting one is the one versus one setup

00:03:44 where you just play against someone else

00:03:47 or even the built in AI, right?

00:03:49 Blizzard put a system that can play the game

00:03:51 reasonably well if you don’t know how to play.

00:03:54 And then in this board, you have again,

00:03:57 pieces like in chess,

00:03:58 but these pieces are not there initially

00:04:01 like they are in chess.

00:04:02 You actually need to decide to gather resources

00:04:05 to decide which pieces to build.

00:04:07 So in a way you’re starting almost with no pieces.

00:04:10 You start gathering resources in StarCraft.

00:04:13 There’s minerals and gas that you can gather.

00:04:16 And then you must decide how much do you wanna focus

00:04:19 for instance, on gathering more resources

00:04:21 or starting to build units or pieces.

00:04:24 And then once you have enough pieces

00:04:27 or maybe like attack, a good attack composition,

00:04:32 then you go and attack the other side of the map.

00:04:35 And now the other main difference with chess

00:04:37 is that you don’t see the other side of the map.

00:04:39 So you’re not seeing the moves of the enemy.

00:04:43 It’s what we call partially observable.

00:04:45 So as a result, you must not only decide

00:04:48 trading off economy versus building your own units,

00:04:52 but you also must decide whether you wanna scout

00:04:54 to gather information, but also by scouting,

00:04:57 you might be giving away some information

00:04:59 that you might be hiding from the enemy.

00:05:01 So there’s a lot of complex decision making

00:05:04 all in real time.

00:05:06 There’s also unlike chess, this is not a turn based game.

00:05:10 You play basically all the time continuously

00:05:13 and thus some skill in terms of speed

00:05:16 and accuracy of clicking is also very important.

00:05:18 And people that train for this really play this game

00:05:21 at an amazing skill level.

00:05:23 I’ve seen many times these

00:05:25 and if you can witness this life,

00:05:27 it’s really, really impressive.

00:05:29 So in a way, it’s kind of a chess

00:05:31 where you don’t see the other side of the board,

00:05:33 you’re building your own pieces

00:05:35 and you also need to gather resources

00:05:37 to basically get some money to build other buildings,

00:05:40 pieces, technology and so on.

00:05:42 From the perspective of a human player,

00:05:45 the difference between that and chess

00:05:47 or maybe that and a game like turn based strategy

00:05:50 like Heroes of Might and Magic is that there’s an anxiety

00:05:55 because you have to make these decisions really quickly.

00:05:58 And if you are not actually aware of what decisions work,

00:06:04 it’s a very stressful balance.

00:06:06 Everything you describe is actually quite stressful,

00:06:08 difficult to balance for an amateur human player.

00:06:11 I don’t know if it gets easier at the professional level,

00:06:14 like if they’re fully aware of what they have to do,

00:06:16 but at the amateur level, there’s this anxiety.

00:06:19 Oh crap, I’m being attacked.

00:06:20 Oh crap, I have to build up resource.

00:06:22 Oh, I have to probably expand.

00:06:24 And all these, the time,

00:06:26 the real time strategy aspect is really stressful

00:06:29 and computationally I’m sure difficult.

00:06:31 We’ll get into it.

00:06:32 But for me, Battle.net,

00:06:35 so StarCraft was released in 98, 20 years ago,

00:06:42 which is hard to believe.

00:06:44 And Blizzard Battle.net with Diablo in 96 came out.

00:06:50 And to me, it might be a narrow perspective,

00:06:52 but it changed online gaming and perhaps society forever.

00:06:56 Yeah.

00:06:57 But I may have made way too narrow viewpoint,

00:07:00 but from your perspective,

00:07:02 can you talk about the history of gaming

00:07:05 over the past 20 years?

00:07:06 Is this, how transformational,

00:07:09 how important is this line of games?

00:07:12 Right, so I think I kind of was an active gamer

00:07:16 whilst this was developing, the internet, online gaming.

00:07:20 So for me, the way it came was I played other games,

00:07:24 strategy related, I played a bit of Common and Conquer,

00:07:27 and then I played Warcraft II, which is from Blizzard.

00:07:31 But at the time, I didn’t know,

00:07:32 I didn’t understand about what Blizzard was or anything.

00:07:35 Warcraft II was just a game,

00:07:36 which was actually very similar to StarCraft in many ways.

00:07:39 It’s also real time strategy game

00:07:41 where there’s orcs and humans, so there’s only two races.

00:07:44 But it was offline.

00:07:46 And it was offline, right?

00:07:47 So I remember a friend of mine came to school,

00:07:51 say, oh, there’s this new cool game called StarCraft.

00:07:53 And I just said, oh, this sounds like

00:07:54 just a copy of Warcraft II, until I kind of installed it.

00:07:59 And at the time, I am from Spain,

00:08:01 so we didn’t have very good internet, right?

00:08:04 So there was, for us,

00:08:05 StarCraft became first kind of an offline experience

00:08:09 where you kind of start to play these missions, right?

00:08:12 You play against some sort of scripted things

00:08:15 to develop the story of the characters in the game.

00:08:18 And then later on, I start playing against the built in AI,

00:08:23 and I thought it was impossible to defeat it.

00:08:25 Then eventually you defeat one

00:08:27 and you can actually play against seven built in AIs

00:08:29 at the same time, which also felt impossible.

00:08:32 But actually, it’s not that hard to beat

00:08:34 seven built in AIs at once.

00:08:36 So once we achieved that, also we discovered that

00:08:40 we could play, as I said, internet wasn’t that great,

00:08:43 but we could play with the LAN, right?

00:08:45 Like basically against each other

00:08:47 if we were in the same place

00:08:49 because you could just connect machines with like cables,

00:08:51 right?

00:08:53 So we started playing in LAN mode

00:08:55 and as a group of friends,

00:08:58 and it was really, really like much more entertaining

00:09:00 than playing against AIs.

00:09:02 And later on, as internet was starting to develop

00:09:05 and being a bit faster and more reliable,

00:09:07 then it’s when I started experiencing Battle.net,

00:09:09 which is this amazing universe,

00:09:11 not only because of the fact

00:09:13 that you can play the game against anyone in the world,

00:09:16 but you can also get to know more people.

00:09:20 You just get exposed to now like this vast variety of,

00:09:23 it’s kind of a bit when the chats came about, right?

00:09:25 There was a chat system.

00:09:27 You could play against people,

00:09:29 but you could also chat with people,

00:09:30 not only about Stalker, but about anything.

00:09:32 And that became a way of life for kind of two years.

00:09:36 And obviously then it became like kind of,

00:09:38 it exploded in me in that I started to play more seriously,

00:09:42 going to tournaments and so on and so forth.

00:09:44 Do you have a sense on a societal, sociological level,

00:09:49 what’s this whole part of society

00:09:52 that many of us are not aware of

00:09:53 and it’s a huge part of society, which is gamers.

00:09:56 I mean, every time I come across that in YouTube

00:10:00 or streaming sites, I mean,

00:10:03 this is the huge number of people play games religiously.

00:10:07 Do you have a sense of those folks,

00:10:08 especially now that you’ve returned to that realm

00:10:10 a little bit on the AI side?

00:10:12 Yeah, so in fact, even after Stalker,

00:10:15 I actually played World of Warcraft,

00:10:17 which is maybe the main sort of online worlds

00:10:21 or in presence that you get to interact

00:10:23 with lots of people.

00:10:24 So I played that for a little bit.

00:10:26 It was to me, it was a bit less stressful than StarCraft

00:10:29 because winning was kind of a given.

00:10:30 You just put in this world

00:10:32 and you can always complete missions.

00:10:34 But I think it was actually the social aspect

00:10:38 of especially StarCraft first

00:10:40 and then games like World of Warcraft

00:10:43 really shaped me in a very interesting ways

00:10:46 because what you get to experience

00:10:48 is just people you wouldn’t usually interact with, right?

00:10:51 So even nowadays, I still have many Facebook friends

00:10:54 from the area where I played online

00:10:56 and their ways of thinking is even political.

00:11:00 They just, we don’t live in,

00:11:01 like we don’t interact in the real world,

00:11:03 but we were connected by basically fiber.

00:11:06 And that way I actually get to understand a bit better

00:11:10 that we live in a diverse world.

00:11:12 And these were just connections that were made by,

00:11:15 because, you know, I happened to go in a city

00:11:18 in a virtual city as a priest and I met this warrior

00:11:22 and we became friends

00:11:23 and then we start like playing together, right?

00:11:25 So I think it’s transformative

00:11:28 and more and more and more people are more aware of it.

00:11:31 I mean, it’s becoming quite mainstream,

00:11:33 but back in the day, as you were saying in 2000, 2005,

00:11:37 even it was very, still very strange thing to do,

00:11:42 especially in Europe.

00:11:44 I think there were exceptions like Korea, for instance,

00:11:47 it was amazing that everything happened so early

00:11:50 in terms of cybercafes, like if you go to Seoul,

00:11:54 it’s a city that back in the day,

00:11:57 StarCraft was kind of,

00:11:58 you could be a celebrity by playing StarCraft,

00:12:00 but this was like 99, 2000, right?

00:12:03 It’s not like recently.

00:12:04 So yeah, it’s quite interesting to look back

00:12:08 and yeah, I think it’s changing society.

00:12:10 The same way, of course, like technology

00:12:13 and social networks and so on are also transforming things.

00:12:16 And a quick tangent, let me ask,

00:12:18 you’re also one of the most productive people

00:12:20 in your particular chosen passion and path in life.

00:12:26 And yet you’re also appreciate and enjoy video games.

00:12:29 Do you think it’s possible to do,

00:12:32 to enjoy video games in moderation?

00:12:35 Someone told me that you could choose two out of three.

00:12:39 When I was playing video games,

00:12:41 you could choose having a girlfriend,

00:12:43 playing video games or studying.

00:12:46 And I think for the most part, it was relatively true.

00:12:50 These things do take time.

00:12:52 Games like StarCraft,

00:12:53 if you take the game pretty seriously

00:12:55 and you wanna study it,

00:12:56 then you obviously will dedicate more time to it.

00:12:59 And I definitely took gaming

00:13:01 and obviously studying very seriously.

00:13:03 I love learning science and et cetera.

00:13:08 So to me, especially when I started university undergrad,

00:13:13 I kind of step off StarCraft.

00:13:14 I actually fully stopped playing.

00:13:16 And then World of Warcraft was a bit more casual.

00:13:19 You could just connect online.

00:13:20 And I mean, it was fun.

00:13:22 But as I said, that was not as much time investment

00:13:26 as it was for me in StarCraft.

00:13:29 Okay, so let’s get into AlphaStar.

00:13:31 What are the, you’re behind the team.

00:13:35 So DeepMind has been working on StarCraft

00:13:37 and released a bunch of cool open source agents

00:13:39 and so on the past few years.

00:13:41 But AlphaStar really is the moment

00:13:43 where the first time you beat a world class player.

00:13:49 So what are the parameters of the challenge

00:13:51 in the way that AlphaStar took it on

00:13:53 and how did you and David

00:13:55 and the rest of the DeepMind team get into it?

00:13:58 Consider that you can even beat the best in the world

00:14:00 or top players.

00:14:02 I think it all started back in 2015.

00:14:08 Actually, I’m lying.

00:14:08 I think it was 2014 when DeepMind was acquired by Google.

00:14:14 And I at the time was at Google Brain,

00:14:15 which was in California, is still in California.

00:14:18 We had this summit where we got together, the two groups.

00:14:21 So Google Brain and Google DeepMind got together

00:14:24 and we gave a series of talks.

00:14:26 And given that they were doing

00:14:28 deep reinforcement learning for games,

00:14:30 I decided to bring up part of my past,

00:14:33 which I had developed at Berkeley,

00:14:35 like this thing which we call Berkeley OverMind,

00:14:37 which is really just a StarCraft one bot, right?

00:14:40 So I talked about that.

00:14:42 And I remember Demis just came to me and said,

00:14:44 well, maybe not now, it’s perhaps a bit too early,

00:14:47 but you should just come to DeepMind

00:14:48 and do this again with deep reinforcement learning, right?

00:14:53 And at the time it sounded very science fiction

00:14:56 for several reasons.

00:14:58 But then in 2016, when I actually moved to London

00:15:01 and joined DeepMind transferring from Brain,

00:15:04 it became apparent that because of the AlphaGo moment

00:15:08 and kind of Blizzard reaching out to us to say,

00:15:11 wait, like, do you want the next challenge?

00:15:13 And also me being full time at DeepMind,

00:15:15 so sort of kind of all these came together.

00:15:17 And then I went to Irvine in California,

00:15:20 to the Blizzard headquarters to just chat with them

00:15:23 and try to explain how would it all work

00:15:26 before you do anything.

00:15:27 And the approach has always been

00:15:30 about the learning perspective, right?

00:15:33 So in Berkeley, we did a lot of rule based conditioning

00:15:39 and if you have more than three units, then go attack.

00:15:42 And if the other has more units than me,

00:15:44 I retreat and so on and so forth.

00:15:46 And of course, the point of deep reinforcement learning,

00:15:48 deep learning, machine learning in general

00:15:50 is that all these should be learned behavior.

00:15:53 So that kind of was the DNA of the project

00:15:56 since its inception in 2016,

00:15:59 where we just didn’t even have an environment to work with.

00:16:02 And so that’s how it all started really.

00:16:05 So if you go back to that conversation with Demis

00:16:08 or even in your own head, how far away did you,

00:16:12 because we’re talking about Atari games,

00:16:14 we’re talking about Go, which is kind of,

00:16:16 if you’re honest about it, really far away from StarCraft.

00:16:20 In, well, now that you’ve beaten it,

00:16:22 maybe you could say it’s close,

00:16:23 but it’s much, it seems like StarCraft

00:16:25 is way harder than Go philosophically

00:16:29 and mathematically speaking.

00:16:30 So how far away did you think you were?

00:16:34 Do you think it’s 2019 and 18

00:16:36 you could be doing as well as you have?

00:16:37 Yeah, when I kind of thought about,

00:16:40 okay, I’m gonna dedicate a lot of my time

00:16:43 and focus on this.

00:16:44 And obviously I do a lot of different research

00:16:47 in deep learning.

00:16:48 So spending time on it, I mean,

00:16:50 I really had to kind of think

00:16:51 there’s gonna be something good happening out of this.

00:16:55 So really I thought, well, this sounds impossible.

00:16:58 And it probably is impossible to do the full thing,

00:17:01 like the full game where you play one versus one

00:17:06 and it’s only a neural network playing and so on.

00:17:09 So it really felt like,

00:17:10 I just didn’t even think it was possible.

00:17:13 But on the other hand,

00:17:14 I could see some stepping stones towards that goal.

00:17:18 Clearly you could define sub problems in StarCraft

00:17:21 and sort of dissect it a bit and say,

00:17:22 okay, here is a part of the game, here’s another part.

00:17:26 And also obviously the fact,

00:17:29 so this was really also critical to me,

00:17:31 the fact that we could access human replays, right?

00:17:34 So Blizzard was very kind.

00:17:35 And in fact, they open source these for the whole community

00:17:38 where you can just go

00:17:39 and it’s not every single StarCraft game ever played,

00:17:42 but it’s a lot of them you can just go and download.

00:17:45 And every day they will,

00:17:47 you can just query a data set and say,

00:17:48 well, give me all the games that were played today.

00:17:51 And given my kind of experience with language

00:17:55 and sequences and supervised learning,

00:17:57 I thought, well, that’s definitely gonna be very helpful

00:18:00 and something quite unique now,

00:18:02 because ever before we had such a large data set of replays,

00:18:08 of people playing the game at this scale

00:18:10 of such a complex video game, right?

00:18:12 So that to me was a precious resource.

00:18:15 And as soon as I knew that Blizzard

00:18:17 was able to kind of give this to the community,

00:18:20 I started to feel positive

00:18:22 about something non trivial happening.

00:18:24 But I also thought the full thing, like really no rules,

00:18:28 no single line of code that tries to say,

00:18:31 well, I mean, if you see this unit, build a detector,

00:18:33 all these, not having any of these specializations

00:18:36 seemed really, really, really difficult to me.

00:18:38 Intuitively.

00:18:39 I do also like that Blizzard was teasing

00:18:42 or even trolling you,

00:18:45 sort of almost, yeah, pulling you in

00:18:48 into this really difficult challenge.

00:18:50 Do they have any awareness?

00:18:51 What’s the interest from the perspective of Blizzard,

00:18:55 except just curiosity?

00:18:57 Yeah, I think Blizzard has really understood

00:18:59 and really bring forward this competitiveness

00:19:03 of esports in games.

00:19:04 The StarCraft really kind of sparked a lot of,

00:19:07 like something that almost was never seen,

00:19:10 especially as I was saying, back in Korea.

00:19:13 So they just probably thought,

00:19:16 well, this is such a pure one versus one setup

00:19:18 that it would be great to see

00:19:21 if something that can play Atari or Go

00:19:24 and then later on chess could even tackle

00:19:27 these kind of complex real time strategy game, right?

00:19:30 So for them, they wanted to see first,

00:19:33 obviously whether it was possible,

00:19:36 if the game they created was in a way solvable

00:19:39 to some extent.

00:19:40 And I think on the other hand,

00:19:42 they also are a pretty modern company that innovates a lot.

00:19:45 So just starting to understand AI for them

00:19:48 to how to bring AI into games

00:19:50 is not AI for games, but games for AI, right?

00:19:54 I mean, both ways I think can work.

00:19:56 And we obviously at DeepMind use games for AI, right?

00:20:00 To drive AI progress,

00:20:01 but Blizzard might actually be able to do

00:20:03 and many other companies to start to understand

00:20:06 and do the opposite.

00:20:06 So I think that is also something

00:20:08 they can get out of these.

00:20:09 And they definitely, we have brainstormed a lot

00:20:12 about these, right?

00:20:13 But one of the interesting things to me

00:20:15 about StarCraft and Diablo

00:20:17 and these games that Blizzard has created

00:20:19 is the task of balancing classes, for example.

00:20:23 Sort of making the game fair from the starting point

00:20:27 and then let skill determine the outcome.

00:20:30 Is there, I mean, can you first comment,

00:20:33 there’s three races, Zerg, Protoss and Terran.

00:20:36 I don’t know if I’ve ever said that out loud.

00:20:38 Is that how you pronounce it?

00:20:40 Terran?

00:20:40 Yeah, Terran.

00:20:41 Yeah.

00:20:44 Yeah, I don’t think I’ve ever in person interacted

00:20:46 with anybody about StarCraft, that’s funny.

00:20:49 So they seem to be pretty balanced.

00:20:51 I wonder if the AI, the work that you’re doing

00:20:56 with AlphaStar would help balance them even further.

00:20:59 Is that something you think about?

00:21:00 Is that something that Blizzard is thinking about?

00:21:03 Right, so balancing when you add a new unit

00:21:06 or a new spell type is obviously possible

00:21:09 given that you can always train or pre train at scale

00:21:13 some agent that might start using that in unintended ways.

00:21:16 But I think actually, if you understand

00:21:19 how StarCraft has kind of co evolved with players,

00:21:22 in a way, I think it’s actually very cool

00:21:24 the ways that many of the things and strategies

00:21:27 that people came up with, right?

00:21:28 So I think we’ve seen it over and over in StarCraft

00:21:32 that Blizzard comes up with maybe a new unit

00:21:35 and then some players get creative

00:21:37 and do something kind of unintentional

00:21:39 or something that Blizzard designers

00:21:40 that just simply didn’t test or think about.

00:21:43 And then after that becomes kind of mainstream

00:21:46 in the community, Blizzard patches the game

00:21:48 and then they kind of maybe weaken that strategy

00:21:51 or make it actually more interesting

00:21:53 but a bit more balanced.

00:21:55 So these kind of continual talk between players

00:21:57 and Blizzard is kind of what has defined them actually

00:22:01 in actually most games in StarCraft

00:22:04 but also in World of Warcraft, they would do that.

00:22:06 There are several classes and it would be not good

00:22:09 that everyone plays absolutely the same race and so on, right?

00:22:13 So I think they do care about balancing of course

00:22:17 and they do a fair amount of testing

00:22:19 but it’s also beautiful to also see

00:22:22 how players get creative anyways.

00:22:24 And I mean, whether AI can be more creative at this point,

00:22:27 I don’t think so, right?

00:22:28 I mean, it’s just sometimes something so amazing happens.

00:22:31 Like I remember back in the days,

00:22:33 like you have these drop ships that could drop the rivers

00:22:36 and that was actually not thought about

00:22:39 that you could drop this unit

00:22:41 that has this what’s called splash damage

00:22:43 that would basically eliminate

00:22:45 all the enemies workers at once.

00:22:47 No one thought that you could actually put them

00:22:50 in really early game, do that kind of damage

00:22:53 and then things change in the game.

00:22:55 But I don’t know, I think it’s quite an amazing

00:22:58 exploration process from both sides,

00:23:00 players and Blizzard alike.

00:23:01 Well, it’s almost like a reinforcement learning exploration

00:23:05 but the scale of humans that play Blizzard games

00:23:11 is almost on the scale of a large scale

00:23:13 deep mind RL experiment.

00:23:15 I mean, if you look at the numbers,

00:23:17 I mean, you’re talking about, I don’t know how many games

00:23:19 but hundreds of thousands of games probably a month.

00:23:22 Yeah.

00:23:22 I mean, so it’s almost the same as running RL agents.

00:23:28 What aspect of the problem of Starcraft

00:23:31 do you think is the hardest?

00:23:32 Is it the, like you said, the imperfect information?

00:23:35 Is it the fact they have to do longterm planning?

00:23:38 Is it the real time aspects?

00:23:40 We have to do stuff really quickly.

00:23:42 Is it the fact that a large action space

00:23:44 so you can do so many possible things?

00:23:47 Or is it, you know, in the game theoretic sense

00:23:51 there is no Nash equilibrium

00:23:52 or at least you don’t know what the optimal strategy is

00:23:54 because there’s way too many options.

00:23:56 Right.

00:23:57 Is there something that stands out as just like the hardest

00:23:59 the most annoying thing?

00:24:01 So when we sort of looked at the problem

00:24:04 and start to define like the parameters of it, right?

00:24:07 What are the observations?

00:24:08 What are the actions?

00:24:10 It became very apparent that, you know,

00:24:13 the very first barrier that one would hit in Starcraft

00:24:17 would be because of the action space being so large

00:24:20 and as not being able to search like you could in chess

00:24:24 or go even though the search space is vast.

00:24:28 The main problem that we identified

00:24:30 was that of exploration, right?

00:24:32 So without any sort of human knowledge or human prior,

00:24:36 if you think about Starcraft

00:24:38 and you know how deep reinforcement learnings algorithm

00:24:40 work which is essentially by issuing random actions

00:24:45 and hoping that they will get some wins sometimes

00:24:47 so they could learn.

00:24:49 So if you think of the action space in Starcraft

00:24:52 almost anything you can do in the early game is bad

00:24:55 because any action involves taking workers

00:24:58 which are mining minerals for free.

00:25:01 That’s something that the game does automatically

00:25:03 sends them to mine.

00:25:04 And you would immediately just take them out of mining

00:25:07 and send them around.

00:25:09 So just thinking how is it gonna be possible

00:25:13 to get to understand these concepts

00:25:16 but even more like expanding, right?

00:25:19 There’s these buildings you can place

00:25:21 in other locations in the map to gather more resources

00:25:24 but the location of the building is important

00:25:26 and you have to select a worker,

00:25:28 send it walking to that location, build the building,

00:25:32 wait for the building to be built

00:25:34 and then put extra workers there so they start mining.

00:25:37 That feels like impossible if you just randomly click

00:25:41 to produce that state, desirable state

00:25:44 that then you could hope to learn from

00:25:46 because eventually that may yield to an extra win, right?

00:25:49 So for me, the exploration problem

00:25:51 and due to the action space

00:25:53 and the fact that there’s not really turns,

00:25:56 there’s so many turns because the game essentially

00:25:59 takes that 22 times per second.

00:26:02 I mean, that’s how they could discretize sort of time.

00:26:05 Obviously you always have to discretize time

00:26:07 but there’s no such thing as real time

00:26:09 but it’s really a lot of time steps

00:26:12 of things that could go wrong.

00:26:14 And that definitely felt a priori like the hardest.

00:26:17 You mentioned many good ones.

00:26:19 I think partial observability

00:26:21 and the fact that there is no perfect strategy

00:26:23 because of the partial observability.

00:26:25 Those are very interesting problems.

00:26:26 We start seeing more and more now

00:26:28 in terms of as we solve the previous ones

00:26:31 but the core problem to me was exploration

00:26:34 and solving it has been basically kind of the focus

00:26:37 and how we saw the first breakthroughs.

00:26:39 So exploration in a multi hierarchical way.

00:26:43 So like 22 times a second exploration

00:26:46 has a very different meaning than it does

00:26:48 in terms of should I gather resources early

00:26:51 or should I wait or so on.

00:26:53 So how do you solve the longterm?

00:26:56 Let’s talk about the internals of AlphaStar.

00:26:58 So first of all, how do you represent the state

00:27:02 of the game as an input?

00:27:05 How do you then do the longterm sequence modeling?

00:27:08 How do you build a policy?

00:27:10 What’s the architecture like?

00:27:12 So AlphaStar has obviously several components

00:27:16 but everything passes through what we call the policy

00:27:20 which is a neural network.

00:27:22 And that’s kind of the beauty of it.

00:27:24 There is, I could just now give you a neural network

00:27:27 and some weights.

00:27:28 And if you fed the right observations

00:27:30 and you understood the actions the same way we do

00:27:32 you would have basically the agent playing the game.

00:27:35 There’s absolutely nothing else needed

00:27:37 other than those weights that were trained.

00:27:40 Now, the first step is observing the game

00:27:43 and we’ve experimented with a few alternatives.

00:27:46 The one that we currently use mixes both spatial

00:27:50 sort of images that you would process from the game

00:27:53 that is the zoomed out version of the map

00:27:56 and also a zoomed in version of the camera

00:27:58 or the screen as we call it.

00:28:00 But also we give to the agent the list of units

00:28:04 that it sees more of as a set of objects

00:28:09 that it can operate on.

00:28:11 That is not necessarily required to use it.

00:28:14 And we have versions of the game that play well

00:28:16 without this set vision that is a bit not like

00:28:19 how humans perceive the game.

00:28:21 But it certainly helps a lot

00:28:23 because it’s a very natural way to encode the game

00:28:26 is by just looking at all the units that there are.

00:28:29 They have properties like health, position, type of unit

00:28:33 whether it’s my unit or the enemies.

00:28:36 And that sort of is kind of the summary

00:28:40 of the state of the game,

00:28:43 that list of units or set of units

00:28:45 that you see all the time.

00:28:47 But that’s pretty close to the way humans see the game.

00:28:49 Why do you say it’s not, isn’t that,

00:28:51 you’re saying the exactness of it is not similar to humans?

00:28:55 The exactness of it is perhaps not the problem.

00:28:57 I guess maybe the problem if you look at it

00:28:59 from how actually humans play the game

00:29:02 is that they play with a mouse and a keyboard and a screen

00:29:05 and they don’t see sort of a structured object

00:29:08 with all the units.

00:29:09 What they see is what they see on the screen, right?

00:29:12 So.

00:29:13 Remember that there’s a, sorry to interrupt,

00:29:14 there’s a plot that you showed with camera base

00:29:16 where you do exactly that, right?

00:29:18 You move around and that seems to converge

00:29:21 to similar performance.

00:29:22 Yeah, I think that’s what I,

00:29:23 we’re kind of experimenting with what’s necessary or not,

00:29:26 but using the set.

00:29:28 So, actually, if you look at research in computer vision,

00:29:32 where it makes a lot of sense to treat images

00:29:35 as two dimensional arrays,

00:29:38 there’s actually a very nice paper from Facebook.

00:29:40 I think, I forgot who the authors are,

00:29:42 but I think it’s part of Caming’s group.

00:29:46 And what they do is they take an image,

00:29:49 which is this two dimensional signal,

00:29:51 and they actually take pixel by pixel

00:29:54 and scramble the image as if it was just a list of pixels.

00:29:59 Crucially, they encode the position of the pixels

00:30:01 with the X, Y coordinates.

00:30:03 And this is just kind of a new architecture,

00:30:06 which we incidentally also use in StarCraft

00:30:08 called the Transformer,

00:30:09 which is a very popular paper from last year,

00:30:11 which yielded very nice result in machine translation.

00:30:15 And if you actually believe in this kind of,

00:30:18 oh, it’s actually a set of pixels,

00:30:20 as long as you encode X, Y, it’s okay,

00:30:22 then you could argue that the list of units that we see

00:30:26 is precisely that,

00:30:26 because we have each unit as a kind of pixel, if you will,

00:30:31 and then their X, Y coordinates.

00:30:33 So in that perspective, we, without knowing it,

00:30:36 we use the same architecture that was shown

00:30:38 to work very well on Pascal and ImageNet and so on.

00:30:41 So the interesting thing here is putting it in that way

00:30:45 it starts to move it towards

00:30:46 the way you usually work with language.

00:30:49 So what, and especially with your expertise

00:30:52 and work in language,

00:30:55 it seems like there’s echoes of a lot of

00:30:58 the way you would work with natural language

00:31:00 in the way you’ve approached AlphaStar.

00:31:02 Right.

00:31:03 What’s, does that help

00:31:05 with the longterm sequence modeling there somehow?

00:31:08 Exactly, so now that we understand

00:31:10 what an observation for a given time step is,

00:31:13 we need to move on to say,

00:31:14 well, there’s going to be a sequence of such observations

00:31:17 and an agent will need to, given all that it’s seen,

00:31:21 not only the current time step, but all that it’s seen, why?

00:31:24 Because there is partial observability.

00:31:25 We must remember whether we saw a worker going somewhere,

00:31:29 for instance, right?

00:31:30 Because then there might be an expansion

00:31:31 on the top right of the map.

00:31:33 So given that, what you must then think about is

00:31:37 there is the problem of given all the observations,

00:31:40 you have to predict the next action.

00:31:42 And not only given all the observations,

00:31:44 but given all the observations

00:31:45 and given all the actions you’ve taken,

00:31:47 predict the next action.

00:31:49 And that sounds exactly like machine translation where,

00:31:53 and that’s exactly how kind of I saw the problem,

00:31:57 especially when you are given supervised data

00:31:59 or replays from humans,

00:32:01 because the problem is exactly the same.

00:32:03 You’re translating essentially a prefix of observations

00:32:07 and actions onto what’s going to happen next,

00:32:10 which is exactly how you would train a model to translate

00:32:12 or to generate language as well, right?

00:32:14 Do you have a certain prefix?

00:32:16 You must remember everything that comes in the past

00:32:18 because otherwise you might start having noncoherent text.

00:32:22 And the same architectures we’re using LSTMs

00:32:26 and transformers to operate on across time

00:32:29 to kind of integrate all that’s happened in the past.

00:32:33 Those architectures that work so well in translation

00:32:35 or language modeling are exactly the same

00:32:38 than what the agent is using to issue actions in the game.

00:32:42 And the way we train it, moreover, for imitation,

00:32:44 which is step one of AlphaStar is,

00:32:47 take all the human experience and try to imitate it,

00:32:49 much like you try to imitate translators

00:32:52 that translated many pairs of sentences

00:32:55 from French to English say,

00:32:57 that sort of principle applies exactly the same.

00:33:00 It’s almost the same code, except that instead of words,

00:33:04 you have a slightly more complicated objects,

00:33:06 which are the observations and the actions

00:33:08 are also a bit more complicated than a word.

00:33:11 Is there a self play component then too?

00:33:13 So once you run out of imitation?

00:33:16 Right, so indeed you can bootstrap from human replays,

00:33:22 but then the agents you get are actually not as good

00:33:25 as the humans you imitated, right?

00:33:28 So how do we imitate?

00:33:30 Well, we take humans from 3000 MMR and higher.

00:33:34 3000 MMR is just a metric of human skill

00:33:37 and 3000 MMR might be like 50% percentile, right?

00:33:41 So it’s just average human.

00:33:43 What’s that?

00:33:44 So maybe quick pause, MMR is a ranking scale,

00:33:47 the matchmaking rating for players.

00:33:50 So it’s 3000, I remember there’s like a master

00:33:52 and a grand master, what’s 3000?

00:33:54 So 3000 is pretty bad.

00:33:56 I think it’s kind of goals level.

00:33:58 It just sounds really good relative to chess, I think.

00:34:00 Oh yeah, yeah, no, the ratings,

00:34:02 the best in the world are at 7,000 MMR.

00:34:05 So 3000, it’s a bit like Elo indeed, right?

00:34:07 So 3,500 just allows us to not filter a lot of the data.

00:34:13 So we like to have a lot of data in deep learning

00:34:15 as you probably know.

00:34:17 So we take these kind of 3,500 and above,

00:34:20 but then we do a very interesting trick,

00:34:22 which is we tell the neural network

00:34:25 what level they are imitating.

00:34:27 So we say, this replay you’re gonna try to imitate

00:34:30 to predict the next action for all the actions

00:34:33 that you’re gonna see is a 4,000 MMR replay.

00:34:36 This one is a 6,000 MMR replay.

00:34:38 And what’s cool about this is then we take this policy

00:34:42 that is being trained from human,

00:34:44 and then we can ask it to play like a 3000 MMR player

00:34:47 by setting a beat saying, well, okay,

00:34:49 play like a 3000 MMR player

00:34:51 or play like a 6,000 MMR player.

00:34:53 And you actually see how the policy behaves differently.

00:34:57 It gets worse economy if you play like a goal level player,

00:35:01 it does less actions per minute,

00:35:03 which is the number of clicks or number of actions

00:35:05 that you will issue in a whole minute.

00:35:07 And it’s very interesting to see

00:35:09 that it kind of imitates the skill level quite well.

00:35:12 But if we ask it to play like a 6,000 MMR player,

00:35:15 we tested, of course, these policies to see how well they do.

00:35:18 They actually beat all the built in AIs

00:35:20 that Blizzard put in the game,

00:35:22 but they’re nowhere near 6,000 MMR players, right?

00:35:25 They might be maybe around goal level, platinum, perhaps.

00:35:29 So there’s still a lot of work to be done for the policy

00:35:32 to truly understand what it means to win.

00:35:35 So far, we only asked them, okay, here is the screen.

00:35:38 And that’s what’s happened on the game until this point.

00:35:41 What would the next action be if we ask a pro to now say,

00:35:46 oh, you’re gonna click here or here or there.

00:35:49 And the point is experiencing wins and losses

00:35:53 is very important to then start to refine.

00:35:56 Otherwise the policy can get loose,

00:35:58 can just go off policy as we call it.

00:36:00 That’s so interesting that you can at least hope eventually

00:36:03 to be able to control a policy

00:36:06 approximately to be at some MMR level.

00:36:10 That’s so interesting, especially given that you have

00:36:12 ground truth for a lot of these cases.

00:36:15 Can I ask you a personal question?

00:36:17 What’s your MMR?

00:36:19 Well, I haven’t played StarCraft II, so I am unranked,

00:36:23 which is the kind of lowest league.

00:36:26 So I used to play StarCraft, the first one.

00:36:29 But you haven’t seriously played StarCraft II.

00:36:32 So the best player we have at DeepMind is about 5,000 MMR,

00:36:37 which is high masters.

00:36:39 It’s not at grand master level.

00:36:42 Grand master level will be the top 200 players

00:36:44 in a certain region like Europe or America or Asia.

00:36:49 But for me, it would be hard to say.

00:36:51 I am very bad at the game.

00:36:53 I actually played AlphaStar a bit too late and it beat me.

00:36:56 I remember the whole team was, oh, Oreo, you should play.

00:36:59 And I was, oh, it looks like it’s not so good yet.

00:37:02 And then I remember I kind of got busy

00:37:04 and waited an extra week and I played

00:37:07 and it really beat me very badly.

00:37:09 Was that, I mean, how did that feel?

00:37:11 Isn’t that an amazing feeling?

00:37:12 That’s amazing, yeah.

00:37:13 I mean, obviously I tried my best

00:37:16 and I tried to also impress my,

00:37:18 because I actually played the first game.

00:37:19 So I’m still pretty good at micromanagement.

00:37:23 The problem is I just don’t understand StarCraft II.

00:37:25 I understand StarCraft.

00:37:27 And when I played StarCraft,

00:37:28 I probably was consistently like for a couple of years,

00:37:32 top 32 in Europe.

00:37:34 So I was decent, but at the time we didn’t have

00:37:37 this kind of MMR system as well established.

00:37:40 So it would be hard to know what it was back then.

00:37:43 So what’s the difference in interface

00:37:44 between AlphaStar and StarCraft

00:37:47 and a human player in StarCraft?

00:37:49 Is there any significant differences

00:37:52 between the way they both see the game?

00:37:54 I would say the way they see the game,

00:37:56 there’s a few things that are just very hard to simulate.

00:38:01 The main one perhaps, which is obvious in hindsight

00:38:05 is what’s called cloaked units, which are invisible units.

00:38:10 So in StarCraft, you can make some units

00:38:13 that you need to have a particular kind of unit

00:38:16 to detect it.

00:38:18 So these units are invisible.

00:38:20 If you cannot detect them, you cannot target them.

00:38:22 So they would just destroy your buildings

00:38:25 or kill your workers.

00:38:27 But despite the fact you cannot target the unit,

00:38:31 there’s a shimmer that as a human you observe.

00:38:34 I mean, you need to train a little bit,

00:38:35 you need to pay attention,

00:38:37 but you would see this kind of space time distortion

00:38:41 and you would know, okay, there are, yeah.

00:38:44 Yeah, there’s like a wave thing.

00:38:46 Yeah, it’s called shimmer.

00:38:47 Space time distortion, I like it.

00:38:49 That’s really like, the Blizzard term is shimmer.

00:38:51 Shimmer, okay.

00:38:52 And so these shimmer professional players

00:38:55 actually can see it immediately.

00:38:57 They understand it very well,

00:38:59 but it’s still something that requires

00:39:01 certain amount of attention

00:39:02 and it’s kind of a bit annoying to deal with.

00:39:05 Whereas for AlphaStar, in terms of vision,

00:39:08 it’s very hard for us to simulate sort of,

00:39:11 oh, are you looking at this pixel in the screen and so on?

00:39:14 So the only thing we can do is,

00:39:17 there is a unit that’s invisible over there.

00:39:19 So AlphaStar would know that immediately.

00:39:22 Obviously still obeys the rules.

00:39:24 You cannot attack the unit.

00:39:25 You must have a detector and so on,

00:39:27 but it’s kind of one of the main things

00:39:29 that it just doesn’t feel there’s a very proper way.

00:39:32 I mean, you could imagine, oh, you don’t have hypers.

00:39:35 Maybe you don’t know exactly where it is,

00:39:37 or sometimes you see it, sometimes you don’t,

00:39:39 but it’s just really, really complicated to get it

00:39:43 so that everyone would agree,

00:39:44 oh, that’s the best way to simulate this, right?

00:39:47 It seems like a perception problem.

00:39:49 It is a perception problem.

00:39:50 So the only problem is people, you ask,

00:39:54 oh, what’s the difference between

00:39:55 how humans perceive the game?

00:39:56 I would say they wouldn’t be able to tell a shimmer

00:39:59 immediately as it appears on the screen,

00:40:02 whereas AlphaStar in principle sees it very sharply, right?

00:40:05 It sees that the bit turned from zero to one,

00:40:08 meaning there’s now a unit there,

00:40:10 although you don’t know the unit,

00:40:11 or you know that you cannot attack it and so on.

00:40:15 So that from a vision standpoint,

00:40:18 that probably is the one that is kind of the most obvious one.

00:40:22 Then there are things humans cannot do perfectly,

00:40:25 even professionals, which is they might miss a detail,

00:40:28 or they might have not seen a unit.

00:40:30 And obviously as a computer,

00:40:32 if there’s a corner of the screen that turns green

00:40:35 because a unit enters the field of view,

00:40:37 that can go into the memory of the agent, the LSTM,

00:40:41 and persist there for a while,

00:40:42 and for however long is relevant, right?

00:40:45 And in terms of action,

00:40:47 it seems like the rate of action from AlphaStar

00:40:50 is comparative, if not slower than professional players,

00:40:54 but it’s more precise is what I read.

00:40:57 So that’s really probably the one that is causing us

00:41:01 more issues for a couple of reasons, right?

00:41:05 The first one is StarCraft has been an AI environment

00:41:08 for quite a few years.

00:41:09 In fact, I mean, I was participating

00:41:12 in the very first competition back in 2010.

00:41:15 And there’s really not been a kind of a very clear set

00:41:19 of rules how the actions per minute,

00:41:22 the rate of actions that you can issue is.

00:41:24 And as a result, these agents or bots that people build

00:41:29 in a kind of almost very cool way,

00:41:31 they do like 20,000, 40,000 actions per minute.

00:41:35 Now, to put this in perspective,

00:41:37 a very good professional human

00:41:39 might do 300 to 800 actions per minute.

00:41:44 They might not be as precise.

00:41:45 That’s why the range is a bit tricky to identify exactly.

00:41:49 I mean, 300 actions per minute precisely

00:41:51 is probably realistic.

00:41:53 800 is probably not, but you see humans doing a lot of actions

00:41:56 because they warm up and they kind of select things

00:41:59 and spam and so on just so that when they need,

00:42:02 they have the accuracy.

00:42:04 So we came into this by not having kind of a standard way

00:42:09 to say, well, how do we measure whether an agent is

00:42:13 at human level or not?

00:42:15 On the other hand, we had a huge advantage,

00:42:18 which is because we do imitation learning,

00:42:21 agents turned out to act like humans

00:42:24 in terms of rate of actions, even

00:42:26 precisions and imprecisions of actions

00:42:28 in the supervised policy.

00:42:30 You could see all these.

00:42:31 You could see how agents like to spam click, to move here.

00:42:34 If you played especially Diablo, you wouldn’t know what I mean.

00:42:37 I mean, you just like spam, oh, move here, move here,

00:42:39 move here.

00:42:40 You’re doing literally like maybe five actions

00:42:43 in two seconds, but these actions are not

00:42:45 very meaningful.

00:42:46 One would have sufficed.

00:42:48 So on the one hand, we start from this imitation policy

00:42:52 that is at the ballpark of the actions per minutes of humans

00:42:55 because it’s actually statistically

00:42:57 trying to imitate humans.

00:42:58 So we see these very nicely in the curves

00:43:01 that we showed in the blog post.

00:43:02 There’s these actions per minute,

00:43:04 and the distribution looks very human like.

00:43:07 But then, of course, as self play kicks in,

00:43:10 and that’s the part we haven’t talked too much yet,

00:43:13 but of course, the agent must play against itself to improve,

00:43:17 then there’s almost no guarantees

00:43:19 that these actions will not become more precise

00:43:22 or even the rate of actions is going to increase over time.

00:43:26 So what we did, and this is probably

00:43:29 the first attempt that we thought was reasonable,

00:43:31 is we looked at the distribution of actions

00:43:33 for humans for certain windows of time.

00:43:36 And just to give a perspective, because I guess I mentioned

00:43:39 that some of these agents that are programmatic,

00:43:41 let’s call them.

00:43:42 They do 40,000 actions per minute.

00:43:44 Professionals, as I said, do 300 to 800.

00:43:47 So what we looked is we look at the distribution

00:43:49 over professional gamers, and we took reasonably high actions

00:43:53 per minute, but we kind of identify certain cutoffs

00:43:57 after which, even if the agent wanted to act,

00:44:00 these actions would be dropped.

00:44:02 But the problem is this cutoff is probably set a bit too high.

00:44:07 And what ends up happening, even though the games,

00:44:10 and when we ask the professionals and the gamers,

00:44:12 by and large, they feel like it’s playing humanlike,

00:44:15 there are some agents that developed maybe slightly

00:44:20 too high APMs, which is actions per minute,

00:44:24 combined with the precision, which

00:44:27 made people start discussing a very interesting issue, which

00:44:30 is, should we have limited these?

00:44:32 Should we just let it lose and see what cool things

00:44:35 it can come up with?

00:44:37 Right?

00:44:37 Interesting.

00:44:38 So this is in itself an extremely interesting

00:44:41 question, but the same way that modeling the shimmer

00:44:44 would be so difficult, modeling absolutely all the details

00:44:47 about muscles and precision and tiredness of humans

00:44:51 would be quite difficult.

00:44:52 So we’re really here kind of innovating

00:44:56 in this sense of, OK, what could be maybe

00:44:58 the next iteration of putting more rules that

00:45:02 makes the agents more humanlike in terms of restrictions?

00:45:06 Yeah, putting constraints that.

00:45:08 More constraints, yeah.

00:45:09 That’s really interesting.

00:45:10 That’s really innovative.

00:45:11 So one of the constraints you put on yourself,

00:45:15 or at least focused in, is on the Protoss race,

00:45:18 as far as I understand.

00:45:19 Can you tell me about the different races

00:45:21 and how they, so Protoss, Terran, and Zerg,

00:45:26 how do they compare?

00:45:27 How do they interact?

00:45:28 Why did you choose Protoss?

00:45:30 Yeah, in the dynamics of the game seen

00:45:34 from a strategic perspective.

00:45:35 So Protoss, so in StarCraft there are three races.

00:45:39 Indeed, in the demonstration, we saw only the Protoss race.

00:45:43 So maybe let’s start with that one.

00:45:45 Protoss is kind of the most technologically advanced race.

00:45:49 It has units that are expensive but powerful.

00:45:53 So in general, you want to kind of conserve your units

00:45:57 as you go attack.

00:45:59 And then you want to utilize these tactical advantages

00:46:03 of very fancy spells and so on and so forth.

00:46:07 And at the same time, they’re kind of,

00:46:11 people say they’re a bit easier to play perhaps.

00:46:15 But that I actually didn’t know.

00:46:17 I mean, I just talked now a lot to the players

00:46:20 that we work with, TLO and Mana, and they said, oh yeah,

00:46:23 Protoss is actually, people think,

00:46:24 is actually one of the easiest races.

00:46:26 So perhaps the easier, that doesn’t

00:46:28 mean that it’s obviously professional players

00:46:32 excel at the three races.

00:46:34 And there’s never a race that dominates

00:46:37 for a very long time anyway.

00:46:38 So if you look at the top, I don’t know, 100 in the world,

00:46:41 is there one race that dominates that list?

00:46:44 It would be hard to know because it depends on the regions.

00:46:46 I think it’s pretty equal in terms of distribution.

00:46:50 And Blizzard wants it to be equal.

00:46:53 They wouldn’t want one race like Protoss

00:46:56 to not be representative in the top place.

00:46:59 So definitely, they tried it to be balanced.

00:47:03 So then maybe the opposite race of Protoss is Zerg.

00:47:07 Zerg is a race where you just kind of expand and take over

00:47:11 as many resources as you can, and they

00:47:14 have a very high capacity to regenerate their units.

00:47:17 So if you have an army, it’s not that valuable in terms

00:47:20 of losing the whole army is not a big deal as Zerg

00:47:23 because you can then rebuild it.

00:47:25 And given that you generally accumulate

00:47:28 a huge bank of resources, Zergs typically

00:47:31 play by applying a lot of pressure,

00:47:34 maybe losing their whole army, but then rebuilding it

00:47:37 quickly.

00:47:37 So although, of course, every race, I mean, there’s never,

00:47:42 I mean, they’re pretty diverse.

00:47:43 I mean, there are some units in Zerg that

00:47:45 are technologically advanced, and they do

00:47:47 some very interesting spells.

00:47:48 And there’s some units in Protoss that are less valuable,

00:47:51 and you could lose a lot of them and rebuild them,

00:47:53 and it wouldn’t be a big deal.

00:47:55 All right, so maybe I’m missing out.

00:47:57 Maybe I’m going to say some dumb stuff, but summary

00:48:01 of strategy.

00:48:02 So first, there’s collection of a lot of resources.

00:48:05 That’s one option.

00:48:06 The other one is expanding, so building other bases.

00:48:11 Then the other is obviously building units

00:48:15 and attacking with those units.

00:48:17 And then I don’t know what else there is.

00:48:20 Maybe there’s the different timing of attacks,

00:48:24 like do I attack early, attack late?

00:48:26 What are the different strategies that emerged

00:48:28 that you’ve learned about?

00:48:29 I’ve read that a bunch of people are super happy

00:48:31 that you guys have apparently, that Alpha Star apparently

00:48:34 has discovered that it’s really good to,

00:48:36 what is it, saturate?

00:48:38 Oh yeah, the mineral line.

00:48:39 Yeah, the mineral line.

00:48:41 Yeah, yeah.

00:48:42 And that’s for greedy amateur players like myself.

00:48:45 That’s always been a good strategy.

00:48:47 You just build up a lot of money,

00:48:49 and it just feels good to just accumulate and accumulate.

00:48:53 So thank you for discovering that and validating all of us.

00:48:56 But is there other strategies that you discovered

00:48:59 that are interesting, unique to this game?

00:49:01 Yeah, so if you look at the kind of,

00:49:05 not being a StarCraft II player,

00:49:06 but of course StarCraft and StarCraft II

00:49:08 and real time strategy games in general are very similar.

00:49:12 I would classify perhaps the openings of the game.

00:49:17 They’re very important.

00:49:18 And generally I would say there’s two kinds of openings.

00:49:21 One that’s a standard opening.

00:49:23 That’s generally how players find sort of a balance

00:49:28 between risk and economy and building some units early on

00:49:32 so that they could defend,

00:49:34 but they’re not too exposed basically,

00:49:36 but also expanding quite quickly.

00:49:38 So this would be kind of a standard opening.

00:49:41 And within a standard opening,

00:49:43 then what you do choose generally is

00:49:45 what technology are you aiming towards?

00:49:47 So there’s a bit of rock, paper, scissors

00:49:49 of you could go for spaceships

00:49:52 or you could go for invisible units

00:49:54 or you could go for, I don’t know,

00:49:55 like massive units that attack against certain kinds

00:49:58 of units, but they’re weak against others.

00:50:01 So standard openings themselves have some choices

00:50:05 like rock, paper, scissors style.

00:50:06 Of course, if you scout and you’re good

00:50:08 at guessing what the opponent is doing,

00:50:10 then you can play as an advantage

00:50:12 because if you know you’re gonna play rock,

00:50:13 I mean, I’m gonna play paper obviously.

00:50:15 So you can imagine that normal standard games

00:50:18 in StarCraft looks like a continuous rock, paper,

00:50:22 scissors game where you guess what the distribution

00:50:26 of rock, paper, and scissors is from the enemy

00:50:29 and reacting accordingly to try to beat it

00:50:32 or put the paper out before he kind of changes his mind

00:50:36 from rock to scissors,

00:50:38 and then you would be in a weak position.

00:50:39 So, sorry to pause on that.

00:50:41 I didn’t realize this element

00:50:42 because I know it’s true with poker.

00:50:44 I know I looked at Labratus.

00:50:48 So you’re also estimating trying to guess the distribution,

00:50:51 trying to better and better estimate the distribution

00:50:53 of what the opponent is likely to be doing.

00:50:55 Yeah, I mean, as a player,

00:50:56 you definitely wanna have a belief state

00:50:59 over what’s up on the other side of the map.

00:51:02 And when your belief state becomes inaccurate,

00:51:05 when you start having that serious doubts,

00:51:07 whether he’s gonna play something that you must know,

00:51:10 that’s when you scout.

00:51:11 You wanna then gather information, right?

00:51:14 Is improving the accuracy of the belief

00:51:15 or improving the belief state part of the loss

00:51:19 that you’re trying to optimize?

00:51:20 Or is it just a side effect?

00:51:22 It’s implicit, but you could explicitly model it,

00:51:25 and it would be quite good at probably predicting

00:51:27 what’s on the other side of the map.

00:51:30 But so far, it’s all implicit.

00:51:32 There’s no additional reward for predicting the enemy.

00:51:36 So there’s these standard openings,

00:51:38 and then there’s what people call cheese,

00:51:41 which is very interesting.

00:51:42 And AlphaStar sometimes really likes this kind of cheese.

00:51:46 These cheeses, what they are is kind of an all in strategy.

00:51:50 You’re gonna do something sneaky.

00:51:53 You’re gonna hide your own buildings

00:51:56 close to the enemy base,

00:51:58 or you’re gonna go for hiding your technological buildings

00:52:01 so that you do invisible units

00:52:03 and the enemy just cannot react to detect it

00:52:06 and thus lose the game.

00:52:07 And there’s quite a few of these cheeses

00:52:10 and variants of them.

00:52:11 And there it’s where actually the belief state

00:52:14 becomes even more important.

00:52:16 Because if I scout your base and I see no buildings at all,

00:52:20 any human player knows something’s up.

00:52:22 They might know, well,

00:52:23 you’re hiding something close to my base.

00:52:25 Should I build suddenly a lot of units to defend?

00:52:28 Should I actually block my ramp with workers

00:52:31 so that you cannot come and destroy my base?

00:52:33 So there’s all this is happening

00:52:35 and defending against cheeses is extremely important.

00:52:39 And in the AlphaStar League,

00:52:40 many agents actually develop some cheesy strategies.

00:52:45 And in the games we saw against TLO and Mana,

00:52:48 two out of the 10 agents

00:52:49 were actually doing these kind of strategies

00:52:51 which are cheesy strategies.

00:52:53 And then there’s a variant of cheesy strategy

00:52:55 which is called all in.

00:52:57 So an all in strategy is not perhaps as drastic as,

00:53:00 oh, I’m gonna build cannons on your base

00:53:02 and then bring all my workers

00:53:03 and try to just disrupt your base and game over,

00:53:06 or GG as we say in StarCraft.

00:53:09 There’s these kind of very cool things

00:53:11 that you can align precisely at a certain time mark.

00:53:14 So for instance,

00:53:15 you can generate exactly 10 unit composition

00:53:19 that is perfect, like five of this type,

00:53:21 five of this other type,

00:53:22 and align the upgrade

00:53:24 so that at four minutes and a half, let’s say,

00:53:27 you have these 10 units and the upgrade just finished.

00:53:30 And at that point, that army is really scary.

00:53:33 And unless the enemy really knows what’s going on,

00:53:36 if you push, you might then have an advantage

00:53:40 because maybe the enemy is doing something more standard,

00:53:42 it expanded too much, it developed too much economy,

00:53:45 and it trade off badly against having defenses,

00:53:49 and the enemy will lose.

00:53:51 But it’s called all in because if you don’t win,

00:53:53 then you’re gonna lose.

00:53:55 So you see players that do these kinds of strategies,

00:53:57 if they don’t succeed, game is not over.

00:54:00 I mean, they still have a base

00:54:01 and they still gathering minerals,

00:54:02 but they will just GG out of the game

00:54:04 because they know, well, game is over.

00:54:06 I gambled and I failed.

00:54:08 So if we start entering the game theoretic aspects

00:54:12 of the game, it’s really rich and it’s really,

00:54:15 that’s why it also makes it quite entertaining to watch.

00:54:17 Even if I don’t play, I still enjoy watching the game.

00:54:21 But the agents are trying to do this mostly implicitly.

00:54:26 But one element that we improved in self play

00:54:29 is creating the Alpha Star League.

00:54:31 And the Alpha Star League is not pure self play.

00:54:34 It’s trying to create a different personalities of agents

00:54:37 so that some of them will become cheesy agents.

00:54:41 Some of them might become very economical, very greedy,

00:54:44 like getting all the resources,

00:54:46 but then being maybe early on, they’re gonna be weak,

00:54:48 but later on, they’re gonna be very strong.

00:54:51 And by creating this personality of agents,

00:54:53 which sometimes it just happens naturally

00:54:55 that you can see kind of an evolution of agents

00:54:58 that given the previous generation,

00:55:00 they train against all of them

00:55:02 and then they generate kind of the perfect counter

00:55:04 to that distribution.

00:55:05 But these agents, you must have them in the populations

00:55:09 because if you don’t have them,

00:55:11 you’re not covered against these things.

00:55:13 You wanna create all sorts of the opponents

00:55:17 that you will find in the wild.

00:55:18 So you can be exposed to these cheeses, early aggression,

00:55:23 later aggression, more expansions,

00:55:25 dropping units in your base from the side, all these things.

00:55:29 And pure self play is getting a bit stuck

00:55:32 at finding some subset of these, but not all of these.

00:55:36 So the Alpha Star League is a way

00:55:38 to kind of do an ensemble of agents

00:55:41 that they’re all playing in a league,

00:55:43 much like people play on Battle.net, right?

00:55:45 They play, you play against someone

00:55:47 who does a new cool strategy and you immediately,

00:55:50 oh my God, I wanna try it, I wanna play again.

00:55:53 And this to me was another critical part of the problem,

00:55:57 which was, can we create a Battle.net for agents?

00:56:01 And that’s kind of what the Alpha Star League really is.

00:56:03 That’s fascinating.

00:56:04 And where they stick to their different strategies.

00:56:06 Yeah, wow, that’s really, really interesting.

00:56:09 But that said, you were fortunate enough

00:56:13 or just skilled enough to win five, zero.

00:56:17 And so how hard is it to win?

00:56:19 I mean, that’s not the goal.

00:56:20 I guess, I don’t know what the goal is.

00:56:21 The goal should be to win majority, not five, zero,

00:56:25 but how hard is it in general to win all matchups

00:56:29 on a one V one?

00:56:31 So that’s a very interesting question

00:56:33 because once you see Alpha Star and superficially

00:56:38 you think, well, okay, it won.

00:56:40 Let’s, if you sum all the games like 10 to one, right?

00:56:42 It lost the game that it played with the camera interface.

00:56:46 You might think, well, that’s done, right?

00:56:48 It’s superhuman at the game.

00:56:50 And that’s not really the claim we really can make actually.

00:56:55 The claim is we beat a professional gamer

00:56:58 for the first time.

00:57:00 StarCraft has really been a thing

00:57:02 that has been going on for a few years,

00:57:04 but a moment like this had not occurred before yet.

00:57:09 But are these agents impossible to beat?

00:57:12 Absolutely not, right?

00:57:13 So that’s a bit what’s kind of the difference is

00:57:17 the agents play at grandmaster level.

00:57:19 They definitely understand the game enough

00:57:21 to play extremely well, but are they unbeatable?

00:57:24 Do they play perfect?

00:57:26 No, and actually in StarCraft,

00:57:29 because of these sneaky strategies,

00:57:32 it’s always possible that you might take a huge risk

00:57:34 sometimes, but you might get wins, right?

00:57:36 Out of this.

00:57:38 So I think that as a domain,

00:57:41 it still has a lot of opportunities,

00:57:43 not only because of course we wanna learn

00:57:45 with less experience, we would like to,

00:57:47 I mean, if I learned to play Protoss,

00:57:49 I can play Terran and learn it much quicker

00:57:52 than Alpha Star can, right?

00:57:53 So there are obvious interesting research challenges

00:57:56 as well, but even as the raw performance goes,

00:58:02 really the claim here can be we are at pro level

00:58:05 or at high grandmaster level,

00:58:08 but obviously the players also did not know what to expect,

00:58:13 right?

00:58:14 Their prior distribution was a bit off

00:58:15 because they played this kind of new like alien brain

00:58:19 as they like to say it, right?

00:58:21 And that’s what makes it exciting for them.

00:58:24 But also I think if you look at the games closely,

00:58:27 you see there were weaknesses in some points,

00:58:30 maybe Alpha Star did not scout,

00:58:32 or if it had invisible units going against

00:58:35 at certain points, it wouldn’t have known

00:58:37 and it would have been bad.

00:58:38 So there’s still quite a lot of work to do,

00:58:42 but it’s really a very exciting moment for us

00:58:44 to be seeing, wow, a single neural net on a GPU

00:58:48 is actually playing against these guys

00:58:50 who are amazing.

00:58:51 I mean, you have to see them play in life.

00:58:52 They’re really, really amazing players.

00:58:55 Yeah, I’m sure there must be a guy in Poland

00:58:59 somewhere right now training his butt off

00:59:02 to make sure that this never happens again with Alpha Star.

00:59:05 So that’s really exciting in terms of Alpha Star

00:59:09 having some holes to exploit, which is great.

00:59:11 And then we build on top of each other

00:59:13 and it feels like StarCraft on let go,

00:59:16 even if you win, it’s still not,

00:59:20 there’s so many different dimensions

00:59:23 in which you can explore.

00:59:24 So that’s really, really interesting.

00:59:25 Do you think there’s a ceiling to Alpha Star?

00:59:28 You’ve said that it hasn’t reached,

00:59:31 you know, this is a big,

00:59:32 wait, let me actually just pause for a second.

00:59:35 How did it feel to come here to this point,

00:59:40 to beat a top professional player?

00:59:42 Like that night, I mean, you know,

00:59:44 Olympic athletes have their gold medal, right?

00:59:47 This is your gold medal in a sense.

00:59:48 Sure, you’re cited a lot,

00:59:50 you’ve published a lot of prestigious papers, whatever,

00:59:53 but this is like a win.

00:59:55 How did it feel?

00:59:56 I mean, it was, for me, it was unbelievable

00:59:59 because first the win itself,

01:00:03 I mean, it was so exciting.

01:00:05 I mean, so looking back to those last days of 2018 really,

01:00:11 that’s when the games were played.

01:00:13 I’m sure I look back at that moment, I’ll say,

01:00:15 oh my God, I want to be in a project like that.

01:00:18 It’s like, I already feel the nostalgia of like,

01:00:21 yeah, that was huge in terms of the energy

01:00:24 and the team effort that went into it.

01:00:26 And so in that sense, as soon as it happened,

01:00:29 I already knew it was kind of,

01:00:31 I was losing it a little bit.

01:00:33 So it is almost like sad that it happened and oh my God,

01:00:36 but on the other hand, it also verifies the approach.

01:00:41 But to me also, there’s so many challenges

01:00:43 and interesting aspects of intelligence

01:00:46 that even though we can train a neural network

01:00:49 to play at the level of the best humans,

01:00:52 there’s still so many challenges.

01:00:54 So for me, it’s also like, well,

01:00:55 this is really an amazing achievement,

01:00:57 but I already was also thinking about next steps.

01:00:59 I mean, as I said, these Asians play Protoss versus Protoss,

01:01:04 but they should be able to play a different race

01:01:07 much quicker, right?

01:01:08 So that would be an amazing achievement.

01:01:10 Some people call this meta reinforcement learning,

01:01:13 meta learning and so on, right?

01:01:15 So there’s so many possibilities after that moment,

01:01:18 but the moment itself, it really felt great.

01:01:23 We had this bet, so I’m kind of a pessimist in general.

01:01:27 So I kind of send an email to the team.

01:01:29 I said, okay, let’s against TLO first, right?

01:01:33 Like what’s gonna be the result?

01:01:35 And I really thought we would lose like five zero, right?

01:01:38 We had some calibration made against the 5,000 MMR player.

01:01:44 TLO was much stronger than that player,

01:01:47 even if he played Protoss, which is his off race.

01:01:51 But yeah, I was not imagining we would win.

01:01:53 So for me, that was just kind of a test run or something.

01:01:55 And then it really kind of, he was really surprised.

01:01:59 And unbelievably, we went to this bar to celebrate

01:02:04 and Dave tells me, well, why don’t we invite someone

01:02:08 who is a thousand MMR stronger in Protoss,

01:02:10 like actual Protoss player,

01:02:12 like that it turned up being Mana, right?

01:02:16 And we had some drinks and I said, sure, why not?

01:02:19 But then I thought, well,

01:02:20 that’s really gonna be impossible to beat.

01:02:22 I mean, even because it’s so much ahead,

01:02:24 a thousand MMR is really like 99% probability

01:02:28 that Mana would beat TLO as Protoss versus Protoss, right?

01:02:33 So we did that.

01:02:34 And to me, the second game was much more important,

01:02:38 even though a lot of uncertainty kind of disappeared

01:02:42 after we kind of beat TLO.

01:02:43 I mean, he is a professional player.

01:02:45 So that was kind of, oh,

01:02:46 but that’s really a very nice achievement.

01:02:49 But Mana really was at the top

01:02:51 and you could see he played much better,

01:02:53 but our agents got much better too.

01:02:55 So it’s like, ah, and then after the first game,

01:02:59 I said, if we take a single game,

01:03:00 at least we can say we beat a game.

01:03:02 I mean, even if we don’t beat the series,

01:03:04 for me, that was a huge relief.

01:03:06 And I mean, I remember the hugging demis.

01:03:09 And I mean, it was really like,

01:03:10 this moment for me will resonate forever as a researcher.

01:03:14 And I mean, as a person,

01:03:15 and yeah, it’s a really like great accomplishment.

01:03:18 And it was great also to be there with the team in the room.

01:03:21 I don’t know if you saw like this.

01:03:23 So it was really like.

01:03:24 I mean, from my perspective,

01:03:25 the other interesting thing is just like watching Kasparov,

01:03:29 watching Mana was also interesting

01:03:33 because he didn’t, he has kind of a loss of words.

01:03:36 I mean, whenever you lose, I’ve done a lot of sports.

01:03:38 You sometimes say excuses, you look for reasons.

01:03:43 And he couldn’t really come up with reasons.

01:03:46 I mean, so with the off race for Protoss,

01:03:50 you could say, well, it felt awkward, it wasn’t,

01:03:52 but here it was just beaten.

01:03:55 And it was beautiful to look at a human being

01:03:57 being superseded by an AI system.

01:04:00 I mean, it’s a beautiful moment for researchers, so.

01:04:04 Yeah, for sure it was.

01:04:05 I mean, probably the highlight of my career so far

01:04:09 because of its uniqueness and coolness.

01:04:11 And I don’t know, I mean, it’s obviously, as you said,

01:04:14 you can look at papers, citations and so on,

01:04:16 but these really is like a testament

01:04:19 of the whole machine learning approach

01:04:22 and using games to advance technology.

01:04:24 I mean, it really was,

01:04:26 everything came together at that moment.

01:04:28 That’s really the summary.

01:04:29 Also on the other side, it’s a popularization of AI too,

01:04:34 because it’s just like traveling to the moon and so on.

01:04:38 I mean, this is where a very large community of people

01:04:41 that don’t really know AI,

01:04:43 they get to really interact with it.

01:04:45 Which is very important.

01:04:46 I mean, we must, you know,

01:04:48 writing papers helps our peers, researchers,

01:04:51 to understand what we’re doing.

01:04:52 But I think AI is becoming mature enough

01:04:55 that we must sort of try to explain what it is.

01:04:59 And perhaps through games is an obvious way

01:05:01 because these games always had built in AI.

01:05:03 So it may be everyone experience an AI playing a video game,

01:05:07 even if they don’t know,

01:05:08 because there’s always some scripted element

01:05:10 and some people might even call that AI already, right?

01:05:13 So what are other applications

01:05:16 of the approaches underlying AlphaStar

01:05:19 that you see happening?

01:05:20 There’s a lot of echoes of, you said,

01:05:22 transformer of language modeling and so on.

01:05:25 Have you already started thinking

01:05:27 where the breakthroughs in AlphaStar

01:05:30 get expanded to other applications?

01:05:32 Right, so I thought about a few things

01:05:34 for like kind of next month, next years.

01:05:38 The main thing I’m thinking about actually is what’s next

01:05:41 as a kind of a grand challenge.

01:05:43 Because for me, like we’ve seen Atari

01:05:47 and then there’s like the sort of three dimensional walls

01:05:50 that we’ve seen also like pretty good performance

01:05:52 from these capture the flag agents

01:05:54 that also some people at DeepMind and elsewhere

01:05:56 are working on.

01:05:57 We’ve also seen some amazing results on like,

01:05:59 for instance, Dota 2, which is also a very complicated game.

01:06:03 So for me, like the main thing I’m thinking about

01:06:05 is what’s next in terms of challenge.

01:06:07 So as a researcher, I see sort of two tensions

01:06:12 between research and then applications or areas

01:06:16 or domains where you apply them.

01:06:18 So on the one hand, we’ve done,

01:06:20 thanks to the application of StarCraft is very hard.

01:06:23 We developed some techniques, some new research

01:06:25 that now we could look at elsewhere.

01:06:27 Like are there other applications where we can apply these?

01:06:30 And the obvious ones, absolutely.

01:06:32 You can think of feeding back to sort of the community

01:06:37 we took from, which was mostly sequence modeling

01:06:40 or natural language processing.

01:06:41 So we’ve developed and extended things from the transformer

01:06:46 and we use pointer networks.

01:06:48 We combine LSTM and transformers in interesting ways.

01:06:51 So that’s perhaps the kind of lowest hanging fruit

01:06:54 of feeding back to now a different field

01:06:57 of machine learning that’s not playing video games.

01:07:00 Let me go old school and jump to Mr. Alan Turing.

01:07:05 So the Turing test is a natural language test,

01:07:09 a conversational test.

01:07:11 What’s your thought of it as a test for intelligence?

01:07:15 Do you think it is a grand challenge

01:07:17 that’s worthy of undertaking?

01:07:18 Maybe if it is, would you reformulate it or phrase it

01:07:22 somehow differently?

01:07:23 Right, so I really love the Turing test

01:07:25 because I also like sequences and language understanding.

01:07:29 And in fact, some of the early work

01:07:32 we did in machine translation, we

01:07:33 tried to apply to kind of a neural chatbot, which obviously

01:07:38 would never pass the Turing test because it was very limited.

01:07:42 But it is a very fascinating idea

01:07:45 that you could really have an AI that

01:07:49 would be indistinguishable from humans in terms of asking

01:07:53 or conversing with it.

01:07:56 So I think the test itself seems very nice.

01:08:00 And it’s kind of well defined, actually,

01:08:02 like the passing it or not.

01:08:04 I think there’s quite a few rules

01:08:06 that feel pretty simple.

01:08:09 And I think they have these competitions every year.

01:08:14 Yes, there’s the Lebner Prize.

01:08:15 But I don’t know if you’ve seen the kind of bots

01:08:22 that emerge from that competition.

01:08:24 They’re not quite as what you would.

01:08:27 So it feels like that there’s weaknesses with the way Turing

01:08:30 formulated it.

01:08:31 It needs to be that the definition

01:08:34 of a genuine, rich, fulfilling human conversation,

01:08:39 it needs to be something else.

01:08:41 Like the Alexa Prize, which I’m not as well familiar with,

01:08:44 has tried to define that more, I think,

01:08:46 by saying you have to continue keeping

01:08:48 a conversation for 30 minutes, something like that.

01:08:52 So basically forcing the agent not to just fool,

01:08:55 but to have an engaging conversation kind of thing.

01:09:02 Have you thought about this problem richly?

01:09:06 And if you have in general, how far away are we from?

01:09:10 You worked a lot on language understanding,

01:09:14 language generation, but the full dialogue,

01:09:16 the conversation, just sitting at the bar

01:09:19 having a couple of beers for an hour,

01:09:21 that kind of conversation.

01:09:22 Have you thought about it?

01:09:23 Yeah, so I think you touched here

01:09:25 on the critical point, which is feasibility.

01:09:28 So there’s a great essay by Hamming,

01:09:32 which describes sort of grand challenges of physics.

01:09:37 And he argues that, well, OK, for instance,

01:09:41 teleportation or time travel are great grand challenges

01:09:44 of physics, but there’s no attacks.

01:09:46 We really don’t know or cannot kind of make any progress.

01:09:50 So that’s why most physicists and so on,

01:09:53 they don’t work on these in their PhDs

01:09:55 and as part of their careers.

01:09:57 So I see the Turing test, in the full Turing test,

01:10:00 as a bit still too early.

01:10:02 Like I think we’re, especially with the current trend

01:10:06 of deep learning language models,

01:10:10 we’ve seen some amazing examples.

01:10:11 I think GPT2 being the most recent one, which

01:10:14 is very impressive.

01:10:15 But to understand to fully solve passing or fooling a human

01:10:21 to think that there’s a human on the other side,

01:10:23 I think we’re quite far.

01:10:24 So as a result, I don’t see myself

01:10:27 and I probably would not recommend people doing a PhD

01:10:30 on solving the Turing test because it just

01:10:32 feels it’s kind of too early or too hard of a problem.

01:10:35 Yeah, but that said, you said the exact same thing

01:10:37 about StarCraft about a few years ago.

01:10:40 Indeed.

01:10:41 To Demis.

01:10:41 So you’ll probably also be the person who passes

01:10:46 the Turing test in three years.

01:10:48 I mean, I think that, yeah.

01:10:50 So we have this on record.

01:10:52 This is nice.

01:10:52 It’s true.

01:10:53 I mean, it’s true that progress sometimes

01:10:56 is a bit unpredictable.

01:10:57 I really wouldn’t have not.

01:10:59 Even six months ago, I would not have predicted the level

01:11:02 that we see that these agents can deliver at grandmaster

01:11:06 level.

01:11:07 But I have worked on language enough.

01:11:10 And basically, my concern is not that something could happen,

01:11:13 a breakthrough could happen that would bring us to solving

01:11:16 or passing the Turing test, is that I just

01:11:19 think the statistical approach to it is not going to cut it.

01:11:24 So we need a breakthrough, which is great for the community.

01:11:28 But given that, I think there’s quite more uncertainty.

01:11:31 Whereas for StarCraft, I knew what the steps would

01:11:36 be to get us there.

01:11:38 I think it was clear that using the imitation learning part

01:11:41 and then using this battle net for agents

01:11:44 were going to be key.

01:11:45 And it turned out that this was the case.

01:11:48 And a little more was needed, but not much more.

01:11:51 For Turing test, I just don’t know

01:11:53 what the plan or execution plan would look like.

01:11:56 So that’s why I myself working on it as a grand challenge

01:12:00 is hard.

01:12:01 But there are quite a few sub challenges

01:12:03 that are related that you could say,

01:12:05 well, I mean, what if you create a great assistant

01:12:09 like Google already has, like the Google Assistant.

01:12:11 So can we make it better?

01:12:13 And can we make it fully neural and so on?

01:12:15 That I start to believe maybe we’re

01:12:17 reaching a point where we should attempt these challenges.

01:12:20 I like this conversation so much because it echoes very much

01:12:23 the StarCraft conversation.

01:12:24 It’s exactly how you approach StarCraft.

01:12:26 Let’s break it down into small pieces and solve those.

01:12:29 And you end up solving the whole game.

01:12:31 Great.

01:12:31 But that said, you’re behind some

01:12:34 of the biggest pieces of work in deep learning

01:12:37 in the last several years.

01:12:40 So you mentioned some limits.

01:12:42 What do you think of the current limits of deep learning?

01:12:44 And how do we overcome those limits?

01:12:47 So if I had to actually use a single word

01:12:50 to define the main challenge in deep learning,

01:12:53 it’s a challenge that probably has

01:12:55 been the challenge for many years.

01:12:56 And it’s that of generalization.

01:12:59 So what that means is that all that we’re doing

01:13:04 is fitting functions to data.

01:13:06 And when the data we see is not from the same distribution,

01:13:12 or even if there are some times that it

01:13:14 is very close to distribution, but because

01:13:17 of the way we train it with limited samples,

01:13:20 we then get to this stage where we just

01:13:23 don’t see generalization as much as we can generalize.

01:13:27 And I think adversarial examples are a clear example of this.

01:13:31 But if you study machine learning and literature,

01:13:34 and the reason why SVMs came very popular

01:13:38 were because they were dealing and they

01:13:40 had some guarantees about generalization, which

01:13:42 is unseen data or out of distribution,

01:13:45 or even within distribution where you take an image adding

01:13:48 a bit of noise, these models fail.

01:13:51 So I think, really, I don’t see a lot of progress

01:13:56 on generalization in the strong generalization

01:14:00 sense of the word.

01:14:01 I think our neural networks, you can always

01:14:05 find design examples that will make their outputs arbitrary,

01:14:11 which is not good because we humans would never

01:14:15 be fooled by these kind of images

01:14:17 or manipulation of the image.

01:14:19 And if you look at the mathematics,

01:14:21 you kind of understand this is a bunch of matrices

01:14:23 multiplied together.

01:14:26 There’s probably numerics and instability

01:14:28 that you can just find corner cases.

01:14:30 So I think that’s really the underlying topic many times

01:14:35 we see when even at the grand stage of Turing test

01:14:40 generalization, if you start passing the Turing test,

01:14:44 should it be in English or should it be in any language?

01:14:48 As a human, if you ask something in a different language,

01:14:53 you actually will go and do some research

01:14:54 and try to translate it and so on.

01:14:57 Should the Turing test include that?

01:15:01 And it’s really a difficult problem

01:15:02 and very fascinating and very mysterious, actually.

01:15:05 Yeah, absolutely.

01:15:06 But do you think if you were to try to solve it,

01:15:10 can you not grow the size of data intelligently

01:15:14 in such a way that the distribution of your training

01:15:17 set does include the entirety of the testing set?

01:15:20 Is that one path?

01:15:21 The other path is totally a new methodology.

01:15:23 It’s not statistical.

01:15:24 So a path that has worked well, and it worked well

01:15:27 in StarCraft and in machine translation and in languages,

01:15:30 scaling up the data and the model.

01:15:32 And that’s kind of been maybe the only single formula that

01:15:38 still delivers today in deep learning, right?

01:15:40 It’s that data scale and model scale really

01:15:44 do more and more of the things that we thought,

01:15:47 oh, there’s no way it can generalize to these,

01:15:49 or there’s no way it can generalize to that.

01:15:51 But I don’t think fundamentally it will be solved with this.

01:15:54 And for instance, I’m really liking some style or approach

01:15:59 that would not only have neural networks,

01:16:02 but it would have programs or some discrete decision making,

01:16:06 because there is where I feel there’s a bit more.

01:16:10 I mean, the best example, I think, for understanding this

01:16:13 is I also worked a bit on, oh, we

01:16:16 can learn an algorithm with a neural network, right?

01:16:18 So you give it many examples, and it’s

01:16:20 going to sort the input numbers or something like that.

01:16:24 But really strong generalization is you give me some numbers

01:16:29 or you ask me to create an algorithm that sorts numbers.

01:16:32 And instead of creating a neural net, which will be fragile

01:16:34 because it’s going to go out of range at some point,

01:16:37 you’re going to give it numbers that are too large, too small,

01:16:40 and whatnot, if you just create a piece of code that

01:16:45 sorts the numbers, then you can prove

01:16:47 that that will generalize to absolutely all the possible

01:16:50 input you could give.

01:16:51 So I think the problem comes with some exciting prospects.

01:16:55 I mean, scale is a bit more boring, but it really works.

01:16:59 And then maybe programs and discrete abstractions

01:17:02 are a bit less developed.

01:17:04 But clearly, I think they’re quite exciting in terms

01:17:07 of future for the field.

01:17:09 Do you draw any insight wisdom from the 80s and expert

01:17:14 systems and symbolic systems, symbolic computing?

01:17:16 Do you ever go back to those reasoning, that kind of logic?

01:17:20 Do you think that might make a comeback?

01:17:23 You’ll have to dust off those books?

01:17:24 Yeah, I actually love actually adding more inductive biases.

01:17:31 To me, the problem really is, what are you trying to solve?

01:17:34 If what you’re trying to solve is so important that try

01:17:37 to solve it no matter what, then absolutely use rules,

01:17:42 use domain knowledge, and then use

01:17:45 a bit of the magic of machine learning

01:17:46 to empower to make the system as the best system that

01:17:50 will detect cancer or detect weather patterns, right?

01:17:56 Or in terms of StarCraft, it also was a very big challenge.

01:17:59 So I was definitely happy that if we

01:18:01 had to cut a corner here and there,

01:18:04 it could have been interesting to do.

01:18:07 And in fact, in StarCraft, we start

01:18:09 thinking about expert systems because it’s a very,

01:18:11 you know, you can define.

01:18:12 I mean, people actually build StarCraft bots by thinking

01:18:15 about those principles, like state machines and rule based.

01:18:20 And then you could think of combining

01:18:22 a bit of a rule based system, but that has also

01:18:25 neural networks incorporated to make it generalize a bit

01:18:28 better.

01:18:29 So absolutely, I mean, we should definitely

01:18:31 go back to those ideas.

01:18:32 And anything that makes the problem simpler,

01:18:35 as long as your problem is important, that’s OK.

01:18:37 And that’s research driving a very important problem.

01:18:41 And on the other hand, if you want to really focus

01:18:44 on the limits of reinforcement learning,

01:18:46 then of course, you must try not to look at imitation data

01:18:50 or to look for some rules of the domain that would help a lot

01:18:55 or even feature engineering, right?

01:18:56 So this is a tension that depending on what you do,

01:19:00 I think both ways are definitely fine.

01:19:03 And I would never not do one or the other

01:19:06 as long as what you’re doing is important

01:19:08 and needs to be solved, right?

01:19:10 Right, so there’s a bunch of different ideas

01:19:13 that you developed that I really enjoy.

01:19:16 But one is translating from image captioning,

01:19:22 translating from image to text, just another beautiful idea,

01:19:27 I think, that resonates throughout your work, actually.

01:19:33 So the underlying nature of reality

01:19:35 being language always, somehow.

01:19:38 So what’s the connection between images and text,

01:19:42 or rather the visual world and the world

01:19:44 of language in your view?

01:19:46 Right, so I think a piece of research that’s been central

01:19:51 to, I would say, even extending into StarGraph

01:19:54 is this idea of sequence to sequence learning,

01:19:57 which what we really meant by that

01:19:59 is that you can now really input anything

01:20:03 to a neural network as the input x.

01:20:06 And then the neural network will learn a function f

01:20:09 that will take x as an input and produce any output y.

01:20:12 And these x and y’s don’t need to be static or features,

01:20:19 like fixed vectors or anything like that.

01:20:22 It could be really sequences and now beyond data structures.

01:20:26 So that paradigm was tested in a very interesting way

01:20:31 when we moved from translating French to English

01:20:35 to translating an image to its caption.

01:20:37 But the beauty of it is that, really,

01:20:40 and that’s actually how it happened.

01:20:43 I changed a line of code in this thing that

01:20:45 was doing machine translation.

01:20:47 And I came the next day, and I saw

01:20:50 how it was producing captions that seemed like, oh my god,

01:20:54 this is really, really working.

01:20:55 And the principle is the same.

01:20:57 So I think I don’t see text, vision, speech, waveforms

01:21:04 as something different as long as you basically

01:21:09 learn a function that will vectorize these into.

01:21:14 And then after we vectorize it, we

01:21:16 can then use transformers, LSTMs, whatever

01:21:20 the flavor of the month of the model is.

01:21:22 And then as long as we have enough supervised data,

01:21:25 really, this formula will work and will keep working,

01:21:30 I believe, to some extent.

01:21:31 Modulo these generalization issues that I mentioned before.

01:21:35 But the task there is to vectorize,

01:21:36 so to form a representation that’s meaningful.

01:21:39 And your intuition now, having worked with all this media,

01:21:42 is that once you are able to form that representation,

01:21:46 you could basically take any things, any sequence.

01:21:51 Going back to StarCraft, is there

01:21:52 limits on the length so that we didn’t really

01:21:56 touch on the long term aspect?

01:21:59 How did you overcome the whole really long term

01:22:02 aspect of things here?

01:22:03 Is there some tricks?

01:22:05 So the main trick, so StarCraft, if you

01:22:08 look at absolutely every frame, you

01:22:10 might think it’s quite a long game.

01:22:12 So we would have to multiply 22 times 60 seconds per minute

01:22:18 times maybe at least 10 minutes per game on average.

01:22:21 So there are quite a few frames.

01:22:25 But the trick really was to only observe, in fact,

01:22:30 which might be seen as a limitation,

01:22:32 but it is also a computational advantage.

01:22:35 Only observe when you act.

01:22:37 And then what the neural network decides

01:22:40 is what is the gap going to be until the next action.

01:22:44 And if you look at most StarCraft games

01:22:48 that we have in the data set that Blizzard provided,

01:22:51 it turns out that most games are actually only,

01:22:56 I mean, it is still a long sequence,

01:22:58 but it’s maybe like 1,000 to 1,500 actions,

01:23:02 which if you start looking at LSTMs, large LSTMs,

01:23:07 transformers, it’s not that difficult, especially

01:23:12 if you have supervised learning.

01:23:14 If you had to do it with reinforcement learning,

01:23:16 the credit assignment problem, what

01:23:18 is it in this game that made you win?

01:23:19 That would be really difficult.

01:23:21 But thankfully, because of imitation learning,

01:23:24 we didn’t have to deal with these directly.

01:23:27 Although if we had to, we tried it.

01:23:29 And what happened is you just take all your workers

01:23:31 and attack with them.

01:23:33 And that is kind of obvious in retrospect

01:23:36 because you start trying random actions.

01:23:38 One of the actions will be a worker

01:23:40 that goes to the enemy base.

01:23:41 And because it’s self play, it’s not

01:23:43 going to know how to defend because it basically

01:23:45 doesn’t know almost anything.

01:23:47 And eventually, what you develop is this take all workers

01:23:50 and attack because the credit assignment issue in a rally

01:23:54 is really, really hard.

01:23:55 I do believe we could do better.

01:23:57 And that’s maybe a research challenge for the future.

01:24:01 But yeah, even in StarCraft, the sequences

01:24:04 are maybe 1,000, which I believe is

01:24:07 within the realm of what transformers can do.

01:24:10 Yeah, I guess the difference between StarCraft and Go

01:24:12 is in Go and Chess, stuff starts happening right away.

01:24:18 So there’s not, yeah, it’s pretty easy to self play.

01:24:22 Not easy, but to self play, it’s possible to develop

01:24:24 reasonable strategies quickly as opposed to StarCraft.

01:24:27 I mean, in Go, there’s only 400 actions.

01:24:30 But one action is what people would call the God action.

01:24:34 That would be if you had expanded the whole search

01:24:38 tree, that’s the best action if you did minimax

01:24:40 or whatever algorithm you would do if you

01:24:42 had the computational capacity.

01:24:44 But in StarCraft, 400 is minuscule.

01:24:48 Like in 400, you couldn’t even click

01:24:51 on the pixels around a unit.

01:24:53 So I think the problem there is in terms of action space size

01:24:58 is way harder.

01:25:01 And that search is impossible.

01:25:03 So there’s quite a few challenges indeed

01:25:06 that make this kind of a step up in terms of machine learning.

01:25:10 For humans, maybe playing StarCraft

01:25:13 seems more intuitive because it looks real.

01:25:16 I mean, the graphics and everything moves smoothly,

01:25:18 whereas I don’t know how to.

01:25:20 I mean, Go is a game that I would really need to study.

01:25:22 It feels quite complicated.

01:25:23 But for machines, kind of maybe it’s the reverse, yes.

01:25:27 Which shows you the gap actually between deep learning

01:25:30 and however the heck our brains work.

01:25:34 So you developed a lot of really interesting ideas.

01:25:36 It’s interesting to just ask, what’s

01:25:38 your process of developing new ideas?

01:25:41 Do you like brainstorming with others?

01:25:42 Do you like thinking alone?

01:25:44 Do you like, what was it, Ian Goodfellow said

01:25:49 he came up with GANs after a few beers.

01:25:52 He thinks beers are essential for coming up with new ideas.

01:25:55 We had beers to decide to play another game of StarCraft

01:25:59 after a week.

01:25:59 So it’s really similar to that story.

01:26:02 Actually, I explained this in a DeepMind retreat.

01:26:05 And I said, this is the same as the GAN story.

01:26:08 I mean, we were in a bar.

01:26:09 And we decided, let’s play a GAN next week.

01:26:10 And that’s what happened.

01:26:11 I feel like we’re giving the wrong message

01:26:13 to young undergrads.

01:26:15 Yeah, I know.

01:26:15 But in general, do you like brainstorming?

01:26:18 Do you like thinking alone, working stuff out?

01:26:20 So I think throughout the years, also, things changed.

01:26:23 So initially, I was very fortunate to be

01:26:29 with great minds like Jeff Hinton, Jeff Dean,

01:26:33 Ilya Sutskever.

01:26:34 I was really fortunate to join Brain at a very good time.

01:26:37 So at that point, ideas, I was just

01:26:41 brainstorming with my colleagues and learned a lot.

01:26:44 And keep learning is actually something

01:26:46 you should never stop doing.

01:26:48 So learning implies reading papers and also

01:26:51 discussing ideas with others.

01:26:53 It’s very hard at some point to not communicate

01:26:56 that being reading a paper from someone

01:26:59 or actually discussing.

01:27:00 So definitely, that communication aspect

01:27:04 needs to be there, whether it’s written or oral.

01:27:08 Nowadays, I’m also trying to be a bit more strategic

01:27:12 about what research to do.

01:27:15 So I was describing a little bit this tension

01:27:18 between research for the sake of research,

01:27:21 and then you have, on the other hand,

01:27:23 applications that can drive the research.

01:27:25 And honestly, the formula that has worked best for me

01:27:28 is just find a hard problem and then

01:27:32 try to see how research fits into it,

01:27:34 how it doesn’t fit into it, and then you must innovate.

01:27:37 So I think machine translation drove sequence to sequence.

01:27:43 Then maybe learning algorithms that had to,

01:27:47 combinatorial algorithms led to pointer networks.

01:27:50 StarCraft led to really scaling up imitation learning

01:27:53 and the AlphaStarLeague.

01:27:55 So that’s been a formula that I personally like.

01:27:58 But the other one is also valid.

01:28:00 And I’ve seen it succeed a lot of the times

01:28:02 where you just want to investigate model based

01:28:05 RL as a research topic.

01:28:08 And then you must then start to think, well,

01:28:11 how are the tests?

01:28:12 How are you going to test these ideas?

01:28:14 You need a minimal environment to try things.

01:28:17 You need to read a lot of papers and so on.

01:28:19 And that’s also very fun to do and something

01:28:21 I’ve also done quite a few times,

01:28:24 both at Brain, at DeepMind, and obviously as a PhD.

01:28:28 So I think besides the ideas and discussions,

01:28:32 I think it’s important also because you start

01:28:35 sort of guiding not only your own goals,

01:28:40 but other people’s goals to the next breakthrough.

01:28:44 So you must really kind of understand this feasibility

01:28:48 also, as we were discussing before,

01:28:50 whether this domain is ready to be tackled or not.

01:28:53 And you don’t want to be too early.

01:28:55 You obviously don’t want to be too late.

01:28:57 So it’s really interesting, this strategic component

01:29:00 of research, which I think as a grad student,

01:29:03 I just had no idea.

01:29:05 I just read papers and discussed ideas.

01:29:07 And I think this has been maybe the major change.

01:29:09 And I recommend people kind of feed forward

01:29:13 to success how it looks like and try to backtrack,

01:29:16 other than just kind of looking, oh, this looks cool.

01:29:18 This looks cool.

01:29:19 And then you do a bit of random work,

01:29:21 which sometimes you stumble upon some interesting things.

01:29:23 But in general, it’s also good to plan a bit.

01:29:27 Yeah, I like it.

01:29:28 Especially like your approach of taking a really hard problem,

01:29:31 stepping right in, and then being

01:29:33 super skeptical about being able to solve the problem.

01:29:37 I mean, there’s a balance of both, right?

01:29:40 There’s a silly optimism and a critical sort of skepticism

01:29:46 that’s good to balance, which is why

01:29:49 it’s good to have a team of people that balance that.

01:29:52 You don’t do that on your own.

01:29:53 You have both mentors that have seen,

01:29:56 or you obviously want to chat and discuss

01:29:59 whether it’s the right time.

01:30:00 I mean, Demis came in 2014.

01:30:03 And he said, maybe in a bit we’ll do StarCraft.

01:30:06 And maybe he knew.

01:30:08 And I’m just following his lead, which is great,

01:30:11 because he’s brilliant, right?

01:30:12 So these things are obviously quite important,

01:30:17 that you want to be surrounded by people who are diverse.

01:30:22 They have their knowledge.

01:30:23 There’s also important to, I mean,

01:30:26 I’ve learned a lot from people who actually have an idea

01:30:30 that I might not think it’s good.

01:30:32 But if I give them the space to try it,

01:30:34 I’ve been proven wrong many, many times as well.

01:30:37 So that’s great.

01:30:38 I think your colleagues are more important than yourself,

01:30:42 I think.

01:30:43 Sure.

01:30:44 Now let’s real quick talk about another impossible problem,

01:30:48 AGI.

01:30:49 Right.

01:30:50 What do you think it takes to build a system that’s

01:30:52 human level intelligence?

01:30:54 We talked a little bit about the Turing test, StarCraft.

01:30:56 All of these have echoes of general intelligence.

01:30:58 But if you think about just something

01:31:01 that you would sit back and say, wow,

01:31:03 this is really something that resembles

01:31:06 human level intelligence.

01:31:07 What do you think it takes to build that?

01:31:09 So I find that AGI oftentimes is maybe not very well defined.

01:31:17 So what I’m trying to then come up with for myself

01:31:20 is what would be a result look like that you would start

01:31:25 to believe that you would have agents or neural nets that

01:31:28 no longer overfeed to a single task,

01:31:31 but actually learn the skill of learning, so to speak.

01:31:37 And that actually is a field that I

01:31:40 am fascinated by, which is the learning to learn,

01:31:43 or meta learning, which is about no longer learning

01:31:47 about a single domain.

01:31:48 So you can think about the learning algorithm

01:31:51 itself is general.

01:31:52 So the same formula we applied for AlphaStar or StarCraft,

01:31:56 we can now apply to almost any video game,

01:31:59 or you could apply to many other problems and domains.

01:32:03 But the algorithm is what’s generalizing.

01:32:07 But the neural network, those weights

01:32:09 are useless even to play another race.

01:32:12 I train a network to play very well at Protos versus Protos.

01:32:15 I need to throw away those weights.

01:32:17 If I want to play now Terran versus Terran,

01:32:20 I would need to retrain a network from scratch

01:32:23 with the same algorithm.

01:32:24 That’s beautiful.

01:32:26 But the network itself will not be useful.

01:32:28 So I think if I see an approach that

01:32:32 can absorb or start solving new problems without the need

01:32:38 to kind of restart the process, I

01:32:40 think that, to me, would be a nice way

01:32:42 to define some form of AGI.

01:32:45 Again, I don’t know the grandiose like age.

01:32:48 I mean, should Turing tests be solved before AGI?

01:32:50 I mean, I don’t know.

01:32:51 I think concretely, I would like to see clearly

01:32:54 that meta learning happen, meaning

01:32:57 that there is an architecture or a network that

01:33:01 as it sees new problem or new data, it solves it.

01:33:04 And to make it kind of a benchmark,

01:33:08 it should solve it at the same speed

01:33:09 that we do solve new problems.

01:33:11 When I define you a new object and you

01:33:13 have to recognize it, when you start playing a new game,

01:33:16 you played all the Atari games.

01:33:17 But now you play a new Atari game.

01:33:19 Well, you’re going to be pretty quickly pretty good

01:33:22 at the game.

01:33:22 So that’s perhaps what’s the domain

01:33:25 and what’s the exact benchmark is a bit difficult.

01:33:28 I think as a community, we might need

01:33:29 to do some work to define it.

01:33:32 But I think this first step, I could

01:33:34 see it happen relatively soon.

01:33:36 But then the whole what AGI means and so on,

01:33:40 I am a bit more confused about what

01:33:43 I think people mean different things.

01:33:44 There’s an emotional, psychological level

01:33:48 that like even the Turing test, passing the Turing test

01:33:53 is something that we just pass judgment on as human beings

01:33:55 what it means to be as a dog in AGI system.

01:34:03 Yeah.

01:34:04 What level, what does it mean, what does it mean?

01:34:07 But I like the generalization.

01:34:08 And maybe as a community, we converge

01:34:10 towards a group of domains that are sufficiently far away.

01:34:14 That would be really damn impressive

01:34:16 if it was able to generalize.

01:34:18 So perhaps not as close as Protoss and Zerg,

01:34:21 but like Wikipedia.

01:34:22 That would be a step.

01:34:23 Yeah, that would be a good step and then a really good step.

01:34:26 But then like from StarCraft to Wikipedia and back.

01:34:30 Yeah, that kind of thing.

01:34:31 And that feels also quite hard and far.

01:34:34 But I think as long as you put the benchmark out,

01:34:38 as we discovered, for instance, with ImageNet,

01:34:41 then tremendous progress can be had.

01:34:43 So I think maybe there’s a lack of benchmark,

01:34:46 but I’m sure we’ll find one and the community will then

01:34:49 work towards that.

01:34:52 And then beyond what AGI might mean or would imply,

01:34:56 I really am hopeful to see basically machine learning

01:35:01 or AI just scaling up and helping people

01:35:05 that might not have the resources to hire an assistant

01:35:08 or that they might not even know what the weather is like.

01:35:13 So I think in terms of the positive impact of AI,

01:35:18 I think that’s maybe what we should also not lose focus.

01:35:22 The research community building AGI,

01:35:23 I mean, that’s a real nice goal.

01:35:25 But I think the way that DeepMind puts it is,

01:35:28 and then use it to solve everything else.

01:35:30 So I think we should paralyze.

01:35:33 Yeah, we shouldn’t forget about all the positive things

01:35:36 that are actually coming out of AI already

01:35:38 and are going to be coming out.

01:35:40 Right.

01:35:41 But on that note, let me ask relative

01:35:45 to popular perception, do you have

01:35:47 any worry about the existential threat

01:35:49 of artificial intelligence in the near or far future

01:35:53 that some people have?

01:35:55 I think in the near future, I’m skeptical.

01:35:58 So I hope I’m not wrong.

01:35:59 But I’m not concerned, but I appreciate efforts,

01:36:04 ongoing efforts, and even like whole research

01:36:07 field on AI safety emerging and in conferences and so on.

01:36:10 I think that’s great.

01:36:12 In the long term, I really hope we just

01:36:16 can simply have the benefits outweigh

01:36:19 the potential dangers.

01:36:20 I am hopeful for that.

01:36:23 But also, we must remain vigilant to monitor and assess

01:36:27 whether the tradeoffs are there and we have enough also lead

01:36:32 time to prevent or to redirect our efforts if need be.

01:36:37 But I’m quite optimistic about the technology

01:36:41 and definitely more fearful of other threats

01:36:45 in terms of planetary level at this point.

01:36:48 But obviously, that’s the one I have more power on.

01:36:52 So clearly, I do start thinking more and more about this.

01:36:56 And it’s grown in me actually to start reading more

01:37:00 about AI safety, which is a field that so far I have not

01:37:04 really contributed to.

01:37:05 But maybe there’s something to be done there as well.

01:37:07 I think it’s really important.

01:37:09 I talk about this with a few folks.

01:37:11 But it’s important to ask you and shove it in your head

01:37:14 because you’re at the leading edge of actually what

01:37:18 people are excited about in AI.

01:37:19 The work with AlphaStar, it’s arguably

01:37:22 at the very cutting edge of the kind of thing

01:37:25 that people are afraid of.

01:37:27 And so you speaking to that fact and that we’re actually

01:37:31 quite far away to the kind of thing

01:37:33 that people might be afraid of.

01:37:35 But it’s still worthwhile to think about.

01:37:38 And it’s also good that you’re not as worried

01:37:43 and you’re also open to thinking about it.

01:37:45 There’s two aspects.

01:37:46 I mean, me not being worried.

01:37:47 But obviously, we should prepare for things

01:37:53 that could go wrong, misuse of the technologies

01:37:56 as with any technologies.

01:37:58 So I think there’s always trade offs.

01:38:02 And as a society, we’ve kind of solved this to some extent

01:38:06 in the past.

01:38:07 So I’m hoping that by having the researchers

01:38:10 and the whole community brainstorm and come up

01:38:14 with interesting solutions to the new things that

01:38:16 will happen in the future, that we can still also push

01:38:20 the research to the avenue that I think

01:38:23 is kind of the greatest avenue, which is

01:38:25 to understand intelligence.

01:38:27 How are we doing what we’re doing?

01:38:29 And obviously, from a scientific standpoint,

01:38:32 that is kind of my personal drive of all the time

01:38:37 that I spend doing what I’m doing, really.

01:38:40 Where do you see the deep learning as a field heading?

01:38:42 Where do you think the next big breakthrough might be?

01:38:46 So I think deep learning, I discussed a little of this

01:38:49 before.

01:38:50 Deep learning has to be combined with some form

01:38:54 of discretization, program synthesis.

01:38:56 I think that’s kind of as a research in itself

01:38:59 is an interesting topic to expand and start

01:39:02 doing more research.

01:39:04 And then as kind of what will deep learning

01:39:07 enable to do in the future?

01:39:08 I don’t think that’s going to be what’s going to happen this year.

01:39:11 But also this idea of starting not to throw away all the weights,

01:39:16 that this idea of learning to learn

01:39:18 and really having these agents not having

01:39:23 to restart their weights.

01:39:24 And you can have an agent that is kind of solving or classifying

01:39:29 images on ImageNet, but also generating speech

01:39:32 if you ask it to generate some speech.

01:39:34 And it should really be kind of almost the same network,

01:39:39 but it might not be a neural network.

01:39:41 It might be a neural network with an optimization

01:39:44 algorithm attached to it.

01:39:45 But I think this idea of generalization to new task

01:39:49 is something that we first must define good benchmarks.

01:39:52 But then I think that’s going to be exciting.

01:39:54 And I’m not sure how close we are.

01:39:56 But I think if you have a very limited domain,

01:40:00 I think we can start doing some progress.

01:40:02 And much like how we did a lot of programs in computer vision,

01:40:07 we should start thinking.

01:40:09 I really like a talk that Leon Buto gave at ICML

01:40:12 a few years ago, which is this train test paradigm should

01:40:16 be broken.

01:40:17 We should stop thinking about a training set and a test set.

01:40:23 And these are closed things that are untouchable.

01:40:26 I think we should go beyond these.

01:40:28 And in meta learning, we call these the meta training

01:40:30 set and the meta test set, which is really thinking about,

01:40:35 if I know about ImageNet, why would that network not

01:40:39 work on MNIST, which is a much simpler problem?

01:40:41 But right now, it really doesn’t.

01:40:44 But it just feels wrong.

01:40:46 So I think that’s kind of the, on the application

01:40:50 or the benchmark sites, we probably

01:40:52 will see quite a few more interest and progress

01:40:56 and hopefully people defining new and exciting challenges

01:41:00 really.

01:41:00 Do you have any hope or interest in knowledge graphs

01:41:04 within this context?

01:41:05 So this kind of constructing graph.

01:41:08 So going back to graphs.

01:41:10 Well, neural networks and graphs.

01:41:12 But I mean, a different kind of knowledge graph,

01:41:14 sort of like semantic graphs or those concepts.

01:41:18 Yeah.

01:41:18 So I think the idea of graphs is,

01:41:23 so I’ve been quite interested in sequences first and then

01:41:26 more interesting or different data structures like graphs.

01:41:29 And I’ve studied graph neural networks in the last three

01:41:33 years or so.

01:41:34 I found these models just very interesting

01:41:37 from deep learning sites standpoint.

01:41:42 But then why do we want these models

01:41:45 and why would we use them?

01:41:47 What’s the application?

01:41:48 What’s kind of the killer application of graphs?

01:41:51 And perhaps if we could extract a knowledge graph

01:41:58 from Wikipedia automatically, that

01:42:01 would be interesting because then these graphs have

01:42:04 this very interesting structure that also is a bit more

01:42:07 compatible with this idea of programs and deep learning

01:42:11 kind of working together, jumping neighborhoods

01:42:14 and so on.

01:42:14 You could imagine defining some primitives

01:42:17 to go around graphs, right?

01:42:18 So I think I really like the idea of a knowledge graph.

01:42:23 And in fact, when we started or as part of the research

01:42:29 we did for StarCraft, I thought, wouldn’t it

01:42:31 be cool to give the graph of all these buildings that

01:42:38 depend on each other and units that have prerequisites

01:42:41 of being built by that.

01:42:42 And so this is information that the network

01:42:45 can learn and extract.

01:42:46 But it would have been great to see

01:42:50 or to think of really StarCraft as a giant graph that even

01:42:53 also as the game evolves, you start taking branches

01:42:57 and so on.

01:42:57 And we did a bit of research on these,

01:42:59 nothing too relevant, but I really like the idea.

01:43:04 And it has elements that are something

01:43:06 you also worked with in terms of visualizing your networks.

01:43:08 It has elements of having human interpretable,

01:43:13 being able to generate knowledge representations that

01:43:15 are human interpretable that maybe human experts can then

01:43:18 tweak or at least understand.

01:43:20 So there’s a lot of interesting aspect there.

01:43:22 And for me personally, I’m just a huge fan of Wikipedia.

01:43:25 And it’s a shame that our neural networks aren’t

01:43:29 taking advantage of all the structured knowledge that’s

01:43:31 on the web.

01:43:32 What’s next for you?

01:43:34 What’s next for DeepMind?

01:43:36 What are you excited about for AlphaStar?

01:43:39 Yeah, so I think the obvious next steps

01:43:43 would be to apply AlphaStar to other races.

01:43:48 I mean, that sort of shows that the algorithm works

01:43:51 because we wouldn’t want to have created by mistake something

01:43:56 in the architecture that happens to work for Protoss

01:43:58 but not for other races.

01:44:00 So as verification, I think that’s an obvious next step

01:44:03 that we are working on.

01:44:05 And then I would like to see so agents and players can

01:44:11 specialize on different skill sets that

01:44:13 allow them to be very good.

01:44:15 I think we’ve seen AlphaStar understanding very well

01:44:19 when to take battles and when to not to do that.

01:44:22 Also very good at micromanagement

01:44:24 and moving the units around and so on.

01:44:27 And also very good at producing nonstop and trading off

01:44:30 economy with building units.

01:44:33 But I have not perhaps seen as much

01:44:36 as I would like this idea of the poker idea

01:44:39 that you mentioned, right?

01:44:40 I’m not sure StarCraft or AlphaStar

01:44:42 rather has developed a very deep understanding of what

01:44:47 the opponent is doing and reacting to that

01:44:50 and sort of trying to trick the player to do something else

01:44:54 or that.

01:44:55 So this kind of reasoning, I would like to see more.

01:44:58 So I think purely from a research standpoint,

01:45:01 there’s perhaps also quite a few things

01:45:03 to be done there in the domain of StarCraft.

01:45:06 Yeah, in the domain of games, I’ve

01:45:08 seen some interesting work in even auctions,

01:45:11 manipulating other players, sort of forming a belief state

01:45:15 and just messing with people.

01:45:17 Yeah, it’s called theory of mind, I guess.

01:45:18 Theory of mind, yeah.

01:45:20 So it’s a fascinating.

01:45:21 Theory of mind on StarCraft is kind of they’re

01:45:24 really made for each other.

01:45:26 So that would be very exciting to see those techniques apply

01:45:29 to StarCraft or perhaps StarCraft

01:45:32 driving new techniques, right?

01:45:33 As I said, this is always the tension between the two.

01:45:36 Well, Orel, thank you so much for talking today.

01:45:38 Awesome.

01:45:39 It was great to be here.

01:45:40 Thanks.