Transcript
00:00:00 The following is a conversation with Ariel Vinales.
00:00:03 He’s a senior research scientist at Google DeepMind,
00:00:05 and before that, he was at Google Brain and Berkeley.
00:00:09 His research has been cited over 39,000 times.
00:00:13 He’s truly one of the most brilliant and impactful minds
00:00:16 in the field of deep learning.
00:00:18 He’s behind some of the biggest papers and ideas in AI,
00:00:20 including sequence to sequence learning,
00:00:23 audio generation, image captioning,
00:00:25 neural machine translation,
00:00:27 and, of course, reinforcement learning.
00:00:29 He’s a lead researcher of the AlphaStar project,
00:00:32 creating an agent that defeated a top professional
00:00:35 at the game of StarCraft.
00:00:38 This conversation is part
00:00:39 of the Artificial Intelligence podcast.
00:00:41 If you enjoy it, subscribe on YouTube, iTunes,
00:00:44 or simply connect with me on Twitter at Lex Friedman,
00:00:48 spelled F R I D.
00:00:51 And now, here’s my conversation with Ariel Vinales.
00:00:55 You spearheaded the DeepMind team behind AlphaStar
00:00:59 that recently beat a top professional player at StarCraft.
00:01:04 So you have an incredible wealth of work
00:01:07 in deep learning and a bunch of fields,
00:01:09 but let’s talk about StarCraft first.
00:01:11 Let’s go back to the very beginning,
00:01:13 even before AlphaStar, before DeepMind,
00:01:16 before deep learning first.
00:01:18 What came first for you,
00:01:21 a love for programming or a love for video games?
00:01:24 I think for me, it definitely came first
00:01:28 the drive to play video games.
00:01:31 I really liked computers.
00:01:35 I didn’t really code much, but what I would do is
00:01:38 I would just mess with the computer, break it and fix it.
00:01:42 That was the level of skills, I guess,
00:01:43 that I gained in my very early days,
00:01:46 I mean, when I was 10 or 11.
00:01:48 And then I really got into video games,
00:01:50 especially StarCraft, actually, the first version.
00:01:53 I spent most of my time
00:01:55 just playing kind of pseudo professionally,
00:01:57 as professionally as you could play back in 98 in Europe,
00:02:01 which was not a very main scene
00:02:03 like what’s called nowadays eSports.
00:02:05 Right, of course, in the 90s.
00:02:07 So how’d you get into StarCraft?
00:02:09 What was your favorite race?
00:02:11 How did you develop your skill?
00:02:15 What was your strategy?
00:02:16 All that kind of thing.
00:02:18 So as a player, I tended to try to play not many games,
00:02:21 not to kind of disclose the strategies
00:02:23 that I kind of developed.
00:02:25 And I like to play random, actually,
00:02:27 not in competitions, but just to…
00:02:30 I think in StarCraft, there’s three main races
00:02:33 and I found it very useful to play with all of them.
00:02:36 And so I would choose random many times,
00:02:38 even sometimes in tournaments,
00:02:40 to gain skill on the three races
00:02:42 because it’s not how you play against someone,
00:02:45 but also if you understand the race because you played,
00:02:48 you also understand what’s annoying,
00:02:51 then when you’re on the other side,
00:02:52 what to do to annoy that person,
00:02:54 to try to gain advantages here and there and so on.
00:02:57 So I actually played random,
00:02:59 although I must say in terms of favorite race,
00:03:02 I really liked Zerg.
00:03:03 I was probably best at Zerg
00:03:05 and that’s probably what I tend to use
00:03:08 towards the end of my career before starting university.
00:03:11 So let’s step back a little bit.
00:03:13 Could you try to describe StarCraft
00:03:15 to people that may never have played video games,
00:03:18 especially the massively online variety like StarCraft?
00:03:22 So StarCraft is a real time strategy game.
00:03:25 And the way to think about StarCraft,
00:03:27 perhaps if you understand a bit chess,
00:03:30 is that there’s a board which is called map
00:03:34 or the map where people play against each other.
00:03:39 There’s obviously many ways you can play,
00:03:40 but the most interesting one is the one versus one setup
00:03:44 where you just play against someone else
00:03:47 or even the built in AI, right?
00:03:49 Blizzard put a system that can play the game
00:03:51 reasonably well if you don’t know how to play.
00:03:54 And then in this board, you have again,
00:03:57 pieces like in chess,
00:03:58 but these pieces are not there initially
00:04:01 like they are in chess.
00:04:02 You actually need to decide to gather resources
00:04:05 to decide which pieces to build.
00:04:07 So in a way you’re starting almost with no pieces.
00:04:10 You start gathering resources in StarCraft.
00:04:13 There’s minerals and gas that you can gather.
00:04:16 And then you must decide how much do you wanna focus
00:04:19 for instance, on gathering more resources
00:04:21 or starting to build units or pieces.
00:04:24 And then once you have enough pieces
00:04:27 or maybe like attack, a good attack composition,
00:04:32 then you go and attack the other side of the map.
00:04:35 And now the other main difference with chess
00:04:37 is that you don’t see the other side of the map.
00:04:39 So you’re not seeing the moves of the enemy.
00:04:43 It’s what we call partially observable.
00:04:45 So as a result, you must not only decide
00:04:48 trading off economy versus building your own units,
00:04:52 but you also must decide whether you wanna scout
00:04:54 to gather information, but also by scouting,
00:04:57 you might be giving away some information
00:04:59 that you might be hiding from the enemy.
00:05:01 So there’s a lot of complex decision making
00:05:04 all in real time.
00:05:06 There’s also unlike chess, this is not a turn based game.
00:05:10 You play basically all the time continuously
00:05:13 and thus some skill in terms of speed
00:05:16 and accuracy of clicking is also very important.
00:05:18 And people that train for this really play this game
00:05:21 at an amazing skill level.
00:05:23 I’ve seen many times these
00:05:25 and if you can witness this life,
00:05:27 it’s really, really impressive.
00:05:29 So in a way, it’s kind of a chess
00:05:31 where you don’t see the other side of the board,
00:05:33 you’re building your own pieces
00:05:35 and you also need to gather resources
00:05:37 to basically get some money to build other buildings,
00:05:40 pieces, technology and so on.
00:05:42 From the perspective of a human player,
00:05:45 the difference between that and chess
00:05:47 or maybe that and a game like turn based strategy
00:05:50 like Heroes of Might and Magic is that there’s an anxiety
00:05:55 because you have to make these decisions really quickly.
00:05:58 And if you are not actually aware of what decisions work,
00:06:04 it’s a very stressful balance.
00:06:06 Everything you describe is actually quite stressful,
00:06:08 difficult to balance for an amateur human player.
00:06:11 I don’t know if it gets easier at the professional level,
00:06:14 like if they’re fully aware of what they have to do,
00:06:16 but at the amateur level, there’s this anxiety.
00:06:19 Oh crap, I’m being attacked.
00:06:20 Oh crap, I have to build up resource.
00:06:22 Oh, I have to probably expand.
00:06:24 And all these, the time,
00:06:26 the real time strategy aspect is really stressful
00:06:29 and computationally I’m sure difficult.
00:06:31 We’ll get into it.
00:06:32 But for me, Battle.net,
00:06:35 so StarCraft was released in 98, 20 years ago,
00:06:42 which is hard to believe.
00:06:44 And Blizzard Battle.net with Diablo in 96 came out.
00:06:50 And to me, it might be a narrow perspective,
00:06:52 but it changed online gaming and perhaps society forever.
00:06:56 Yeah.
00:06:57 But I may have made way too narrow viewpoint,
00:07:00 but from your perspective,
00:07:02 can you talk about the history of gaming
00:07:05 over the past 20 years?
00:07:06 Is this, how transformational,
00:07:09 how important is this line of games?
00:07:12 Right, so I think I kind of was an active gamer
00:07:16 whilst this was developing, the internet, online gaming.
00:07:20 So for me, the way it came was I played other games,
00:07:24 strategy related, I played a bit of Common and Conquer,
00:07:27 and then I played Warcraft II, which is from Blizzard.
00:07:31 But at the time, I didn’t know,
00:07:32 I didn’t understand about what Blizzard was or anything.
00:07:35 Warcraft II was just a game,
00:07:36 which was actually very similar to StarCraft in many ways.
00:07:39 It’s also real time strategy game
00:07:41 where there’s orcs and humans, so there’s only two races.
00:07:44 But it was offline.
00:07:46 And it was offline, right?
00:07:47 So I remember a friend of mine came to school,
00:07:51 say, oh, there’s this new cool game called StarCraft.
00:07:53 And I just said, oh, this sounds like
00:07:54 just a copy of Warcraft II, until I kind of installed it.
00:07:59 And at the time, I am from Spain,
00:08:01 so we didn’t have very good internet, right?
00:08:04 So there was, for us,
00:08:05 StarCraft became first kind of an offline experience
00:08:09 where you kind of start to play these missions, right?
00:08:12 You play against some sort of scripted things
00:08:15 to develop the story of the characters in the game.
00:08:18 And then later on, I start playing against the built in AI,
00:08:23 and I thought it was impossible to defeat it.
00:08:25 Then eventually you defeat one
00:08:27 and you can actually play against seven built in AIs
00:08:29 at the same time, which also felt impossible.
00:08:32 But actually, it’s not that hard to beat
00:08:34 seven built in AIs at once.
00:08:36 So once we achieved that, also we discovered that
00:08:40 we could play, as I said, internet wasn’t that great,
00:08:43 but we could play with the LAN, right?
00:08:45 Like basically against each other
00:08:47 if we were in the same place
00:08:49 because you could just connect machines with like cables,
00:08:51 right?
00:08:53 So we started playing in LAN mode
00:08:55 and as a group of friends,
00:08:58 and it was really, really like much more entertaining
00:09:00 than playing against AIs.
00:09:02 And later on, as internet was starting to develop
00:09:05 and being a bit faster and more reliable,
00:09:07 then it’s when I started experiencing Battle.net,
00:09:09 which is this amazing universe,
00:09:11 not only because of the fact
00:09:13 that you can play the game against anyone in the world,
00:09:16 but you can also get to know more people.
00:09:20 You just get exposed to now like this vast variety of,
00:09:23 it’s kind of a bit when the chats came about, right?
00:09:25 There was a chat system.
00:09:27 You could play against people,
00:09:29 but you could also chat with people,
00:09:30 not only about Stalker, but about anything.
00:09:32 And that became a way of life for kind of two years.
00:09:36 And obviously then it became like kind of,
00:09:38 it exploded in me in that I started to play more seriously,
00:09:42 going to tournaments and so on and so forth.
00:09:44 Do you have a sense on a societal, sociological level,
00:09:49 what’s this whole part of society
00:09:52 that many of us are not aware of
00:09:53 and it’s a huge part of society, which is gamers.
00:09:56 I mean, every time I come across that in YouTube
00:10:00 or streaming sites, I mean,
00:10:03 this is the huge number of people play games religiously.
00:10:07 Do you have a sense of those folks,
00:10:08 especially now that you’ve returned to that realm
00:10:10 a little bit on the AI side?
00:10:12 Yeah, so in fact, even after Stalker,
00:10:15 I actually played World of Warcraft,
00:10:17 which is maybe the main sort of online worlds
00:10:21 or in presence that you get to interact
00:10:23 with lots of people.
00:10:24 So I played that for a little bit.
00:10:26 It was to me, it was a bit less stressful than StarCraft
00:10:29 because winning was kind of a given.
00:10:30 You just put in this world
00:10:32 and you can always complete missions.
00:10:34 But I think it was actually the social aspect
00:10:38 of especially StarCraft first
00:10:40 and then games like World of Warcraft
00:10:43 really shaped me in a very interesting ways
00:10:46 because what you get to experience
00:10:48 is just people you wouldn’t usually interact with, right?
00:10:51 So even nowadays, I still have many Facebook friends
00:10:54 from the area where I played online
00:10:56 and their ways of thinking is even political.
00:11:00 They just, we don’t live in,
00:11:01 like we don’t interact in the real world,
00:11:03 but we were connected by basically fiber.
00:11:06 And that way I actually get to understand a bit better
00:11:10 that we live in a diverse world.
00:11:12 And these were just connections that were made by,
00:11:15 because, you know, I happened to go in a city
00:11:18 in a virtual city as a priest and I met this warrior
00:11:22 and we became friends
00:11:23 and then we start like playing together, right?
00:11:25 So I think it’s transformative
00:11:28 and more and more and more people are more aware of it.
00:11:31 I mean, it’s becoming quite mainstream,
00:11:33 but back in the day, as you were saying in 2000, 2005,
00:11:37 even it was very, still very strange thing to do,
00:11:42 especially in Europe.
00:11:44 I think there were exceptions like Korea, for instance,
00:11:47 it was amazing that everything happened so early
00:11:50 in terms of cybercafes, like if you go to Seoul,
00:11:54 it’s a city that back in the day,
00:11:57 StarCraft was kind of,
00:11:58 you could be a celebrity by playing StarCraft,
00:12:00 but this was like 99, 2000, right?
00:12:03 It’s not like recently.
00:12:04 So yeah, it’s quite interesting to look back
00:12:08 and yeah, I think it’s changing society.
00:12:10 The same way, of course, like technology
00:12:13 and social networks and so on are also transforming things.
00:12:16 And a quick tangent, let me ask,
00:12:18 you’re also one of the most productive people
00:12:20 in your particular chosen passion and path in life.
00:12:26 And yet you’re also appreciate and enjoy video games.
00:12:29 Do you think it’s possible to do,
00:12:32 to enjoy video games in moderation?
00:12:35 Someone told me that you could choose two out of three.
00:12:39 When I was playing video games,
00:12:41 you could choose having a girlfriend,
00:12:43 playing video games or studying.
00:12:46 And I think for the most part, it was relatively true.
00:12:50 These things do take time.
00:12:52 Games like StarCraft,
00:12:53 if you take the game pretty seriously
00:12:55 and you wanna study it,
00:12:56 then you obviously will dedicate more time to it.
00:12:59 And I definitely took gaming
00:13:01 and obviously studying very seriously.
00:13:03 I love learning science and et cetera.
00:13:08 So to me, especially when I started university undergrad,
00:13:13 I kind of step off StarCraft.
00:13:14 I actually fully stopped playing.
00:13:16 And then World of Warcraft was a bit more casual.
00:13:19 You could just connect online.
00:13:20 And I mean, it was fun.
00:13:22 But as I said, that was not as much time investment
00:13:26 as it was for me in StarCraft.
00:13:29 Okay, so let’s get into AlphaStar.
00:13:31 What are the, you’re behind the team.
00:13:35 So DeepMind has been working on StarCraft
00:13:37 and released a bunch of cool open source agents
00:13:39 and so on the past few years.
00:13:41 But AlphaStar really is the moment
00:13:43 where the first time you beat a world class player.
00:13:49 So what are the parameters of the challenge
00:13:51 in the way that AlphaStar took it on
00:13:53 and how did you and David
00:13:55 and the rest of the DeepMind team get into it?
00:13:58 Consider that you can even beat the best in the world
00:14:00 or top players.
00:14:02 I think it all started back in 2015.
00:14:08 Actually, I’m lying.
00:14:08 I think it was 2014 when DeepMind was acquired by Google.
00:14:14 And I at the time was at Google Brain,
00:14:15 which was in California, is still in California.
00:14:18 We had this summit where we got together, the two groups.
00:14:21 So Google Brain and Google DeepMind got together
00:14:24 and we gave a series of talks.
00:14:26 And given that they were doing
00:14:28 deep reinforcement learning for games,
00:14:30 I decided to bring up part of my past,
00:14:33 which I had developed at Berkeley,
00:14:35 like this thing which we call Berkeley OverMind,
00:14:37 which is really just a StarCraft one bot, right?
00:14:40 So I talked about that.
00:14:42 And I remember Demis just came to me and said,
00:14:44 well, maybe not now, it’s perhaps a bit too early,
00:14:47 but you should just come to DeepMind
00:14:48 and do this again with deep reinforcement learning, right?
00:14:53 And at the time it sounded very science fiction
00:14:56 for several reasons.
00:14:58 But then in 2016, when I actually moved to London
00:15:01 and joined DeepMind transferring from Brain,
00:15:04 it became apparent that because of the AlphaGo moment
00:15:08 and kind of Blizzard reaching out to us to say,
00:15:11 wait, like, do you want the next challenge?
00:15:13 And also me being full time at DeepMind,
00:15:15 so sort of kind of all these came together.
00:15:17 And then I went to Irvine in California,
00:15:20 to the Blizzard headquarters to just chat with them
00:15:23 and try to explain how would it all work
00:15:26 before you do anything.
00:15:27 And the approach has always been
00:15:30 about the learning perspective, right?
00:15:33 So in Berkeley, we did a lot of rule based conditioning
00:15:39 and if you have more than three units, then go attack.
00:15:42 And if the other has more units than me,
00:15:44 I retreat and so on and so forth.
00:15:46 And of course, the point of deep reinforcement learning,
00:15:48 deep learning, machine learning in general
00:15:50 is that all these should be learned behavior.
00:15:53 So that kind of was the DNA of the project
00:15:56 since its inception in 2016,
00:15:59 where we just didn’t even have an environment to work with.
00:16:02 And so that’s how it all started really.
00:16:05 So if you go back to that conversation with Demis
00:16:08 or even in your own head, how far away did you,
00:16:12 because we’re talking about Atari games,
00:16:14 we’re talking about Go, which is kind of,
00:16:16 if you’re honest about it, really far away from StarCraft.
00:16:20 In, well, now that you’ve beaten it,
00:16:22 maybe you could say it’s close,
00:16:23 but it’s much, it seems like StarCraft
00:16:25 is way harder than Go philosophically
00:16:29 and mathematically speaking.
00:16:30 So how far away did you think you were?
00:16:34 Do you think it’s 2019 and 18
00:16:36 you could be doing as well as you have?
00:16:37 Yeah, when I kind of thought about,
00:16:40 okay, I’m gonna dedicate a lot of my time
00:16:43 and focus on this.
00:16:44 And obviously I do a lot of different research
00:16:47 in deep learning.
00:16:48 So spending time on it, I mean,
00:16:50 I really had to kind of think
00:16:51 there’s gonna be something good happening out of this.
00:16:55 So really I thought, well, this sounds impossible.
00:16:58 And it probably is impossible to do the full thing,
00:17:01 like the full game where you play one versus one
00:17:06 and it’s only a neural network playing and so on.
00:17:09 So it really felt like,
00:17:10 I just didn’t even think it was possible.
00:17:13 But on the other hand,
00:17:14 I could see some stepping stones towards that goal.
00:17:18 Clearly you could define sub problems in StarCraft
00:17:21 and sort of dissect it a bit and say,
00:17:22 okay, here is a part of the game, here’s another part.
00:17:26 And also obviously the fact,
00:17:29 so this was really also critical to me,
00:17:31 the fact that we could access human replays, right?
00:17:34 So Blizzard was very kind.
00:17:35 And in fact, they open source these for the whole community
00:17:38 where you can just go
00:17:39 and it’s not every single StarCraft game ever played,
00:17:42 but it’s a lot of them you can just go and download.
00:17:45 And every day they will,
00:17:47 you can just query a data set and say,
00:17:48 well, give me all the games that were played today.
00:17:51 And given my kind of experience with language
00:17:55 and sequences and supervised learning,
00:17:57 I thought, well, that’s definitely gonna be very helpful
00:18:00 and something quite unique now,
00:18:02 because ever before we had such a large data set of replays,
00:18:08 of people playing the game at this scale
00:18:10 of such a complex video game, right?
00:18:12 So that to me was a precious resource.
00:18:15 And as soon as I knew that Blizzard
00:18:17 was able to kind of give this to the community,
00:18:20 I started to feel positive
00:18:22 about something non trivial happening.
00:18:24 But I also thought the full thing, like really no rules,
00:18:28 no single line of code that tries to say,
00:18:31 well, I mean, if you see this unit, build a detector,
00:18:33 all these, not having any of these specializations
00:18:36 seemed really, really, really difficult to me.
00:18:38 Intuitively.
00:18:39 I do also like that Blizzard was teasing
00:18:42 or even trolling you,
00:18:45 sort of almost, yeah, pulling you in
00:18:48 into this really difficult challenge.
00:18:50 Do they have any awareness?
00:18:51 What’s the interest from the perspective of Blizzard,
00:18:55 except just curiosity?
00:18:57 Yeah, I think Blizzard has really understood
00:18:59 and really bring forward this competitiveness
00:19:03 of esports in games.
00:19:04 The StarCraft really kind of sparked a lot of,
00:19:07 like something that almost was never seen,
00:19:10 especially as I was saying, back in Korea.
00:19:13 So they just probably thought,
00:19:16 well, this is such a pure one versus one setup
00:19:18 that it would be great to see
00:19:21 if something that can play Atari or Go
00:19:24 and then later on chess could even tackle
00:19:27 these kind of complex real time strategy game, right?
00:19:30 So for them, they wanted to see first,
00:19:33 obviously whether it was possible,
00:19:36 if the game they created was in a way solvable
00:19:39 to some extent.
00:19:40 And I think on the other hand,
00:19:42 they also are a pretty modern company that innovates a lot.
00:19:45 So just starting to understand AI for them
00:19:48 to how to bring AI into games
00:19:50 is not AI for games, but games for AI, right?
00:19:54 I mean, both ways I think can work.
00:19:56 And we obviously at DeepMind use games for AI, right?
00:20:00 To drive AI progress,
00:20:01 but Blizzard might actually be able to do
00:20:03 and many other companies to start to understand
00:20:06 and do the opposite.
00:20:06 So I think that is also something
00:20:08 they can get out of these.
00:20:09 And they definitely, we have brainstormed a lot
00:20:12 about these, right?
00:20:13 But one of the interesting things to me
00:20:15 about StarCraft and Diablo
00:20:17 and these games that Blizzard has created
00:20:19 is the task of balancing classes, for example.
00:20:23 Sort of making the game fair from the starting point
00:20:27 and then let skill determine the outcome.
00:20:30 Is there, I mean, can you first comment,
00:20:33 there’s three races, Zerg, Protoss and Terran.
00:20:36 I don’t know if I’ve ever said that out loud.
00:20:38 Is that how you pronounce it?
00:20:40 Terran?
00:20:40 Yeah, Terran.
00:20:41 Yeah.
00:20:44 Yeah, I don’t think I’ve ever in person interacted
00:20:46 with anybody about StarCraft, that’s funny.
00:20:49 So they seem to be pretty balanced.
00:20:51 I wonder if the AI, the work that you’re doing
00:20:56 with AlphaStar would help balance them even further.
00:20:59 Is that something you think about?
00:21:00 Is that something that Blizzard is thinking about?
00:21:03 Right, so balancing when you add a new unit
00:21:06 or a new spell type is obviously possible
00:21:09 given that you can always train or pre train at scale
00:21:13 some agent that might start using that in unintended ways.
00:21:16 But I think actually, if you understand
00:21:19 how StarCraft has kind of co evolved with players,
00:21:22 in a way, I think it’s actually very cool
00:21:24 the ways that many of the things and strategies
00:21:27 that people came up with, right?
00:21:28 So I think we’ve seen it over and over in StarCraft
00:21:32 that Blizzard comes up with maybe a new unit
00:21:35 and then some players get creative
00:21:37 and do something kind of unintentional
00:21:39 or something that Blizzard designers
00:21:40 that just simply didn’t test or think about.
00:21:43 And then after that becomes kind of mainstream
00:21:46 in the community, Blizzard patches the game
00:21:48 and then they kind of maybe weaken that strategy
00:21:51 or make it actually more interesting
00:21:53 but a bit more balanced.
00:21:55 So these kind of continual talk between players
00:21:57 and Blizzard is kind of what has defined them actually
00:22:01 in actually most games in StarCraft
00:22:04 but also in World of Warcraft, they would do that.
00:22:06 There are several classes and it would be not good
00:22:09 that everyone plays absolutely the same race and so on, right?
00:22:13 So I think they do care about balancing of course
00:22:17 and they do a fair amount of testing
00:22:19 but it’s also beautiful to also see
00:22:22 how players get creative anyways.
00:22:24 And I mean, whether AI can be more creative at this point,
00:22:27 I don’t think so, right?
00:22:28 I mean, it’s just sometimes something so amazing happens.
00:22:31 Like I remember back in the days,
00:22:33 like you have these drop ships that could drop the rivers
00:22:36 and that was actually not thought about
00:22:39 that you could drop this unit
00:22:41 that has this what’s called splash damage
00:22:43 that would basically eliminate
00:22:45 all the enemies workers at once.
00:22:47 No one thought that you could actually put them
00:22:50 in really early game, do that kind of damage
00:22:53 and then things change in the game.
00:22:55 But I don’t know, I think it’s quite an amazing
00:22:58 exploration process from both sides,
00:23:00 players and Blizzard alike.
00:23:01 Well, it’s almost like a reinforcement learning exploration
00:23:05 but the scale of humans that play Blizzard games
00:23:11 is almost on the scale of a large scale
00:23:13 deep mind RL experiment.
00:23:15 I mean, if you look at the numbers,
00:23:17 I mean, you’re talking about, I don’t know how many games
00:23:19 but hundreds of thousands of games probably a month.
00:23:22 Yeah.
00:23:22 I mean, so it’s almost the same as running RL agents.
00:23:28 What aspect of the problem of Starcraft
00:23:31 do you think is the hardest?
00:23:32 Is it the, like you said, the imperfect information?
00:23:35 Is it the fact they have to do longterm planning?
00:23:38 Is it the real time aspects?
00:23:40 We have to do stuff really quickly.
00:23:42 Is it the fact that a large action space
00:23:44 so you can do so many possible things?
00:23:47 Or is it, you know, in the game theoretic sense
00:23:51 there is no Nash equilibrium
00:23:52 or at least you don’t know what the optimal strategy is
00:23:54 because there’s way too many options.
00:23:56 Right.
00:23:57 Is there something that stands out as just like the hardest
00:23:59 the most annoying thing?
00:24:01 So when we sort of looked at the problem
00:24:04 and start to define like the parameters of it, right?
00:24:07 What are the observations?
00:24:08 What are the actions?
00:24:10 It became very apparent that, you know,
00:24:13 the very first barrier that one would hit in Starcraft
00:24:17 would be because of the action space being so large
00:24:20 and as not being able to search like you could in chess
00:24:24 or go even though the search space is vast.
00:24:28 The main problem that we identified
00:24:30 was that of exploration, right?
00:24:32 So without any sort of human knowledge or human prior,
00:24:36 if you think about Starcraft
00:24:38 and you know how deep reinforcement learnings algorithm
00:24:40 work which is essentially by issuing random actions
00:24:45 and hoping that they will get some wins sometimes
00:24:47 so they could learn.
00:24:49 So if you think of the action space in Starcraft
00:24:52 almost anything you can do in the early game is bad
00:24:55 because any action involves taking workers
00:24:58 which are mining minerals for free.
00:25:01 That’s something that the game does automatically
00:25:03 sends them to mine.
00:25:04 And you would immediately just take them out of mining
00:25:07 and send them around.
00:25:09 So just thinking how is it gonna be possible
00:25:13 to get to understand these concepts
00:25:16 but even more like expanding, right?
00:25:19 There’s these buildings you can place
00:25:21 in other locations in the map to gather more resources
00:25:24 but the location of the building is important
00:25:26 and you have to select a worker,
00:25:28 send it walking to that location, build the building,
00:25:32 wait for the building to be built
00:25:34 and then put extra workers there so they start mining.
00:25:37 That feels like impossible if you just randomly click
00:25:41 to produce that state, desirable state
00:25:44 that then you could hope to learn from
00:25:46 because eventually that may yield to an extra win, right?
00:25:49 So for me, the exploration problem
00:25:51 and due to the action space
00:25:53 and the fact that there’s not really turns,
00:25:56 there’s so many turns because the game essentially
00:25:59 takes that 22 times per second.
00:26:02 I mean, that’s how they could discretize sort of time.
00:26:05 Obviously you always have to discretize time
00:26:07 but there’s no such thing as real time
00:26:09 but it’s really a lot of time steps
00:26:12 of things that could go wrong.
00:26:14 And that definitely felt a priori like the hardest.
00:26:17 You mentioned many good ones.
00:26:19 I think partial observability
00:26:21 and the fact that there is no perfect strategy
00:26:23 because of the partial observability.
00:26:25 Those are very interesting problems.
00:26:26 We start seeing more and more now
00:26:28 in terms of as we solve the previous ones
00:26:31 but the core problem to me was exploration
00:26:34 and solving it has been basically kind of the focus
00:26:37 and how we saw the first breakthroughs.
00:26:39 So exploration in a multi hierarchical way.
00:26:43 So like 22 times a second exploration
00:26:46 has a very different meaning than it does
00:26:48 in terms of should I gather resources early
00:26:51 or should I wait or so on.
00:26:53 So how do you solve the longterm?
00:26:56 Let’s talk about the internals of AlphaStar.
00:26:58 So first of all, how do you represent the state
00:27:02 of the game as an input?
00:27:05 How do you then do the longterm sequence modeling?
00:27:08 How do you build a policy?
00:27:10 What’s the architecture like?
00:27:12 So AlphaStar has obviously several components
00:27:16 but everything passes through what we call the policy
00:27:20 which is a neural network.
00:27:22 And that’s kind of the beauty of it.
00:27:24 There is, I could just now give you a neural network
00:27:27 and some weights.
00:27:28 And if you fed the right observations
00:27:30 and you understood the actions the same way we do
00:27:32 you would have basically the agent playing the game.
00:27:35 There’s absolutely nothing else needed
00:27:37 other than those weights that were trained.
00:27:40 Now, the first step is observing the game
00:27:43 and we’ve experimented with a few alternatives.
00:27:46 The one that we currently use mixes both spatial
00:27:50 sort of images that you would process from the game
00:27:53 that is the zoomed out version of the map
00:27:56 and also a zoomed in version of the camera
00:27:58 or the screen as we call it.
00:28:00 But also we give to the agent the list of units
00:28:04 that it sees more of as a set of objects
00:28:09 that it can operate on.
00:28:11 That is not necessarily required to use it.
00:28:14 And we have versions of the game that play well
00:28:16 without this set vision that is a bit not like
00:28:19 how humans perceive the game.
00:28:21 But it certainly helps a lot
00:28:23 because it’s a very natural way to encode the game
00:28:26 is by just looking at all the units that there are.
00:28:29 They have properties like health, position, type of unit
00:28:33 whether it’s my unit or the enemies.
00:28:36 And that sort of is kind of the summary
00:28:40 of the state of the game,
00:28:43 that list of units or set of units
00:28:45 that you see all the time.
00:28:47 But that’s pretty close to the way humans see the game.
00:28:49 Why do you say it’s not, isn’t that,
00:28:51 you’re saying the exactness of it is not similar to humans?
00:28:55 The exactness of it is perhaps not the problem.
00:28:57 I guess maybe the problem if you look at it
00:28:59 from how actually humans play the game
00:29:02 is that they play with a mouse and a keyboard and a screen
00:29:05 and they don’t see sort of a structured object
00:29:08 with all the units.
00:29:09 What they see is what they see on the screen, right?
00:29:12 So.
00:29:13 Remember that there’s a, sorry to interrupt,
00:29:14 there’s a plot that you showed with camera base
00:29:16 where you do exactly that, right?
00:29:18 You move around and that seems to converge
00:29:21 to similar performance.
00:29:22 Yeah, I think that’s what I,
00:29:23 we’re kind of experimenting with what’s necessary or not,
00:29:26 but using the set.
00:29:28 So, actually, if you look at research in computer vision,
00:29:32 where it makes a lot of sense to treat images
00:29:35 as two dimensional arrays,
00:29:38 there’s actually a very nice paper from Facebook.
00:29:40 I think, I forgot who the authors are,
00:29:42 but I think it’s part of Caming’s group.
00:29:46 And what they do is they take an image,
00:29:49 which is this two dimensional signal,
00:29:51 and they actually take pixel by pixel
00:29:54 and scramble the image as if it was just a list of pixels.
00:29:59 Crucially, they encode the position of the pixels
00:30:01 with the X, Y coordinates.
00:30:03 And this is just kind of a new architecture,
00:30:06 which we incidentally also use in StarCraft
00:30:08 called the Transformer,
00:30:09 which is a very popular paper from last year,
00:30:11 which yielded very nice result in machine translation.
00:30:15 And if you actually believe in this kind of,
00:30:18 oh, it’s actually a set of pixels,
00:30:20 as long as you encode X, Y, it’s okay,
00:30:22 then you could argue that the list of units that we see
00:30:26 is precisely that,
00:30:26 because we have each unit as a kind of pixel, if you will,
00:30:31 and then their X, Y coordinates.
00:30:33 So in that perspective, we, without knowing it,
00:30:36 we use the same architecture that was shown
00:30:38 to work very well on Pascal and ImageNet and so on.
00:30:41 So the interesting thing here is putting it in that way
00:30:45 it starts to move it towards
00:30:46 the way you usually work with language.
00:30:49 So what, and especially with your expertise
00:30:52 and work in language,
00:30:55 it seems like there’s echoes of a lot of
00:30:58 the way you would work with natural language
00:31:00 in the way you’ve approached AlphaStar.
00:31:02 Right.
00:31:03 What’s, does that help
00:31:05 with the longterm sequence modeling there somehow?
00:31:08 Exactly, so now that we understand
00:31:10 what an observation for a given time step is,
00:31:13 we need to move on to say,
00:31:14 well, there’s going to be a sequence of such observations
00:31:17 and an agent will need to, given all that it’s seen,
00:31:21 not only the current time step, but all that it’s seen, why?
00:31:24 Because there is partial observability.
00:31:25 We must remember whether we saw a worker going somewhere,
00:31:29 for instance, right?
00:31:30 Because then there might be an expansion
00:31:31 on the top right of the map.
00:31:33 So given that, what you must then think about is
00:31:37 there is the problem of given all the observations,
00:31:40 you have to predict the next action.
00:31:42 And not only given all the observations,
00:31:44 but given all the observations
00:31:45 and given all the actions you’ve taken,
00:31:47 predict the next action.
00:31:49 And that sounds exactly like machine translation where,
00:31:53 and that’s exactly how kind of I saw the problem,
00:31:57 especially when you are given supervised data
00:31:59 or replays from humans,
00:32:01 because the problem is exactly the same.
00:32:03 You’re translating essentially a prefix of observations
00:32:07 and actions onto what’s going to happen next,
00:32:10 which is exactly how you would train a model to translate
00:32:12 or to generate language as well, right?
00:32:14 Do you have a certain prefix?
00:32:16 You must remember everything that comes in the past
00:32:18 because otherwise you might start having noncoherent text.
00:32:22 And the same architectures we’re using LSTMs
00:32:26 and transformers to operate on across time
00:32:29 to kind of integrate all that’s happened in the past.
00:32:33 Those architectures that work so well in translation
00:32:35 or language modeling are exactly the same
00:32:38 than what the agent is using to issue actions in the game.
00:32:42 And the way we train it, moreover, for imitation,
00:32:44 which is step one of AlphaStar is,
00:32:47 take all the human experience and try to imitate it,
00:32:49 much like you try to imitate translators
00:32:52 that translated many pairs of sentences
00:32:55 from French to English say,
00:32:57 that sort of principle applies exactly the same.
00:33:00 It’s almost the same code, except that instead of words,
00:33:04 you have a slightly more complicated objects,
00:33:06 which are the observations and the actions
00:33:08 are also a bit more complicated than a word.
00:33:11 Is there a self play component then too?
00:33:13 So once you run out of imitation?
00:33:16 Right, so indeed you can bootstrap from human replays,
00:33:22 but then the agents you get are actually not as good
00:33:25 as the humans you imitated, right?
00:33:28 So how do we imitate?
00:33:30 Well, we take humans from 3000 MMR and higher.
00:33:34 3000 MMR is just a metric of human skill
00:33:37 and 3000 MMR might be like 50% percentile, right?
00:33:41 So it’s just average human.
00:33:43 What’s that?
00:33:44 So maybe quick pause, MMR is a ranking scale,
00:33:47 the matchmaking rating for players.
00:33:50 So it’s 3000, I remember there’s like a master
00:33:52 and a grand master, what’s 3000?
00:33:54 So 3000 is pretty bad.
00:33:56 I think it’s kind of goals level.
00:33:58 It just sounds really good relative to chess, I think.
00:34:00 Oh yeah, yeah, no, the ratings,
00:34:02 the best in the world are at 7,000 MMR.
00:34:05 So 3000, it’s a bit like Elo indeed, right?
00:34:07 So 3,500 just allows us to not filter a lot of the data.
00:34:13 So we like to have a lot of data in deep learning
00:34:15 as you probably know.
00:34:17 So we take these kind of 3,500 and above,
00:34:20 but then we do a very interesting trick,
00:34:22 which is we tell the neural network
00:34:25 what level they are imitating.
00:34:27 So we say, this replay you’re gonna try to imitate
00:34:30 to predict the next action for all the actions
00:34:33 that you’re gonna see is a 4,000 MMR replay.
00:34:36 This one is a 6,000 MMR replay.
00:34:38 And what’s cool about this is then we take this policy
00:34:42 that is being trained from human,
00:34:44 and then we can ask it to play like a 3000 MMR player
00:34:47 by setting a beat saying, well, okay,
00:34:49 play like a 3000 MMR player
00:34:51 or play like a 6,000 MMR player.
00:34:53 And you actually see how the policy behaves differently.
00:34:57 It gets worse economy if you play like a goal level player,
00:35:01 it does less actions per minute,
00:35:03 which is the number of clicks or number of actions
00:35:05 that you will issue in a whole minute.
00:35:07 And it’s very interesting to see
00:35:09 that it kind of imitates the skill level quite well.
00:35:12 But if we ask it to play like a 6,000 MMR player,
00:35:15 we tested, of course, these policies to see how well they do.
00:35:18 They actually beat all the built in AIs
00:35:20 that Blizzard put in the game,
00:35:22 but they’re nowhere near 6,000 MMR players, right?
00:35:25 They might be maybe around goal level, platinum, perhaps.
00:35:29 So there’s still a lot of work to be done for the policy
00:35:32 to truly understand what it means to win.
00:35:35 So far, we only asked them, okay, here is the screen.
00:35:38 And that’s what’s happened on the game until this point.
00:35:41 What would the next action be if we ask a pro to now say,
00:35:46 oh, you’re gonna click here or here or there.
00:35:49 And the point is experiencing wins and losses
00:35:53 is very important to then start to refine.
00:35:56 Otherwise the policy can get loose,
00:35:58 can just go off policy as we call it.
00:36:00 That’s so interesting that you can at least hope eventually
00:36:03 to be able to control a policy
00:36:06 approximately to be at some MMR level.
00:36:10 That’s so interesting, especially given that you have
00:36:12 ground truth for a lot of these cases.
00:36:15 Can I ask you a personal question?
00:36:17 What’s your MMR?
00:36:19 Well, I haven’t played StarCraft II, so I am unranked,
00:36:23 which is the kind of lowest league.
00:36:26 So I used to play StarCraft, the first one.
00:36:29 But you haven’t seriously played StarCraft II.
00:36:32 So the best player we have at DeepMind is about 5,000 MMR,
00:36:37 which is high masters.
00:36:39 It’s not at grand master level.
00:36:42 Grand master level will be the top 200 players
00:36:44 in a certain region like Europe or America or Asia.
00:36:49 But for me, it would be hard to say.
00:36:51 I am very bad at the game.
00:36:53 I actually played AlphaStar a bit too late and it beat me.
00:36:56 I remember the whole team was, oh, Oreo, you should play.
00:36:59 And I was, oh, it looks like it’s not so good yet.
00:37:02 And then I remember I kind of got busy
00:37:04 and waited an extra week and I played
00:37:07 and it really beat me very badly.
00:37:09 Was that, I mean, how did that feel?
00:37:11 Isn’t that an amazing feeling?
00:37:12 That’s amazing, yeah.
00:37:13 I mean, obviously I tried my best
00:37:16 and I tried to also impress my,
00:37:18 because I actually played the first game.
00:37:19 So I’m still pretty good at micromanagement.
00:37:23 The problem is I just don’t understand StarCraft II.
00:37:25 I understand StarCraft.
00:37:27 And when I played StarCraft,
00:37:28 I probably was consistently like for a couple of years,
00:37:32 top 32 in Europe.
00:37:34 So I was decent, but at the time we didn’t have
00:37:37 this kind of MMR system as well established.
00:37:40 So it would be hard to know what it was back then.
00:37:43 So what’s the difference in interface
00:37:44 between AlphaStar and StarCraft
00:37:47 and a human player in StarCraft?
00:37:49 Is there any significant differences
00:37:52 between the way they both see the game?
00:37:54 I would say the way they see the game,
00:37:56 there’s a few things that are just very hard to simulate.
00:38:01 The main one perhaps, which is obvious in hindsight
00:38:05 is what’s called cloaked units, which are invisible units.
00:38:10 So in StarCraft, you can make some units
00:38:13 that you need to have a particular kind of unit
00:38:16 to detect it.
00:38:18 So these units are invisible.
00:38:20 If you cannot detect them, you cannot target them.
00:38:22 So they would just destroy your buildings
00:38:25 or kill your workers.
00:38:27 But despite the fact you cannot target the unit,
00:38:31 there’s a shimmer that as a human you observe.
00:38:34 I mean, you need to train a little bit,
00:38:35 you need to pay attention,
00:38:37 but you would see this kind of space time distortion
00:38:41 and you would know, okay, there are, yeah.
00:38:44 Yeah, there’s like a wave thing.
00:38:46 Yeah, it’s called shimmer.
00:38:47 Space time distortion, I like it.
00:38:49 That’s really like, the Blizzard term is shimmer.
00:38:51 Shimmer, okay.
00:38:52 And so these shimmer professional players
00:38:55 actually can see it immediately.
00:38:57 They understand it very well,
00:38:59 but it’s still something that requires
00:39:01 certain amount of attention
00:39:02 and it’s kind of a bit annoying to deal with.
00:39:05 Whereas for AlphaStar, in terms of vision,
00:39:08 it’s very hard for us to simulate sort of,
00:39:11 oh, are you looking at this pixel in the screen and so on?
00:39:14 So the only thing we can do is,
00:39:17 there is a unit that’s invisible over there.
00:39:19 So AlphaStar would know that immediately.
00:39:22 Obviously still obeys the rules.
00:39:24 You cannot attack the unit.
00:39:25 You must have a detector and so on,
00:39:27 but it’s kind of one of the main things
00:39:29 that it just doesn’t feel there’s a very proper way.
00:39:32 I mean, you could imagine, oh, you don’t have hypers.
00:39:35 Maybe you don’t know exactly where it is,
00:39:37 or sometimes you see it, sometimes you don’t,
00:39:39 but it’s just really, really complicated to get it
00:39:43 so that everyone would agree,
00:39:44 oh, that’s the best way to simulate this, right?
00:39:47 It seems like a perception problem.
00:39:49 It is a perception problem.
00:39:50 So the only problem is people, you ask,
00:39:54 oh, what’s the difference between
00:39:55 how humans perceive the game?
00:39:56 I would say they wouldn’t be able to tell a shimmer
00:39:59 immediately as it appears on the screen,
00:40:02 whereas AlphaStar in principle sees it very sharply, right?
00:40:05 It sees that the bit turned from zero to one,
00:40:08 meaning there’s now a unit there,
00:40:10 although you don’t know the unit,
00:40:11 or you know that you cannot attack it and so on.
00:40:15 So that from a vision standpoint,
00:40:18 that probably is the one that is kind of the most obvious one.
00:40:22 Then there are things humans cannot do perfectly,
00:40:25 even professionals, which is they might miss a detail,
00:40:28 or they might have not seen a unit.
00:40:30 And obviously as a computer,
00:40:32 if there’s a corner of the screen that turns green
00:40:35 because a unit enters the field of view,
00:40:37 that can go into the memory of the agent, the LSTM,
00:40:41 and persist there for a while,
00:40:42 and for however long is relevant, right?
00:40:45 And in terms of action,
00:40:47 it seems like the rate of action from AlphaStar
00:40:50 is comparative, if not slower than professional players,
00:40:54 but it’s more precise is what I read.
00:40:57 So that’s really probably the one that is causing us
00:41:01 more issues for a couple of reasons, right?
00:41:05 The first one is StarCraft has been an AI environment
00:41:08 for quite a few years.
00:41:09 In fact, I mean, I was participating
00:41:12 in the very first competition back in 2010.
00:41:15 And there’s really not been a kind of a very clear set
00:41:19 of rules how the actions per minute,
00:41:22 the rate of actions that you can issue is.
00:41:24 And as a result, these agents or bots that people build
00:41:29 in a kind of almost very cool way,
00:41:31 they do like 20,000, 40,000 actions per minute.
00:41:35 Now, to put this in perspective,
00:41:37 a very good professional human
00:41:39 might do 300 to 800 actions per minute.
00:41:44 They might not be as precise.
00:41:45 That’s why the range is a bit tricky to identify exactly.
00:41:49 I mean, 300 actions per minute precisely
00:41:51 is probably realistic.
00:41:53 800 is probably not, but you see humans doing a lot of actions
00:41:56 because they warm up and they kind of select things
00:41:59 and spam and so on just so that when they need,
00:42:02 they have the accuracy.
00:42:04 So we came into this by not having kind of a standard way
00:42:09 to say, well, how do we measure whether an agent is
00:42:13 at human level or not?
00:42:15 On the other hand, we had a huge advantage,
00:42:18 which is because we do imitation learning,
00:42:21 agents turned out to act like humans
00:42:24 in terms of rate of actions, even
00:42:26 precisions and imprecisions of actions
00:42:28 in the supervised policy.
00:42:30 You could see all these.
00:42:31 You could see how agents like to spam click, to move here.
00:42:34 If you played especially Diablo, you wouldn’t know what I mean.
00:42:37 I mean, you just like spam, oh, move here, move here,
00:42:39 move here.
00:42:40 You’re doing literally like maybe five actions
00:42:43 in two seconds, but these actions are not
00:42:45 very meaningful.
00:42:46 One would have sufficed.
00:42:48 So on the one hand, we start from this imitation policy
00:42:52 that is at the ballpark of the actions per minutes of humans
00:42:55 because it’s actually statistically
00:42:57 trying to imitate humans.
00:42:58 So we see these very nicely in the curves
00:43:01 that we showed in the blog post.
00:43:02 There’s these actions per minute,
00:43:04 and the distribution looks very human like.
00:43:07 But then, of course, as self play kicks in,
00:43:10 and that’s the part we haven’t talked too much yet,
00:43:13 but of course, the agent must play against itself to improve,
00:43:17 then there’s almost no guarantees
00:43:19 that these actions will not become more precise
00:43:22 or even the rate of actions is going to increase over time.
00:43:26 So what we did, and this is probably
00:43:29 the first attempt that we thought was reasonable,
00:43:31 is we looked at the distribution of actions
00:43:33 for humans for certain windows of time.
00:43:36 And just to give a perspective, because I guess I mentioned
00:43:39 that some of these agents that are programmatic,
00:43:41 let’s call them.
00:43:42 They do 40,000 actions per minute.
00:43:44 Professionals, as I said, do 300 to 800.
00:43:47 So what we looked is we look at the distribution
00:43:49 over professional gamers, and we took reasonably high actions
00:43:53 per minute, but we kind of identify certain cutoffs
00:43:57 after which, even if the agent wanted to act,
00:44:00 these actions would be dropped.
00:44:02 But the problem is this cutoff is probably set a bit too high.
00:44:07 And what ends up happening, even though the games,
00:44:10 and when we ask the professionals and the gamers,
00:44:12 by and large, they feel like it’s playing humanlike,
00:44:15 there are some agents that developed maybe slightly
00:44:20 too high APMs, which is actions per minute,
00:44:24 combined with the precision, which
00:44:27 made people start discussing a very interesting issue, which
00:44:30 is, should we have limited these?
00:44:32 Should we just let it lose and see what cool things
00:44:35 it can come up with?
00:44:37 Right?
00:44:37 Interesting.
00:44:38 So this is in itself an extremely interesting
00:44:41 question, but the same way that modeling the shimmer
00:44:44 would be so difficult, modeling absolutely all the details
00:44:47 about muscles and precision and tiredness of humans
00:44:51 would be quite difficult.
00:44:52 So we’re really here kind of innovating
00:44:56 in this sense of, OK, what could be maybe
00:44:58 the next iteration of putting more rules that
00:45:02 makes the agents more humanlike in terms of restrictions?
00:45:06 Yeah, putting constraints that.
00:45:08 More constraints, yeah.
00:45:09 That’s really interesting.
00:45:10 That’s really innovative.
00:45:11 So one of the constraints you put on yourself,
00:45:15 or at least focused in, is on the Protoss race,
00:45:18 as far as I understand.
00:45:19 Can you tell me about the different races
00:45:21 and how they, so Protoss, Terran, and Zerg,
00:45:26 how do they compare?
00:45:27 How do they interact?
00:45:28 Why did you choose Protoss?
00:45:30 Yeah, in the dynamics of the game seen
00:45:34 from a strategic perspective.
00:45:35 So Protoss, so in StarCraft there are three races.
00:45:39 Indeed, in the demonstration, we saw only the Protoss race.
00:45:43 So maybe let’s start with that one.
00:45:45 Protoss is kind of the most technologically advanced race.
00:45:49 It has units that are expensive but powerful.
00:45:53 So in general, you want to kind of conserve your units
00:45:57 as you go attack.
00:45:59 And then you want to utilize these tactical advantages
00:46:03 of very fancy spells and so on and so forth.
00:46:07 And at the same time, they’re kind of,
00:46:11 people say they’re a bit easier to play perhaps.
00:46:15 But that I actually didn’t know.
00:46:17 I mean, I just talked now a lot to the players
00:46:20 that we work with, TLO and Mana, and they said, oh yeah,
00:46:23 Protoss is actually, people think,
00:46:24 is actually one of the easiest races.
00:46:26 So perhaps the easier, that doesn’t
00:46:28 mean that it’s obviously professional players
00:46:32 excel at the three races.
00:46:34 And there’s never a race that dominates
00:46:37 for a very long time anyway.
00:46:38 So if you look at the top, I don’t know, 100 in the world,
00:46:41 is there one race that dominates that list?
00:46:44 It would be hard to know because it depends on the regions.
00:46:46 I think it’s pretty equal in terms of distribution.
00:46:50 And Blizzard wants it to be equal.
00:46:53 They wouldn’t want one race like Protoss
00:46:56 to not be representative in the top place.
00:46:59 So definitely, they tried it to be balanced.
00:47:03 So then maybe the opposite race of Protoss is Zerg.
00:47:07 Zerg is a race where you just kind of expand and take over
00:47:11 as many resources as you can, and they
00:47:14 have a very high capacity to regenerate their units.
00:47:17 So if you have an army, it’s not that valuable in terms
00:47:20 of losing the whole army is not a big deal as Zerg
00:47:23 because you can then rebuild it.
00:47:25 And given that you generally accumulate
00:47:28 a huge bank of resources, Zergs typically
00:47:31 play by applying a lot of pressure,
00:47:34 maybe losing their whole army, but then rebuilding it
00:47:37 quickly.
00:47:37 So although, of course, every race, I mean, there’s never,
00:47:42 I mean, they’re pretty diverse.
00:47:43 I mean, there are some units in Zerg that
00:47:45 are technologically advanced, and they do
00:47:47 some very interesting spells.
00:47:48 And there’s some units in Protoss that are less valuable,
00:47:51 and you could lose a lot of them and rebuild them,
00:47:53 and it wouldn’t be a big deal.
00:47:55 All right, so maybe I’m missing out.
00:47:57 Maybe I’m going to say some dumb stuff, but summary
00:48:01 of strategy.
00:48:02 So first, there’s collection of a lot of resources.
00:48:05 That’s one option.
00:48:06 The other one is expanding, so building other bases.
00:48:11 Then the other is obviously building units
00:48:15 and attacking with those units.
00:48:17 And then I don’t know what else there is.
00:48:20 Maybe there’s the different timing of attacks,
00:48:24 like do I attack early, attack late?
00:48:26 What are the different strategies that emerged
00:48:28 that you’ve learned about?
00:48:29 I’ve read that a bunch of people are super happy
00:48:31 that you guys have apparently, that Alpha Star apparently
00:48:34 has discovered that it’s really good to,
00:48:36 what is it, saturate?
00:48:38 Oh yeah, the mineral line.
00:48:39 Yeah, the mineral line.
00:48:41 Yeah, yeah.
00:48:42 And that’s for greedy amateur players like myself.
00:48:45 That’s always been a good strategy.
00:48:47 You just build up a lot of money,
00:48:49 and it just feels good to just accumulate and accumulate.
00:48:53 So thank you for discovering that and validating all of us.
00:48:56 But is there other strategies that you discovered
00:48:59 that are interesting, unique to this game?
00:49:01 Yeah, so if you look at the kind of,
00:49:05 not being a StarCraft II player,
00:49:06 but of course StarCraft and StarCraft II
00:49:08 and real time strategy games in general are very similar.
00:49:12 I would classify perhaps the openings of the game.
00:49:17 They’re very important.
00:49:18 And generally I would say there’s two kinds of openings.
00:49:21 One that’s a standard opening.
00:49:23 That’s generally how players find sort of a balance
00:49:28 between risk and economy and building some units early on
00:49:32 so that they could defend,
00:49:34 but they’re not too exposed basically,
00:49:36 but also expanding quite quickly.
00:49:38 So this would be kind of a standard opening.
00:49:41 And within a standard opening,
00:49:43 then what you do choose generally is
00:49:45 what technology are you aiming towards?
00:49:47 So there’s a bit of rock, paper, scissors
00:49:49 of you could go for spaceships
00:49:52 or you could go for invisible units
00:49:54 or you could go for, I don’t know,
00:49:55 like massive units that attack against certain kinds
00:49:58 of units, but they’re weak against others.
00:50:01 So standard openings themselves have some choices
00:50:05 like rock, paper, scissors style.
00:50:06 Of course, if you scout and you’re good
00:50:08 at guessing what the opponent is doing,
00:50:10 then you can play as an advantage
00:50:12 because if you know you’re gonna play rock,
00:50:13 I mean, I’m gonna play paper obviously.
00:50:15 So you can imagine that normal standard games
00:50:18 in StarCraft looks like a continuous rock, paper,
00:50:22 scissors game where you guess what the distribution
00:50:26 of rock, paper, and scissors is from the enemy
00:50:29 and reacting accordingly to try to beat it
00:50:32 or put the paper out before he kind of changes his mind
00:50:36 from rock to scissors,
00:50:38 and then you would be in a weak position.
00:50:39 So, sorry to pause on that.
00:50:41 I didn’t realize this element
00:50:42 because I know it’s true with poker.
00:50:44 I know I looked at Labratus.
00:50:48 So you’re also estimating trying to guess the distribution,
00:50:51 trying to better and better estimate the distribution
00:50:53 of what the opponent is likely to be doing.
00:50:55 Yeah, I mean, as a player,
00:50:56 you definitely wanna have a belief state
00:50:59 over what’s up on the other side of the map.
00:51:02 And when your belief state becomes inaccurate,
00:51:05 when you start having that serious doubts,
00:51:07 whether he’s gonna play something that you must know,
00:51:10 that’s when you scout.
00:51:11 You wanna then gather information, right?
00:51:14 Is improving the accuracy of the belief
00:51:15 or improving the belief state part of the loss
00:51:19 that you’re trying to optimize?
00:51:20 Or is it just a side effect?
00:51:22 It’s implicit, but you could explicitly model it,
00:51:25 and it would be quite good at probably predicting
00:51:27 what’s on the other side of the map.
00:51:30 But so far, it’s all implicit.
00:51:32 There’s no additional reward for predicting the enemy.
00:51:36 So there’s these standard openings,
00:51:38 and then there’s what people call cheese,
00:51:41 which is very interesting.
00:51:42 And AlphaStar sometimes really likes this kind of cheese.
00:51:46 These cheeses, what they are is kind of an all in strategy.
00:51:50 You’re gonna do something sneaky.
00:51:53 You’re gonna hide your own buildings
00:51:56 close to the enemy base,
00:51:58 or you’re gonna go for hiding your technological buildings
00:52:01 so that you do invisible units
00:52:03 and the enemy just cannot react to detect it
00:52:06 and thus lose the game.
00:52:07 And there’s quite a few of these cheeses
00:52:10 and variants of them.
00:52:11 And there it’s where actually the belief state
00:52:14 becomes even more important.
00:52:16 Because if I scout your base and I see no buildings at all,
00:52:20 any human player knows something’s up.
00:52:22 They might know, well,
00:52:23 you’re hiding something close to my base.
00:52:25 Should I build suddenly a lot of units to defend?
00:52:28 Should I actually block my ramp with workers
00:52:31 so that you cannot come and destroy my base?
00:52:33 So there’s all this is happening
00:52:35 and defending against cheeses is extremely important.
00:52:39 And in the AlphaStar League,
00:52:40 many agents actually develop some cheesy strategies.
00:52:45 And in the games we saw against TLO and Mana,
00:52:48 two out of the 10 agents
00:52:49 were actually doing these kind of strategies
00:52:51 which are cheesy strategies.
00:52:53 And then there’s a variant of cheesy strategy
00:52:55 which is called all in.
00:52:57 So an all in strategy is not perhaps as drastic as,
00:53:00 oh, I’m gonna build cannons on your base
00:53:02 and then bring all my workers
00:53:03 and try to just disrupt your base and game over,
00:53:06 or GG as we say in StarCraft.
00:53:09 There’s these kind of very cool things
00:53:11 that you can align precisely at a certain time mark.
00:53:14 So for instance,
00:53:15 you can generate exactly 10 unit composition
00:53:19 that is perfect, like five of this type,
00:53:21 five of this other type,
00:53:22 and align the upgrade
00:53:24 so that at four minutes and a half, let’s say,
00:53:27 you have these 10 units and the upgrade just finished.
00:53:30 And at that point, that army is really scary.
00:53:33 And unless the enemy really knows what’s going on,
00:53:36 if you push, you might then have an advantage
00:53:40 because maybe the enemy is doing something more standard,
00:53:42 it expanded too much, it developed too much economy,
00:53:45 and it trade off badly against having defenses,
00:53:49 and the enemy will lose.
00:53:51 But it’s called all in because if you don’t win,
00:53:53 then you’re gonna lose.
00:53:55 So you see players that do these kinds of strategies,
00:53:57 if they don’t succeed, game is not over.
00:54:00 I mean, they still have a base
00:54:01 and they still gathering minerals,
00:54:02 but they will just GG out of the game
00:54:04 because they know, well, game is over.
00:54:06 I gambled and I failed.
00:54:08 So if we start entering the game theoretic aspects
00:54:12 of the game, it’s really rich and it’s really,
00:54:15 that’s why it also makes it quite entertaining to watch.
00:54:17 Even if I don’t play, I still enjoy watching the game.
00:54:21 But the agents are trying to do this mostly implicitly.
00:54:26 But one element that we improved in self play
00:54:29 is creating the Alpha Star League.
00:54:31 And the Alpha Star League is not pure self play.
00:54:34 It’s trying to create a different personalities of agents
00:54:37 so that some of them will become cheesy agents.
00:54:41 Some of them might become very economical, very greedy,
00:54:44 like getting all the resources,
00:54:46 but then being maybe early on, they’re gonna be weak,
00:54:48 but later on, they’re gonna be very strong.
00:54:51 And by creating this personality of agents,
00:54:53 which sometimes it just happens naturally
00:54:55 that you can see kind of an evolution of agents
00:54:58 that given the previous generation,
00:55:00 they train against all of them
00:55:02 and then they generate kind of the perfect counter
00:55:04 to that distribution.
00:55:05 But these agents, you must have them in the populations
00:55:09 because if you don’t have them,
00:55:11 you’re not covered against these things.
00:55:13 You wanna create all sorts of the opponents
00:55:17 that you will find in the wild.
00:55:18 So you can be exposed to these cheeses, early aggression,
00:55:23 later aggression, more expansions,
00:55:25 dropping units in your base from the side, all these things.
00:55:29 And pure self play is getting a bit stuck
00:55:32 at finding some subset of these, but not all of these.
00:55:36 So the Alpha Star League is a way
00:55:38 to kind of do an ensemble of agents
00:55:41 that they’re all playing in a league,
00:55:43 much like people play on Battle.net, right?
00:55:45 They play, you play against someone
00:55:47 who does a new cool strategy and you immediately,
00:55:50 oh my God, I wanna try it, I wanna play again.
00:55:53 And this to me was another critical part of the problem,
00:55:57 which was, can we create a Battle.net for agents?
00:56:01 And that’s kind of what the Alpha Star League really is.
00:56:03 That’s fascinating.
00:56:04 And where they stick to their different strategies.
00:56:06 Yeah, wow, that’s really, really interesting.
00:56:09 But that said, you were fortunate enough
00:56:13 or just skilled enough to win five, zero.
00:56:17 And so how hard is it to win?
00:56:19 I mean, that’s not the goal.
00:56:20 I guess, I don’t know what the goal is.
00:56:21 The goal should be to win majority, not five, zero,
00:56:25 but how hard is it in general to win all matchups
00:56:29 on a one V one?
00:56:31 So that’s a very interesting question
00:56:33 because once you see Alpha Star and superficially
00:56:38 you think, well, okay, it won.
00:56:40 Let’s, if you sum all the games like 10 to one, right?
00:56:42 It lost the game that it played with the camera interface.
00:56:46 You might think, well, that’s done, right?
00:56:48 It’s superhuman at the game.
00:56:50 And that’s not really the claim we really can make actually.
00:56:55 The claim is we beat a professional gamer
00:56:58 for the first time.
00:57:00 StarCraft has really been a thing
00:57:02 that has been going on for a few years,
00:57:04 but a moment like this had not occurred before yet.
00:57:09 But are these agents impossible to beat?
00:57:12 Absolutely not, right?
00:57:13 So that’s a bit what’s kind of the difference is
00:57:17 the agents play at grandmaster level.
00:57:19 They definitely understand the game enough
00:57:21 to play extremely well, but are they unbeatable?
00:57:24 Do they play perfect?
00:57:26 No, and actually in StarCraft,
00:57:29 because of these sneaky strategies,
00:57:32 it’s always possible that you might take a huge risk
00:57:34 sometimes, but you might get wins, right?
00:57:36 Out of this.
00:57:38 So I think that as a domain,
00:57:41 it still has a lot of opportunities,
00:57:43 not only because of course we wanna learn
00:57:45 with less experience, we would like to,
00:57:47 I mean, if I learned to play Protoss,
00:57:49 I can play Terran and learn it much quicker
00:57:52 than Alpha Star can, right?
00:57:53 So there are obvious interesting research challenges
00:57:56 as well, but even as the raw performance goes,
00:58:02 really the claim here can be we are at pro level
00:58:05 or at high grandmaster level,
00:58:08 but obviously the players also did not know what to expect,
00:58:13 right?
00:58:14 Their prior distribution was a bit off
00:58:15 because they played this kind of new like alien brain
00:58:19 as they like to say it, right?
00:58:21 And that’s what makes it exciting for them.
00:58:24 But also I think if you look at the games closely,
00:58:27 you see there were weaknesses in some points,
00:58:30 maybe Alpha Star did not scout,
00:58:32 or if it had invisible units going against
00:58:35 at certain points, it wouldn’t have known
00:58:37 and it would have been bad.
00:58:38 So there’s still quite a lot of work to do,
00:58:42 but it’s really a very exciting moment for us
00:58:44 to be seeing, wow, a single neural net on a GPU
00:58:48 is actually playing against these guys
00:58:50 who are amazing.
00:58:51 I mean, you have to see them play in life.
00:58:52 They’re really, really amazing players.
00:58:55 Yeah, I’m sure there must be a guy in Poland
00:58:59 somewhere right now training his butt off
00:59:02 to make sure that this never happens again with Alpha Star.
00:59:05 So that’s really exciting in terms of Alpha Star
00:59:09 having some holes to exploit, which is great.
00:59:11 And then we build on top of each other
00:59:13 and it feels like StarCraft on let go,
00:59:16 even if you win, it’s still not,
00:59:20 there’s so many different dimensions
00:59:23 in which you can explore.
00:59:24 So that’s really, really interesting.
00:59:25 Do you think there’s a ceiling to Alpha Star?
00:59:28 You’ve said that it hasn’t reached,
00:59:31 you know, this is a big,
00:59:32 wait, let me actually just pause for a second.
00:59:35 How did it feel to come here to this point,
00:59:40 to beat a top professional player?
00:59:42 Like that night, I mean, you know,
00:59:44 Olympic athletes have their gold medal, right?
00:59:47 This is your gold medal in a sense.
00:59:48 Sure, you’re cited a lot,
00:59:50 you’ve published a lot of prestigious papers, whatever,
00:59:53 but this is like a win.
00:59:55 How did it feel?
00:59:56 I mean, it was, for me, it was unbelievable
00:59:59 because first the win itself,
01:00:03 I mean, it was so exciting.
01:00:05 I mean, so looking back to those last days of 2018 really,
01:00:11 that’s when the games were played.
01:00:13 I’m sure I look back at that moment, I’ll say,
01:00:15 oh my God, I want to be in a project like that.
01:00:18 It’s like, I already feel the nostalgia of like,
01:00:21 yeah, that was huge in terms of the energy
01:00:24 and the team effort that went into it.
01:00:26 And so in that sense, as soon as it happened,
01:00:29 I already knew it was kind of,
01:00:31 I was losing it a little bit.
01:00:33 So it is almost like sad that it happened and oh my God,
01:00:36 but on the other hand, it also verifies the approach.
01:00:41 But to me also, there’s so many challenges
01:00:43 and interesting aspects of intelligence
01:00:46 that even though we can train a neural network
01:00:49 to play at the level of the best humans,
01:00:52 there’s still so many challenges.
01:00:54 So for me, it’s also like, well,
01:00:55 this is really an amazing achievement,
01:00:57 but I already was also thinking about next steps.
01:00:59 I mean, as I said, these Asians play Protoss versus Protoss,
01:01:04 but they should be able to play a different race
01:01:07 much quicker, right?
01:01:08 So that would be an amazing achievement.
01:01:10 Some people call this meta reinforcement learning,
01:01:13 meta learning and so on, right?
01:01:15 So there’s so many possibilities after that moment,
01:01:18 but the moment itself, it really felt great.
01:01:23 We had this bet, so I’m kind of a pessimist in general.
01:01:27 So I kind of send an email to the team.
01:01:29 I said, okay, let’s against TLO first, right?
01:01:33 Like what’s gonna be the result?
01:01:35 And I really thought we would lose like five zero, right?
01:01:38 We had some calibration made against the 5,000 MMR player.
01:01:44 TLO was much stronger than that player,
01:01:47 even if he played Protoss, which is his off race.
01:01:51 But yeah, I was not imagining we would win.
01:01:53 So for me, that was just kind of a test run or something.
01:01:55 And then it really kind of, he was really surprised.
01:01:59 And unbelievably, we went to this bar to celebrate
01:02:04 and Dave tells me, well, why don’t we invite someone
01:02:08 who is a thousand MMR stronger in Protoss,
01:02:10 like actual Protoss player,
01:02:12 like that it turned up being Mana, right?
01:02:16 And we had some drinks and I said, sure, why not?
01:02:19 But then I thought, well,
01:02:20 that’s really gonna be impossible to beat.
01:02:22 I mean, even because it’s so much ahead,
01:02:24 a thousand MMR is really like 99% probability
01:02:28 that Mana would beat TLO as Protoss versus Protoss, right?
01:02:33 So we did that.
01:02:34 And to me, the second game was much more important,
01:02:38 even though a lot of uncertainty kind of disappeared
01:02:42 after we kind of beat TLO.
01:02:43 I mean, he is a professional player.
01:02:45 So that was kind of, oh,
01:02:46 but that’s really a very nice achievement.
01:02:49 But Mana really was at the top
01:02:51 and you could see he played much better,
01:02:53 but our agents got much better too.
01:02:55 So it’s like, ah, and then after the first game,
01:02:59 I said, if we take a single game,
01:03:00 at least we can say we beat a game.
01:03:02 I mean, even if we don’t beat the series,
01:03:04 for me, that was a huge relief.
01:03:06 And I mean, I remember the hugging demis.
01:03:09 And I mean, it was really like,
01:03:10 this moment for me will resonate forever as a researcher.
01:03:14 And I mean, as a person,
01:03:15 and yeah, it’s a really like great accomplishment.
01:03:18 And it was great also to be there with the team in the room.
01:03:21 I don’t know if you saw like this.
01:03:23 So it was really like.
01:03:24 I mean, from my perspective,
01:03:25 the other interesting thing is just like watching Kasparov,
01:03:29 watching Mana was also interesting
01:03:33 because he didn’t, he has kind of a loss of words.
01:03:36 I mean, whenever you lose, I’ve done a lot of sports.
01:03:38 You sometimes say excuses, you look for reasons.
01:03:43 And he couldn’t really come up with reasons.
01:03:46 I mean, so with the off race for Protoss,
01:03:50 you could say, well, it felt awkward, it wasn’t,
01:03:52 but here it was just beaten.
01:03:55 And it was beautiful to look at a human being
01:03:57 being superseded by an AI system.
01:04:00 I mean, it’s a beautiful moment for researchers, so.
01:04:04 Yeah, for sure it was.
01:04:05 I mean, probably the highlight of my career so far
01:04:09 because of its uniqueness and coolness.
01:04:11 And I don’t know, I mean, it’s obviously, as you said,
01:04:14 you can look at papers, citations and so on,
01:04:16 but these really is like a testament
01:04:19 of the whole machine learning approach
01:04:22 and using games to advance technology.
01:04:24 I mean, it really was,
01:04:26 everything came together at that moment.
01:04:28 That’s really the summary.
01:04:29 Also on the other side, it’s a popularization of AI too,
01:04:34 because it’s just like traveling to the moon and so on.
01:04:38 I mean, this is where a very large community of people
01:04:41 that don’t really know AI,
01:04:43 they get to really interact with it.
01:04:45 Which is very important.
01:04:46 I mean, we must, you know,
01:04:48 writing papers helps our peers, researchers,
01:04:51 to understand what we’re doing.
01:04:52 But I think AI is becoming mature enough
01:04:55 that we must sort of try to explain what it is.
01:04:59 And perhaps through games is an obvious way
01:05:01 because these games always had built in AI.
01:05:03 So it may be everyone experience an AI playing a video game,
01:05:07 even if they don’t know,
01:05:08 because there’s always some scripted element
01:05:10 and some people might even call that AI already, right?
01:05:13 So what are other applications
01:05:16 of the approaches underlying AlphaStar
01:05:19 that you see happening?
01:05:20 There’s a lot of echoes of, you said,
01:05:22 transformer of language modeling and so on.
01:05:25 Have you already started thinking
01:05:27 where the breakthroughs in AlphaStar
01:05:30 get expanded to other applications?
01:05:32 Right, so I thought about a few things
01:05:34 for like kind of next month, next years.
01:05:38 The main thing I’m thinking about actually is what’s next
01:05:41 as a kind of a grand challenge.
01:05:43 Because for me, like we’ve seen Atari
01:05:47 and then there’s like the sort of three dimensional walls
01:05:50 that we’ve seen also like pretty good performance
01:05:52 from these capture the flag agents
01:05:54 that also some people at DeepMind and elsewhere
01:05:56 are working on.
01:05:57 We’ve also seen some amazing results on like,
01:05:59 for instance, Dota 2, which is also a very complicated game.
01:06:03 So for me, like the main thing I’m thinking about
01:06:05 is what’s next in terms of challenge.
01:06:07 So as a researcher, I see sort of two tensions
01:06:12 between research and then applications or areas
01:06:16 or domains where you apply them.
01:06:18 So on the one hand, we’ve done,
01:06:20 thanks to the application of StarCraft is very hard.
01:06:23 We developed some techniques, some new research
01:06:25 that now we could look at elsewhere.
01:06:27 Like are there other applications where we can apply these?
01:06:30 And the obvious ones, absolutely.
01:06:32 You can think of feeding back to sort of the community
01:06:37 we took from, which was mostly sequence modeling
01:06:40 or natural language processing.
01:06:41 So we’ve developed and extended things from the transformer
01:06:46 and we use pointer networks.
01:06:48 We combine LSTM and transformers in interesting ways.
01:06:51 So that’s perhaps the kind of lowest hanging fruit
01:06:54 of feeding back to now a different field
01:06:57 of machine learning that’s not playing video games.
01:07:00 Let me go old school and jump to Mr. Alan Turing.
01:07:05 So the Turing test is a natural language test,
01:07:09 a conversational test.
01:07:11 What’s your thought of it as a test for intelligence?
01:07:15 Do you think it is a grand challenge
01:07:17 that’s worthy of undertaking?
01:07:18 Maybe if it is, would you reformulate it or phrase it
01:07:22 somehow differently?
01:07:23 Right, so I really love the Turing test
01:07:25 because I also like sequences and language understanding.
01:07:29 And in fact, some of the early work
01:07:32 we did in machine translation, we
01:07:33 tried to apply to kind of a neural chatbot, which obviously
01:07:38 would never pass the Turing test because it was very limited.
01:07:42 But it is a very fascinating idea
01:07:45 that you could really have an AI that
01:07:49 would be indistinguishable from humans in terms of asking
01:07:53 or conversing with it.
01:07:56 So I think the test itself seems very nice.
01:08:00 And it’s kind of well defined, actually,
01:08:02 like the passing it or not.
01:08:04 I think there’s quite a few rules
01:08:06 that feel pretty simple.
01:08:09 And I think they have these competitions every year.
01:08:14 Yes, there’s the Lebner Prize.
01:08:15 But I don’t know if you’ve seen the kind of bots
01:08:22 that emerge from that competition.
01:08:24 They’re not quite as what you would.
01:08:27 So it feels like that there’s weaknesses with the way Turing
01:08:30 formulated it.
01:08:31 It needs to be that the definition
01:08:34 of a genuine, rich, fulfilling human conversation,
01:08:39 it needs to be something else.
01:08:41 Like the Alexa Prize, which I’m not as well familiar with,
01:08:44 has tried to define that more, I think,
01:08:46 by saying you have to continue keeping
01:08:48 a conversation for 30 minutes, something like that.
01:08:52 So basically forcing the agent not to just fool,
01:08:55 but to have an engaging conversation kind of thing.
01:09:02 Have you thought about this problem richly?
01:09:06 And if you have in general, how far away are we from?
01:09:10 You worked a lot on language understanding,
01:09:14 language generation, but the full dialogue,
01:09:16 the conversation, just sitting at the bar
01:09:19 having a couple of beers for an hour,
01:09:21 that kind of conversation.
01:09:22 Have you thought about it?
01:09:23 Yeah, so I think you touched here
01:09:25 on the critical point, which is feasibility.
01:09:28 So there’s a great essay by Hamming,
01:09:32 which describes sort of grand challenges of physics.
01:09:37 And he argues that, well, OK, for instance,
01:09:41 teleportation or time travel are great grand challenges
01:09:44 of physics, but there’s no attacks.
01:09:46 We really don’t know or cannot kind of make any progress.
01:09:50 So that’s why most physicists and so on,
01:09:53 they don’t work on these in their PhDs
01:09:55 and as part of their careers.
01:09:57 So I see the Turing test, in the full Turing test,
01:10:00 as a bit still too early.
01:10:02 Like I think we’re, especially with the current trend
01:10:06 of deep learning language models,
01:10:10 we’ve seen some amazing examples.
01:10:11 I think GPT2 being the most recent one, which
01:10:14 is very impressive.
01:10:15 But to understand to fully solve passing or fooling a human
01:10:21 to think that there’s a human on the other side,
01:10:23 I think we’re quite far.
01:10:24 So as a result, I don’t see myself
01:10:27 and I probably would not recommend people doing a PhD
01:10:30 on solving the Turing test because it just
01:10:32 feels it’s kind of too early or too hard of a problem.
01:10:35 Yeah, but that said, you said the exact same thing
01:10:37 about StarCraft about a few years ago.
01:10:40 Indeed.
01:10:41 To Demis.
01:10:41 So you’ll probably also be the person who passes
01:10:46 the Turing test in three years.
01:10:48 I mean, I think that, yeah.
01:10:50 So we have this on record.
01:10:52 This is nice.
01:10:52 It’s true.
01:10:53 I mean, it’s true that progress sometimes
01:10:56 is a bit unpredictable.
01:10:57 I really wouldn’t have not.
01:10:59 Even six months ago, I would not have predicted the level
01:11:02 that we see that these agents can deliver at grandmaster
01:11:06 level.
01:11:07 But I have worked on language enough.
01:11:10 And basically, my concern is not that something could happen,
01:11:13 a breakthrough could happen that would bring us to solving
01:11:16 or passing the Turing test, is that I just
01:11:19 think the statistical approach to it is not going to cut it.
01:11:24 So we need a breakthrough, which is great for the community.
01:11:28 But given that, I think there’s quite more uncertainty.
01:11:31 Whereas for StarCraft, I knew what the steps would
01:11:36 be to get us there.
01:11:38 I think it was clear that using the imitation learning part
01:11:41 and then using this battle net for agents
01:11:44 were going to be key.
01:11:45 And it turned out that this was the case.
01:11:48 And a little more was needed, but not much more.
01:11:51 For Turing test, I just don’t know
01:11:53 what the plan or execution plan would look like.
01:11:56 So that’s why I myself working on it as a grand challenge
01:12:00 is hard.
01:12:01 But there are quite a few sub challenges
01:12:03 that are related that you could say,
01:12:05 well, I mean, what if you create a great assistant
01:12:09 like Google already has, like the Google Assistant.
01:12:11 So can we make it better?
01:12:13 And can we make it fully neural and so on?
01:12:15 That I start to believe maybe we’re
01:12:17 reaching a point where we should attempt these challenges.
01:12:20 I like this conversation so much because it echoes very much
01:12:23 the StarCraft conversation.
01:12:24 It’s exactly how you approach StarCraft.
01:12:26 Let’s break it down into small pieces and solve those.
01:12:29 And you end up solving the whole game.
01:12:31 Great.
01:12:31 But that said, you’re behind some
01:12:34 of the biggest pieces of work in deep learning
01:12:37 in the last several years.
01:12:40 So you mentioned some limits.
01:12:42 What do you think of the current limits of deep learning?
01:12:44 And how do we overcome those limits?
01:12:47 So if I had to actually use a single word
01:12:50 to define the main challenge in deep learning,
01:12:53 it’s a challenge that probably has
01:12:55 been the challenge for many years.
01:12:56 And it’s that of generalization.
01:12:59 So what that means is that all that we’re doing
01:13:04 is fitting functions to data.
01:13:06 And when the data we see is not from the same distribution,
01:13:12 or even if there are some times that it
01:13:14 is very close to distribution, but because
01:13:17 of the way we train it with limited samples,
01:13:20 we then get to this stage where we just
01:13:23 don’t see generalization as much as we can generalize.
01:13:27 And I think adversarial examples are a clear example of this.
01:13:31 But if you study machine learning and literature,
01:13:34 and the reason why SVMs came very popular
01:13:38 were because they were dealing and they
01:13:40 had some guarantees about generalization, which
01:13:42 is unseen data or out of distribution,
01:13:45 or even within distribution where you take an image adding
01:13:48 a bit of noise, these models fail.
01:13:51 So I think, really, I don’t see a lot of progress
01:13:56 on generalization in the strong generalization
01:14:00 sense of the word.
01:14:01 I think our neural networks, you can always
01:14:05 find design examples that will make their outputs arbitrary,
01:14:11 which is not good because we humans would never
01:14:15 be fooled by these kind of images
01:14:17 or manipulation of the image.
01:14:19 And if you look at the mathematics,
01:14:21 you kind of understand this is a bunch of matrices
01:14:23 multiplied together.
01:14:26 There’s probably numerics and instability
01:14:28 that you can just find corner cases.
01:14:30 So I think that’s really the underlying topic many times
01:14:35 we see when even at the grand stage of Turing test
01:14:40 generalization, if you start passing the Turing test,
01:14:44 should it be in English or should it be in any language?
01:14:48 As a human, if you ask something in a different language,
01:14:53 you actually will go and do some research
01:14:54 and try to translate it and so on.
01:14:57 Should the Turing test include that?
01:15:01 And it’s really a difficult problem
01:15:02 and very fascinating and very mysterious, actually.
01:15:05 Yeah, absolutely.
01:15:06 But do you think if you were to try to solve it,
01:15:10 can you not grow the size of data intelligently
01:15:14 in such a way that the distribution of your training
01:15:17 set does include the entirety of the testing set?
01:15:20 Is that one path?
01:15:21 The other path is totally a new methodology.
01:15:23 It’s not statistical.
01:15:24 So a path that has worked well, and it worked well
01:15:27 in StarCraft and in machine translation and in languages,
01:15:30 scaling up the data and the model.
01:15:32 And that’s kind of been maybe the only single formula that
01:15:38 still delivers today in deep learning, right?
01:15:40 It’s that data scale and model scale really
01:15:44 do more and more of the things that we thought,
01:15:47 oh, there’s no way it can generalize to these,
01:15:49 or there’s no way it can generalize to that.
01:15:51 But I don’t think fundamentally it will be solved with this.
01:15:54 And for instance, I’m really liking some style or approach
01:15:59 that would not only have neural networks,
01:16:02 but it would have programs or some discrete decision making,
01:16:06 because there is where I feel there’s a bit more.
01:16:10 I mean, the best example, I think, for understanding this
01:16:13 is I also worked a bit on, oh, we
01:16:16 can learn an algorithm with a neural network, right?
01:16:18 So you give it many examples, and it’s
01:16:20 going to sort the input numbers or something like that.
01:16:24 But really strong generalization is you give me some numbers
01:16:29 or you ask me to create an algorithm that sorts numbers.
01:16:32 And instead of creating a neural net, which will be fragile
01:16:34 because it’s going to go out of range at some point,
01:16:37 you’re going to give it numbers that are too large, too small,
01:16:40 and whatnot, if you just create a piece of code that
01:16:45 sorts the numbers, then you can prove
01:16:47 that that will generalize to absolutely all the possible
01:16:50 input you could give.
01:16:51 So I think the problem comes with some exciting prospects.
01:16:55 I mean, scale is a bit more boring, but it really works.
01:16:59 And then maybe programs and discrete abstractions
01:17:02 are a bit less developed.
01:17:04 But clearly, I think they’re quite exciting in terms
01:17:07 of future for the field.
01:17:09 Do you draw any insight wisdom from the 80s and expert
01:17:14 systems and symbolic systems, symbolic computing?
01:17:16 Do you ever go back to those reasoning, that kind of logic?
01:17:20 Do you think that might make a comeback?
01:17:23 You’ll have to dust off those books?
01:17:24 Yeah, I actually love actually adding more inductive biases.
01:17:31 To me, the problem really is, what are you trying to solve?
01:17:34 If what you’re trying to solve is so important that try
01:17:37 to solve it no matter what, then absolutely use rules,
01:17:42 use domain knowledge, and then use
01:17:45 a bit of the magic of machine learning
01:17:46 to empower to make the system as the best system that
01:17:50 will detect cancer or detect weather patterns, right?
01:17:56 Or in terms of StarCraft, it also was a very big challenge.
01:17:59 So I was definitely happy that if we
01:18:01 had to cut a corner here and there,
01:18:04 it could have been interesting to do.
01:18:07 And in fact, in StarCraft, we start
01:18:09 thinking about expert systems because it’s a very,
01:18:11 you know, you can define.
01:18:12 I mean, people actually build StarCraft bots by thinking
01:18:15 about those principles, like state machines and rule based.
01:18:20 And then you could think of combining
01:18:22 a bit of a rule based system, but that has also
01:18:25 neural networks incorporated to make it generalize a bit
01:18:28 better.
01:18:29 So absolutely, I mean, we should definitely
01:18:31 go back to those ideas.
01:18:32 And anything that makes the problem simpler,
01:18:35 as long as your problem is important, that’s OK.
01:18:37 And that’s research driving a very important problem.
01:18:41 And on the other hand, if you want to really focus
01:18:44 on the limits of reinforcement learning,
01:18:46 then of course, you must try not to look at imitation data
01:18:50 or to look for some rules of the domain that would help a lot
01:18:55 or even feature engineering, right?
01:18:56 So this is a tension that depending on what you do,
01:19:00 I think both ways are definitely fine.
01:19:03 And I would never not do one or the other
01:19:06 as long as what you’re doing is important
01:19:08 and needs to be solved, right?
01:19:10 Right, so there’s a bunch of different ideas
01:19:13 that you developed that I really enjoy.
01:19:16 But one is translating from image captioning,
01:19:22 translating from image to text, just another beautiful idea,
01:19:27 I think, that resonates throughout your work, actually.
01:19:33 So the underlying nature of reality
01:19:35 being language always, somehow.
01:19:38 So what’s the connection between images and text,
01:19:42 or rather the visual world and the world
01:19:44 of language in your view?
01:19:46 Right, so I think a piece of research that’s been central
01:19:51 to, I would say, even extending into StarGraph
01:19:54 is this idea of sequence to sequence learning,
01:19:57 which what we really meant by that
01:19:59 is that you can now really input anything
01:20:03 to a neural network as the input x.
01:20:06 And then the neural network will learn a function f
01:20:09 that will take x as an input and produce any output y.
01:20:12 And these x and y’s don’t need to be static or features,
01:20:19 like fixed vectors or anything like that.
01:20:22 It could be really sequences and now beyond data structures.
01:20:26 So that paradigm was tested in a very interesting way
01:20:31 when we moved from translating French to English
01:20:35 to translating an image to its caption.
01:20:37 But the beauty of it is that, really,
01:20:40 and that’s actually how it happened.
01:20:43 I changed a line of code in this thing that
01:20:45 was doing machine translation.
01:20:47 And I came the next day, and I saw
01:20:50 how it was producing captions that seemed like, oh my god,
01:20:54 this is really, really working.
01:20:55 And the principle is the same.
01:20:57 So I think I don’t see text, vision, speech, waveforms
01:21:04 as something different as long as you basically
01:21:09 learn a function that will vectorize these into.
01:21:14 And then after we vectorize it, we
01:21:16 can then use transformers, LSTMs, whatever
01:21:20 the flavor of the month of the model is.
01:21:22 And then as long as we have enough supervised data,
01:21:25 really, this formula will work and will keep working,
01:21:30 I believe, to some extent.
01:21:31 Modulo these generalization issues that I mentioned before.
01:21:35 But the task there is to vectorize,
01:21:36 so to form a representation that’s meaningful.
01:21:39 And your intuition now, having worked with all this media,
01:21:42 is that once you are able to form that representation,
01:21:46 you could basically take any things, any sequence.
01:21:51 Going back to StarCraft, is there
01:21:52 limits on the length so that we didn’t really
01:21:56 touch on the long term aspect?
01:21:59 How did you overcome the whole really long term
01:22:02 aspect of things here?
01:22:03 Is there some tricks?
01:22:05 So the main trick, so StarCraft, if you
01:22:08 look at absolutely every frame, you
01:22:10 might think it’s quite a long game.
01:22:12 So we would have to multiply 22 times 60 seconds per minute
01:22:18 times maybe at least 10 minutes per game on average.
01:22:21 So there are quite a few frames.
01:22:25 But the trick really was to only observe, in fact,
01:22:30 which might be seen as a limitation,
01:22:32 but it is also a computational advantage.
01:22:35 Only observe when you act.
01:22:37 And then what the neural network decides
01:22:40 is what is the gap going to be until the next action.
01:22:44 And if you look at most StarCraft games
01:22:48 that we have in the data set that Blizzard provided,
01:22:51 it turns out that most games are actually only,
01:22:56 I mean, it is still a long sequence,
01:22:58 but it’s maybe like 1,000 to 1,500 actions,
01:23:02 which if you start looking at LSTMs, large LSTMs,
01:23:07 transformers, it’s not that difficult, especially
01:23:12 if you have supervised learning.
01:23:14 If you had to do it with reinforcement learning,
01:23:16 the credit assignment problem, what
01:23:18 is it in this game that made you win?
01:23:19 That would be really difficult.
01:23:21 But thankfully, because of imitation learning,
01:23:24 we didn’t have to deal with these directly.
01:23:27 Although if we had to, we tried it.
01:23:29 And what happened is you just take all your workers
01:23:31 and attack with them.
01:23:33 And that is kind of obvious in retrospect
01:23:36 because you start trying random actions.
01:23:38 One of the actions will be a worker
01:23:40 that goes to the enemy base.
01:23:41 And because it’s self play, it’s not
01:23:43 going to know how to defend because it basically
01:23:45 doesn’t know almost anything.
01:23:47 And eventually, what you develop is this take all workers
01:23:50 and attack because the credit assignment issue in a rally
01:23:54 is really, really hard.
01:23:55 I do believe we could do better.
01:23:57 And that’s maybe a research challenge for the future.
01:24:01 But yeah, even in StarCraft, the sequences
01:24:04 are maybe 1,000, which I believe is
01:24:07 within the realm of what transformers can do.
01:24:10 Yeah, I guess the difference between StarCraft and Go
01:24:12 is in Go and Chess, stuff starts happening right away.
01:24:18 So there’s not, yeah, it’s pretty easy to self play.
01:24:22 Not easy, but to self play, it’s possible to develop
01:24:24 reasonable strategies quickly as opposed to StarCraft.
01:24:27 I mean, in Go, there’s only 400 actions.
01:24:30 But one action is what people would call the God action.
01:24:34 That would be if you had expanded the whole search
01:24:38 tree, that’s the best action if you did minimax
01:24:40 or whatever algorithm you would do if you
01:24:42 had the computational capacity.
01:24:44 But in StarCraft, 400 is minuscule.
01:24:48 Like in 400, you couldn’t even click
01:24:51 on the pixels around a unit.
01:24:53 So I think the problem there is in terms of action space size
01:24:58 is way harder.
01:25:01 And that search is impossible.
01:25:03 So there’s quite a few challenges indeed
01:25:06 that make this kind of a step up in terms of machine learning.
01:25:10 For humans, maybe playing StarCraft
01:25:13 seems more intuitive because it looks real.
01:25:16 I mean, the graphics and everything moves smoothly,
01:25:18 whereas I don’t know how to.
01:25:20 I mean, Go is a game that I would really need to study.
01:25:22 It feels quite complicated.
01:25:23 But for machines, kind of maybe it’s the reverse, yes.
01:25:27 Which shows you the gap actually between deep learning
01:25:30 and however the heck our brains work.
01:25:34 So you developed a lot of really interesting ideas.
01:25:36 It’s interesting to just ask, what’s
01:25:38 your process of developing new ideas?
01:25:41 Do you like brainstorming with others?
01:25:42 Do you like thinking alone?
01:25:44 Do you like, what was it, Ian Goodfellow said
01:25:49 he came up with GANs after a few beers.
01:25:52 He thinks beers are essential for coming up with new ideas.
01:25:55 We had beers to decide to play another game of StarCraft
01:25:59 after a week.
01:25:59 So it’s really similar to that story.
01:26:02 Actually, I explained this in a DeepMind retreat.
01:26:05 And I said, this is the same as the GAN story.
01:26:08 I mean, we were in a bar.
01:26:09 And we decided, let’s play a GAN next week.
01:26:10 And that’s what happened.
01:26:11 I feel like we’re giving the wrong message
01:26:13 to young undergrads.
01:26:15 Yeah, I know.
01:26:15 But in general, do you like brainstorming?
01:26:18 Do you like thinking alone, working stuff out?
01:26:20 So I think throughout the years, also, things changed.
01:26:23 So initially, I was very fortunate to be
01:26:29 with great minds like Jeff Hinton, Jeff Dean,
01:26:33 Ilya Sutskever.
01:26:34 I was really fortunate to join Brain at a very good time.
01:26:37 So at that point, ideas, I was just
01:26:41 brainstorming with my colleagues and learned a lot.
01:26:44 And keep learning is actually something
01:26:46 you should never stop doing.
01:26:48 So learning implies reading papers and also
01:26:51 discussing ideas with others.
01:26:53 It’s very hard at some point to not communicate
01:26:56 that being reading a paper from someone
01:26:59 or actually discussing.
01:27:00 So definitely, that communication aspect
01:27:04 needs to be there, whether it’s written or oral.
01:27:08 Nowadays, I’m also trying to be a bit more strategic
01:27:12 about what research to do.
01:27:15 So I was describing a little bit this tension
01:27:18 between research for the sake of research,
01:27:21 and then you have, on the other hand,
01:27:23 applications that can drive the research.
01:27:25 And honestly, the formula that has worked best for me
01:27:28 is just find a hard problem and then
01:27:32 try to see how research fits into it,
01:27:34 how it doesn’t fit into it, and then you must innovate.
01:27:37 So I think machine translation drove sequence to sequence.
01:27:43 Then maybe learning algorithms that had to,
01:27:47 combinatorial algorithms led to pointer networks.
01:27:50 StarCraft led to really scaling up imitation learning
01:27:53 and the AlphaStarLeague.
01:27:55 So that’s been a formula that I personally like.
01:27:58 But the other one is also valid.
01:28:00 And I’ve seen it succeed a lot of the times
01:28:02 where you just want to investigate model based
01:28:05 RL as a research topic.
01:28:08 And then you must then start to think, well,
01:28:11 how are the tests?
01:28:12 How are you going to test these ideas?
01:28:14 You need a minimal environment to try things.
01:28:17 You need to read a lot of papers and so on.
01:28:19 And that’s also very fun to do and something
01:28:21 I’ve also done quite a few times,
01:28:24 both at Brain, at DeepMind, and obviously as a PhD.
01:28:28 So I think besides the ideas and discussions,
01:28:32 I think it’s important also because you start
01:28:35 sort of guiding not only your own goals,
01:28:40 but other people’s goals to the next breakthrough.
01:28:44 So you must really kind of understand this feasibility
01:28:48 also, as we were discussing before,
01:28:50 whether this domain is ready to be tackled or not.
01:28:53 And you don’t want to be too early.
01:28:55 You obviously don’t want to be too late.
01:28:57 So it’s really interesting, this strategic component
01:29:00 of research, which I think as a grad student,
01:29:03 I just had no idea.
01:29:05 I just read papers and discussed ideas.
01:29:07 And I think this has been maybe the major change.
01:29:09 And I recommend people kind of feed forward
01:29:13 to success how it looks like and try to backtrack,
01:29:16 other than just kind of looking, oh, this looks cool.
01:29:18 This looks cool.
01:29:19 And then you do a bit of random work,
01:29:21 which sometimes you stumble upon some interesting things.
01:29:23 But in general, it’s also good to plan a bit.
01:29:27 Yeah, I like it.
01:29:28 Especially like your approach of taking a really hard problem,
01:29:31 stepping right in, and then being
01:29:33 super skeptical about being able to solve the problem.
01:29:37 I mean, there’s a balance of both, right?
01:29:40 There’s a silly optimism and a critical sort of skepticism
01:29:46 that’s good to balance, which is why
01:29:49 it’s good to have a team of people that balance that.
01:29:52 You don’t do that on your own.
01:29:53 You have both mentors that have seen,
01:29:56 or you obviously want to chat and discuss
01:29:59 whether it’s the right time.
01:30:00 I mean, Demis came in 2014.
01:30:03 And he said, maybe in a bit we’ll do StarCraft.
01:30:06 And maybe he knew.
01:30:08 And I’m just following his lead, which is great,
01:30:11 because he’s brilliant, right?
01:30:12 So these things are obviously quite important,
01:30:17 that you want to be surrounded by people who are diverse.
01:30:22 They have their knowledge.
01:30:23 There’s also important to, I mean,
01:30:26 I’ve learned a lot from people who actually have an idea
01:30:30 that I might not think it’s good.
01:30:32 But if I give them the space to try it,
01:30:34 I’ve been proven wrong many, many times as well.
01:30:37 So that’s great.
01:30:38 I think your colleagues are more important than yourself,
01:30:42 I think.
01:30:43 Sure.
01:30:44 Now let’s real quick talk about another impossible problem,
01:30:48 AGI.
01:30:49 Right.
01:30:50 What do you think it takes to build a system that’s
01:30:52 human level intelligence?
01:30:54 We talked a little bit about the Turing test, StarCraft.
01:30:56 All of these have echoes of general intelligence.
01:30:58 But if you think about just something
01:31:01 that you would sit back and say, wow,
01:31:03 this is really something that resembles
01:31:06 human level intelligence.
01:31:07 What do you think it takes to build that?
01:31:09 So I find that AGI oftentimes is maybe not very well defined.
01:31:17 So what I’m trying to then come up with for myself
01:31:20 is what would be a result look like that you would start
01:31:25 to believe that you would have agents or neural nets that
01:31:28 no longer overfeed to a single task,
01:31:31 but actually learn the skill of learning, so to speak.
01:31:37 And that actually is a field that I
01:31:40 am fascinated by, which is the learning to learn,
01:31:43 or meta learning, which is about no longer learning
01:31:47 about a single domain.
01:31:48 So you can think about the learning algorithm
01:31:51 itself is general.
01:31:52 So the same formula we applied for AlphaStar or StarCraft,
01:31:56 we can now apply to almost any video game,
01:31:59 or you could apply to many other problems and domains.
01:32:03 But the algorithm is what’s generalizing.
01:32:07 But the neural network, those weights
01:32:09 are useless even to play another race.
01:32:12 I train a network to play very well at Protos versus Protos.
01:32:15 I need to throw away those weights.
01:32:17 If I want to play now Terran versus Terran,
01:32:20 I would need to retrain a network from scratch
01:32:23 with the same algorithm.
01:32:24 That’s beautiful.
01:32:26 But the network itself will not be useful.
01:32:28 So I think if I see an approach that
01:32:32 can absorb or start solving new problems without the need
01:32:38 to kind of restart the process, I
01:32:40 think that, to me, would be a nice way
01:32:42 to define some form of AGI.
01:32:45 Again, I don’t know the grandiose like age.
01:32:48 I mean, should Turing tests be solved before AGI?
01:32:50 I mean, I don’t know.
01:32:51 I think concretely, I would like to see clearly
01:32:54 that meta learning happen, meaning
01:32:57 that there is an architecture or a network that
01:33:01 as it sees new problem or new data, it solves it.
01:33:04 And to make it kind of a benchmark,
01:33:08 it should solve it at the same speed
01:33:09 that we do solve new problems.
01:33:11 When I define you a new object and you
01:33:13 have to recognize it, when you start playing a new game,
01:33:16 you played all the Atari games.
01:33:17 But now you play a new Atari game.
01:33:19 Well, you’re going to be pretty quickly pretty good
01:33:22 at the game.
01:33:22 So that’s perhaps what’s the domain
01:33:25 and what’s the exact benchmark is a bit difficult.
01:33:28 I think as a community, we might need
01:33:29 to do some work to define it.
01:33:32 But I think this first step, I could
01:33:34 see it happen relatively soon.
01:33:36 But then the whole what AGI means and so on,
01:33:40 I am a bit more confused about what
01:33:43 I think people mean different things.
01:33:44 There’s an emotional, psychological level
01:33:48 that like even the Turing test, passing the Turing test
01:33:53 is something that we just pass judgment on as human beings
01:33:55 what it means to be as a dog in AGI system.
01:34:03 Yeah.
01:34:04 What level, what does it mean, what does it mean?
01:34:07 But I like the generalization.
01:34:08 And maybe as a community, we converge
01:34:10 towards a group of domains that are sufficiently far away.
01:34:14 That would be really damn impressive
01:34:16 if it was able to generalize.
01:34:18 So perhaps not as close as Protoss and Zerg,
01:34:21 but like Wikipedia.
01:34:22 That would be a step.
01:34:23 Yeah, that would be a good step and then a really good step.
01:34:26 But then like from StarCraft to Wikipedia and back.
01:34:30 Yeah, that kind of thing.
01:34:31 And that feels also quite hard and far.
01:34:34 But I think as long as you put the benchmark out,
01:34:38 as we discovered, for instance, with ImageNet,
01:34:41 then tremendous progress can be had.
01:34:43 So I think maybe there’s a lack of benchmark,
01:34:46 but I’m sure we’ll find one and the community will then
01:34:49 work towards that.
01:34:52 And then beyond what AGI might mean or would imply,
01:34:56 I really am hopeful to see basically machine learning
01:35:01 or AI just scaling up and helping people
01:35:05 that might not have the resources to hire an assistant
01:35:08 or that they might not even know what the weather is like.
01:35:13 So I think in terms of the positive impact of AI,
01:35:18 I think that’s maybe what we should also not lose focus.
01:35:22 The research community building AGI,
01:35:23 I mean, that’s a real nice goal.
01:35:25 But I think the way that DeepMind puts it is,
01:35:28 and then use it to solve everything else.
01:35:30 So I think we should paralyze.
01:35:33 Yeah, we shouldn’t forget about all the positive things
01:35:36 that are actually coming out of AI already
01:35:38 and are going to be coming out.
01:35:40 Right.
01:35:41 But on that note, let me ask relative
01:35:45 to popular perception, do you have
01:35:47 any worry about the existential threat
01:35:49 of artificial intelligence in the near or far future
01:35:53 that some people have?
01:35:55 I think in the near future, I’m skeptical.
01:35:58 So I hope I’m not wrong.
01:35:59 But I’m not concerned, but I appreciate efforts,
01:36:04 ongoing efforts, and even like whole research
01:36:07 field on AI safety emerging and in conferences and so on.
01:36:10 I think that’s great.
01:36:12 In the long term, I really hope we just
01:36:16 can simply have the benefits outweigh
01:36:19 the potential dangers.
01:36:20 I am hopeful for that.
01:36:23 But also, we must remain vigilant to monitor and assess
01:36:27 whether the tradeoffs are there and we have enough also lead
01:36:32 time to prevent or to redirect our efforts if need be.
01:36:37 But I’m quite optimistic about the technology
01:36:41 and definitely more fearful of other threats
01:36:45 in terms of planetary level at this point.
01:36:48 But obviously, that’s the one I have more power on.
01:36:52 So clearly, I do start thinking more and more about this.
01:36:56 And it’s grown in me actually to start reading more
01:37:00 about AI safety, which is a field that so far I have not
01:37:04 really contributed to.
01:37:05 But maybe there’s something to be done there as well.
01:37:07 I think it’s really important.
01:37:09 I talk about this with a few folks.
01:37:11 But it’s important to ask you and shove it in your head
01:37:14 because you’re at the leading edge of actually what
01:37:18 people are excited about in AI.
01:37:19 The work with AlphaStar, it’s arguably
01:37:22 at the very cutting edge of the kind of thing
01:37:25 that people are afraid of.
01:37:27 And so you speaking to that fact and that we’re actually
01:37:31 quite far away to the kind of thing
01:37:33 that people might be afraid of.
01:37:35 But it’s still worthwhile to think about.
01:37:38 And it’s also good that you’re not as worried
01:37:43 and you’re also open to thinking about it.
01:37:45 There’s two aspects.
01:37:46 I mean, me not being worried.
01:37:47 But obviously, we should prepare for things
01:37:53 that could go wrong, misuse of the technologies
01:37:56 as with any technologies.
01:37:58 So I think there’s always trade offs.
01:38:02 And as a society, we’ve kind of solved this to some extent
01:38:06 in the past.
01:38:07 So I’m hoping that by having the researchers
01:38:10 and the whole community brainstorm and come up
01:38:14 with interesting solutions to the new things that
01:38:16 will happen in the future, that we can still also push
01:38:20 the research to the avenue that I think
01:38:23 is kind of the greatest avenue, which is
01:38:25 to understand intelligence.
01:38:27 How are we doing what we’re doing?
01:38:29 And obviously, from a scientific standpoint,
01:38:32 that is kind of my personal drive of all the time
01:38:37 that I spend doing what I’m doing, really.
01:38:40 Where do you see the deep learning as a field heading?
01:38:42 Where do you think the next big breakthrough might be?
01:38:46 So I think deep learning, I discussed a little of this
01:38:49 before.
01:38:50 Deep learning has to be combined with some form
01:38:54 of discretization, program synthesis.
01:38:56 I think that’s kind of as a research in itself
01:38:59 is an interesting topic to expand and start
01:39:02 doing more research.
01:39:04 And then as kind of what will deep learning
01:39:07 enable to do in the future?
01:39:08 I don’t think that’s going to be what’s going to happen this year.
01:39:11 But also this idea of starting not to throw away all the weights,
01:39:16 that this idea of learning to learn
01:39:18 and really having these agents not having
01:39:23 to restart their weights.
01:39:24 And you can have an agent that is kind of solving or classifying
01:39:29 images on ImageNet, but also generating speech
01:39:32 if you ask it to generate some speech.
01:39:34 And it should really be kind of almost the same network,
01:39:39 but it might not be a neural network.
01:39:41 It might be a neural network with an optimization
01:39:44 algorithm attached to it.
01:39:45 But I think this idea of generalization to new task
01:39:49 is something that we first must define good benchmarks.
01:39:52 But then I think that’s going to be exciting.
01:39:54 And I’m not sure how close we are.
01:39:56 But I think if you have a very limited domain,
01:40:00 I think we can start doing some progress.
01:40:02 And much like how we did a lot of programs in computer vision,
01:40:07 we should start thinking.
01:40:09 I really like a talk that Leon Buto gave at ICML
01:40:12 a few years ago, which is this train test paradigm should
01:40:16 be broken.
01:40:17 We should stop thinking about a training set and a test set.
01:40:23 And these are closed things that are untouchable.
01:40:26 I think we should go beyond these.
01:40:28 And in meta learning, we call these the meta training
01:40:30 set and the meta test set, which is really thinking about,
01:40:35 if I know about ImageNet, why would that network not
01:40:39 work on MNIST, which is a much simpler problem?
01:40:41 But right now, it really doesn’t.
01:40:44 But it just feels wrong.
01:40:46 So I think that’s kind of the, on the application
01:40:50 or the benchmark sites, we probably
01:40:52 will see quite a few more interest and progress
01:40:56 and hopefully people defining new and exciting challenges
01:41:00 really.
01:41:00 Do you have any hope or interest in knowledge graphs
01:41:04 within this context?
01:41:05 So this kind of constructing graph.
01:41:08 So going back to graphs.
01:41:10 Well, neural networks and graphs.
01:41:12 But I mean, a different kind of knowledge graph,
01:41:14 sort of like semantic graphs or those concepts.
01:41:18 Yeah.
01:41:18 So I think the idea of graphs is,
01:41:23 so I’ve been quite interested in sequences first and then
01:41:26 more interesting or different data structures like graphs.
01:41:29 And I’ve studied graph neural networks in the last three
01:41:33 years or so.
01:41:34 I found these models just very interesting
01:41:37 from deep learning sites standpoint.
01:41:42 But then why do we want these models
01:41:45 and why would we use them?
01:41:47 What’s the application?
01:41:48 What’s kind of the killer application of graphs?
01:41:51 And perhaps if we could extract a knowledge graph
01:41:58 from Wikipedia automatically, that
01:42:01 would be interesting because then these graphs have
01:42:04 this very interesting structure that also is a bit more
01:42:07 compatible with this idea of programs and deep learning
01:42:11 kind of working together, jumping neighborhoods
01:42:14 and so on.
01:42:14 You could imagine defining some primitives
01:42:17 to go around graphs, right?
01:42:18 So I think I really like the idea of a knowledge graph.
01:42:23 And in fact, when we started or as part of the research
01:42:29 we did for StarCraft, I thought, wouldn’t it
01:42:31 be cool to give the graph of all these buildings that
01:42:38 depend on each other and units that have prerequisites
01:42:41 of being built by that.
01:42:42 And so this is information that the network
01:42:45 can learn and extract.
01:42:46 But it would have been great to see
01:42:50 or to think of really StarCraft as a giant graph that even
01:42:53 also as the game evolves, you start taking branches
01:42:57 and so on.
01:42:57 And we did a bit of research on these,
01:42:59 nothing too relevant, but I really like the idea.
01:43:04 And it has elements that are something
01:43:06 you also worked with in terms of visualizing your networks.
01:43:08 It has elements of having human interpretable,
01:43:13 being able to generate knowledge representations that
01:43:15 are human interpretable that maybe human experts can then
01:43:18 tweak or at least understand.
01:43:20 So there’s a lot of interesting aspect there.
01:43:22 And for me personally, I’m just a huge fan of Wikipedia.
01:43:25 And it’s a shame that our neural networks aren’t
01:43:29 taking advantage of all the structured knowledge that’s
01:43:31 on the web.
01:43:32 What’s next for you?
01:43:34 What’s next for DeepMind?
01:43:36 What are you excited about for AlphaStar?
01:43:39 Yeah, so I think the obvious next steps
01:43:43 would be to apply AlphaStar to other races.
01:43:48 I mean, that sort of shows that the algorithm works
01:43:51 because we wouldn’t want to have created by mistake something
01:43:56 in the architecture that happens to work for Protoss
01:43:58 but not for other races.
01:44:00 So as verification, I think that’s an obvious next step
01:44:03 that we are working on.
01:44:05 And then I would like to see so agents and players can
01:44:11 specialize on different skill sets that
01:44:13 allow them to be very good.
01:44:15 I think we’ve seen AlphaStar understanding very well
01:44:19 when to take battles and when to not to do that.
01:44:22 Also very good at micromanagement
01:44:24 and moving the units around and so on.
01:44:27 And also very good at producing nonstop and trading off
01:44:30 economy with building units.
01:44:33 But I have not perhaps seen as much
01:44:36 as I would like this idea of the poker idea
01:44:39 that you mentioned, right?
01:44:40 I’m not sure StarCraft or AlphaStar
01:44:42 rather has developed a very deep understanding of what
01:44:47 the opponent is doing and reacting to that
01:44:50 and sort of trying to trick the player to do something else
01:44:54 or that.
01:44:55 So this kind of reasoning, I would like to see more.
01:44:58 So I think purely from a research standpoint,
01:45:01 there’s perhaps also quite a few things
01:45:03 to be done there in the domain of StarCraft.
01:45:06 Yeah, in the domain of games, I’ve
01:45:08 seen some interesting work in even auctions,
01:45:11 manipulating other players, sort of forming a belief state
01:45:15 and just messing with people.
01:45:17 Yeah, it’s called theory of mind, I guess.
01:45:18 Theory of mind, yeah.
01:45:20 So it’s a fascinating.
01:45:21 Theory of mind on StarCraft is kind of they’re
01:45:24 really made for each other.
01:45:26 So that would be very exciting to see those techniques apply
01:45:29 to StarCraft or perhaps StarCraft
01:45:32 driving new techniques, right?
01:45:33 As I said, this is always the tension between the two.
01:45:36 Well, Orel, thank you so much for talking today.
01:45:38 Awesome.
01:45:39 It was great to be here.
01:45:40 Thanks.