Tuomas Sandholm: Poker and Game Theory #12

Transcript

00:00:00 The following is a conversation with Thomas Sanholm.

00:00:03 He’s a professor at CMU and co creator of Labratus,

00:00:06 which is the first AI system to beat top human players

00:00:09 in the game of Heads Up No Limit Texas Holdem.

00:00:13 He has published over 450 papers

00:00:15 on game theory and machine learning,

00:00:17 including a best paper in 2017 at NIPS,

00:00:21 now renamed to Newrips,

00:00:23 which is where I caught up with him for this conversation.

00:00:27 His research and companies have had wide reaching impact

00:00:30 in the real world,

00:00:32 especially because he and his group

00:00:34 not only propose new ideas,

00:00:36 but also build systems to prove that these ideas work

00:00:40 in the real world.

00:00:42 This conversation is part of the MIT course

00:00:44 on artificial general intelligence

00:00:46 and the artificial intelligence podcast.

00:00:49 If you enjoy it, subscribe on YouTube, iTunes,

00:00:52 or simply connect with me on Twitter

00:00:54 at Lex Friedman, spelled F R I D.

00:00:58 And now here’s my conversation with Thomas Sanholm.

00:01:03 Can you describe at the high level

00:01:06 the game of poker, Texas Holdem, Heads Up Texas Holdem

00:01:09 for people who might not be familiar with this card game?

00:01:13 Yeah, happy to.

00:01:14 So Heads Up No Limit Texas Holdem

00:01:16 has really emerged in the AI community

00:01:18 as a main benchmark for testing these

00:01:21 application independent algorithms

00:01:23 for imperfect information game solving.

00:01:26 And this is a game that’s actually played by humans.

00:01:30 You don’t see that much on TV or casinos

00:01:33 because well, for various reasons,

00:01:36 but you do see it in some expert level casinos

00:01:40 and you see it in the best poker movies of all time.

00:01:43 It’s actually an event in the World Series of Poker,

00:01:45 but mostly it’s played online

00:01:48 and typically for pretty big sums of money.

00:01:50 And this is a game that usually only experts play.

00:01:54 So if you go to your home game on a Friday night,

00:01:58 it probably is not gonna be Heads Up No Limit Texas Holdem.

00:02:01 It might be No Limit Texas Holdem in some cases,

00:02:04 but typically for a big group and it’s not as competitive.

00:02:08 While Heads Up means it’s two players.

00:02:10 So it’s really like me against you.

00:02:13 Am I better or are you better?

00:02:14 Much like chess or go in that sense,

00:02:17 but an imperfect information game,

00:02:19 which makes it much harder because I have to deal

00:02:21 with issues of you knowing things that I don’t know

00:02:25 and I know things that you don’t know

00:02:27 instead of pieces being nicely laid on the board

00:02:29 for both of us to see.

00:02:31 So in Texas Holdem, there’s two cards

00:02:34 that you only see that belong to you.

00:02:37 Yeah. And there is,

00:02:38 they gradually lay out some cards

00:02:40 that add up overall to five cards that everybody can see.

00:02:44 Yeah. So the imperfect nature

00:02:45 of the information is the two cards

00:02:47 that you’re holding in your hand.

00:02:48 Up front, yeah.

00:02:49 So as you said, you first get two cards

00:02:51 in private each and then there’s a betting round.

00:02:55 Then you get three cards in public on the table.

00:02:58 Then there’s a betting round.

00:02:59 Then you get the fourth card in public on the table.

00:03:01 There’s a betting round.

00:03:02 Then you get the 5th card on the table.

00:03:04 There’s a betting round.

00:03:05 So there’s a total of four betting rounds

00:03:07 and four tranches of information revelation if you will.

00:03:11 The only the first tranche is private

00:03:14 and then it’s public from there.

00:03:16 And this is probably by far the most popular game in AI

00:03:24 and just the general public

00:03:26 in terms of imperfect information.

00:03:28 So that’s probably the most popular spectator game

00:03:32 to watch, right?

00:03:33 So, which is why it’s a super exciting game to tackle.

00:03:37 So it’s on the order of chess, I would say,

00:03:40 in terms of popularity, in terms of AI setting it

00:03:43 as the bar of what is intelligence.

00:03:46 So in 2017, Labratus, how do you pronounce it?

00:03:50 Labratus.

00:03:51 Labratus.

00:03:52 Labratus beats.

00:03:52 A little Latin there.

00:03:54 A little bit of Latin.

00:03:55 Labratus beats a few, four expert human players.

00:04:01 Can you describe that event?

00:04:03 What you learned from it?

00:04:04 What was it like?

00:04:04 What was the process in general

00:04:06 for people who have not read the papers and the study?

00:04:09 Yeah, so the event was that we invited

00:04:12 four of the top 10 players,

00:04:14 with these specialist players in Heads Up No Limit,

00:04:17 Texas Holden, which is very important

00:04:19 because this game is actually quite different

00:04:21 than the multiplayer version.

00:04:23 We brought them in to Pittsburgh

00:04:25 to play at the Reverse Casino for 20 days.

00:04:28 We wanted to get 120,000 hands in

00:04:31 because we wanted to get statistical significance.

00:04:36 So it’s a lot of hands for humans to play,

00:04:39 even for these top pros who play fairly quickly normally.

00:04:42 So we couldn’t just have one of them play so many hands.

00:04:46 20 days, they were playing basically morning to evening.

00:04:50 And I raised 200,000 as a little incentive for them to play.

00:04:55 And the setting was so that they didn’t all get 50,000.

00:05:01 We actually paid them out

00:05:02 based on how they did against the AI each.

00:05:05 So they had an incentive to play as hard as they could,

00:05:09 whether they’re way ahead or way behind

00:05:11 or right at the mark of beating the AI.

00:05:13 And you don’t make any money, unfortunately.

00:05:16 Right, no, we can’t make any money.

00:05:17 So originally, a couple of years earlier,

00:05:20 I actually explored whether we could actually play for money

00:05:24 because that would be, of course, interesting as well,

00:05:28 to play against the top people for money.

00:05:29 But the Pennsylvania Gaming Board said no, so we couldn’t.

00:05:33 So this is much like an exhibit,

00:05:36 like for a musician or a boxer or something like that.

00:05:39 Nevertheless, they were keeping track of the money

00:05:41 and brought us close to $2 million, I think.

00:05:48 So if it was for real money, if you were able to earn money,

00:05:51 that was a quite impressive and inspiring achievement.

00:05:55 Just a few details, what were the players looking at?

00:05:59 Were they behind a computer?

00:06:00 What was the interface like?

00:06:02 Yes, they were playing much like they normally do.

00:06:05 These top players, when they play this game,

00:06:07 they play mostly online.

00:06:08 So they’re used to playing through a UI.

00:06:11 And they did the same thing here.

00:06:13 So there was this layout.

00:06:14 You could imagine there’s a table on a screen.

00:06:17 There’s the human sitting there,

00:06:20 and then there’s the AI sitting there.

00:06:21 And the screen shows everything that’s happening.

00:06:24 The cards coming out and shows the bets being made.

00:06:27 And we also had the betting history for the human.

00:06:29 So if the human forgot what had happened in the hand so far,

00:06:33 they could actually reference back and so forth.

00:06:37 Is there a reason they were given access

00:06:39 to the betting history for?

00:06:41 Well, we just, it didn’t really matter.

00:06:45 They wouldn’t have forgotten anyway.

00:06:47 These are top quality people.

00:06:48 But we just wanted to put out there

00:06:51 so it’s not a question of the human forgetting

00:06:53 and the AI somehow trying to get advantage

00:06:55 of better memory.

00:06:56 So what was that like?

00:06:57 I mean, that was an incredible accomplishment.

00:06:59 So what did it feel like before the event?

00:07:02 Did you have doubt, hope?

00:07:05 Where was your confidence at?

00:07:08 Yeah, that’s great.

00:07:09 So great question.

00:07:10 So 18 months earlier, I had organized a similar brains

00:07:14 versus AI competition with a previous AI called Cloudyco

00:07:17 and we couldn’t beat the humans.

00:07:20 So this time around, it was only 18 months later.

00:07:23 And I knew that this new AI, Libratus, was way stronger,

00:07:27 but it’s hard to say how you’ll do against the top humans

00:07:31 before you try.

00:07:32 So I thought we had about a 50, 50 shot.

00:07:35 And the international betting sites put us

00:07:38 as a four to one or five to one underdog.

00:07:41 So it’s kind of interesting that people really believe

00:07:44 in people and over AI, not just people.

00:07:48 People don’t just over believe in themselves,

00:07:50 but they have overconfidence in other people as well

00:07:53 compared to the performance of AI.

00:07:55 And yeah, so we were a four to one or five to one underdog.

00:07:59 And even after three days of beating the humans in a row,

00:08:02 we were still 50, 50 on the international betting sites.

00:08:06 Do you think there’s something special and magical

00:08:09 about poker and the way people think about it,

00:08:12 in the sense you have,

00:08:14 I mean, even in chess, there’s no Hollywood movies.

00:08:17 Poker is the star of many movies.

00:08:21 And there’s this feeling that certain human facial

00:08:26 expressions and body language, eye movement,

00:08:30 all these tells are critical to poker.

00:08:33 Like you can look into somebody’s soul

00:08:35 and understand their betting strategy and so on.

00:08:37 So that’s probably why, possibly,

00:08:41 do you think that is why people have a confidence

00:08:43 that humans will outperform?

00:08:45 Because AI systems cannot, in this construct,

00:08:48 perceive these kinds of tells.

00:08:51 They’re only looking at betting patterns

00:08:53 and nothing else, betting patterns and statistics.

00:08:58 So what’s more important to you

00:09:02 if you step back on human players, human versus human?

00:09:06 What’s the role of these tells,

00:09:08 of these ideas that we romanticize?

00:09:11 Yeah, so I’ll split it into two parts.

00:09:15 So one is why do humans trust humans more than AI

00:09:20 and have overconfidence in humans?

00:09:22 I think that’s not really related to the tell question.

00:09:25 It’s just that they’ve seen these top players,

00:09:28 how good they are, and they’re really fantastic.

00:09:31 So it’s just hard to believe that an AI could beat them.

00:09:36 So I think that’s where that comes from.

00:09:37 And that’s actually maybe a more general lesson about AI.

00:09:40 That until you’ve seen it overperform a human,

00:09:43 it’s hard to believe that it could.

00:09:45 But then the tells, a lot of these top players,

00:09:50 they’re so good at hiding tells

00:09:52 that among the top players,

00:09:56 it’s actually not really worth it

00:09:59 for them to invest a lot of effort

00:10:01 trying to find tells in each other

00:10:03 because they’re so good at hiding them.

00:10:05 So yes, at the kind of Friday evening game,

00:10:09 tells are gonna be a huge thing.

00:10:11 You can read other people.

00:10:13 And if you’re a good reader,

00:10:14 you’ll read them like an open book.

00:10:16 But at the top levels of poker now,

00:10:18 the tells become a much smaller and smaller aspect

00:10:21 of the game as you go to the top levels.

00:10:24 The amount of strategies, the amount of possible actions

00:10:28 is very large, 10 to the power of 100 plus.

00:10:35 So there has to be some, I’ve read a few of the papers

00:10:37 related, it has to form some abstractions

00:10:42 of various hands and actions.

00:10:44 So what kind of abstractions are effective

00:10:47 for the game of poker?

00:10:49 Yeah, so you’re exactly right.

00:10:50 So when you go from a game tree that’s 10 to the 161,

00:10:55 especially in an imperfect information game,

00:10:58 it’s way too large to solve directly,

00:11:00 even with our fastest equilibrium finding algorithms.

00:11:03 So you wanna abstract it first.

00:11:07 And abstraction in games is much trickier

00:11:10 than abstraction in MDPs or other single agent settings.

00:11:15 Because you have these abstraction pathologies

00:11:17 that if I have a finer grained abstraction,

00:11:19 the strategy that I can get from that for the real game

00:11:23 might actually be worse than the strategy

00:11:25 I can get from the coarse grained abstraction.

00:11:27 So you have to be very careful.

00:11:28 Now the kinds of abstractions, just to zoom out,

00:11:31 we’re talking about, there’s the hands abstractions

00:11:34 and then there’s betting strategies.

00:11:37 Yeah, betting actions, yeah.

00:11:38 Baiting actions.

00:11:39 So there’s information abstraction,

00:11:41 don’t talk about general games, information abstraction,

00:11:44 which is the abstraction of what chance does.

00:11:47 And this would be the cards in the case of poker.

00:11:50 And then there’s action abstraction,

00:11:52 which is abstracting the actions of the actual players,

00:11:57 which would be bets in the case of poker.

00:11:59 Yourself and the other players?

00:12:01 Yes, yourself and other players.

00:12:03 And for information abstraction,

00:12:08 we were completely automated.

00:12:11 So these are algorithms,

00:12:13 but they do what we call potential aware abstraction,

00:12:16 where we don’t just look at the value of the hand,

00:12:19 but also how it might materialize

00:12:20 into good or bad hands over time.

00:12:22 And it’s a certain kind of bottom up process

00:12:25 with integer programming there and clustering

00:12:27 and various aspects, how do you build this abstraction?

00:12:31 And then in the action abstraction,

00:12:34 there it’s largely based on how humans and other AIs

00:12:40 have played this game in the past.

00:12:42 But in the beginning,

00:12:43 we actually used an automated action abstraction technology,

00:12:47 which is provably convergent

00:12:51 that it finds the optimal combination of bet sizes,

00:12:54 but it’s not very scalable.

00:12:55 So we couldn’t use it for the whole game,

00:12:57 but we use it for the first couple of betting actions.

00:12:59 So what’s more important, the strength of the hand,

00:13:03 so the information abstraction or the how you play them,

00:13:09 the actions, does it, you know,

00:13:11 the romanticized notion again,

00:13:13 is that it doesn’t matter what hands you have,

00:13:15 that the actions, the betting may be the way you win

00:13:19 no matter what hands you have.

00:13:20 Yeah, so that’s why you have to play a lot of hands

00:13:23 so that the role of luck gets smaller.

00:13:26 So you could otherwise get lucky and get some good hands

00:13:29 and then you’re gonna win the match.

00:13:31 Even with thousands of hands, you can get lucky

00:13:35 because there’s so much variance

00:13:36 in No Limit Texas Holden because if we both go all in,

00:13:40 it’s a huge stack of variance, so there are these

00:13:43 massive swings in No Limit Texas Holden.

00:13:47 So that’s why you have to play not just thousands,

00:13:50 but over 100,000 hands to get statistical significance.

00:13:55 So let me ask another way this question.

00:13:57 If you didn’t even look at your hands,

00:14:02 but they didn’t know that, the opponents didn’t know that,

00:14:04 how well would you be able to do?

00:14:06 Oh, that’s a good question.

00:14:07 There’s actually, I heard this story

00:14:09 that there’s this Norwegian female poker player

00:14:11 called Annette Oberstad who’s actually won a tournament

00:14:15 by doing exactly that, but that would be extremely rare.

00:14:18 So you cannot really play well that way.

00:14:23 Okay, so the hands do have some role to play, okay.

00:14:27 So Labradus does not use, as far as I understand,

00:14:33 they use learning methods, deep learning.

00:14:35 Is there room for learning in,

00:14:40 there’s no reason why Labradus doesn’t combine

00:14:44 with an AlphaGo type approach for estimating

00:14:46 the quality for function estimator.

00:14:49 What are your thoughts on this,

00:14:52 maybe as compared to another algorithm

00:14:54 which I’m not that familiar with, DeepStack,

00:14:56 the engine that does use deep learning,

00:14:59 that it’s unclear how well it does,

00:15:01 but nevertheless uses deep learning.

00:15:03 So what are your thoughts about learning methods

00:15:05 to aid in the way that Labradus plays in the game of poker?

00:15:09 Yeah, so as you said,

00:15:10 Labradus did not use learning methods

00:15:13 and played very well without them.

00:15:15 Since then, we have actually, actually here,

00:15:17 we have a couple of papers on things

00:15:20 that do use learning techniques.

00:15:22 Excellent.

00:15:24 And deep learning in particular.

00:15:26 And sort of the way you’re talking about

00:15:29 where it’s learning an evaluation function,

00:15:33 but in imperfect information games,

00:15:37 unlike let’s say in Go or now also in chess and shogi,

00:15:42 it’s not sufficient to learn an evaluation for a state

00:15:47 because the value of an information set

00:15:52 depends not only on the exact state,

00:15:55 but it also depends on both players beliefs.

00:15:59 Like if I have a bad hand,

00:16:01 I’m much better off if the opponent thinks I have a good hand

00:16:04 and vice versa.

00:16:05 If I have a good hand,

00:16:06 I’m much better off if the opponent believes

00:16:09 I have a bad hand.

00:16:11 So the value of a state is not just a function of the cards.

00:16:15 It depends on, if you will, the path of play,

00:16:19 but only to the extent that it’s captured

00:16:22 in the belief distributions.

00:16:23 So that’s why it’s not as simple

00:16:26 as it is in perfect information games.

00:16:29 And I don’t wanna say it’s simple there either.

00:16:31 It’s of course very complicated computationally there too,

00:16:34 but at least conceptually, it’s very straightforward.

00:16:36 There’s a state, there’s an evaluation function.

00:16:38 You can try to learn it.

00:16:39 Here, you have to do something more.

00:16:43 And what we do is in one of these papers,

00:16:47 we’re looking at where we allow the opponent

00:16:50 to actually take different strategies

00:16:53 at the leaf of the search tree, if you will.

00:16:56 And that is a different way of doing it.

00:16:59 And it doesn’t assume therefore a particular way

00:17:02 that the opponent plays,

00:17:04 but it allows the opponent to choose

00:17:05 from a set of different continuation strategies.

00:17:09 And that forces us to not be too optimistic

00:17:13 in a look ahead search.

00:17:15 And that’s one way you can do sound look ahead search

00:17:19 in imperfect information games,

00:17:21 which is very difficult.

00:17:23 And you were asking about DeepStack.

00:17:26 What they did, it was very different than what we do,

00:17:29 either in Libratus or in this new work.

00:17:32 They were randomly generating various situations

00:17:35 in the game.

00:17:36 Then they were doing the look ahead

00:17:38 from there to the end of the game,

00:17:39 as if that was the start of a different game.

00:17:42 And then they were using deep learning

00:17:44 to learn those values of those states,

00:17:47 but the states were not just the physical states.

00:17:50 They include belief distributions.

00:17:52 When you talk about look ahead for DeepStack

00:17:56 or with Libratus, does it mean,

00:17:59 considering every possibility that the game can evolve,

00:18:02 are we talking about extremely,

00:18:04 sort of this exponentially growth of a tree?

00:18:06 Yes, so we’re talking about exactly that.

00:18:11 Much like you do in alpha beta search

00:18:14 or Monte Carlo tree search, but with different techniques.

00:18:17 So there’s a different search algorithm.

00:18:19 And then we have to deal with the leaves differently.

00:18:21 So if you think about what Libratus did,

00:18:24 we didn’t have to worry about this

00:18:25 because we only did it at the end of the game.

00:18:28 So we would always terminate into a real situation

00:18:32 and we would know what the payout is.

00:18:34 It didn’t do these depth limited lookaheads,

00:18:36 but now in this new paper, which is called depth limited,

00:18:40 I think it’s called depth limited search

00:18:42 for imperfect information games,

00:18:43 we can actually do sound depth limited lookahead.

00:18:47 So we can actually start to do the look ahead

00:18:49 from the beginning of the game on,

00:18:51 because that’s too complicated to do

00:18:53 for this whole long game.

00:18:54 So in Libratus, we were just doing it for the end.

00:18:57 So, and then the other side, this belief distribution,

00:19:00 so is it explicitly modeled what kind of beliefs

00:19:05 that the opponent might have?

00:19:07 Yeah, it is explicitly modeled, but it’s not assumed.

00:19:11 The beliefs are actually output, not input.

00:19:15 Of course, the starting beliefs are input,

00:19:18 but they just fall from the rules of the game

00:19:20 because we know that the dealer deals uniformly

00:19:23 from the deck, so I know that every pair of cards

00:19:27 that you might have is equally likely.

00:19:30 I know that for a fact, that just follows

00:19:32 from the rules of the game.

00:19:33 Of course, except the two cards that I have,

00:19:35 I know you don’t have those.

00:19:36 Yeah.

00:19:37 You have to take that into account.

00:19:38 That’s called card removal and that’s very important.

00:19:40 Is the dealing always coming from a single deck

00:19:43 in Heads Up, so you can assume.

00:19:45 Single deck, so you know that if I have the ace of spades,

00:19:50 I know you don’t have an ace of spades.

00:19:53 Great, so in the beginning, your belief is basically

00:19:56 the fact that it’s a fair dealing of hands,

00:19:59 but how do you start to adjust that belief?

00:20:02 Well, that’s where this beauty of game theory comes.

00:20:06 So Nash equilibrium, which John Nash introduced in 1950,

00:20:10 introduces what rational play is

00:20:13 when you have more than one player.

00:20:16 And these are pairs of strategies

00:20:18 where strategies are contingency plans,

00:20:20 one for each player.

00:20:22 So that neither player wants to deviate

00:20:25 to a different strategy,

00:20:26 given that the other doesn’t deviate.

00:20:29 But as a side effect, you get the beliefs from base roll.

00:20:33 So Nash equilibrium really isn’t just deriving

00:20:36 in these imperfect information games,

00:20:38 Nash equilibrium, it doesn’t just define strategies.

00:20:41 It also defines beliefs for both of us

00:20:44 and defines beliefs for each state.

00:20:48 So at each state, it’s called information sets.

00:20:53 At each information set in the game,

00:20:55 there’s a set of different states that we might be in,

00:20:59 but I don’t know which one we’re in.

00:21:01 Nash equilibrium tells me exactly

00:21:03 what is the probability distribution

00:21:05 over those real world states in my mind.

00:21:08 How does Nash equilibrium give you that distribution?

00:21:11 So why?

00:21:12 I’ll do a simple example.

00:21:13 So you know the game Rock, Paper, Scissors?

00:21:16 So we can draw it as player one moves first

00:21:20 and then player two moves.

00:21:21 But of course, it’s important that player two

00:21:24 doesn’t know what player one moved,

00:21:26 otherwise player two would win every time.

00:21:28 So we can draw that as an information set

00:21:30 where player one makes one of three moves first,

00:21:33 and then there’s an information set for player two.

00:21:36 So player two doesn’t know which of those nodes

00:21:39 the world is in.

00:21:41 But once we know the strategy for player one,

00:21:44 Nash equilibrium will say that you play 1 3rd Rock,

00:21:47 1 3rd Paper, 1 3rd Scissors.

00:21:49 From that, I can derive my beliefs on the information set

00:21:52 that they’re 1 3rd, 1 3rd, 1 3rd.

00:21:54 So Bayes gives you that.

00:21:56 Bayes gives you.

00:21:57 But is that specific to a particular player,

00:21:59 or is it something you quickly update

00:22:03 with the specific player?

00:22:05 No, the game theory isn’t really player specific.

00:22:08 So that’s also why we don’t need any data.

00:22:11 We don’t need any history

00:22:12 how these particular humans played in the past

00:22:14 or how any AI or human had played before.

00:22:17 It’s all about rationality.

00:22:20 So the AI just thinks about

00:22:22 what would a rational opponent do?

00:22:24 And what would I do if I am rational?

00:22:28 And that’s the idea of game theory.

00:22:31 So it’s really a data free, opponent free approach.

00:22:35 So it comes from the design of the game

00:22:37 as opposed to the design of the player.

00:22:40 Exactly, there’s no opponent modeling per se.

00:22:43 I mean, we’ve done some work on combining opponent modeling

00:22:45 with game theory so you can exploit weak players even more,

00:22:48 but that’s another strand.

00:22:50 And in Librarus, we didn’t turn that on.

00:22:52 So I decided that these players are too good.

00:22:55 And when you start to exploit an opponent,

00:22:58 you typically open yourself up to exploitation.

00:23:01 And these guys have so few holes to exploit

00:23:04 and they’re world’s leading experts in counter exploitation.

00:23:06 So I decided that we’re not gonna turn that stuff on.

00:23:09 Actually, I saw a few of your papers exploiting opponents.

00:23:12 It sounded very interesting to explore.

00:23:15 Do you think there’s room for exploitation

00:23:17 generally outside of Librarus?

00:23:19 Is there a subject or people differences

00:23:24 that could be exploited, maybe not just in poker,

00:23:27 but in general interactions and negotiations,

00:23:30 all these other domains that you’re considering?

00:23:33 Yeah, definitely.

00:23:34 We’ve done some work on that.

00:23:35 And I really like the work at hybrid digested too.

00:23:39 So you figure out what would a rational opponent do.

00:23:43 And by the way, that’s safe in these zero sum games,

00:23:46 two player zero sum games,

00:23:47 because if the opponent does something irrational,

00:23:49 yes, it might throw off my beliefs,

00:23:53 but the amount that the player can gain

00:23:55 by throwing off my belief is always less

00:23:59 than they lose by playing poorly.

00:24:01 So it’s safe.

00:24:03 But still, if somebody’s weak as a player,

00:24:06 you might wanna play differently to exploit them more.

00:24:10 So you can think about it this way,

00:24:12 a game theoretic strategy is unbeatable,

00:24:15 but it doesn’t maximally beat the other opponent.

00:24:19 So the winnings per hand might be better

00:24:22 with a different strategy.

00:24:24 And the hybrid is that you start

00:24:25 from a game theoretic approach.

00:24:27 And then as you gain data about the opponent

00:24:30 in certain parts of the game tree,

00:24:32 then in those parts of the game tree,

00:24:34 you start to tweak your strategy more and more

00:24:37 towards exploitation while still staying fairly close

00:24:40 to the game theoretic strategy

00:24:42 so as to not open yourself up to exploitation too much.

00:24:46 How do you do that?

00:24:48 Do you try to vary up strategies, make it unpredictable?

00:24:53 It’s like, what is it, tit for tat strategies

00:24:57 in Prisoner’s Dilemma or?

00:25:00 Well, that’s a repeated game.

00:25:03 Repeated games.

00:25:04 Simple Prisoner’s Dilemma, repeated games.

00:25:06 But even there, there’s no proof that says

00:25:08 that that’s the best thing.

00:25:10 But experimentally, it actually does well.

00:25:13 So what kind of games are there, first of all?

00:25:15 I don’t know if this is something

00:25:17 that you could just summarize.

00:25:18 There’s perfect information games

00:25:20 where all the information’s on the table.

00:25:22 There is imperfect information games.

00:25:25 There’s repeated games that you play over and over.

00:25:28 There’s zero sum games.

00:25:31 There’s non zero sum games.

00:25:34 And then there’s a really important distinction

00:25:37 you’re making, two player versus more players.

00:25:40 So what are, what other games are there?

00:25:44 And what’s the difference, for example,

00:25:46 with this two player game versus more players?

00:25:50 What are the key differences in your view?

00:25:51 So let me start from the basics.

00:25:54 So a repeated game is a game where the same exact game

00:25:59 is played over and over.

00:26:01 In these extensive form games, where it’s,

00:26:05 think about three form, maybe with these information sets

00:26:08 to represent incomplete information,

00:26:11 you can have kind of repetitive interactions.

00:26:14 Even repeated games are a special case of that, by the way.

00:26:17 But the game doesn’t have to be exactly the same.

00:26:21 It’s like in sourcing auctions.

00:26:23 Yes, we’re gonna see the same supply base year to year,

00:26:26 but what I’m buying is a little different every time.

00:26:28 And the supply base is a little different every time

00:26:31 and so on.

00:26:31 So it’s not really repeated.

00:26:33 So to find a purely repeated game

00:26:35 is actually very rare in the world.

00:26:37 So they’re really a very course model of what’s going on.

00:26:42 Then if you move up from just repeated,

00:26:46 simple repeated matrix games,

00:26:49 not all the way to extensive form games,

00:26:50 but in between, they’re stochastic games,

00:26:53 where, you know, there’s these,

00:26:57 you think about it like these little matrix games.

00:27:00 And when you take an action and your opponent takes an action,

00:27:04 they determine not which next state I’m going to,

00:27:07 next game I’m going to,

00:27:09 but the distribution over next games

00:27:11 where I might be going to.

00:27:13 So that’s the stochastic game.

00:27:15 But it’s like matrix games, repeated stochastic games,

00:27:19 extensive form games.

00:27:20 That is from less to more general.

00:27:23 And poker is an example of the last one.

00:27:26 So it’s really in the most general setting.

00:27:29 Extensive form games.

00:27:30 And that’s kind of what the AI community has been working on

00:27:34 and being benchmarked on

00:27:36 with this Heads Up No Limit Texas Holdem.

00:27:38 Can you describe extensive form games?

00:27:39 What’s the model here?

00:27:41 Yeah, so if you’re familiar with the tree form,

00:27:44 so it’s really the tree form.

00:27:45 Like in chess, there’s a search tree.

00:27:47 Versus a matrix.

00:27:48 Versus a matrix, yeah.

00:27:50 And the matrix is called the matrix form

00:27:53 or bi matrix form or normal form game.

00:27:55 And here you have the tree form.

00:27:57 So you can actually do certain types of reasoning there

00:28:00 that you lose the information when you go to normal form.

00:28:04 There’s a certain form of equivalence.

00:28:07 Like if you go from tree form and you say it,

00:28:08 every possible contingency plan is a strategy.

00:28:12 Then I can actually go back to the normal form,

00:28:15 but I lose some information from the lack of sequentiality.

00:28:18 Then the multiplayer versus two player distinction

00:28:21 is an important one.

00:28:22 So two player games in zero sum

00:28:27 are conceptually easier and computationally easier.

00:28:32 They’re still huge like this one,

00:28:36 but they’re conceptually easier and computationally easier

00:28:39 in that conceptually, you don’t have to worry about

00:28:42 which equilibrium is the other guy going to play

00:28:45 when there are multiple,

00:28:46 because any equilibrium strategy is a best response

00:28:49 to any other equilibrium strategy.

00:28:52 So I can play a different equilibrium from you

00:28:54 and we’ll still get the right values of the game.

00:28:57 That falls apart even with two players

00:28:59 when you have general sum games.

00:29:01 Even without cooperation just in general.

00:29:03 Even without cooperation.

00:29:04 So there’s a big gap from two player zero sum

00:29:07 to two player general sum or even to three player zero sum.

00:29:11 That’s a big gap, at least in theory.

00:29:14 Can you maybe non mathematically provide the intuition

00:29:18 why it all falls apart with three or more players?

00:29:22 It seems like you should still be able to have

00:29:24 a Nash equilibrium that’s instructive, that holds.

00:29:31 Okay, so it is true that all finite games

00:29:36 have a Nash equilibrium.

00:29:38 So this is what John Nash actually proved.

00:29:41 So they do have a Nash equilibrium.

00:29:42 That’s not the problem.

00:29:43 The problem is that there can be many.

00:29:46 And then there’s a question of which equilibrium to select.

00:29:50 So, and if you select your strategy

00:29:52 from a different equilibrium and I select mine,

00:29:57 then what does that mean?

00:29:59 And in these non zero sum games,

00:30:02 we may lose some joint benefit

00:30:05 by being just simply stupid.

00:30:07 We could actually both be better off

00:30:08 if we did something else.

00:30:09 And in three player, you get other problems

00:30:11 also like collusion.

00:30:13 Like maybe you and I can gang up on a third player

00:30:16 and we can do radically better by colluding.

00:30:19 So there are lots of issues that come up there.

00:30:22 So Noah Brown, the student you work with on this

00:30:25 has mentioned, I looked through the AMA on Reddit.

00:30:29 He mentioned that the ability of poker players

00:30:31 to collaborate will make the game.

00:30:33 He was asked the question of,

00:30:35 how would you make the game of poker,

00:30:37 or both of you were asked the question,

00:30:39 how would you make the game of poker

00:30:41 beyond being solvable by current AI methods?

00:30:47 And he said that there’s not many ways

00:30:50 of making poker more difficult,

00:30:53 but a collaboration or cooperation between players

00:30:57 would make it extremely difficult.

00:30:59 So can you provide the intuition behind why that is,

00:31:03 if you agree with that idea?

00:31:05 Yeah, so I’ve done a lot of work on coalitional games

00:31:10 and we actually have a paper here

00:31:11 with my other student Gabriele Farina

00:31:13 and some other collaborators at NIPS on that.

00:31:16 Actually just came back from the poster session

00:31:18 where we presented this.

00:31:19 But so when you have a collusion, it’s a different problem.

00:31:23 And it typically gets even harder then.

00:31:27 Even the game representations,

00:31:29 some of the game representations don’t really allow

00:31:33 good computation.

00:31:34 So we actually introduced a new game representation

00:31:37 for that.

00:31:38 Is that kind of cooperation part of the model?

00:31:42 Are you, do you have, do you have information

00:31:44 about the fact that other players are cooperating

00:31:47 or is it just this chaos that where nothing is known?

00:31:50 So there’s some things unknown.

00:31:52 Can you give an example of a collusion type game

00:31:55 or is it usually?

00:31:56 So like bridge.

00:31:58 So think about bridge.

00:31:59 It’s like when you and I are on a team,

00:32:02 our payoffs are the same.

00:32:04 The problem is that we can’t talk.

00:32:06 So when I get my cards, I can’t whisper to you

00:32:09 what my cards are.

00:32:10 That would not be allowed.

00:32:12 So we have to somehow coordinate our strategies

00:32:16 ahead of time and only ahead of time.

00:32:19 And then there’s certain signals we can talk about,

00:32:22 but they have to be such that the other team

00:32:25 also understands them.

00:32:26 So that’s an example where the coordination

00:32:30 is already built into the rules of the game.

00:32:33 But in many other situations like auctions

00:32:35 or negotiations or diplomatic relationships, poker,

00:32:40 it’s not really built in, but it still can be very helpful

00:32:44 for the colluders.

00:32:45 I’ve read you write somewhere,

00:32:48 the negotiations you come to the table with prior,

00:32:52 like a strategy that you’re willing to do

00:32:56 and not willing to do those kinds of things.

00:32:58 So how do you start to now moving away from poker,

00:33:01 moving beyond poker into other applications

00:33:04 like negotiations, how do you start applying this

00:33:07 to other domains, even real world domains

00:33:11 that you’ve worked on?

00:33:12 Yeah, I actually have two startup companies

00:33:14 doing exactly that.

00:33:15 One is called Strategic Machine,

00:33:17 and that’s for kind of business applications,

00:33:20 gaming, sports, all sorts of things like that.

00:33:22 Any applications of this to business and to sports

00:33:27 and to gaming, to various types of things

00:33:32 in finance, electricity markets and so on.

00:33:34 And the other is called Strategy Robot,

00:33:36 where we are taking these to military security,

00:33:40 cyber security and intelligence applications.

00:33:43 I think you worked a little bit in,

00:33:48 how do you put it, advertisement,

00:33:51 sort of suggesting ads kind of thing, auction.

00:33:55 That’s another company, optimized markets.

00:33:57 But that’s much more about a combinatorial market

00:34:00 and optimization based technology.

00:34:02 That’s not using these game theoretic reasoning technologies.

00:34:06 I see, okay, so what sort of high level

00:34:11 do you think about our ability to use

00:34:15 game theoretic concepts to model human behavior?

00:34:18 Do you think human behavior is amenable

00:34:21 to this kind of modeling outside of the poker games,

00:34:24 and where have you seen it done successfully in your work?

00:34:27 I’m not sure the goal really is modeling humans.

00:34:33 Like for example, if I’m playing a zero sum game,

00:34:36 I don’t really care that the opponent

00:34:39 is actually following my model of rational behavior,

00:34:42 because if they’re not, that’s even better for me.

00:34:46 Right, so see with the opponents in games,

00:34:51 the prerequisite is that you formalize

00:34:56 the interaction in some way

00:34:57 that can be amenable to analysis.

00:35:01 And you’ve done this amazing work with mechanism design,

00:35:04 designing games that have certain outcomes.

00:35:10 But, so I’ll tell you an example

00:35:12 from my world of autonomous vehicles, right?

00:35:15 We’re studying pedestrians,

00:35:17 and pedestrians and cars negotiate

00:35:20 in this nonverbal communication.

00:35:22 There’s this weird game dance of tension

00:35:25 where pedestrians are basically saying,

00:35:27 I trust that you won’t kill me,

00:35:28 and so as a jaywalker, I will step onto the road

00:35:31 even though I’m breaking the law, and there’s this tension.

00:35:34 And the question is, we really don’t know

00:35:36 how to model that well in trying to model intent.

00:35:40 And so people sometimes bring up ideas

00:35:43 of game theory and so on.

00:35:44 Do you think that aspect of human behavior

00:35:49 can use these kinds of imperfect information approaches,

00:35:53 modeling, how do you start to attack a problem like that

00:35:57 when you don’t even know how to design the game

00:36:00 to describe the situation in order to solve it?

00:36:04 Okay, so I haven’t really thought about jaywalking,

00:36:06 but one thing that I think could be a good application

00:36:10 in autonomous vehicles is the following.

00:36:13 So let’s say that you have fleets of autonomous cars

00:36:16 operating by different companies.

00:36:18 So maybe here’s the Waymo fleet and here’s the Uber fleet.

00:36:22 If you think about the rules of the road,

00:36:24 they define certain legal rules,

00:36:26 but that still leaves a huge strategy space open.

00:36:30 Like as a simple example, when cars merge,

00:36:32 how humans merge, they slow down and look at each other

00:36:36 and try to merge.

00:36:39 Wouldn’t it be better if these situations

00:36:40 would already be prenegotiated

00:36:43 so we can actually merge at full speed

00:36:45 and we know that this is the situation,

00:36:47 this is how we do it, and it’s all gonna be faster.

00:36:50 But there are way too many situations to negotiate manually.

00:36:54 So you could use automated negotiation,

00:36:56 this is the idea at least,

00:36:57 you could use automated negotiation

00:36:59 to negotiate all of these situations

00:37:02 or many of them in advance.

00:37:04 And of course it might be that,

00:37:05 hey, maybe you’re not gonna always let me go first.

00:37:09 Maybe you said, okay, well, in these situations,

00:37:11 I’ll let you go first, but in exchange,

00:37:13 you’re gonna give me too much,

00:37:14 you’re gonna let me go first in this situation.

00:37:17 So it’s this huge combinatorial negotiation.

00:37:20 And do you think there’s room in that example of merging

00:37:24 to model this whole situation

00:37:25 as an imperfect information game

00:37:27 or do you really want to consider it to be a perfect?

00:37:30 No, that’s a good question, yeah.

00:37:32 That’s a good question.

00:37:33 Do you pay the price of assuming

00:37:37 that you don’t know everything?

00:37:39 Yeah, I don’t know.

00:37:40 It’s certainly much easier.

00:37:42 Games with perfect information are much easier.

00:37:45 So if you can’t get away with it, you should.

00:37:49 But if the real situation is of imperfect information,

00:37:52 then you’re gonna have to deal with imperfect information.

00:37:55 Great, so what lessons have you learned

00:37:58 the Annual Computer Poker Competition?

00:38:00 An incredible accomplishment of AI.

00:38:03 You look at the history of Deep Blue, AlphaGo,

00:38:07 these kind of moments when AI stepped up

00:38:10 in an engineering effort and a scientific effort combined

00:38:13 to beat the best of human players.

00:38:16 So what do you take away from this whole experience?

00:38:19 What have you learned about designing AI systems

00:38:22 that play these kinds of games?

00:38:23 And what does that mean for AI in general,

00:38:28 for the future of AI development?

00:38:30 Yeah, so that’s a good question.

00:38:32 So there’s so much to say about it.

00:38:35 I do like this type of performance oriented research.

00:38:39 Although in my group, we go all the way from like idea

00:38:42 to theory, to experiments, to big system building,

00:38:44 to commercialization, so we span that spectrum.

00:38:47 But I think that in a lot of situations in AI,

00:38:51 you really have to build the big systems

00:38:53 and evaluate them at scale

00:38:55 before you know what works and doesn’t.

00:38:57 And we’ve seen that in the computational

00:39:00 game theory community, that there are a lot of techniques

00:39:02 that look good in the small,

00:39:05 but then they cease to look good in the large.

00:39:07 And we’ve also seen that there are a lot of techniques

00:39:10 that look superior in theory.

00:39:13 And I really mean in terms of convergence rates,

00:39:16 like first order methods, better convergence rates,

00:39:18 like the CFR based algorithms,

00:39:20 yet the CFR based algorithms are the fastest in practice.

00:39:24 So it really tells me that you have to test this in reality.

00:39:28 The theory isn’t tight enough, if you will,

00:39:30 to tell you which algorithms are better than the others.

00:39:34 And you have to look at these things in the large,

00:39:38 because any sort of projections you do from the small

00:39:41 can at least in this domain be very misleading.

00:39:43 So that’s kind of from a kind of a science

00:39:46 and engineering perspective, from a personal perspective,

00:39:49 it’s been just a wild experience

00:39:51 in that with the first poker competition,

00:39:54 the first brains versus AI,

00:39:57 man machine poker competition that we organized.

00:39:59 There had been, by the way, for other poker games,

00:40:01 there had been previous competitions,

00:40:03 but this was for Heads Up No Limit, this was the first.

00:40:06 And I probably became the most hated person

00:40:09 in the world of poker.

00:40:10 And I didn’t mean to, I just saw.

00:40:12 Why is that?

00:40:13 For cracking the game, for something.

00:40:15 Yeah, a lot of people felt that it was a real threat

00:40:20 to the whole game, the whole existence of the game.

00:40:22 If AI becomes better than humans,

00:40:26 people would be scared to play poker

00:40:28 because there are these superhuman AIs running around

00:40:30 taking their money and all of that.

00:40:32 So I just, it’s just really aggressive.

00:40:36 The comments were super aggressive.

00:40:37 I got everything just short of death threats.

00:40:40 Do you think the same was true for chess?

00:40:44 Because right now they just completed

00:40:45 the world championships in chess,

00:40:47 and humans just started ignoring the fact

00:40:49 that there’s AI systems now that outperform humans

00:40:52 and they still enjoy the game, it’s still a beautiful game.

00:40:55 That’s what I think.

00:40:56 And I think the same thing happens in poker.

00:40:58 And so I didn’t think of myself

00:41:01 as somebody who was gonna kill the game,

00:41:02 and I don’t think I did.

00:41:03 I’ve really learned to love this game.

00:41:05 I wasn’t a poker player before,

00:41:06 but learned so many nuances about it from these AIs,

00:41:10 and they’ve really changed how the game is played,

00:41:12 by the way.

00:41:13 So they have these very Martian ways of playing poker,

00:41:16 and the top humans are now incorporating

00:41:18 those types of strategies into their own play.

00:41:21 So if anything, to me, our work has made poker

00:41:26 a richer, more interesting game for humans to play,

00:41:29 not something that is gonna steer humans

00:41:32 away from it entirely.

00:41:34 Just a quick comment on something you said,

00:41:35 which is, if I may say so,

00:41:39 in academia is a little bit rare sometimes.

00:41:42 It’s pretty brave to put your ideas to the test

00:41:45 in the way you described,

00:41:47 saying that sometimes good ideas don’t work

00:41:49 when you actually try to apply them at scale.

00:41:52 So where does that come from?

00:41:54 I mean, if you could do advice for people,

00:41:58 what drives you in that sense?

00:42:00 Were you always this way?

00:42:02 I mean, it takes a brave person.

00:42:04 I guess is what I’m saying, to test their ideas

00:42:06 and to see if this thing actually works

00:42:08 against human top human players and so on.

00:42:11 Yeah, I don’t know about brave,

00:42:12 but it takes a lot of work.

00:42:15 It takes a lot of work and a lot of time

00:42:18 to organize, to make something big

00:42:20 and to organize an event and stuff like that.

00:42:22 And what drives you in that effort?

00:42:24 Because you could still, I would argue,

00:42:26 get a best paper award at NIPS as you did in 17

00:42:30 without doing this.

00:42:31 That’s right, yes.

00:42:32 And so in general, I believe it’s very important

00:42:37 to do things in the real world and at scale.

00:42:41 And that’s really where the pudding, if you will,

00:42:46 proof is in the pudding, that’s where it is.

00:42:48 In this particular case,

00:42:50 it was kind of a competition between different groups

00:42:55 and for many years as to who can be the first one

00:42:59 to beat the top humans at Heads Up No Limit, Texas Holdem.

00:43:02 So it became kind of like a competition who can get there.

00:43:09 Yeah, so a little friendly competition

00:43:11 could do wonders for progress.

00:43:14 Yes, absolutely.

00:43:16 So the topic of mechanism design,

00:43:19 which is really interesting, also kind of new to me,

00:43:22 except as an observer of, I don’t know, politics and any,

00:43:25 I’m an observer of mechanisms,

00:43:27 but you write in your paper an automated mechanism design

00:43:31 that I quickly read.

00:43:34 So mechanism design is designing the rules of the game

00:43:37 so you get a certain desirable outcome.

00:43:40 And you have this work on doing so in an automatic fashion

00:43:44 as opposed to fine tuning it.

00:43:46 So what have you learned from those efforts?

00:43:50 If you look, say, I don’t know,

00:43:52 at complexes like our political system,

00:43:56 can we design our political system

00:43:58 to have, in an automated fashion,

00:44:01 to have outcomes that we want?

00:44:03 Can we design something like traffic lights to be smart

00:44:09 where it gets outcomes that we want?

00:44:11 So what are the lessons that you draw from that work?

00:44:14 Yeah, so I still very much believe

00:44:17 in the automated mechanism design direction.

00:44:19 Yes.

00:44:20 But it’s not a panacea.

00:44:23 There are impossibility results in mechanism design

00:44:26 saying that there is no mechanism that accomplishes

00:44:30 objective X in class C.

00:44:33 So it’s not going to,

00:44:36 there’s no way using any mechanism design tools,

00:44:39 manual or automated,

00:44:41 to do certain things in mechanism design.

00:44:42 Can you describe that again?

00:44:43 So meaning it’s impossible to achieve that?

00:44:47 Yeah, yeah.

00:44:48 And it’s unlikely.

00:44:50 Impossible.

00:44:51 Impossible.

00:44:52 So these are not statements about human ingenuity

00:44:55 who might come up with something smart.

00:44:57 These are proofs that if you want to accomplish

00:44:59 properties X in class C,

00:45:02 that is not doable with any mechanism.

00:45:04 The good thing about automated mechanism design

00:45:07 is that we’re not really designing for a class.

00:45:10 We’re designing for specific settings at a time.

00:45:14 So even if there’s an impossibility result

00:45:16 for the whole class,

00:45:18 it just doesn’t mean that all of the cases

00:45:21 in the class are impossible.

00:45:22 It just means that some of the cases are impossible.

00:45:25 So we can actually carve these islands of possibility

00:45:28 within these known impossible classes.

00:45:30 And we’ve actually done that.

00:45:31 So one of the famous results in mechanism design

00:45:35 is the Meyerson Satethweight theorem

00:45:37 by Roger Meyerson and Mark Satethweight from 1983.

00:45:41 It’s an impossibility of efficient trade

00:45:43 under imperfect information.

00:45:45 We show that you can, in many settings,

00:45:48 avoid that and get efficient trade anyway.

00:45:51 Depending on how they design the game, okay.

00:45:54 Depending how you design the game.

00:45:55 And of course, it doesn’t in any way

00:46:00 contradict the impossibility result.

00:46:01 The impossibility result is still there,

00:46:03 but it just finds spots within this impossible class

00:46:08 where in those spots, you don’t have the impossibility.

00:46:12 Sorry if I’m going a bit philosophical,

00:46:14 but what lessons do you draw towards,

00:46:17 like I mentioned, politics or human interaction

00:46:20 and designing mechanisms for outside of just

00:46:24 these kinds of trading or auctioning

00:46:26 or purely formal games or human interaction,

00:46:33 like a political system?

00:46:34 How, do you think it’s applicable to, yeah, politics

00:46:39 or to business, to negotiations, these kinds of things,

00:46:46 designing rules that have certain outcomes?

00:46:49 Yeah, yeah, I do think so.

00:46:51 Have you seen that successfully done?

00:46:54 They haven’t really, oh, you mean mechanism design

00:46:56 or automated mechanism design?

00:46:57 Automated mechanism design.

00:46:59 So mechanism design itself

00:47:01 has had fairly limited success so far.

00:47:06 There are certain cases,

00:47:07 but most of the real world situations

00:47:10 are actually not sound from a mechanism design perspective,

00:47:14 even in those cases where they’ve been designed

00:47:16 by very knowledgeable mechanism design people,

00:47:20 the people are typically just taking some insights

00:47:22 from the theory and applying those insights

00:47:25 into the real world,

00:47:26 rather than applying the mechanisms directly.

00:47:29 So one famous example of is the FCC spectrum auctions.

00:47:33 So I’ve also had a small role in that

00:47:36 and very good economists have been working,

00:47:40 excellent economists have been working on that

00:47:42 with no game theory,

00:47:44 yet the rules that are designed in practice there,

00:47:47 they’re such that bidding truthfully

00:47:49 is not the best strategy.

00:47:51 Usually mechanism design,

00:47:52 we try to make things easy for the participants.

00:47:56 So telling the truth is the best strategy,

00:47:58 but even in those very high stakes auctions

00:48:01 where you have tens of billions of dollars

00:48:03 worth of spectrum being auctioned,

00:48:06 truth telling is not the best strategy.

00:48:08 And by the way,

00:48:10 nobody knows even a single optimal bidding strategy

00:48:12 for those auctions.

00:48:14 What’s the challenge of coming up with an optimal,

00:48:15 because there’s a lot of players and there’s imperfect.

00:48:18 It’s not so much that a lot of players,

00:48:20 but many items for sale,

00:48:22 and these mechanisms are such that even with just two items

00:48:26 or one item, bidding truthfully

00:48:28 wouldn’t be the best strategy.

00:48:31 If you look at the history of AI,

00:48:34 it’s marked by seminal events.

00:48:37 AlphaGo beating a world champion human Go player,

00:48:40 I would put Liberatus winning the Heads Up No Limit Holdem

00:48:43 as one of such event.

00:48:45 Thank you.

00:48:46 And what do you think is the next such event,

00:48:52 whether it’s in your life or in the broadly AI community

00:48:56 that you think might be out there

00:48:59 that would surprise the world?

00:49:01 So that’s a great question,

00:49:02 and I don’t really know the answer.

00:49:04 In terms of game solving,

00:49:07 Heads Up No Limit Texas Holdem

00:49:08 really was the one remaining widely agreed upon benchmark.

00:49:14 So that was the big milestone.

00:49:15 Now, are there other things?

00:49:17 Yeah, certainly there are,

00:49:18 but there’s not one that the community

00:49:21 has kind of focused on.

00:49:22 So what could be other things?

00:49:25 There are groups working on StarCraft.

00:49:27 There are groups working on Dota 2.

00:49:29 These are video games.

00:49:31 Or you could have like Diplomacy or Hanabi,

00:49:36 things like that.

00:49:37 These are like recreational games,

00:49:38 but none of them are really acknowledged

00:49:42 as kind of the main next challenge problem,

00:49:45 like chess or Go or Heads Up No Limit Texas Holdem was.

00:49:50 So I don’t really know in the game solving space

00:49:52 what is or what will be the next benchmark.

00:49:55 I kind of hope that there will be a next benchmark

00:49:57 because really the different groups

00:49:59 working on the same problem

00:50:01 really drove these application independent techniques

00:50:05 forward very quickly over 10 years.

00:50:07 Do you think there’s an open problem

00:50:09 that excites you that you start moving away

00:50:11 from games into real world games,

00:50:15 like say the stock market trading?

00:50:17 Yeah, so that’s kind of how I am.

00:50:19 So I am probably not going to work

00:50:23 as hard on these recreational benchmarks.

00:50:27 I’m doing two startups on game solving technology,

00:50:30 Strategic Machine and Strategy Robot,

00:50:32 and we’re really interested

00:50:34 in pushing this stuff into practice.

00:50:36 What do you think would be really

00:50:43 a powerful result that would be surprising

00:50:45 that would be, if you can say,

00:50:49 I mean, five years, 10 years from now,

00:50:53 something that statistically you would say

00:50:56 is not very likely,

00:50:57 but if there’s a breakthrough, would achieve?

00:51:01 Yeah, so I think that overall,

00:51:03 we’re in a very different situation in game theory

00:51:09 than we are in, let’s say, machine learning.

00:51:11 So in machine learning, it’s a fairly mature technology

00:51:14 and it’s very broadly applied

00:51:16 and proven success in the real world.

00:51:19 In game solving, there are almost no applications yet.

00:51:24 We have just become superhuman,

00:51:26 which machine learning you could argue happened in the 90s,

00:51:29 if not earlier,

00:51:30 and at least on supervised learning,

00:51:32 certain complex supervised learning applications.

00:51:36 Now, I think the next challenge problem,

00:51:39 I know you’re not asking about it this way,

00:51:40 you’re asking about the technology breakthrough,

00:51:42 but I think that big, big breakthrough

00:51:44 is to be able to show that, hey,

00:51:46 maybe most of, let’s say, military planning

00:51:48 or most of business strategy

00:51:50 will actually be done strategically

00:51:52 using computational game theory.

00:51:54 That’s what I would like to see

00:51:55 as the next five or 10 year goal.

00:51:57 Maybe you can explain to me again,

00:51:59 forgive me if this is an obvious question,

00:52:01 but machine learning methods,

00:52:04 neural networks suffer from not being transparent,

00:52:07 not being explainable.

00:52:09 Game theoretic methods, Nash equilibria,

00:52:12 do they generally, when you see the different solutions,

00:52:15 are they, when you talk about military operations,

00:52:19 are they, once you see the strategies,

00:52:21 do they make sense, are they explainable,

00:52:23 or do they suffer from the same problems

00:52:25 as neural networks do?

00:52:27 So that’s a good question.

00:52:28 I would say a little bit yes and no.

00:52:31 And what I mean by that is that

00:52:34 these game theoretic strategies,

00:52:36 let’s say, Nash equilibrium,

00:52:38 it has provable properties.

00:52:40 So it’s unlike, let’s say, deep learning

00:52:42 where you kind of cross your fingers,

00:52:44 hopefully it’ll work.

00:52:45 And then after the fact, when you have the weights,

00:52:47 you’re still crossing your fingers,

00:52:48 and I hope it will work.

00:52:51 Here, you know that the solution quality is there.

00:52:55 There’s provable solution quality guarantees.

00:52:58 Now, that doesn’t necessarily mean

00:53:00 that the strategies are human understandable.

00:53:03 That’s a whole other problem.

00:53:04 So I think that deep learning and computational game theory

00:53:08 are in the same boat in that sense,

00:53:10 that both are difficult to understand.

00:53:13 But at least the game theoretic techniques,

00:53:15 they have these guarantees of solution quality.

00:53:19 So do you see business operations, strategic operations,

00:53:22 or even military in the future being

00:53:26 at least the strong candidates

00:53:28 being proposed by automated systems?

00:53:32 Do you see that?

00:53:34 Yeah, I do, I do.

00:53:35 But that’s more of a belief than a substantiated fact.

00:53:39 Depending on where you land in optimism or pessimism,

00:53:42 that’s a really, to me, that’s an exciting future,

00:53:45 especially if there’s provable things

00:53:48 in terms of optimality.

00:53:50 So looking into the future,

00:53:54 there’s a few folks worried about the,

00:53:58 especially you look at the game of poker,

00:54:01 which is probably one of the last benchmarks

00:54:03 in terms of games being solved.

00:54:05 They worry about the future

00:54:07 and the existential threats of artificial intelligence.

00:54:10 So the negative impact in whatever form on society.

00:54:13 Is that something that concerns you as much,

00:54:17 or are you more optimistic about the positive impacts of AI?

00:54:21 Oh, I am much more optimistic about the positive impacts.

00:54:24 So just in my own work, what we’ve done so far,

00:54:27 we run the nationwide kidney exchange.

00:54:29 Hundreds of people are walking around alive today,

00:54:32 who would it be?

00:54:34 And it’s increased employment.

00:54:36 You have a lot of people now running kidney exchanges

00:54:39 and at the transplant centers,

00:54:42 interacting with the kidney exchange.

00:54:45 You have extra surgeons, nurses, anesthesiologists,

00:54:49 hospitals, all of that.

00:54:51 So employment is increasing from that

00:54:53 and the world is becoming a better place.

00:54:55 Another example is combinatorial sourcing auctions.

00:54:59 We did 800 large scale combinatorial sourcing auctions

00:55:04 from 2001 to 2010 in a previous startup of mine

00:55:08 called CombineNet.

00:55:09 And we increased the supply chain efficiency

00:55:13 on that $60 billion of spend by 12.6%.

00:55:18 So that’s over $6 billion of efficiency improvement

00:55:21 in the world.

00:55:22 And this is not like shifting value

00:55:23 from somebody to somebody else,

00:55:25 just efficiency improvement, like in trucking,

00:55:28 less empty driving, so there’s less waste,

00:55:31 less carbon footprint and so on.

00:55:33 So a huge positive impact in the near term,

00:55:36 but sort of to stay in it for a little longer,

00:55:40 because I think game theory has a role to play here.

00:55:43 Oh, let me actually come back on that as one thing.

00:55:45 I think AI is also going to make the world much safer.

00:55:49 So that’s another aspect that often gets overlooked.

00:55:53 Well, let me ask this question.

00:55:54 Maybe you can speak to the safer.

00:55:56 So I talked to Max Tegmark and Stuart Russell,

00:55:59 who are very concerned about existential threats of AI.

00:56:02 And often the concern is about value misalignment.

00:56:06 So AI systems basically working,

00:56:11 operating towards goals that are not the same

00:56:14 as human civilization, human beings.

00:56:17 So it seems like game theory has a role to play there

00:56:24 to make sure the values are aligned with human beings.

00:56:27 I don’t know if that’s how you think about it.

00:56:29 If not, how do you think AI might help with this problem?

00:56:35 How do you think AI might make the world safer?

00:56:39 Yeah, I think this value misalignment

00:56:43 is a fairly theoretical worry.

00:56:46 And I haven’t really seen it in,

00:56:49 because I do a lot of real applications.

00:56:51 I don’t see it anywhere.

00:56:53 The closest I’ve seen it

00:56:55 was the following type of mental exercise really,

00:56:57 where I had this argument in the late eighties

00:57:00 when we were building

00:57:01 these transportation optimization systems.

00:57:03 And somebody had heard that it’s a good idea

00:57:05 to have high utilization of assets.

00:57:08 So they told me, hey, why don’t you put that as objective?

00:57:11 And we didn’t even put it as an objective

00:57:14 because I just showed him that,

00:57:16 if you had that as your objective,

00:57:18 the solution would be to load your trucks full

00:57:20 and drive in circles.

00:57:21 Nothing would ever get delivered.

00:57:23 You’d have a hundred percent utilization.

00:57:25 So yeah, I know this phenomenon.

00:57:27 I’ve known this for over 30 years,

00:57:29 but I’ve never seen it actually be a problem in reality.

00:57:33 And yes, if you have the wrong objective,

00:57:35 the AI will optimize that to the hilt

00:57:37 and it’s gonna hurt more than some human

00:57:39 who’s kind of trying to solve it in a half baked way

00:57:43 with some human insight too.

00:57:45 But I just haven’t seen that materialize in practice.

00:57:49 There’s this gap that you’ve actually put your finger on

00:57:52 very clearly just now between theory and reality.

00:57:57 That’s very difficult to put into words, I think.

00:57:59 It’s what you can theoretically imagine,

00:58:03 the worst possible case or even, yeah, I mean bad cases.

00:58:08 And what usually happens in reality.

00:58:10 So for example, to me,

00:58:11 maybe it’s something you can comment on having grown up

00:58:15 and I grew up in the Soviet Union.

00:58:19 There’s currently 10,000 nuclear weapons in the world.

00:58:22 And for many decades,

00:58:24 it’s theoretically surprising to me

00:58:28 that the nuclear war is not broken out.

00:58:30 Do you think about this aspect

00:58:33 from a game theoretic perspective in general,

00:58:36 why is that true?

00:58:38 Why in theory you could see

00:58:40 how things would go terribly wrong

00:58:42 and somehow yet they have not?

00:58:44 Yeah, how do you think about it?

00:58:45 So I do think about that a lot.

00:58:47 I think the biggest two threats that we’re facing as mankind,

00:58:50 one is climate change and the other is nuclear war.

00:58:53 So those are my main two worries that I worry about.

00:58:57 And I’ve tried to do something about climate,

00:58:59 thought about trying to do something

00:59:01 for climate change twice.

00:59:02 Actually, for two of my startups,

00:59:05 I’ve actually commissioned studies

00:59:06 of what we could do on those things.

00:59:09 And we didn’t really find a sweet spot,

00:59:11 but I’m still keeping an eye out on that.

00:59:12 If there’s something where we could actually

00:59:15 provide a market solution or optimizations solution

00:59:17 or some other technology solution to problems.

00:59:20 Right now, like for example,

00:59:23 pollution critic markets was what we were looking at then.

00:59:26 And it was much more the lack of political will

00:59:30 by those markets were not so successful

00:59:32 rather than bad market design.

00:59:34 So I could go in and make a better market design,

00:59:37 but that wouldn’t really move the needle

00:59:38 on the world very much if there’s no political will.

00:59:41 And in the US, the market,

00:59:43 at least the Chicago market was just shut down and so on.

00:59:47 So then it doesn’t really help

00:59:48 how great your market design was.

00:59:51 And then the nuclear side, it’s more,

00:59:53 so global warming is a more encroaching problem.

01:00:00 Nuclear weapons have been here.

01:00:03 It’s an obvious problem that’s just been sitting there.

01:00:05 So how do you think about,

01:00:07 what is the mechanism design there

01:00:09 that just made everything seem stable?

01:00:12 And are you still extremely worried?

01:00:14 I am still extremely worried.

01:00:16 So you probably know the simple game theory of mad.

01:00:20 So this was a mutually assured destruction

01:00:23 and it doesn’t require any computation with small matrices.

01:00:27 You can actually convince yourself

01:00:28 that the game is such that nobody wants to initiate.

01:00:31 Yeah, that’s a very coarse grained analysis.

01:00:34 And it really works in a situational way.

01:00:36 You have two superpowers or small number of superpowers.

01:00:40 Now things are very different.

01:00:41 You have a smaller nuke.

01:00:43 So the threshold of initiating is smaller

01:00:47 and you have smaller countries and non nation actors

01:00:51 who may get a nuke and so on.

01:00:53 So I think it’s riskier now than it was maybe ever before.

01:00:58 And what idea, application of AI,

01:01:03 you’ve talked about a little bit,

01:01:04 but what is the most exciting to you right now?

01:01:07 I mean, you’re here at NIPS, NeurIPS.

01:01:10 Now you have a few excellent pieces of work,

01:01:14 but what are you thinking into the future

01:01:16 with several companies you’re doing?

01:01:17 What’s the most exciting thing or one of the exciting things?

01:01:21 The number one thing for me right now

01:01:23 is coming up with these scalable techniques

01:01:26 for game solving and applying them into the real world.

01:01:30 I’m still very interested in market design as well.

01:01:33 And we’re doing that in the optimized markets,

01:01:35 but I’m most interested if number one right now

01:01:37 is strategic machine strategy robot,

01:01:40 getting that technology out there

01:01:41 and seeing as you were in the trenches doing applications,

01:01:45 what needs to be actually filled,

01:01:47 what technology gaps still need to be filled.

01:01:49 So it’s so hard to just put your feet on the table

01:01:52 and imagine what needs to be done.

01:01:53 But when you’re actually doing real applications,

01:01:56 the applications tell you what needs to be done.

01:01:59 And I really enjoy that interaction.

01:02:00 Is it a challenging process to apply

01:02:04 some of the state of the art techniques you’re working on

01:02:07 and having the various players in industry

01:02:14 or the military or people who could really benefit from it

01:02:17 actually use it?

01:02:19 What’s that process like of,

01:02:21 autonomous vehicles work with automotive companies

01:02:23 and they’re in many ways are a little bit old fashioned.

01:02:28 It’s difficult.

01:02:29 They really want to use this technology.

01:02:31 There’s clearly will have a significant benefit,

01:02:34 but the systems aren’t quite in place

01:02:37 to easily have them integrated in terms of data,

01:02:41 in terms of compute, in terms of all these kinds of things.

01:02:43 So is that one of the bigger challenges that you’re facing

01:02:48 and how do you tackle that challenge?

01:02:50 Yeah, I think that’s always a challenge.

01:02:52 That’s kind of slowness and inertia really

01:02:55 of let’s do things the way we’ve always done it.

01:02:57 You just have to find the internal champions

01:03:00 at the customer who understand that,

01:03:02 hey, things can’t be the same way in the future.

01:03:04 Otherwise bad things are going to happen.

01:03:06 And it’s in autonomous vehicles.

01:03:08 It’s actually very interesting

01:03:09 that the car makers are doing that

01:03:11 and they’re very traditional,

01:03:12 but at the same time you have tech companies

01:03:14 who have nothing to do with cars or transportation

01:03:17 like Google and Baidu really pushing on autonomous cars.

01:03:21 I find that fascinating.

01:03:23 Clearly you’re super excited

01:03:25 about actually these ideas having an impact in the world.

01:03:29 In terms of the technology, in terms of ideas and research,

01:03:32 are there directions that you’re also excited about?

01:03:36 Whether that’s on some of the approaches you talked about

01:03:40 for the imperfect information games,

01:03:42 whether it’s applying deep learning

01:03:44 to some of these problems,

01:03:45 is there something that you’re excited

01:03:46 in the research side of things?

01:03:48 Yeah, yeah, lots of different things

01:03:51 in the game solving.

01:03:53 So solving even bigger games,

01:03:56 games where you have more hidden action

01:03:59 of the player actions as well.

01:04:02 Poker is a game where really the chance actions are hidden

01:04:05 or some of them are hidden,

01:04:07 but the player actions are public.

01:04:11 Multiplayer games of various sorts,

01:04:14 collusion, opponent exploitation,

01:04:18 all and even longer games.

01:04:21 So games that basically go forever,

01:04:23 but they’re not repeated.

01:04:24 So see extensive fun games that go forever.

01:04:27 What would that even look like?

01:04:30 How do you represent that?

01:04:31 How do you solve that?

01:04:32 What’s an example of a game like that?

01:04:33 Or is this some of the stochastic games

01:04:35 that you mentioned?

01:04:36 Let’s say business strategy.

01:04:37 So it’s not just modeling like a particular interaction,

01:04:40 but thinking about the business from here to eternity.

01:04:44 Or let’s say military strategy.

01:04:49 So it’s not like war is gonna go away.

01:04:51 How do you think about military strategy

01:04:54 that’s gonna go forever?

01:04:56 How do you even model that?

01:04:58 How do you know whether a move was good

01:05:01 that somebody made and so on?

01:05:05 So that’s kind of one direction.

01:05:06 I’m also very interested in learning

01:05:09 much more scalable techniques for integer programming.

01:05:13 So we had an ICML paper this summer on that.

01:05:16 The first automated algorithm configuration paper

01:05:20 that has theoretical generalization guarantees.

01:05:23 So if I see this many training examples

01:05:26 and I told my algorithm in this way,

01:05:28 it’s going to have good performance

01:05:30 on the real distribution, which I’ve not seen.

01:05:33 So, which is kind of interesting

01:05:34 that algorithm configuration has been going on now

01:05:37 for at least 17 years seriously.

01:05:41 And there has not been any generalization theory before.

01:05:45 Well, this is really exciting

01:05:47 and it’s a huge honor to talk to you.

01:05:49 Thank you so much, Tomas.

01:05:51 Thank you for bringing Labradus to the world

01:05:52 and all the great work you’re doing.

01:05:54 Well, thank you very much.

01:05:55 It’s been fun.

01:05:55 No more questions.