Transcript
00:00:00 The following is a conversation with Thomas Sanholm.
00:00:03 He’s a professor at CMU and co creator of Labratus,
00:00:06 which is the first AI system to beat top human players
00:00:09 in the game of Heads Up No Limit Texas Holdem.
00:00:13 He has published over 450 papers
00:00:15 on game theory and machine learning,
00:00:17 including a best paper in 2017 at NIPS,
00:00:21 now renamed to Newrips,
00:00:23 which is where I caught up with him for this conversation.
00:00:27 His research and companies have had wide reaching impact
00:00:30 in the real world,
00:00:32 especially because he and his group
00:00:34 not only propose new ideas,
00:00:36 but also build systems to prove that these ideas work
00:00:40 in the real world.
00:00:42 This conversation is part of the MIT course
00:00:44 on artificial general intelligence
00:00:46 and the artificial intelligence podcast.
00:00:49 If you enjoy it, subscribe on YouTube, iTunes,
00:00:52 or simply connect with me on Twitter
00:00:54 at Lex Friedman, spelled F R I D.
00:00:58 And now here’s my conversation with Thomas Sanholm.
00:01:03 Can you describe at the high level
00:01:06 the game of poker, Texas Holdem, Heads Up Texas Holdem
00:01:09 for people who might not be familiar with this card game?
00:01:13 Yeah, happy to.
00:01:14 So Heads Up No Limit Texas Holdem
00:01:16 has really emerged in the AI community
00:01:18 as a main benchmark for testing these
00:01:21 application independent algorithms
00:01:23 for imperfect information game solving.
00:01:26 And this is a game that’s actually played by humans.
00:01:30 You don’t see that much on TV or casinos
00:01:33 because well, for various reasons,
00:01:36 but you do see it in some expert level casinos
00:01:40 and you see it in the best poker movies of all time.
00:01:43 It’s actually an event in the World Series of Poker,
00:01:45 but mostly it’s played online
00:01:48 and typically for pretty big sums of money.
00:01:50 And this is a game that usually only experts play.
00:01:54 So if you go to your home game on a Friday night,
00:01:58 it probably is not gonna be Heads Up No Limit Texas Holdem.
00:02:01 It might be No Limit Texas Holdem in some cases,
00:02:04 but typically for a big group and it’s not as competitive.
00:02:08 While Heads Up means it’s two players.
00:02:10 So it’s really like me against you.
00:02:13 Am I better or are you better?
00:02:14 Much like chess or go in that sense,
00:02:17 but an imperfect information game,
00:02:19 which makes it much harder because I have to deal
00:02:21 with issues of you knowing things that I don’t know
00:02:25 and I know things that you don’t know
00:02:27 instead of pieces being nicely laid on the board
00:02:29 for both of us to see.
00:02:31 So in Texas Holdem, there’s two cards
00:02:34 that you only see that belong to you.
00:02:37 Yeah. And there is,
00:02:38 they gradually lay out some cards
00:02:40 that add up overall to five cards that everybody can see.
00:02:44 Yeah. So the imperfect nature
00:02:45 of the information is the two cards
00:02:47 that you’re holding in your hand.
00:02:48 Up front, yeah.
00:02:49 So as you said, you first get two cards
00:02:51 in private each and then there’s a betting round.
00:02:55 Then you get three cards in public on the table.
00:02:58 Then there’s a betting round.
00:02:59 Then you get the fourth card in public on the table.
00:03:01 There’s a betting round.
00:03:02 Then you get the 5th card on the table.
00:03:04 There’s a betting round.
00:03:05 So there’s a total of four betting rounds
00:03:07 and four tranches of information revelation if you will.
00:03:11 The only the first tranche is private
00:03:14 and then it’s public from there.
00:03:16 And this is probably by far the most popular game in AI
00:03:24 and just the general public
00:03:26 in terms of imperfect information.
00:03:28 So that’s probably the most popular spectator game
00:03:32 to watch, right?
00:03:33 So, which is why it’s a super exciting game to tackle.
00:03:37 So it’s on the order of chess, I would say,
00:03:40 in terms of popularity, in terms of AI setting it
00:03:43 as the bar of what is intelligence.
00:03:46 So in 2017, Labratus, how do you pronounce it?
00:03:50 Labratus.
00:03:51 Labratus.
00:03:52 Labratus beats.
00:03:52 A little Latin there.
00:03:54 A little bit of Latin.
00:03:55 Labratus beats a few, four expert human players.
00:04:01 Can you describe that event?
00:04:03 What you learned from it?
00:04:04 What was it like?
00:04:04 What was the process in general
00:04:06 for people who have not read the papers and the study?
00:04:09 Yeah, so the event was that we invited
00:04:12 four of the top 10 players,
00:04:14 with these specialist players in Heads Up No Limit,
00:04:17 Texas Holden, which is very important
00:04:19 because this game is actually quite different
00:04:21 than the multiplayer version.
00:04:23 We brought them in to Pittsburgh
00:04:25 to play at the Reverse Casino for 20 days.
00:04:28 We wanted to get 120,000 hands in
00:04:31 because we wanted to get statistical significance.
00:04:36 So it’s a lot of hands for humans to play,
00:04:39 even for these top pros who play fairly quickly normally.
00:04:42 So we couldn’t just have one of them play so many hands.
00:04:46 20 days, they were playing basically morning to evening.
00:04:50 And I raised 200,000 as a little incentive for them to play.
00:04:55 And the setting was so that they didn’t all get 50,000.
00:05:01 We actually paid them out
00:05:02 based on how they did against the AI each.
00:05:05 So they had an incentive to play as hard as they could,
00:05:09 whether they’re way ahead or way behind
00:05:11 or right at the mark of beating the AI.
00:05:13 And you don’t make any money, unfortunately.
00:05:16 Right, no, we can’t make any money.
00:05:17 So originally, a couple of years earlier,
00:05:20 I actually explored whether we could actually play for money
00:05:24 because that would be, of course, interesting as well,
00:05:28 to play against the top people for money.
00:05:29 But the Pennsylvania Gaming Board said no, so we couldn’t.
00:05:33 So this is much like an exhibit,
00:05:36 like for a musician or a boxer or something like that.
00:05:39 Nevertheless, they were keeping track of the money
00:05:41 and brought us close to $2 million, I think.
00:05:48 So if it was for real money, if you were able to earn money,
00:05:51 that was a quite impressive and inspiring achievement.
00:05:55 Just a few details, what were the players looking at?
00:05:59 Were they behind a computer?
00:06:00 What was the interface like?
00:06:02 Yes, they were playing much like they normally do.
00:06:05 These top players, when they play this game,
00:06:07 they play mostly online.
00:06:08 So they’re used to playing through a UI.
00:06:11 And they did the same thing here.
00:06:13 So there was this layout.
00:06:14 You could imagine there’s a table on a screen.
00:06:17 There’s the human sitting there,
00:06:20 and then there’s the AI sitting there.
00:06:21 And the screen shows everything that’s happening.
00:06:24 The cards coming out and shows the bets being made.
00:06:27 And we also had the betting history for the human.
00:06:29 So if the human forgot what had happened in the hand so far,
00:06:33 they could actually reference back and so forth.
00:06:37 Is there a reason they were given access
00:06:39 to the betting history for?
00:06:41 Well, we just, it didn’t really matter.
00:06:45 They wouldn’t have forgotten anyway.
00:06:47 These are top quality people.
00:06:48 But we just wanted to put out there
00:06:51 so it’s not a question of the human forgetting
00:06:53 and the AI somehow trying to get advantage
00:06:55 of better memory.
00:06:56 So what was that like?
00:06:57 I mean, that was an incredible accomplishment.
00:06:59 So what did it feel like before the event?
00:07:02 Did you have doubt, hope?
00:07:05 Where was your confidence at?
00:07:08 Yeah, that’s great.
00:07:09 So great question.
00:07:10 So 18 months earlier, I had organized a similar brains
00:07:14 versus AI competition with a previous AI called Cloudyco
00:07:17 and we couldn’t beat the humans.
00:07:20 So this time around, it was only 18 months later.
00:07:23 And I knew that this new AI, Libratus, was way stronger,
00:07:27 but it’s hard to say how you’ll do against the top humans
00:07:31 before you try.
00:07:32 So I thought we had about a 50, 50 shot.
00:07:35 And the international betting sites put us
00:07:38 as a four to one or five to one underdog.
00:07:41 So it’s kind of interesting that people really believe
00:07:44 in people and over AI, not just people.
00:07:48 People don’t just over believe in themselves,
00:07:50 but they have overconfidence in other people as well
00:07:53 compared to the performance of AI.
00:07:55 And yeah, so we were a four to one or five to one underdog.
00:07:59 And even after three days of beating the humans in a row,
00:08:02 we were still 50, 50 on the international betting sites.
00:08:06 Do you think there’s something special and magical
00:08:09 about poker and the way people think about it,
00:08:12 in the sense you have,
00:08:14 I mean, even in chess, there’s no Hollywood movies.
00:08:17 Poker is the star of many movies.
00:08:21 And there’s this feeling that certain human facial
00:08:26 expressions and body language, eye movement,
00:08:30 all these tells are critical to poker.
00:08:33 Like you can look into somebody’s soul
00:08:35 and understand their betting strategy and so on.
00:08:37 So that’s probably why, possibly,
00:08:41 do you think that is why people have a confidence
00:08:43 that humans will outperform?
00:08:45 Because AI systems cannot, in this construct,
00:08:48 perceive these kinds of tells.
00:08:51 They’re only looking at betting patterns
00:08:53 and nothing else, betting patterns and statistics.
00:08:58 So what’s more important to you
00:09:02 if you step back on human players, human versus human?
00:09:06 What’s the role of these tells,
00:09:08 of these ideas that we romanticize?
00:09:11 Yeah, so I’ll split it into two parts.
00:09:15 So one is why do humans trust humans more than AI
00:09:20 and have overconfidence in humans?
00:09:22 I think that’s not really related to the tell question.
00:09:25 It’s just that they’ve seen these top players,
00:09:28 how good they are, and they’re really fantastic.
00:09:31 So it’s just hard to believe that an AI could beat them.
00:09:36 So I think that’s where that comes from.
00:09:37 And that’s actually maybe a more general lesson about AI.
00:09:40 That until you’ve seen it overperform a human,
00:09:43 it’s hard to believe that it could.
00:09:45 But then the tells, a lot of these top players,
00:09:50 they’re so good at hiding tells
00:09:52 that among the top players,
00:09:56 it’s actually not really worth it
00:09:59 for them to invest a lot of effort
00:10:01 trying to find tells in each other
00:10:03 because they’re so good at hiding them.
00:10:05 So yes, at the kind of Friday evening game,
00:10:09 tells are gonna be a huge thing.
00:10:11 You can read other people.
00:10:13 And if you’re a good reader,
00:10:14 you’ll read them like an open book.
00:10:16 But at the top levels of poker now,
00:10:18 the tells become a much smaller and smaller aspect
00:10:21 of the game as you go to the top levels.
00:10:24 The amount of strategies, the amount of possible actions
00:10:28 is very large, 10 to the power of 100 plus.
00:10:35 So there has to be some, I’ve read a few of the papers
00:10:37 related, it has to form some abstractions
00:10:42 of various hands and actions.
00:10:44 So what kind of abstractions are effective
00:10:47 for the game of poker?
00:10:49 Yeah, so you’re exactly right.
00:10:50 So when you go from a game tree that’s 10 to the 161,
00:10:55 especially in an imperfect information game,
00:10:58 it’s way too large to solve directly,
00:11:00 even with our fastest equilibrium finding algorithms.
00:11:03 So you wanna abstract it first.
00:11:07 And abstraction in games is much trickier
00:11:10 than abstraction in MDPs or other single agent settings.
00:11:15 Because you have these abstraction pathologies
00:11:17 that if I have a finer grained abstraction,
00:11:19 the strategy that I can get from that for the real game
00:11:23 might actually be worse than the strategy
00:11:25 I can get from the coarse grained abstraction.
00:11:27 So you have to be very careful.
00:11:28 Now the kinds of abstractions, just to zoom out,
00:11:31 we’re talking about, there’s the hands abstractions
00:11:34 and then there’s betting strategies.
00:11:37 Yeah, betting actions, yeah.
00:11:38 Baiting actions.
00:11:39 So there’s information abstraction,
00:11:41 don’t talk about general games, information abstraction,
00:11:44 which is the abstraction of what chance does.
00:11:47 And this would be the cards in the case of poker.
00:11:50 And then there’s action abstraction,
00:11:52 which is abstracting the actions of the actual players,
00:11:57 which would be bets in the case of poker.
00:11:59 Yourself and the other players?
00:12:01 Yes, yourself and other players.
00:12:03 And for information abstraction,
00:12:08 we were completely automated.
00:12:11 So these are algorithms,
00:12:13 but they do what we call potential aware abstraction,
00:12:16 where we don’t just look at the value of the hand,
00:12:19 but also how it might materialize
00:12:20 into good or bad hands over time.
00:12:22 And it’s a certain kind of bottom up process
00:12:25 with integer programming there and clustering
00:12:27 and various aspects, how do you build this abstraction?
00:12:31 And then in the action abstraction,
00:12:34 there it’s largely based on how humans and other AIs
00:12:40 have played this game in the past.
00:12:42 But in the beginning,
00:12:43 we actually used an automated action abstraction technology,
00:12:47 which is provably convergent
00:12:51 that it finds the optimal combination of bet sizes,
00:12:54 but it’s not very scalable.
00:12:55 So we couldn’t use it for the whole game,
00:12:57 but we use it for the first couple of betting actions.
00:12:59 So what’s more important, the strength of the hand,
00:13:03 so the information abstraction or the how you play them,
00:13:09 the actions, does it, you know,
00:13:11 the romanticized notion again,
00:13:13 is that it doesn’t matter what hands you have,
00:13:15 that the actions, the betting may be the way you win
00:13:19 no matter what hands you have.
00:13:20 Yeah, so that’s why you have to play a lot of hands
00:13:23 so that the role of luck gets smaller.
00:13:26 So you could otherwise get lucky and get some good hands
00:13:29 and then you’re gonna win the match.
00:13:31 Even with thousands of hands, you can get lucky
00:13:35 because there’s so much variance
00:13:36 in No Limit Texas Holden because if we both go all in,
00:13:40 it’s a huge stack of variance, so there are these
00:13:43 massive swings in No Limit Texas Holden.
00:13:47 So that’s why you have to play not just thousands,
00:13:50 but over 100,000 hands to get statistical significance.
00:13:55 So let me ask another way this question.
00:13:57 If you didn’t even look at your hands,
00:14:02 but they didn’t know that, the opponents didn’t know that,
00:14:04 how well would you be able to do?
00:14:06 Oh, that’s a good question.
00:14:07 There’s actually, I heard this story
00:14:09 that there’s this Norwegian female poker player
00:14:11 called Annette Oberstad who’s actually won a tournament
00:14:15 by doing exactly that, but that would be extremely rare.
00:14:18 So you cannot really play well that way.
00:14:23 Okay, so the hands do have some role to play, okay.
00:14:27 So Labradus does not use, as far as I understand,
00:14:33 they use learning methods, deep learning.
00:14:35 Is there room for learning in,
00:14:40 there’s no reason why Labradus doesn’t combine
00:14:44 with an AlphaGo type approach for estimating
00:14:46 the quality for function estimator.
00:14:49 What are your thoughts on this,
00:14:52 maybe as compared to another algorithm
00:14:54 which I’m not that familiar with, DeepStack,
00:14:56 the engine that does use deep learning,
00:14:59 that it’s unclear how well it does,
00:15:01 but nevertheless uses deep learning.
00:15:03 So what are your thoughts about learning methods
00:15:05 to aid in the way that Labradus plays in the game of poker?
00:15:09 Yeah, so as you said,
00:15:10 Labradus did not use learning methods
00:15:13 and played very well without them.
00:15:15 Since then, we have actually, actually here,
00:15:17 we have a couple of papers on things
00:15:20 that do use learning techniques.
00:15:22 Excellent.
00:15:24 And deep learning in particular.
00:15:26 And sort of the way you’re talking about
00:15:29 where it’s learning an evaluation function,
00:15:33 but in imperfect information games,
00:15:37 unlike let’s say in Go or now also in chess and shogi,
00:15:42 it’s not sufficient to learn an evaluation for a state
00:15:47 because the value of an information set
00:15:52 depends not only on the exact state,
00:15:55 but it also depends on both players beliefs.
00:15:59 Like if I have a bad hand,
00:16:01 I’m much better off if the opponent thinks I have a good hand
00:16:04 and vice versa.
00:16:05 If I have a good hand,
00:16:06 I’m much better off if the opponent believes
00:16:09 I have a bad hand.
00:16:11 So the value of a state is not just a function of the cards.
00:16:15 It depends on, if you will, the path of play,
00:16:19 but only to the extent that it’s captured
00:16:22 in the belief distributions.
00:16:23 So that’s why it’s not as simple
00:16:26 as it is in perfect information games.
00:16:29 And I don’t wanna say it’s simple there either.
00:16:31 It’s of course very complicated computationally there too,
00:16:34 but at least conceptually, it’s very straightforward.
00:16:36 There’s a state, there’s an evaluation function.
00:16:38 You can try to learn it.
00:16:39 Here, you have to do something more.
00:16:43 And what we do is in one of these papers,
00:16:47 we’re looking at where we allow the opponent
00:16:50 to actually take different strategies
00:16:53 at the leaf of the search tree, if you will.
00:16:56 And that is a different way of doing it.
00:16:59 And it doesn’t assume therefore a particular way
00:17:02 that the opponent plays,
00:17:04 but it allows the opponent to choose
00:17:05 from a set of different continuation strategies.
00:17:09 And that forces us to not be too optimistic
00:17:13 in a look ahead search.
00:17:15 And that’s one way you can do sound look ahead search
00:17:19 in imperfect information games,
00:17:21 which is very difficult.
00:17:23 And you were asking about DeepStack.
00:17:26 What they did, it was very different than what we do,
00:17:29 either in Libratus or in this new work.
00:17:32 They were randomly generating various situations
00:17:35 in the game.
00:17:36 Then they were doing the look ahead
00:17:38 from there to the end of the game,
00:17:39 as if that was the start of a different game.
00:17:42 And then they were using deep learning
00:17:44 to learn those values of those states,
00:17:47 but the states were not just the physical states.
00:17:50 They include belief distributions.
00:17:52 When you talk about look ahead for DeepStack
00:17:56 or with Libratus, does it mean,
00:17:59 considering every possibility that the game can evolve,
00:18:02 are we talking about extremely,
00:18:04 sort of this exponentially growth of a tree?
00:18:06 Yes, so we’re talking about exactly that.
00:18:11 Much like you do in alpha beta search
00:18:14 or Monte Carlo tree search, but with different techniques.
00:18:17 So there’s a different search algorithm.
00:18:19 And then we have to deal with the leaves differently.
00:18:21 So if you think about what Libratus did,
00:18:24 we didn’t have to worry about this
00:18:25 because we only did it at the end of the game.
00:18:28 So we would always terminate into a real situation
00:18:32 and we would know what the payout is.
00:18:34 It didn’t do these depth limited lookaheads,
00:18:36 but now in this new paper, which is called depth limited,
00:18:40 I think it’s called depth limited search
00:18:42 for imperfect information games,
00:18:43 we can actually do sound depth limited lookahead.
00:18:47 So we can actually start to do the look ahead
00:18:49 from the beginning of the game on,
00:18:51 because that’s too complicated to do
00:18:53 for this whole long game.
00:18:54 So in Libratus, we were just doing it for the end.
00:18:57 So, and then the other side, this belief distribution,
00:19:00 so is it explicitly modeled what kind of beliefs
00:19:05 that the opponent might have?
00:19:07 Yeah, it is explicitly modeled, but it’s not assumed.
00:19:11 The beliefs are actually output, not input.
00:19:15 Of course, the starting beliefs are input,
00:19:18 but they just fall from the rules of the game
00:19:20 because we know that the dealer deals uniformly
00:19:23 from the deck, so I know that every pair of cards
00:19:27 that you might have is equally likely.
00:19:30 I know that for a fact, that just follows
00:19:32 from the rules of the game.
00:19:33 Of course, except the two cards that I have,
00:19:35 I know you don’t have those.
00:19:36 Yeah.
00:19:37 You have to take that into account.
00:19:38 That’s called card removal and that’s very important.
00:19:40 Is the dealing always coming from a single deck
00:19:43 in Heads Up, so you can assume.
00:19:45 Single deck, so you know that if I have the ace of spades,
00:19:50 I know you don’t have an ace of spades.
00:19:53 Great, so in the beginning, your belief is basically
00:19:56 the fact that it’s a fair dealing of hands,
00:19:59 but how do you start to adjust that belief?
00:20:02 Well, that’s where this beauty of game theory comes.
00:20:06 So Nash equilibrium, which John Nash introduced in 1950,
00:20:10 introduces what rational play is
00:20:13 when you have more than one player.
00:20:16 And these are pairs of strategies
00:20:18 where strategies are contingency plans,
00:20:20 one for each player.
00:20:22 So that neither player wants to deviate
00:20:25 to a different strategy,
00:20:26 given that the other doesn’t deviate.
00:20:29 But as a side effect, you get the beliefs from base roll.
00:20:33 So Nash equilibrium really isn’t just deriving
00:20:36 in these imperfect information games,
00:20:38 Nash equilibrium, it doesn’t just define strategies.
00:20:41 It also defines beliefs for both of us
00:20:44 and defines beliefs for each state.
00:20:48 So at each state, it’s called information sets.
00:20:53 At each information set in the game,
00:20:55 there’s a set of different states that we might be in,
00:20:59 but I don’t know which one we’re in.
00:21:01 Nash equilibrium tells me exactly
00:21:03 what is the probability distribution
00:21:05 over those real world states in my mind.
00:21:08 How does Nash equilibrium give you that distribution?
00:21:11 So why?
00:21:12 I’ll do a simple example.
00:21:13 So you know the game Rock, Paper, Scissors?
00:21:16 So we can draw it as player one moves first
00:21:20 and then player two moves.
00:21:21 But of course, it’s important that player two
00:21:24 doesn’t know what player one moved,
00:21:26 otherwise player two would win every time.
00:21:28 So we can draw that as an information set
00:21:30 where player one makes one of three moves first,
00:21:33 and then there’s an information set for player two.
00:21:36 So player two doesn’t know which of those nodes
00:21:39 the world is in.
00:21:41 But once we know the strategy for player one,
00:21:44 Nash equilibrium will say that you play 1 3rd Rock,
00:21:47 1 3rd Paper, 1 3rd Scissors.
00:21:49 From that, I can derive my beliefs on the information set
00:21:52 that they’re 1 3rd, 1 3rd, 1 3rd.
00:21:54 So Bayes gives you that.
00:21:56 Bayes gives you.
00:21:57 But is that specific to a particular player,
00:21:59 or is it something you quickly update
00:22:03 with the specific player?
00:22:05 No, the game theory isn’t really player specific.
00:22:08 So that’s also why we don’t need any data.
00:22:11 We don’t need any history
00:22:12 how these particular humans played in the past
00:22:14 or how any AI or human had played before.
00:22:17 It’s all about rationality.
00:22:20 So the AI just thinks about
00:22:22 what would a rational opponent do?
00:22:24 And what would I do if I am rational?
00:22:28 And that’s the idea of game theory.
00:22:31 So it’s really a data free, opponent free approach.
00:22:35 So it comes from the design of the game
00:22:37 as opposed to the design of the player.
00:22:40 Exactly, there’s no opponent modeling per se.
00:22:43 I mean, we’ve done some work on combining opponent modeling
00:22:45 with game theory so you can exploit weak players even more,
00:22:48 but that’s another strand.
00:22:50 And in Librarus, we didn’t turn that on.
00:22:52 So I decided that these players are too good.
00:22:55 And when you start to exploit an opponent,
00:22:58 you typically open yourself up to exploitation.
00:23:01 And these guys have so few holes to exploit
00:23:04 and they’re world’s leading experts in counter exploitation.
00:23:06 So I decided that we’re not gonna turn that stuff on.
00:23:09 Actually, I saw a few of your papers exploiting opponents.
00:23:12 It sounded very interesting to explore.
00:23:15 Do you think there’s room for exploitation
00:23:17 generally outside of Librarus?
00:23:19 Is there a subject or people differences
00:23:24 that could be exploited, maybe not just in poker,
00:23:27 but in general interactions and negotiations,
00:23:30 all these other domains that you’re considering?
00:23:33 Yeah, definitely.
00:23:34 We’ve done some work on that.
00:23:35 And I really like the work at hybrid digested too.
00:23:39 So you figure out what would a rational opponent do.
00:23:43 And by the way, that’s safe in these zero sum games,
00:23:46 two player zero sum games,
00:23:47 because if the opponent does something irrational,
00:23:49 yes, it might throw off my beliefs,
00:23:53 but the amount that the player can gain
00:23:55 by throwing off my belief is always less
00:23:59 than they lose by playing poorly.
00:24:01 So it’s safe.
00:24:03 But still, if somebody’s weak as a player,
00:24:06 you might wanna play differently to exploit them more.
00:24:10 So you can think about it this way,
00:24:12 a game theoretic strategy is unbeatable,
00:24:15 but it doesn’t maximally beat the other opponent.
00:24:19 So the winnings per hand might be better
00:24:22 with a different strategy.
00:24:24 And the hybrid is that you start
00:24:25 from a game theoretic approach.
00:24:27 And then as you gain data about the opponent
00:24:30 in certain parts of the game tree,
00:24:32 then in those parts of the game tree,
00:24:34 you start to tweak your strategy more and more
00:24:37 towards exploitation while still staying fairly close
00:24:40 to the game theoretic strategy
00:24:42 so as to not open yourself up to exploitation too much.
00:24:46 How do you do that?
00:24:48 Do you try to vary up strategies, make it unpredictable?
00:24:53 It’s like, what is it, tit for tat strategies
00:24:57 in Prisoner’s Dilemma or?
00:25:00 Well, that’s a repeated game.
00:25:03 Repeated games.
00:25:04 Simple Prisoner’s Dilemma, repeated games.
00:25:06 But even there, there’s no proof that says
00:25:08 that that’s the best thing.
00:25:10 But experimentally, it actually does well.
00:25:13 So what kind of games are there, first of all?
00:25:15 I don’t know if this is something
00:25:17 that you could just summarize.
00:25:18 There’s perfect information games
00:25:20 where all the information’s on the table.
00:25:22 There is imperfect information games.
00:25:25 There’s repeated games that you play over and over.
00:25:28 There’s zero sum games.
00:25:31 There’s non zero sum games.
00:25:34 And then there’s a really important distinction
00:25:37 you’re making, two player versus more players.
00:25:40 So what are, what other games are there?
00:25:44 And what’s the difference, for example,
00:25:46 with this two player game versus more players?
00:25:50 What are the key differences in your view?
00:25:51 So let me start from the basics.
00:25:54 So a repeated game is a game where the same exact game
00:25:59 is played over and over.
00:26:01 In these extensive form games, where it’s,
00:26:05 think about three form, maybe with these information sets
00:26:08 to represent incomplete information,
00:26:11 you can have kind of repetitive interactions.
00:26:14 Even repeated games are a special case of that, by the way.
00:26:17 But the game doesn’t have to be exactly the same.
00:26:21 It’s like in sourcing auctions.
00:26:23 Yes, we’re gonna see the same supply base year to year,
00:26:26 but what I’m buying is a little different every time.
00:26:28 And the supply base is a little different every time
00:26:31 and so on.
00:26:31 So it’s not really repeated.
00:26:33 So to find a purely repeated game
00:26:35 is actually very rare in the world.
00:26:37 So they’re really a very course model of what’s going on.
00:26:42 Then if you move up from just repeated,
00:26:46 simple repeated matrix games,
00:26:49 not all the way to extensive form games,
00:26:50 but in between, they’re stochastic games,
00:26:53 where, you know, there’s these,
00:26:57 you think about it like these little matrix games.
00:27:00 And when you take an action and your opponent takes an action,
00:27:04 they determine not which next state I’m going to,
00:27:07 next game I’m going to,
00:27:09 but the distribution over next games
00:27:11 where I might be going to.
00:27:13 So that’s the stochastic game.
00:27:15 But it’s like matrix games, repeated stochastic games,
00:27:19 extensive form games.
00:27:20 That is from less to more general.
00:27:23 And poker is an example of the last one.
00:27:26 So it’s really in the most general setting.
00:27:29 Extensive form games.
00:27:30 And that’s kind of what the AI community has been working on
00:27:34 and being benchmarked on
00:27:36 with this Heads Up No Limit Texas Holdem.
00:27:38 Can you describe extensive form games?
00:27:39 What’s the model here?
00:27:41 Yeah, so if you’re familiar with the tree form,
00:27:44 so it’s really the tree form.
00:27:45 Like in chess, there’s a search tree.
00:27:47 Versus a matrix.
00:27:48 Versus a matrix, yeah.
00:27:50 And the matrix is called the matrix form
00:27:53 or bi matrix form or normal form game.
00:27:55 And here you have the tree form.
00:27:57 So you can actually do certain types of reasoning there
00:28:00 that you lose the information when you go to normal form.
00:28:04 There’s a certain form of equivalence.
00:28:07 Like if you go from tree form and you say it,
00:28:08 every possible contingency plan is a strategy.
00:28:12 Then I can actually go back to the normal form,
00:28:15 but I lose some information from the lack of sequentiality.
00:28:18 Then the multiplayer versus two player distinction
00:28:21 is an important one.
00:28:22 So two player games in zero sum
00:28:27 are conceptually easier and computationally easier.
00:28:32 They’re still huge like this one,
00:28:36 but they’re conceptually easier and computationally easier
00:28:39 in that conceptually, you don’t have to worry about
00:28:42 which equilibrium is the other guy going to play
00:28:45 when there are multiple,
00:28:46 because any equilibrium strategy is a best response
00:28:49 to any other equilibrium strategy.
00:28:52 So I can play a different equilibrium from you
00:28:54 and we’ll still get the right values of the game.
00:28:57 That falls apart even with two players
00:28:59 when you have general sum games.
00:29:01 Even without cooperation just in general.
00:29:03 Even without cooperation.
00:29:04 So there’s a big gap from two player zero sum
00:29:07 to two player general sum or even to three player zero sum.
00:29:11 That’s a big gap, at least in theory.
00:29:14 Can you maybe non mathematically provide the intuition
00:29:18 why it all falls apart with three or more players?
00:29:22 It seems like you should still be able to have
00:29:24 a Nash equilibrium that’s instructive, that holds.
00:29:31 Okay, so it is true that all finite games
00:29:36 have a Nash equilibrium.
00:29:38 So this is what John Nash actually proved.
00:29:41 So they do have a Nash equilibrium.
00:29:42 That’s not the problem.
00:29:43 The problem is that there can be many.
00:29:46 And then there’s a question of which equilibrium to select.
00:29:50 So, and if you select your strategy
00:29:52 from a different equilibrium and I select mine,
00:29:57 then what does that mean?
00:29:59 And in these non zero sum games,
00:30:02 we may lose some joint benefit
00:30:05 by being just simply stupid.
00:30:07 We could actually both be better off
00:30:08 if we did something else.
00:30:09 And in three player, you get other problems
00:30:11 also like collusion.
00:30:13 Like maybe you and I can gang up on a third player
00:30:16 and we can do radically better by colluding.
00:30:19 So there are lots of issues that come up there.
00:30:22 So Noah Brown, the student you work with on this
00:30:25 has mentioned, I looked through the AMA on Reddit.
00:30:29 He mentioned that the ability of poker players
00:30:31 to collaborate will make the game.
00:30:33 He was asked the question of,
00:30:35 how would you make the game of poker,
00:30:37 or both of you were asked the question,
00:30:39 how would you make the game of poker
00:30:41 beyond being solvable by current AI methods?
00:30:47 And he said that there’s not many ways
00:30:50 of making poker more difficult,
00:30:53 but a collaboration or cooperation between players
00:30:57 would make it extremely difficult.
00:30:59 So can you provide the intuition behind why that is,
00:31:03 if you agree with that idea?
00:31:05 Yeah, so I’ve done a lot of work on coalitional games
00:31:10 and we actually have a paper here
00:31:11 with my other student Gabriele Farina
00:31:13 and some other collaborators at NIPS on that.
00:31:16 Actually just came back from the poster session
00:31:18 where we presented this.
00:31:19 But so when you have a collusion, it’s a different problem.
00:31:23 And it typically gets even harder then.
00:31:27 Even the game representations,
00:31:29 some of the game representations don’t really allow
00:31:33 good computation.
00:31:34 So we actually introduced a new game representation
00:31:37 for that.
00:31:38 Is that kind of cooperation part of the model?
00:31:42 Are you, do you have, do you have information
00:31:44 about the fact that other players are cooperating
00:31:47 or is it just this chaos that where nothing is known?
00:31:50 So there’s some things unknown.
00:31:52 Can you give an example of a collusion type game
00:31:55 or is it usually?
00:31:56 So like bridge.
00:31:58 So think about bridge.
00:31:59 It’s like when you and I are on a team,
00:32:02 our payoffs are the same.
00:32:04 The problem is that we can’t talk.
00:32:06 So when I get my cards, I can’t whisper to you
00:32:09 what my cards are.
00:32:10 That would not be allowed.
00:32:12 So we have to somehow coordinate our strategies
00:32:16 ahead of time and only ahead of time.
00:32:19 And then there’s certain signals we can talk about,
00:32:22 but they have to be such that the other team
00:32:25 also understands them.
00:32:26 So that’s an example where the coordination
00:32:30 is already built into the rules of the game.
00:32:33 But in many other situations like auctions
00:32:35 or negotiations or diplomatic relationships, poker,
00:32:40 it’s not really built in, but it still can be very helpful
00:32:44 for the colluders.
00:32:45 I’ve read you write somewhere,
00:32:48 the negotiations you come to the table with prior,
00:32:52 like a strategy that you’re willing to do
00:32:56 and not willing to do those kinds of things.
00:32:58 So how do you start to now moving away from poker,
00:33:01 moving beyond poker into other applications
00:33:04 like negotiations, how do you start applying this
00:33:07 to other domains, even real world domains
00:33:11 that you’ve worked on?
00:33:12 Yeah, I actually have two startup companies
00:33:14 doing exactly that.
00:33:15 One is called Strategic Machine,
00:33:17 and that’s for kind of business applications,
00:33:20 gaming, sports, all sorts of things like that.
00:33:22 Any applications of this to business and to sports
00:33:27 and to gaming, to various types of things
00:33:32 in finance, electricity markets and so on.
00:33:34 And the other is called Strategy Robot,
00:33:36 where we are taking these to military security,
00:33:40 cyber security and intelligence applications.
00:33:43 I think you worked a little bit in,
00:33:48 how do you put it, advertisement,
00:33:51 sort of suggesting ads kind of thing, auction.
00:33:55 That’s another company, optimized markets.
00:33:57 But that’s much more about a combinatorial market
00:34:00 and optimization based technology.
00:34:02 That’s not using these game theoretic reasoning technologies.
00:34:06 I see, okay, so what sort of high level
00:34:11 do you think about our ability to use
00:34:15 game theoretic concepts to model human behavior?
00:34:18 Do you think human behavior is amenable
00:34:21 to this kind of modeling outside of the poker games,
00:34:24 and where have you seen it done successfully in your work?
00:34:27 I’m not sure the goal really is modeling humans.
00:34:33 Like for example, if I’m playing a zero sum game,
00:34:36 I don’t really care that the opponent
00:34:39 is actually following my model of rational behavior,
00:34:42 because if they’re not, that’s even better for me.
00:34:46 Right, so see with the opponents in games,
00:34:51 the prerequisite is that you formalize
00:34:56 the interaction in some way
00:34:57 that can be amenable to analysis.
00:35:01 And you’ve done this amazing work with mechanism design,
00:35:04 designing games that have certain outcomes.
00:35:10 But, so I’ll tell you an example
00:35:12 from my world of autonomous vehicles, right?
00:35:15 We’re studying pedestrians,
00:35:17 and pedestrians and cars negotiate
00:35:20 in this nonverbal communication.
00:35:22 There’s this weird game dance of tension
00:35:25 where pedestrians are basically saying,
00:35:27 I trust that you won’t kill me,
00:35:28 and so as a jaywalker, I will step onto the road
00:35:31 even though I’m breaking the law, and there’s this tension.
00:35:34 And the question is, we really don’t know
00:35:36 how to model that well in trying to model intent.
00:35:40 And so people sometimes bring up ideas
00:35:43 of game theory and so on.
00:35:44 Do you think that aspect of human behavior
00:35:49 can use these kinds of imperfect information approaches,
00:35:53 modeling, how do you start to attack a problem like that
00:35:57 when you don’t even know how to design the game
00:36:00 to describe the situation in order to solve it?
00:36:04 Okay, so I haven’t really thought about jaywalking,
00:36:06 but one thing that I think could be a good application
00:36:10 in autonomous vehicles is the following.
00:36:13 So let’s say that you have fleets of autonomous cars
00:36:16 operating by different companies.
00:36:18 So maybe here’s the Waymo fleet and here’s the Uber fleet.
00:36:22 If you think about the rules of the road,
00:36:24 they define certain legal rules,
00:36:26 but that still leaves a huge strategy space open.
00:36:30 Like as a simple example, when cars merge,
00:36:32 how humans merge, they slow down and look at each other
00:36:36 and try to merge.
00:36:39 Wouldn’t it be better if these situations
00:36:40 would already be prenegotiated
00:36:43 so we can actually merge at full speed
00:36:45 and we know that this is the situation,
00:36:47 this is how we do it, and it’s all gonna be faster.
00:36:50 But there are way too many situations to negotiate manually.
00:36:54 So you could use automated negotiation,
00:36:56 this is the idea at least,
00:36:57 you could use automated negotiation
00:36:59 to negotiate all of these situations
00:37:02 or many of them in advance.
00:37:04 And of course it might be that,
00:37:05 hey, maybe you’re not gonna always let me go first.
00:37:09 Maybe you said, okay, well, in these situations,
00:37:11 I’ll let you go first, but in exchange,
00:37:13 you’re gonna give me too much,
00:37:14 you’re gonna let me go first in this situation.
00:37:17 So it’s this huge combinatorial negotiation.
00:37:20 And do you think there’s room in that example of merging
00:37:24 to model this whole situation
00:37:25 as an imperfect information game
00:37:27 or do you really want to consider it to be a perfect?
00:37:30 No, that’s a good question, yeah.
00:37:32 That’s a good question.
00:37:33 Do you pay the price of assuming
00:37:37 that you don’t know everything?
00:37:39 Yeah, I don’t know.
00:37:40 It’s certainly much easier.
00:37:42 Games with perfect information are much easier.
00:37:45 So if you can’t get away with it, you should.
00:37:49 But if the real situation is of imperfect information,
00:37:52 then you’re gonna have to deal with imperfect information.
00:37:55 Great, so what lessons have you learned
00:37:58 the Annual Computer Poker Competition?
00:38:00 An incredible accomplishment of AI.
00:38:03 You look at the history of Deep Blue, AlphaGo,
00:38:07 these kind of moments when AI stepped up
00:38:10 in an engineering effort and a scientific effort combined
00:38:13 to beat the best of human players.
00:38:16 So what do you take away from this whole experience?
00:38:19 What have you learned about designing AI systems
00:38:22 that play these kinds of games?
00:38:23 And what does that mean for AI in general,
00:38:28 for the future of AI development?
00:38:30 Yeah, so that’s a good question.
00:38:32 So there’s so much to say about it.
00:38:35 I do like this type of performance oriented research.
00:38:39 Although in my group, we go all the way from like idea
00:38:42 to theory, to experiments, to big system building,
00:38:44 to commercialization, so we span that spectrum.
00:38:47 But I think that in a lot of situations in AI,
00:38:51 you really have to build the big systems
00:38:53 and evaluate them at scale
00:38:55 before you know what works and doesn’t.
00:38:57 And we’ve seen that in the computational
00:39:00 game theory community, that there are a lot of techniques
00:39:02 that look good in the small,
00:39:05 but then they cease to look good in the large.
00:39:07 And we’ve also seen that there are a lot of techniques
00:39:10 that look superior in theory.
00:39:13 And I really mean in terms of convergence rates,
00:39:16 like first order methods, better convergence rates,
00:39:18 like the CFR based algorithms,
00:39:20 yet the CFR based algorithms are the fastest in practice.
00:39:24 So it really tells me that you have to test this in reality.
00:39:28 The theory isn’t tight enough, if you will,
00:39:30 to tell you which algorithms are better than the others.
00:39:34 And you have to look at these things in the large,
00:39:38 because any sort of projections you do from the small
00:39:41 can at least in this domain be very misleading.
00:39:43 So that’s kind of from a kind of a science
00:39:46 and engineering perspective, from a personal perspective,
00:39:49 it’s been just a wild experience
00:39:51 in that with the first poker competition,
00:39:54 the first brains versus AI,
00:39:57 man machine poker competition that we organized.
00:39:59 There had been, by the way, for other poker games,
00:40:01 there had been previous competitions,
00:40:03 but this was for Heads Up No Limit, this was the first.
00:40:06 And I probably became the most hated person
00:40:09 in the world of poker.
00:40:10 And I didn’t mean to, I just saw.
00:40:12 Why is that?
00:40:13 For cracking the game, for something.
00:40:15 Yeah, a lot of people felt that it was a real threat
00:40:20 to the whole game, the whole existence of the game.
00:40:22 If AI becomes better than humans,
00:40:26 people would be scared to play poker
00:40:28 because there are these superhuman AIs running around
00:40:30 taking their money and all of that.
00:40:32 So I just, it’s just really aggressive.
00:40:36 The comments were super aggressive.
00:40:37 I got everything just short of death threats.
00:40:40 Do you think the same was true for chess?
00:40:44 Because right now they just completed
00:40:45 the world championships in chess,
00:40:47 and humans just started ignoring the fact
00:40:49 that there’s AI systems now that outperform humans
00:40:52 and they still enjoy the game, it’s still a beautiful game.
00:40:55 That’s what I think.
00:40:56 And I think the same thing happens in poker.
00:40:58 And so I didn’t think of myself
00:41:01 as somebody who was gonna kill the game,
00:41:02 and I don’t think I did.
00:41:03 I’ve really learned to love this game.
00:41:05 I wasn’t a poker player before,
00:41:06 but learned so many nuances about it from these AIs,
00:41:10 and they’ve really changed how the game is played,
00:41:12 by the way.
00:41:13 So they have these very Martian ways of playing poker,
00:41:16 and the top humans are now incorporating
00:41:18 those types of strategies into their own play.
00:41:21 So if anything, to me, our work has made poker
00:41:26 a richer, more interesting game for humans to play,
00:41:29 not something that is gonna steer humans
00:41:32 away from it entirely.
00:41:34 Just a quick comment on something you said,
00:41:35 which is, if I may say so,
00:41:39 in academia is a little bit rare sometimes.
00:41:42 It’s pretty brave to put your ideas to the test
00:41:45 in the way you described,
00:41:47 saying that sometimes good ideas don’t work
00:41:49 when you actually try to apply them at scale.
00:41:52 So where does that come from?
00:41:54 I mean, if you could do advice for people,
00:41:58 what drives you in that sense?
00:42:00 Were you always this way?
00:42:02 I mean, it takes a brave person.
00:42:04 I guess is what I’m saying, to test their ideas
00:42:06 and to see if this thing actually works
00:42:08 against human top human players and so on.
00:42:11 Yeah, I don’t know about brave,
00:42:12 but it takes a lot of work.
00:42:15 It takes a lot of work and a lot of time
00:42:18 to organize, to make something big
00:42:20 and to organize an event and stuff like that.
00:42:22 And what drives you in that effort?
00:42:24 Because you could still, I would argue,
00:42:26 get a best paper award at NIPS as you did in 17
00:42:30 without doing this.
00:42:31 That’s right, yes.
00:42:32 And so in general, I believe it’s very important
00:42:37 to do things in the real world and at scale.
00:42:41 And that’s really where the pudding, if you will,
00:42:46 proof is in the pudding, that’s where it is.
00:42:48 In this particular case,
00:42:50 it was kind of a competition between different groups
00:42:55 and for many years as to who can be the first one
00:42:59 to beat the top humans at Heads Up No Limit, Texas Holdem.
00:43:02 So it became kind of like a competition who can get there.
00:43:09 Yeah, so a little friendly competition
00:43:11 could do wonders for progress.
00:43:14 Yes, absolutely.
00:43:16 So the topic of mechanism design,
00:43:19 which is really interesting, also kind of new to me,
00:43:22 except as an observer of, I don’t know, politics and any,
00:43:25 I’m an observer of mechanisms,
00:43:27 but you write in your paper an automated mechanism design
00:43:31 that I quickly read.
00:43:34 So mechanism design is designing the rules of the game
00:43:37 so you get a certain desirable outcome.
00:43:40 And you have this work on doing so in an automatic fashion
00:43:44 as opposed to fine tuning it.
00:43:46 So what have you learned from those efforts?
00:43:50 If you look, say, I don’t know,
00:43:52 at complexes like our political system,
00:43:56 can we design our political system
00:43:58 to have, in an automated fashion,
00:44:01 to have outcomes that we want?
00:44:03 Can we design something like traffic lights to be smart
00:44:09 where it gets outcomes that we want?
00:44:11 So what are the lessons that you draw from that work?
00:44:14 Yeah, so I still very much believe
00:44:17 in the automated mechanism design direction.
00:44:19 Yes.
00:44:20 But it’s not a panacea.
00:44:23 There are impossibility results in mechanism design
00:44:26 saying that there is no mechanism that accomplishes
00:44:30 objective X in class C.
00:44:33 So it’s not going to,
00:44:36 there’s no way using any mechanism design tools,
00:44:39 manual or automated,
00:44:41 to do certain things in mechanism design.
00:44:42 Can you describe that again?
00:44:43 So meaning it’s impossible to achieve that?
00:44:47 Yeah, yeah.
00:44:48 And it’s unlikely.
00:44:50 Impossible.
00:44:51 Impossible.
00:44:52 So these are not statements about human ingenuity
00:44:55 who might come up with something smart.
00:44:57 These are proofs that if you want to accomplish
00:44:59 properties X in class C,
00:45:02 that is not doable with any mechanism.
00:45:04 The good thing about automated mechanism design
00:45:07 is that we’re not really designing for a class.
00:45:10 We’re designing for specific settings at a time.
00:45:14 So even if there’s an impossibility result
00:45:16 for the whole class,
00:45:18 it just doesn’t mean that all of the cases
00:45:21 in the class are impossible.
00:45:22 It just means that some of the cases are impossible.
00:45:25 So we can actually carve these islands of possibility
00:45:28 within these known impossible classes.
00:45:30 And we’ve actually done that.
00:45:31 So one of the famous results in mechanism design
00:45:35 is the Meyerson Satethweight theorem
00:45:37 by Roger Meyerson and Mark Satethweight from 1983.
00:45:41 It’s an impossibility of efficient trade
00:45:43 under imperfect information.
00:45:45 We show that you can, in many settings,
00:45:48 avoid that and get efficient trade anyway.
00:45:51 Depending on how they design the game, okay.
00:45:54 Depending how you design the game.
00:45:55 And of course, it doesn’t in any way
00:46:00 contradict the impossibility result.
00:46:01 The impossibility result is still there,
00:46:03 but it just finds spots within this impossible class
00:46:08 where in those spots, you don’t have the impossibility.
00:46:12 Sorry if I’m going a bit philosophical,
00:46:14 but what lessons do you draw towards,
00:46:17 like I mentioned, politics or human interaction
00:46:20 and designing mechanisms for outside of just
00:46:24 these kinds of trading or auctioning
00:46:26 or purely formal games or human interaction,
00:46:33 like a political system?
00:46:34 How, do you think it’s applicable to, yeah, politics
00:46:39 or to business, to negotiations, these kinds of things,
00:46:46 designing rules that have certain outcomes?
00:46:49 Yeah, yeah, I do think so.
00:46:51 Have you seen that successfully done?
00:46:54 They haven’t really, oh, you mean mechanism design
00:46:56 or automated mechanism design?
00:46:57 Automated mechanism design.
00:46:59 So mechanism design itself
00:47:01 has had fairly limited success so far.
00:47:06 There are certain cases,
00:47:07 but most of the real world situations
00:47:10 are actually not sound from a mechanism design perspective,
00:47:14 even in those cases where they’ve been designed
00:47:16 by very knowledgeable mechanism design people,
00:47:20 the people are typically just taking some insights
00:47:22 from the theory and applying those insights
00:47:25 into the real world,
00:47:26 rather than applying the mechanisms directly.
00:47:29 So one famous example of is the FCC spectrum auctions.
00:47:33 So I’ve also had a small role in that
00:47:36 and very good economists have been working,
00:47:40 excellent economists have been working on that
00:47:42 with no game theory,
00:47:44 yet the rules that are designed in practice there,
00:47:47 they’re such that bidding truthfully
00:47:49 is not the best strategy.
00:47:51 Usually mechanism design,
00:47:52 we try to make things easy for the participants.
00:47:56 So telling the truth is the best strategy,
00:47:58 but even in those very high stakes auctions
00:48:01 where you have tens of billions of dollars
00:48:03 worth of spectrum being auctioned,
00:48:06 truth telling is not the best strategy.
00:48:08 And by the way,
00:48:10 nobody knows even a single optimal bidding strategy
00:48:12 for those auctions.
00:48:14 What’s the challenge of coming up with an optimal,
00:48:15 because there’s a lot of players and there’s imperfect.
00:48:18 It’s not so much that a lot of players,
00:48:20 but many items for sale,
00:48:22 and these mechanisms are such that even with just two items
00:48:26 or one item, bidding truthfully
00:48:28 wouldn’t be the best strategy.
00:48:31 If you look at the history of AI,
00:48:34 it’s marked by seminal events.
00:48:37 AlphaGo beating a world champion human Go player,
00:48:40 I would put Liberatus winning the Heads Up No Limit Holdem
00:48:43 as one of such event.
00:48:45 Thank you.
00:48:46 And what do you think is the next such event,
00:48:52 whether it’s in your life or in the broadly AI community
00:48:56 that you think might be out there
00:48:59 that would surprise the world?
00:49:01 So that’s a great question,
00:49:02 and I don’t really know the answer.
00:49:04 In terms of game solving,
00:49:07 Heads Up No Limit Texas Holdem
00:49:08 really was the one remaining widely agreed upon benchmark.
00:49:14 So that was the big milestone.
00:49:15 Now, are there other things?
00:49:17 Yeah, certainly there are,
00:49:18 but there’s not one that the community
00:49:21 has kind of focused on.
00:49:22 So what could be other things?
00:49:25 There are groups working on StarCraft.
00:49:27 There are groups working on Dota 2.
00:49:29 These are video games.
00:49:31 Or you could have like Diplomacy or Hanabi,
00:49:36 things like that.
00:49:37 These are like recreational games,
00:49:38 but none of them are really acknowledged
00:49:42 as kind of the main next challenge problem,
00:49:45 like chess or Go or Heads Up No Limit Texas Holdem was.
00:49:50 So I don’t really know in the game solving space
00:49:52 what is or what will be the next benchmark.
00:49:55 I kind of hope that there will be a next benchmark
00:49:57 because really the different groups
00:49:59 working on the same problem
00:50:01 really drove these application independent techniques
00:50:05 forward very quickly over 10 years.
00:50:07 Do you think there’s an open problem
00:50:09 that excites you that you start moving away
00:50:11 from games into real world games,
00:50:15 like say the stock market trading?
00:50:17 Yeah, so that’s kind of how I am.
00:50:19 So I am probably not going to work
00:50:23 as hard on these recreational benchmarks.
00:50:27 I’m doing two startups on game solving technology,
00:50:30 Strategic Machine and Strategy Robot,
00:50:32 and we’re really interested
00:50:34 in pushing this stuff into practice.
00:50:36 What do you think would be really
00:50:43 a powerful result that would be surprising
00:50:45 that would be, if you can say,
00:50:49 I mean, five years, 10 years from now,
00:50:53 something that statistically you would say
00:50:56 is not very likely,
00:50:57 but if there’s a breakthrough, would achieve?
00:51:01 Yeah, so I think that overall,
00:51:03 we’re in a very different situation in game theory
00:51:09 than we are in, let’s say, machine learning.
00:51:11 So in machine learning, it’s a fairly mature technology
00:51:14 and it’s very broadly applied
00:51:16 and proven success in the real world.
00:51:19 In game solving, there are almost no applications yet.
00:51:24 We have just become superhuman,
00:51:26 which machine learning you could argue happened in the 90s,
00:51:29 if not earlier,
00:51:30 and at least on supervised learning,
00:51:32 certain complex supervised learning applications.
00:51:36 Now, I think the next challenge problem,
00:51:39 I know you’re not asking about it this way,
00:51:40 you’re asking about the technology breakthrough,
00:51:42 but I think that big, big breakthrough
00:51:44 is to be able to show that, hey,
00:51:46 maybe most of, let’s say, military planning
00:51:48 or most of business strategy
00:51:50 will actually be done strategically
00:51:52 using computational game theory.
00:51:54 That’s what I would like to see
00:51:55 as the next five or 10 year goal.
00:51:57 Maybe you can explain to me again,
00:51:59 forgive me if this is an obvious question,
00:52:01 but machine learning methods,
00:52:04 neural networks suffer from not being transparent,
00:52:07 not being explainable.
00:52:09 Game theoretic methods, Nash equilibria,
00:52:12 do they generally, when you see the different solutions,
00:52:15 are they, when you talk about military operations,
00:52:19 are they, once you see the strategies,
00:52:21 do they make sense, are they explainable,
00:52:23 or do they suffer from the same problems
00:52:25 as neural networks do?
00:52:27 So that’s a good question.
00:52:28 I would say a little bit yes and no.
00:52:31 And what I mean by that is that
00:52:34 these game theoretic strategies,
00:52:36 let’s say, Nash equilibrium,
00:52:38 it has provable properties.
00:52:40 So it’s unlike, let’s say, deep learning
00:52:42 where you kind of cross your fingers,
00:52:44 hopefully it’ll work.
00:52:45 And then after the fact, when you have the weights,
00:52:47 you’re still crossing your fingers,
00:52:48 and I hope it will work.
00:52:51 Here, you know that the solution quality is there.
00:52:55 There’s provable solution quality guarantees.
00:52:58 Now, that doesn’t necessarily mean
00:53:00 that the strategies are human understandable.
00:53:03 That’s a whole other problem.
00:53:04 So I think that deep learning and computational game theory
00:53:08 are in the same boat in that sense,
00:53:10 that both are difficult to understand.
00:53:13 But at least the game theoretic techniques,
00:53:15 they have these guarantees of solution quality.
00:53:19 So do you see business operations, strategic operations,
00:53:22 or even military in the future being
00:53:26 at least the strong candidates
00:53:28 being proposed by automated systems?
00:53:32 Do you see that?
00:53:34 Yeah, I do, I do.
00:53:35 But that’s more of a belief than a substantiated fact.
00:53:39 Depending on where you land in optimism or pessimism,
00:53:42 that’s a really, to me, that’s an exciting future,
00:53:45 especially if there’s provable things
00:53:48 in terms of optimality.
00:53:50 So looking into the future,
00:53:54 there’s a few folks worried about the,
00:53:58 especially you look at the game of poker,
00:54:01 which is probably one of the last benchmarks
00:54:03 in terms of games being solved.
00:54:05 They worry about the future
00:54:07 and the existential threats of artificial intelligence.
00:54:10 So the negative impact in whatever form on society.
00:54:13 Is that something that concerns you as much,
00:54:17 or are you more optimistic about the positive impacts of AI?
00:54:21 Oh, I am much more optimistic about the positive impacts.
00:54:24 So just in my own work, what we’ve done so far,
00:54:27 we run the nationwide kidney exchange.
00:54:29 Hundreds of people are walking around alive today,
00:54:32 who would it be?
00:54:34 And it’s increased employment.
00:54:36 You have a lot of people now running kidney exchanges
00:54:39 and at the transplant centers,
00:54:42 interacting with the kidney exchange.
00:54:45 You have extra surgeons, nurses, anesthesiologists,
00:54:49 hospitals, all of that.
00:54:51 So employment is increasing from that
00:54:53 and the world is becoming a better place.
00:54:55 Another example is combinatorial sourcing auctions.
00:54:59 We did 800 large scale combinatorial sourcing auctions
00:55:04 from 2001 to 2010 in a previous startup of mine
00:55:08 called CombineNet.
00:55:09 And we increased the supply chain efficiency
00:55:13 on that $60 billion of spend by 12.6%.
00:55:18 So that’s over $6 billion of efficiency improvement
00:55:21 in the world.
00:55:22 And this is not like shifting value
00:55:23 from somebody to somebody else,
00:55:25 just efficiency improvement, like in trucking,
00:55:28 less empty driving, so there’s less waste,
00:55:31 less carbon footprint and so on.
00:55:33 So a huge positive impact in the near term,
00:55:36 but sort of to stay in it for a little longer,
00:55:40 because I think game theory has a role to play here.
00:55:43 Oh, let me actually come back on that as one thing.
00:55:45 I think AI is also going to make the world much safer.
00:55:49 So that’s another aspect that often gets overlooked.
00:55:53 Well, let me ask this question.
00:55:54 Maybe you can speak to the safer.
00:55:56 So I talked to Max Tegmark and Stuart Russell,
00:55:59 who are very concerned about existential threats of AI.
00:56:02 And often the concern is about value misalignment.
00:56:06 So AI systems basically working,
00:56:11 operating towards goals that are not the same
00:56:14 as human civilization, human beings.
00:56:17 So it seems like game theory has a role to play there
00:56:24 to make sure the values are aligned with human beings.
00:56:27 I don’t know if that’s how you think about it.
00:56:29 If not, how do you think AI might help with this problem?
00:56:35 How do you think AI might make the world safer?
00:56:39 Yeah, I think this value misalignment
00:56:43 is a fairly theoretical worry.
00:56:46 And I haven’t really seen it in,
00:56:49 because I do a lot of real applications.
00:56:51 I don’t see it anywhere.
00:56:53 The closest I’ve seen it
00:56:55 was the following type of mental exercise really,
00:56:57 where I had this argument in the late eighties
00:57:00 when we were building
00:57:01 these transportation optimization systems.
00:57:03 And somebody had heard that it’s a good idea
00:57:05 to have high utilization of assets.
00:57:08 So they told me, hey, why don’t you put that as objective?
00:57:11 And we didn’t even put it as an objective
00:57:14 because I just showed him that,
00:57:16 if you had that as your objective,
00:57:18 the solution would be to load your trucks full
00:57:20 and drive in circles.
00:57:21 Nothing would ever get delivered.
00:57:23 You’d have a hundred percent utilization.
00:57:25 So yeah, I know this phenomenon.
00:57:27 I’ve known this for over 30 years,
00:57:29 but I’ve never seen it actually be a problem in reality.
00:57:33 And yes, if you have the wrong objective,
00:57:35 the AI will optimize that to the hilt
00:57:37 and it’s gonna hurt more than some human
00:57:39 who’s kind of trying to solve it in a half baked way
00:57:43 with some human insight too.
00:57:45 But I just haven’t seen that materialize in practice.
00:57:49 There’s this gap that you’ve actually put your finger on
00:57:52 very clearly just now between theory and reality.
00:57:57 That’s very difficult to put into words, I think.
00:57:59 It’s what you can theoretically imagine,
00:58:03 the worst possible case or even, yeah, I mean bad cases.
00:58:08 And what usually happens in reality.
00:58:10 So for example, to me,
00:58:11 maybe it’s something you can comment on having grown up
00:58:15 and I grew up in the Soviet Union.
00:58:19 There’s currently 10,000 nuclear weapons in the world.
00:58:22 And for many decades,
00:58:24 it’s theoretically surprising to me
00:58:28 that the nuclear war is not broken out.
00:58:30 Do you think about this aspect
00:58:33 from a game theoretic perspective in general,
00:58:36 why is that true?
00:58:38 Why in theory you could see
00:58:40 how things would go terribly wrong
00:58:42 and somehow yet they have not?
00:58:44 Yeah, how do you think about it?
00:58:45 So I do think about that a lot.
00:58:47 I think the biggest two threats that we’re facing as mankind,
00:58:50 one is climate change and the other is nuclear war.
00:58:53 So those are my main two worries that I worry about.
00:58:57 And I’ve tried to do something about climate,
00:58:59 thought about trying to do something
00:59:01 for climate change twice.
00:59:02 Actually, for two of my startups,
00:59:05 I’ve actually commissioned studies
00:59:06 of what we could do on those things.
00:59:09 And we didn’t really find a sweet spot,
00:59:11 but I’m still keeping an eye out on that.
00:59:12 If there’s something where we could actually
00:59:15 provide a market solution or optimizations solution
00:59:17 or some other technology solution to problems.
00:59:20 Right now, like for example,
00:59:23 pollution critic markets was what we were looking at then.
00:59:26 And it was much more the lack of political will
00:59:30 by those markets were not so successful
00:59:32 rather than bad market design.
00:59:34 So I could go in and make a better market design,
00:59:37 but that wouldn’t really move the needle
00:59:38 on the world very much if there’s no political will.
00:59:41 And in the US, the market,
00:59:43 at least the Chicago market was just shut down and so on.
00:59:47 So then it doesn’t really help
00:59:48 how great your market design was.
00:59:51 And then the nuclear side, it’s more,
00:59:53 so global warming is a more encroaching problem.
01:00:00 Nuclear weapons have been here.
01:00:03 It’s an obvious problem that’s just been sitting there.
01:00:05 So how do you think about,
01:00:07 what is the mechanism design there
01:00:09 that just made everything seem stable?
01:00:12 And are you still extremely worried?
01:00:14 I am still extremely worried.
01:00:16 So you probably know the simple game theory of mad.
01:00:20 So this was a mutually assured destruction
01:00:23 and it doesn’t require any computation with small matrices.
01:00:27 You can actually convince yourself
01:00:28 that the game is such that nobody wants to initiate.
01:00:31 Yeah, that’s a very coarse grained analysis.
01:00:34 And it really works in a situational way.
01:00:36 You have two superpowers or small number of superpowers.
01:00:40 Now things are very different.
01:00:41 You have a smaller nuke.
01:00:43 So the threshold of initiating is smaller
01:00:47 and you have smaller countries and non nation actors
01:00:51 who may get a nuke and so on.
01:00:53 So I think it’s riskier now than it was maybe ever before.
01:00:58 And what idea, application of AI,
01:01:03 you’ve talked about a little bit,
01:01:04 but what is the most exciting to you right now?
01:01:07 I mean, you’re here at NIPS, NeurIPS.
01:01:10 Now you have a few excellent pieces of work,
01:01:14 but what are you thinking into the future
01:01:16 with several companies you’re doing?
01:01:17 What’s the most exciting thing or one of the exciting things?
01:01:21 The number one thing for me right now
01:01:23 is coming up with these scalable techniques
01:01:26 for game solving and applying them into the real world.
01:01:30 I’m still very interested in market design as well.
01:01:33 And we’re doing that in the optimized markets,
01:01:35 but I’m most interested if number one right now
01:01:37 is strategic machine strategy robot,
01:01:40 getting that technology out there
01:01:41 and seeing as you were in the trenches doing applications,
01:01:45 what needs to be actually filled,
01:01:47 what technology gaps still need to be filled.
01:01:49 So it’s so hard to just put your feet on the table
01:01:52 and imagine what needs to be done.
01:01:53 But when you’re actually doing real applications,
01:01:56 the applications tell you what needs to be done.
01:01:59 And I really enjoy that interaction.
01:02:00 Is it a challenging process to apply
01:02:04 some of the state of the art techniques you’re working on
01:02:07 and having the various players in industry
01:02:14 or the military or people who could really benefit from it
01:02:17 actually use it?
01:02:19 What’s that process like of,
01:02:21 autonomous vehicles work with automotive companies
01:02:23 and they’re in many ways are a little bit old fashioned.
01:02:28 It’s difficult.
01:02:29 They really want to use this technology.
01:02:31 There’s clearly will have a significant benefit,
01:02:34 but the systems aren’t quite in place
01:02:37 to easily have them integrated in terms of data,
01:02:41 in terms of compute, in terms of all these kinds of things.
01:02:43 So is that one of the bigger challenges that you’re facing
01:02:48 and how do you tackle that challenge?
01:02:50 Yeah, I think that’s always a challenge.
01:02:52 That’s kind of slowness and inertia really
01:02:55 of let’s do things the way we’ve always done it.
01:02:57 You just have to find the internal champions
01:03:00 at the customer who understand that,
01:03:02 hey, things can’t be the same way in the future.
01:03:04 Otherwise bad things are going to happen.
01:03:06 And it’s in autonomous vehicles.
01:03:08 It’s actually very interesting
01:03:09 that the car makers are doing that
01:03:11 and they’re very traditional,
01:03:12 but at the same time you have tech companies
01:03:14 who have nothing to do with cars or transportation
01:03:17 like Google and Baidu really pushing on autonomous cars.
01:03:21 I find that fascinating.
01:03:23 Clearly you’re super excited
01:03:25 about actually these ideas having an impact in the world.
01:03:29 In terms of the technology, in terms of ideas and research,
01:03:32 are there directions that you’re also excited about?
01:03:36 Whether that’s on some of the approaches you talked about
01:03:40 for the imperfect information games,
01:03:42 whether it’s applying deep learning
01:03:44 to some of these problems,
01:03:45 is there something that you’re excited
01:03:46 in the research side of things?
01:03:48 Yeah, yeah, lots of different things
01:03:51 in the game solving.
01:03:53 So solving even bigger games,
01:03:56 games where you have more hidden action
01:03:59 of the player actions as well.
01:04:02 Poker is a game where really the chance actions are hidden
01:04:05 or some of them are hidden,
01:04:07 but the player actions are public.
01:04:11 Multiplayer games of various sorts,
01:04:14 collusion, opponent exploitation,
01:04:18 all and even longer games.
01:04:21 So games that basically go forever,
01:04:23 but they’re not repeated.
01:04:24 So see extensive fun games that go forever.
01:04:27 What would that even look like?
01:04:30 How do you represent that?
01:04:31 How do you solve that?
01:04:32 What’s an example of a game like that?
01:04:33 Or is this some of the stochastic games
01:04:35 that you mentioned?
01:04:36 Let’s say business strategy.
01:04:37 So it’s not just modeling like a particular interaction,
01:04:40 but thinking about the business from here to eternity.
01:04:44 Or let’s say military strategy.
01:04:49 So it’s not like war is gonna go away.
01:04:51 How do you think about military strategy
01:04:54 that’s gonna go forever?
01:04:56 How do you even model that?
01:04:58 How do you know whether a move was good
01:05:01 that somebody made and so on?
01:05:05 So that’s kind of one direction.
01:05:06 I’m also very interested in learning
01:05:09 much more scalable techniques for integer programming.
01:05:13 So we had an ICML paper this summer on that.
01:05:16 The first automated algorithm configuration paper
01:05:20 that has theoretical generalization guarantees.
01:05:23 So if I see this many training examples
01:05:26 and I told my algorithm in this way,
01:05:28 it’s going to have good performance
01:05:30 on the real distribution, which I’ve not seen.
01:05:33 So, which is kind of interesting
01:05:34 that algorithm configuration has been going on now
01:05:37 for at least 17 years seriously.
01:05:41 And there has not been any generalization theory before.
01:05:45 Well, this is really exciting
01:05:47 and it’s a huge honor to talk to you.
01:05:49 Thank you so much, Tomas.
01:05:51 Thank you for bringing Labradus to the world
01:05:52 and all the great work you’re doing.
01:05:54 Well, thank you very much.
01:05:55 It’s been fun.
01:05:55 No more questions.