What are people referring to when they talk about "game-theory optimal"?

basse · Oct 5, 2014

I wasn't sure where to put this thread, it pertains to both cash and tournament poker, really (but I think it's easier to talk about in terms of cash games). If it is the wrong place, I would appreciate if someone could move it to its right place.

I see a lot of people mention GTO or "game-theory optimal" in discussions here and elsewhere. What are people referring to, exactly, when they say this?

I am a researcher in computational game theory, and I don't think these concepts are well-defined, for mainly two reasons:

1. In games with more than two players, a strategy can only really be optimal with respect to a fixed set of strategies of all other players. At any given table, play is clearly not in equilibrium, so this represents serious problems for calling any strategy GTO.

2. In two-player zero-sum games (such as HU), it is more clear. Here, you can play a least exploitable strategy (technically you could also do this above, but that would probably be absolutely horrible, possibly consisting of folding almost everything), and it is arguably the optimal strategy. However, no such strategy is known. At least not for the poker games that people are discussing.

So I charge you, cardschat, with convincing me that the use of GTO as a discussion argument is sensible. I suspect that people might just be using it to mean something else than what it would technically mean in game theory.

Mase31683 · Oct 5, 2014

Cliff notes:

GTO is playing in a manner that is not exploitable.

A good example would be Nash equilibrium charts for shoving heads up.

/Cliff notes, begin tldr but I think you'll appreciate this.

Understanding Game Theory and Hold’em, by Bryce Paradis and Douglas Zare

Game theory has become a popular, if somewhat misunderstood, topic for hold’em discussion. This article is intended to give you a fundamental understanding of what game theory optimal strategy is, how it works, and what its impact is on hold’em play. Before we begin on the article proper, however, we will start by reviewing some key definitions. These definitions are not necessarily the same as those used by all others.

Optimal Exploitive Strategy: A strategy which yields the highest possible EV against your opponent’s strategy. For example, if in a game of rock-paper-scissors your opponent’s strategy is to choose rock every single time your optimal exploitive strategy is to pick paper every single time. The same is true if your opponent’s strategy is rock 50%, paper 25%, and scissors 25%.

Suboptimal Strategy: A strategy which performs worse than an optimal exploitive strategy. For example, if your opponent’s strategy is to choose rock every single time, choosing paper 50% and rock 50% is still a winning strategy. The EV of the paper-and-rock strategy, however, is less than that of the paper-only strategy. Therefore the paper-and-rock strategy is suboptimal.

Game Theory Optimal (GTO): A strategy that yields the highest possible EV (or: “is optimal”) if your opponent always chooses the best possible counter-strategy. In a game of rock-paper-scissors the GTO strategy is to choose randomly from an equal distribution of paper, scissors, and rocks. If you play rock less often than paper, you will have less than ½ equity against an all scissors strategy. Similarly, you must play paper at least as often as you play scissors, and scissors at least as often as you play rock. As a result, you must play paper, scissors, and rocks with equal frequency to guarantee ½ equity against all strategies. So long as your opponent always chooses the optimal counter-strategy to whatever strategy you choose no strategy on your part can have a higher EV than this.

Exploitive Strategy: Any strategy which has a higher EV than GTO strategy against a particular opponent.

Exploitable Strategy: A strategy which has less EV against some exploitive strategies than GTO strategy. All non-GTO strategies are exploitable.

When analyzing optimal, exploitive strategies, we treat an opponent’s strategy as a known. For example: “my opponent always chooses rock.” In reality, our opponent’s strategy is an unknown, and we often act on assumptions and observations in order to determine what we will treat our opponent’s strategy as. To determine a GTO strategy, we assume that our opponent always chooses the optimally exploitive counter to whichever strategy we try, rather than playing a fixed strategy.

Hold’em is a much more complicated game than rock-paper-scissors, and until the game is solved by computers no one will ever play against an opponent who always chooses a GTO (or: “unexploitable”) strategy. This is an important point, as a GTO strategy is not necessarily the strategy with the highest possible EV. For example, if our opponent’s strategy is rock-only then the GTO strategy of choosing randomly from an equal distribution of paper, scissors, and rocks has less EV than that of the paper-only strategy.

GTO play, however, still plays an important role in hold’em strategy. Even though a GTO strategy may have less EV an exploitive strategy, understanding what the GTO strategy is and being able to identify how our opponents’ strategy deviates from it can help you to better exploit your opponents. Further, understanding GTO strategy can also allow to be able to create balanced strategies which are difficult to exploit. These strategies can be used as a defense against tough opponents looking for an exploitive edge.

In hold’em, as in many simple games such as rock-paper-scissors, a GTO strategy is often identifiable by finding an indifference point. What this means is that the GTO strategy will often distribute your actions in such a way that your opponent is indifferent to choosing between two actions. As a result your strategy is unexploitable.

Although hold’em has not been solved, many half-street and full-street mini-games which model real hold’em situations have been solved. By understanding where the indifference points lie in different hold’em scenarios, you can identify your opponent’s deviations from GTO play and exploit your opponent maximally. At its most basic conceptual level hold’em is still a very simply game: rather than playing with a distribution of paper, scissors and rocks we play with a distribution of bluffs and not-bluffs. By understanding even just the simplest mini-games you can greatly improve your play.

A common example of a half-street game would be one where we either hold hands that always win, or always lose if we see a showdown, and can either bet or check, and our opponent may only call or fold. If he calls, there is a showdown. This is often analogous to a river-betting scenario in real hold’em play where our opponent’s range is narrow and ours is polarized. By solving the mini-game we can see that the GTO strategy is to bluff an amount proportionate to the price we are laying our opponent on his call. For example, if we bet $1 into a $2 pot we are laying 3:1 by betting, and the GTO strategy is to bluff 25% of the time that we bet. Our opponent will be indifferent to calling or folding. As a result, we know that if we deviate from this strategy our opponent can exploit us by either always calling if we bluff more, or always folding if we bluff less.

Conversely, in this scenario the pot is laying us 2:1 on our bluffs, and so we become indifferent to betting or checking with our bluffs if our opponent calls 67% of the time. This is our opponent’s GTO strategy. If our opponent deviates from this strategy we can exploit him by always bluffing if he calls less, or by never bluffing if he calls more.

If our opponent deviates from GTO strategy in the previous example, the optimal exploitive strategies of either always folding or always bluffing have higher EV than any exploitive strategies which involve bluffing or folding less than 100% of the time. Weak opponents are weak not only because they choose exploitable strategies so often, but because we can also make such large deviations from indifference points without them adapting to exploit us.

Not all GTO decisions involve finding an indifference point. For example, say we are playing a variant of rock-paper-scissors where there is a fourth option to choose dynamite, which beats everything. The GTO strategy is to choose dynamite-only. Your opponent, however, may still select a dominated strategy by choosing either paper, scissors, or rock. Similar circumstances arise in hold’em, for example, when the nuts is such a large portion of our total range that we are unable to bluff often enough to make our opponent indifferent to calling or folding.

What this means is that while a GTO strategy can never be exploited, and can therefore never be a losing strategy in hold’em (if there is no rake), your opponents can still make dominated strategy decisions which will cause them to lose, and you to win. Therefore, while GTO strategies in hold’em are often suboptimal, the prospect of these “invincible strategies” still hold some exciting implications for a savvy student of game theory, particularly at the highest levels of play.

A tough opponent is only tough, after all, because he or she chooses makes far fewer suboptimal strategy decisions than soft opponent. An extraordinarily tough opponent will have an extremely refined capacity for dynamic play. If you choose a strategy of rock-only, he or she will quickly recognize it and choose paper-only, and so on. Such players will quickly identify trends in your play, or even make pre-emptive assumptions about your play, which may allow them to exploit your non-GTO strategies with unnerving frequency and accuracy.

It is appealing to think that by selecting a GTO strategy, our opponents could only lose. However, even the strongest opponents have exploitive (and therefore potentially-exploited) strategies in their play, and hold’em is, after all, a game of incomplete information. If you are playing against an extremely tough opponent who you know uses a strategy analogous to paper 33%, scissors 20%, and rock 47%, you would be foolish to attempt a strategy of paper-only. By definition of your opponent’s toughness, your opponent will quickly adapt to exploit you. By understanding where the indifference points lie, however, and by making small deviations from them, you can still play exploitatively. Even the toughest, most cut-throat opponents are not clairvoyant, after all, and if you elect an exploitive strategy of paper 40%, scissors 30%, rock 30% how are they to know?

basse · Oct 6, 2014

Mase31683 said:
Cliff notes:

GTO is playing in a manner that is not exploitable.

A good example would be Nash equilibrium charts for shoving heads up.

/Cliff notes, begin tldr but I think you'll appreciate this.

In hold’em, as in many simple games such as rock-paper-scissors, a GTO strategy is often identifiable by finding an indifference point. What this means is that the GTO strategy will often distribute your actions in such a way that your opponent is indifferent to choosing between two actions. As a result your strategy is unexploitable.

Although hold’em has not been solved, many half-street and full-street mini-games which model real hold’em situations have been solved. By understanding where the indifference points lie in different hold’em scenarios, you can identify your opponent’s deviations from GTO play and exploit your opponent maximally. At its most basic conceptual level hold’em is still a very simply game: rather than playing with a distribution of paper, scissors and rocks we play with a distribution of bluffs and not-bluffs. By understanding even just the simplest mini-games you can greatly improve your play.

A common example of a half-street game would be one where we either hold hands that always win, or always lose if we see a showdown, and can either bet or check, and our opponent may only call or fold. If he calls, there is a showdown. This is often analogous to a river-betting scenario in real hold’em play where our opponent’s range is narrow and ours is polarized. By solving the mini-game we can see that the GTO strategy is to bluff an amount proportionate to the price we are laying our opponent on his call. For example, if we bet $1 into a $2 pot we are laying 3:1 by betting, and the GTO strategy is to bluff 25% of the time that we bet. Our opponent will be indifferent to calling or folding. As a result, we know that if we deviate from this strategy our opponent can exploit us by either always calling if we bluff more, or always folding if we bluff less.

Conversely, in this scenario the pot is laying us 2:1 on our bluffs, and so we become indifferent to betting or checking with our bluffs if our opponent calls 67% of the time. This is our opponent’s GTO strategy. If our opponent deviates from this strategy we can exploit him by always bluffing if he calls less, or by never bluffing if he calls more.

How are these Nash equilibrium charts for shoving heads-up generated? I don't see how they could possibly be guaranteed to be unexploitable. We don't know any Nash equilibria of 2p NL hold'em.

Thanks for the article, it's an interesting read, and not TLDR at all

It certainly takes a somewhat more rigorous approach to what GTO means. I still have a few concerns on it though. Note that these are more theoretical than practical, I think what is written above is very useful in a practical sense, and probably the only semi-sound way to use game theory to inform play, for the time being. I still think it would be nice if the authors acknowledged these caveats, though.

Specifically, my contention is with what I left in the quote. It seems to essentially imply that, while we can't solve 2-player NLHE, we can solve "subgames" of it. However, the notion of a subgame, in a game-theoretic sense, does not apply to any subset of poker. This is because all game positions are interleaved with each other because the players don't know each other's hand. When solving isolated hands like is suggested here, you might open yourself up to exploitation by an opponent who realizes that you solved hands in isolation, and who can then exploit you by manipulating the isolated hand scenarios so that they don't have the structure you solved for. This would make such strategies non-GTO.

GWU73 · Oct 12, 2014

GTO is really only good for off the table study and developing you ranges. It will help you identify profitable situations that occur regularly, identify weak players, and simplify decision making. I recommend NOT trying to play GTO, but actually researching game theory. Harvard University has a wonderful lecture / class on game theory available on you tube.

S3mper · Oct 12, 2014

Make sure when you post stuff like that you post the link in which you have found it.

Here it is just remember next time =P

http://forumserver.twoplustwo.com/94/stoxpoker-com/understanding-game-theory-holdem-245479/

basse · Oct 12, 2014

GWU73 said:
GTO is really only good for off the table study and developing you ranges. It will help you identify profitable situations that occur regularly, identify weak players, and simplify decision making. I recommend NOT trying to play GTO, but actually researching game theory. Harvard University has a wonderful lecture / class on game theory available on you tube.

I don't think you read my post

I do research in game theory, and know it very well. Poker is the part that I need experience on. I'm basically just complaining about all these game-theory optimal phrases being thrown around, because it's technically not correct. My impression is that game-theory optimal (when poker people talk about it) has just become a slang to refer to some strategies that are Nash equilibria in these restricted games, which does not imply that they are Nash equilibria in the full games.

Faydar09 · Oct 1, 2015

some great reading here, thanks

BogdanStark · Oct 6, 2015

I understand you, dude! GTO and others theories what players use are quit difficult and rather more stupid in online poker, especial in micro limits.
I think every poker players try to join forces includes poker theory, strategy and knowledges they have to maximise value of winnings and poker experience.
As for me, I play table by table using strategy by strategy. What I mean: each table has own special atmosphere and status. Each poker player has personal status so my strategy will be differ from each to other opponent.

touchmytallalla · Oct 7, 2015

I think a lot of people trying and say they are playing gto. But liitle of them really understand gto. And noone knows clear gto. IMHO exploit is more profitable.

Search

What are people referring to when they talk about "game-theory optimal"?

basse

Enthusiast

Mase31683

Legend

basse

Enthusiast

GWU73

Visionary

S3mper

Poker Not Checkers

basse

Enthusiast

Faydar09

Rising Star

BogdanStark

Visionary

touchmytallalla

Rock Star

About CardsChat

Our Community

Our Site

Follow us

Language