July 30, 2020

Facebook Develops New ‘ReBeL’ Poker AI That Outperforms Libratus

3 min read

Facebook developers have created a general artificial intelligence framework known as Recursive Belief-based Learning (ReBeL) that has proven itself by excelling at a game that has long been difficult for AI programs: Texas Hold’em poker.

ReBeL poker AI Facebook — The ReBeL poker AI outperformed Libratus, which defeated a team of professional poker players in 2017. (Image: Carnegie Mellon University)

The ReBeL framework implements new concepts that allow it to better handle the partial information aspects of poker, and even outperform a previous superhuman poker AI, Libratus.

‘Public Belief States’ Aid in Self-Play Learning

In recent years, AI systems have shown an incredible ability to crack a variety of complex games. DeepMind’s AlphaZero program was able to teach itself chess, shogi (Japanese chess), and Go from just the basic rules of each, using self-play to reach new heights in all three games in a matter of hours.

Libratus also used self-play to learn heads-up No-Limit Hold’em. ReBeL does the same, but incorporates a new notion of what constitutes a “game state,” allowing the AI to better understand hidden information games during self-play.

ReBeL considers information about the visible game state, like the known cards, bet sizing, and even the range of hands the opponent might have. In addition, it also considers each players’ “belief” about the state they are in, similar to how a human might consider whether an opponent thinks they are ahead or behind in a hand.

To do so, ReBeL actually trains two different AI models through self-play reinforcement learning: a value network, and a policy network. The AI then operates on what researchers call public belief states, or PBS. In a perfect-information game like chess, simply having a game state is enough to make perfect decisions. A PBS considers both the game state and factors such as the policies of both players to come up with a complete, probabilistic model of all the possible actions a player might make, and how they might turn out.

According to researchers, ReBel has excelled at imperfect-information games thanks to this approach. The Facebook team conducted experiments in which ReBel played two-player versions of Hold’em, Turn Endgame Hold’em – a simplified version of the game with no raises on the first two betting rounds — and Liar’s Dice.

ReBeL Outperforms Libratus Against Human Foe

The result is an AI you wouldn’t want to face across the virtual felt. ReBeL defeated heads-up specialist Dong Kim by 165 thousandths of a big blind per hand over a 7,500-hand match. That’s higher than the 147 thousandths of a big blind by which Libratus defeated four human players in 2017. That may undersell the improvement, however: Libratus only beat Dong Kim by an estimated 29 thousandths of a big blind per hand back in that match.

If you’re worried that you might run into an opponent running ReBeL online, researchers have taken precautions against that happening.

“The most immediate risk posed by this work is its potential for cheating in recreational games such as poker,” the team wrote in its paper. “Partly for this reason, we have decided not to release the code for poker.”

They did release their open-source implementation for Liar’s Dice to aid in future research. The developers believe that ReBeL could help develop better general equilibrium-finding algorithms with applications in auctions, negotiations, cybersecurity, and self-driving vehicles, among other areas.