How This Poker-Playing Computer Beat the Best Human Players

February 1, 2017 10:49 AM EST

When Tuomas Sandholm began studying poker to research artificial intelligence 12 years ago, he never imagined that a computer would be able to defeat the best human players. “At least not in my lifetime,” he says.

But Sandholm, a computer science professor at Carnegie Mellon University, along with doctorate student Noam Brown, developed AI software capable of doing just that. The program, called Libratus, successfully defeated four professional poker players in a 20-day competition that ended on Jan. 30. After playing 120,000 hands of heads-up, no-limit Texas Hold’em, Libratus was ahead of its human challengers by more than $1.7 million in chips.

“I didn’t expect that we would win by this much,” says Sandholm. “I thought we had a 50-50 chance.”

Games have long served as tools for training artificial intelligence and measuring new breakthroughs. Google’s Deepmind AlphaGo software made headlines last year after it defeated legendary player Lee Sedol in the ancient and highly complex Chinese game of Go. IBM’s Watson, which is now being used for everything from diagnosing diseases to aiding in online shopping, is still best known for beating Jeopardy! champs Ken Jennings and Brad Rutter in 2011. And who could forget when IBM’s Deep Blue defeated then-world chess champion Garry Kasparov in 1996?

What makes poker different than a game of chess or Go is the level of uncertainty involved. Unlike those aforementioned games, poker players don’t have access to all of the elements in the game. Whereas chess and Go players can view the entire board, including their opponent’s pieces, there’s no way to tell which cards an adversary might be holding, other than players’ “tells.” Conquering games like poker, known as “imperfect information” situations, opens up new possibilities for computers in the future, says Sandholm.

Sandholm spoke with TIME about how he developed Libratus and the factors that contributed to its victory. What follows is a transcript of our conversation that has been edited for length and clarity.

You’ve been developing artificial intelligence systems specifically for playing poker over the past 12 years. What were the breakthroughs that enabled Libratus to be so successful this time?

SANDHOLM: There are really three pieces of the architecture, and each one has really important advancements over the prior corresponding modules. One is the strategy computation ahead of the time, so the algorithms that are game-independent, meaning they’re not about poker. The second module is the endgame solving. During the game, the computer will think about how to refine its strategy.

The third piece is the continual improvement of its own strategy in the background. So, based on what holes the opponent found in our strategy, the AI will automatically see which of those holes have been the biggest and the most frequently exploited. And then overnight on a supercomputer, it will compute patches to those pieces of the strategy, and they’re automatically glued into the main strategy.

AI has become incredibly advanced, but it still can’t communicate as well as humans. Given that, how did you teach Libratus to bluff?

Bluffing is not really programmed in. The algorithm for solving these games just comes up with the strategy, and the strategy includes bluffing. Given the input rules of the game, the algorithm will already output a strategy, and that strategy does involve bluffing. And it also involves understanding the opponent’s bluffing.

How does this differ from the algorithms you’ve used in the past? Your previous AI, Claudico, wasn’t able to win as many chips as human poker professionals when it competed in 2015.

It’s a combination of these three modules we talked about. Each one has new algorithms. Using the new algorithm in any two of them, but with old algorithms in any one of the modules, would not have done the trick. So all of the new algorithms in all three modules were necessary.

Can you go into more detail about how these new algorithms work?

The main benefit [of the first module] is that it can solve the game faster, meaning we can solve larger abstractions. In the second module, we were doing what’s called “nested endgame solving.” Instead of just solving the endgame once, we are solving it every time the opponent makes a move in the endgame. So we can actually take the opponent’s bet sizes into account. And we do what we call safe endgame solving, [which is] taking into account the opponent’s mistakes so far.

And in the last module, unlike learning to exploit opponents as other people’s learning systems have done, including ours in the past, we are actually letting opponents’ actions tell us where our biggest holes are. And then we are automatically algorithmically fixing those holes in our own strategy. So instead of trying to learn to exploit the opponent, we are learning to patch our own strategy to become less exploitable.

We’ve seen AI defeat renowned human players in games like Go, chess, Jeopardy!, and now Texas Hold’em. What’s an example of a game that’s still too complex for a computer to master?

Well, heads up, no-limit Texas Hold’em was really the last frontier of the games on which AI research has been done seriously. And by seriously I mean for many decades. So, Othello, checkers, chess, and heads up Texas Hold’em, those are really games where the best AI had already surpassed the best human. It had remained elusive for years and now we have actually achieved superhuman performance on that game. That said, of course there are a lot of games where AI is not as good as humans because it has not been studied yet.

Tell me about how this type of technology can be used outside of board games.

I’ve been working on poker for 12 years and have been doing research in automated negotiation for 27 years. So I don’t view poker as an application; poker has emerged as the benchmark in the AI community for testing these types of algorithms for solving imperfect information games. These algorithms work for any imperfect information game.

And by game, I don’t mean recreational. I mean these games can be very high stakes, like business-to-business negotiations, military strategy planning, cybersecurity, finance, medical treatment planning of certain kinds. These are really for a host of applications, really any situation that can be modeled theoretically as a game. Now that we’ve shown that the best AI’s ability to do strategic reasoning in an imperfect information setting has surpassed that of the best humans, there’s really a strong reason for companies to start using this kind of AI support in their interactions.

This Researcher Programmed the Perfect Poker-Playing Computer

More Must-Reads From TIME