Poker social media has been buzzing about the latest Kaggle challenge, which pitted some of the best and most popular LLMs against each other in a tournament featuring three games: chess, werewolf, and, of course, poker.

The challenge itself is about Large Language Models (LLMs), such as ChatGPT, Gemini, Grok, and Claude. However, in the process of building hype for the tournament, the LLM part was replaced with AI, creating a somewhat misleading narrative.

Doug Polk is one of the experts who were invited to participate in the challenge and analyze hands played by the LLMs, and he’s so far published two videos, with the third and final one still to come. In these videos, Polk repeatedly uses the term “Artificial Intelligence,” which is a misnomer, whether by accident or design.

What we’ve learned from the challenge, if we are being completely blunt here, is that LLMs are really bad at poker, struggling to read their hole cards, the board, or even understand basic rules of Texas Hold’em—flushes seem to be a particular pain point.

To those who have been following similar challenges, this came as no surprise. LLM chess matches are just hilarious, featuring illegal moves, adding new pieces to the board, and more.

It’s entertaining content, and when presented by someone who has a natural sense of humor, which Polk certainly does, it’s fun to watch.

However, many times over, he said (semi-jokingly) that, if this is what AI can do, we have nothing to worry about, which, whether you worry about AI or not, is simply a red herring. The Kaggle challenge is not representative of AI on a large scale — it shows a small subset of it (LLMs), placed in an environment for which they weren’t developed.

Simply put, if you’re using an excellent tool to do a job it’s not meant to do, you’ll be disappointed. That doesn’t make the tool bad, though. In fact, we could go as far as to say that such an application provides almost zero relevant information about the tool.

AI Knows Poker Pretty Well

Poker is a complex game, and we may still be a bit away from the point where AI is crushing human players, but it’s nowhere near as bad as what you might think watching these LLMs spew chips like there is no tomorrow.

We know that computers have been superior to humans in chess for a while now, and this is no longer disputed by anyone. Even the best chess players in the world refuse to play AI because it’s pointless. They can’t win.

With the rise of poker solvers and the development of GTO strategy, we now have very good, almost perfect mathematical models for Hold’em. A player capable of reproducing this strategy 100% would crush it in the long run.

Of course, for a human, it’s impossible to learn all the different nodes in the virtually countless spots, but not for an AI, which is why poker bots are getting harder and harder to play against. And the more the game becomes solved, the better the AI will become at it.

Yet, Grok, ChatGPT, Gemini, and the rest of them seem completely oblivious and play some strange strategies that rely on dubious and jumbled data sets. How can that be?

The Red Herring

In everyday discourse, it’s become quite common to refer to everything that looks like it has the ability to “think” on its own as AI. In reality, though, LLMs are very bad at “thinking” or analyzing things. That’s no surprise, as they were not built for that purpose.

As suggested by the word “language” in the name, they represent a subset of the larger AI network, whose primary purpose is gathering and interpreting text inputs and generating credible, reliable text outputs. This makes them solid (albeit not perfect) to assist you when you need to research a particular topic and write about it.

However, their ability to further process the output they produce and use it in an actionable way falls short, which is something we see time and time again.

If you’ve seen Doug’s videos, you’ll recognize this pattern. LLMs are trying to explain their actions based on the information they gathered, and if you just skim over it, it looks solid. The language sounds right, they sound “smart”, but when you dig into the actual explanation, you realize that it makes no sense or, even worse, contradicts itself from one sentence to another.

LLMs are not using solvers or crunching numbers. Their knowledge is based on what they picked up from various sources across the internet, but that type of learning doesn’t work for very specific things.

For example, one reason why these LLMs may be overly aggressive with trash hands is probably that they had many examples of such hands to go through. People don’t really post about or discuss at length hands where they opened with 7-4 off and folded to a large 3-bet. There isn’t much to dissect there.

Calling a Spade a Spade

Computers and AI have changed and will continue to change poker. RTA and bots are real threats that need to be addressed. Is it a doomsday scenario? No, I don’t think so, but it is something to be very aware of.

Laughing at LLMs as they struggle to figure out that you need five cards of the same suit to make a flush doesn’t change that. They are not and have never been a threat to online poker. Unless you want to use them as your personal coach and take their advice at face value, in which case, good luck.

The whole thing may be funny and entertaining, but it’s not surprising. You can use Microsoft Word to edit images, too, but we all know the end results of that endeavor leave a lot to be desired. At the same time, just because Word sucks at editing images, it doesn’t mean we don’t have other, very powerful software to perform those tasks.

The bottom line is that these types of matchups are funny because of misplaced expectations. And I enjoy a good laugh as much as the next person, so no issues there. But don’t read too much into it, as there is much more to AI than LLMs.