Learning Binary Decision Trees from the Poker Hand Data Set

Greetings, my blog readers!

What a year this has been so far! I am finally finding some time and energy to write new posts on my blog. I am really looking forward to blogging more about machine learning and data science!

In this post I will share with you my approach to building a model to learn to recognize a poker hand. This problem and the training and testing data are available from the UCI machine learning repository here. My approach to this task is based on recognizing the following about the data:

  1. The training set of all positive examples is incomplete.
  2. It is possible to express all positive examples as a set of if/else/then rules.
  3. Given the training data and a classifier capable of learning the high-order rules, we should be able to achieve a 100% accuracy on test data. Thus the fact that the set of positive examples is incomplete does not matter.
  4. To evaluate our classifier we can stick with a simple accuracy ratio since our target accuracy is 100%.

The poker hand data set

There are 10 classes in the poker data set. Class 0 represents “nothing in hand” and is the largest class. The set of positive examples can be extended if we generate an exhaustive set of combinations representing a class. For example, we can fully specify the Flush by listing all possibilities of five cards of the same suit. We would need to list \frac{13!}{5!(13-5)!}\cdot 4 -40=5108 combinations, where minus 40 comes from subtracting the 40 possibilities of a straight flush.

I am not going to list all possibilities. Instead, I will train several binary decision tree models. Each tree will recognize one class vs. the rest. What do you think will happen if we present the raw training data to a decision tree model? Take, for example, scikit-learn DecisionTreeClassifier. Because our data is numeric, the tree will contain branches and nodes that do not represent meaningful rules that can be translated to categorical data. In other words, the decision tree model will try to locally optimize the splitting criteria and fail to see the big picture. If we want to use the decision tree model, we need to add more features. Specifically, we need to add features that will let the tree to spot the higher order separations that exist in this data.

Engineering more features

So, what features do we need to add? We need to add binary features indicating the Flush, 4 of a kind, 3 of a kind, a pair, full house, etc. Classes 6 and 8 can be learned by combining the features we are going to introduce. One can say that features engineering for this problem is cheating as it defeats the purpose of the task. I agree with this completely, and the main lesson of this post is that when presented with such data as the poker hand data set, the best choice of the classifier must take into account the higher-order abstracts. If adding new features is not possible, then using an un/supervised approach to first discover such constructs may be required.

Python code

Let’s take a look at a possible solution in Python (2.7). You can download it from my github space here. The “poker_hand.py” contains the solution. I have provided enough comments to make it clear what I am doing at each step. Please let me know if you can think of a more elegant way to concatenate_all_ranks and suits. The code has an ability to graph the trees using graphiz. Here is an example of a tree to pick-out the Royal Flush:

Royal Flush tree
Royal Flush tree

The code achieves 100% accuracy. No wonder! We told the tree what to learn…