In next two sections, we will go through a simple and easy to interpret machine learning algorithm, decision tree. It is a powerful algorithm that is the fundamentals for other widely applicable and complex algorithm like Random Forest, XGBoost and LightGBM. One of the features of decision tree is that it can solve both regression and classification problems. In this section, we will talk about solving regression tasks through regression tree. We will explain the mathematics behind the algorithm, followed by a complete Python code for applying regression tree to a real-world example.

Mathematics

Structure

First of all, let's quickly go through the terms related to decision tress.

https://s3-us-west-2.amazonaws.com/secure.notion-static.com/b4900f5c-23de-4887-aa0a-5a89fcb9b42e/RegressionTreeDefinition.png

Root Node: The top-most node of a decision tree. It does not have any node above and represents the entire data set

Decision Node: The nodes are used to make any decision and have multiple branches

Leaf Node: The output of the decisions and do not contain any further branches. In regression tree, the prediction is the average value of leaf node. In classification tree, the prediction is the mode of leaf node. Please refer to LINK

Here comes to a question ? How can we spilt the data in the decision nodes ? What do we based on to make decisions ?

Approach

Reduction in variance is the mathematical approach to spilt data in decision nodes for regression trees. It uses variance as a measure for deciding the feature on which decision node is spilt. For classification problem, we have different ways to handle that which will be introduced in the next blog.

$Variance = \dfrac{\sum(X - \mu) ^2}{N}$

Here are the steps to use reduction in variance:

  1. On the decision node, try one feature and one value as condition to spilt data into left and right node
  2. Calculate the weighted variance of each node and combine them together
  3. Loop all the feature and value combination to select the one that has lowest weighted variance
  4. Loop step 1-3 until either it reached pre-selected maximum tree depth or pre-selected minimum leaf size

Hyperparameters

Maximum tree depth and minimum leaf size are the hyperparameters we can tune to optimize the performance. Maximum tree depth aims to control the depth of tree (as known as number of layers in the tree). The deeper we allow the tree to grow, the more complex the model will become because it will have more spilt. Minimum leaf size aims to control the minimum sample sizes in a leaf node, the smaller the size is, the more complex the model will become.