In previous blog [LINK], we discussed how to implement decision tree to solve regression problem. Today, we introduce another decision tree algorithm to solve classification problem, classification tree. Similar framework rolls out, we will firstly walk through the mathematics behind classification tree and following by a complete Python code for applying classification tree to a real-world example.
Please refer regression tree page for structure: LINK
Gini index is one of the mathematical approach to spilt data in decision nodes for classification trees. It uses proportion mix as a measure for deciding the feature on which decision node is spilt.
$Gini = 1 - \sum_{i=1}^n (p_i)^2$
Here are the steps to use reduction in variance:
Please refer regression tree page for structure: LINK
In this example we will use data from scikit learn package
Download from scikit learn package: Link
This dataset contains information about difference iris plants.
Input variables