Anatomy of Decision Tree Algorithm-AI-php.cn

Home

Technology peripherals

Anatomy of Decision Tree Algorithm

王林

Apr 12, 2023 pm 10:01 PM

machine learningdecision treesupervised learning

Translator| Zhao Qingyu

Reviewer| Sun Shujuan

Foreword

In machine learning, classification has two stages , are the learning stage and the prediction stage respectively. In the learning phase, a model is built based on the given training data; in the prediction phase, the model is used to predict the response given the data. Decision trees are one of the easiest classification algorithms to understand and explain.

In machine learning, classification has two stages, namely the learning stage and the prediction stage. In the learning phase, a model is built based on the given training data; in the prediction phase, the model is used to predict the response given the data. Decision trees are one of the easiest classification algorithms to understand and explain.

Decision tree algorithm

The decision tree algorithm is one of the supervised learning algorithms. Unlike other supervised learning algorithms, the decision tree algorithm can be used to solve both regression and classification problems.

The purpose of using a decision tree is to create a training model that predicts the class or value of a target variable by learning simple decision rules inferred from previous data (training data).

In a decision tree, we start from the root of the tree to predict the class label of a record. We compare the value of the root attribute with the recorded attribute, and based on the comparison, we follow the branch corresponding to this value and jump to the next node.

Types of decision trees

Based on the types of target variables we have, we can divide trees into two types:

1. Categorical variable decision tree: Yes A decision tree of a categorical target variable is called a categorical variable decision tree.

2. Continuous variable decision tree: The target variable of the decision tree is continuous, so it is called a continuous variable decision tree.

Example: Suppose we have a problem of predicting whether a customer will pay a renewal premium to an insurance company. A customer's income is an important variable here, but insurance companies don't have income details for all customers. Now that we know this is an important variable, we can then build a decision tree to predict a customer's revenue based on occupation, product, and various other variables. In this case, we predict that the target variable is continuous.

1. Root node (root node): It represents the entire member or sample, which will be further divided into two or more of the same type collection.

2. Splitting): The process of splitting a node into two or more child nodes.

3. Decision Node: When a child node splits into more child nodes, it is called a decision node.

4. Leaf/Terminal Node: A node that cannot be split is called a leaf or terminal node.

5. Pruning: The process in which we delete the child nodes of a decision node is called pruning. Construction can also be seen as the reverse process of separation.

6. Branch/Sub-Tree: A subpart of the entire tree is called a branch or subtree.

7. Parent and Child Node: A node that can be split into child nodes is called a parent node, and a child node is a child node of the parent node.

Anatomy of Decision Tree Algorithm

The decision tree classifies samples in descending order from the root to the leaf/terminal nodes, which provide the classification method of the sample. Each node in the tree acts as a test case for a certain attribute, and each descending direction from the node corresponds to a possible answer to the test case. This process is recursive in nature and is treated the same for each subtree rooted at a new node.

Assumptions made when creating decision trees

The following are some assumptions we make when using decision trees:

●First, take the entire training set as the root.

●It is best that the feature values can be classified. If these values are continuous, they can be discretized before building the model.

●Records are recursively distributed based on attribute values.

●By using some statistical methods, the corresponding attributes are placed at the root node of the tree or the internal nodes of the tree in order.

The decision tree follows the sum of products expression form. Sum of Products (SOP) is also known as disjunctive normal form. For a class, each branch from the root of the tree to a leaf node with the same class is a conjunction of values, and the different branches ending in the class constitute a disjunction.

The main challenge in the implementation process of decision tree is to determine the attributes of the root node and each level node. This problem is the attribute selection problem. There are currently different attribute selection methods to select the attributes of nodes at each level.

How do decision trees work?

The separation characteristics of decision-making seriously affect the accuracy of the tree. The decision-making criteria of classification trees and regression trees are different.

Decision trees use a variety of algorithms to decide to split a node into two or more child nodes. The creation of child nodes increases the homogeneity of the child nodes. In other words, the purity of the node is increased relative to the target variable. The decision tree separates nodes on all available variables and then selects nodes that can produce many isomorphic child nodes for splitting.

The algorithm is selected based on the type of target variable. Next, let’s take a look at some algorithms used in decision trees:

ID3→(Extension of D3)

C4.5→(Successor of ID3)

CART→(Classification and Regression Tree)

CHAID→(Chi-square automatic interaction detection performs multi-level separation when calculating the classification tree)

MARS →(Multiple adaptive regression splines)

ID3 algorithm uses a top-down greedy search method to build a decision tree through the possible branch space without backtracking. Greedy algorithms, as the name suggests, always make what seems to be the best choice at a certain moment.

ID3 algorithm steps:

1. It uses the original set S as the root node.

2. During each iteration of the algorithm, iterate over the unused attributes in the set S and calculate the entropy (H) and information gain (IG) of the attribute.

3. Then select the attribute with the smallest entropy or the largest information gain.

4. Then separate the set S using the selected attributes to generate subsets of the data.

5. The algorithm continues to iterate over each subset, considering only attributes that have never been selected before on each iteration.

Attribute selection method

If the data set contains N attributes, then deciding which attribute to place at the root node or at different levels of the tree as an internal node is a complex step. The problem cannot be solved by randomly selecting any node as the root node. If we adopt a random approach, we may get worse results.

To solve this attribute selection problem, researchers have designed some solutions. They suggested using the following criteria:

Entropy
Information Gain
Gini Index
Gain Rate
Variance Reduction
Chi-square

Calculate the value of each attribute using these criteria, then sort the values and place the attributes in the tree in order, that is, the attributes with high values placed at the root position.

When using information gain as the criterion, we assume that the attributes are categorical, while for the Gini index, we assume that the attributes are continuous.

1. Entropy

Entropy is a measure of the randomness of the information being processed. The higher the entropy value, the harder it is to draw any conclusions from the information. Tossing a coin is an example of behavior that provides random information.

Anatomy of Decision Tree Algorithm

As can be seen from the above figure, when the probability is 0 or 1, the entropy H(X) is zero. Entropy is greatest when the probability is 0.5 because it projects complete randomness in the data.

The rule followed by ID3 is: a branch with an entropy of 0 is a leaf node, and a branch with an entropy greater than 0 needs to be further separated.

The mathematical entropy of a single attribute is expressed as follows:

Anatomy of Decision Tree Algorithm

#where S represents the current state, Pi represents the probability of event i in state S or i in state S node Class percentage.

The mathematical entropy of multiple attributes is expressed as follows:

Anatomy of Decision Tree Algorithm

where T represents the current state and X represents the selected attribute

2. Information Gain

Information gain (IG) is a statistical attribute used to measure the effectiveness of separate training for a given attribute based on the target class. Building a decision tree is the process of finding an attribute that returns the highest information gain and the lowest entropy.

Anatomy of Decision Tree Algorithm

#The increase in information is the decrease in entropy. It calculates the entropy difference before separation and the average entropy difference after separation of the data set based on the given attribute value. The ID3 decision tree algorithm uses the information gain method.

IG is expressed mathematically as follows:

Anatomy of Decision Tree Algorithm

With a simpler approach, we can draw this conclusion:

Anatomy of Decision Tree Algorithm

where before is the data set before splitting, K is the number of subsets generated by splitting, and (j, after) is the subset j after splitting.

3. Gini Index

You can think of the Gini index as a cost function used to evaluate separation in a data set. It is calculated by subtracting the sum of squared probabilities for each class from 1. It favors the case of larger partitions and is easier to implement, while information gain favors the case of smaller partitions with different values.

Anatomy of Decision Tree Algorithm

The Gini index is inseparable from the categorical target variable "success" or "failure". It only performs binary separation. The higher the Gini coefficient, the higher the degree of inequality and the stronger the heterogeneity.

The steps to calculate the Gini index separation are as follows:

Calculate the Gini coefficient of the child node, using the above success (p) and The formula for failure (q) is (p² q²).
Calculate the Gini coefficient index of the separation using the weighted Gini score of each node of the separation.

CART (Classification and Regression Tree) uses the Gini index method to create separation points.

4. Gain rate

Information gain tends to select attributes with a large number of values as root nodes. This means it prefers properties with a large number of different values.

C4.5 is an improved method of ID3, which uses gain ratio, which is a modification of the information gain to reduce its bias, which is usually the best choice method. Gain rate overcomes the problem of information gain by taking into account the number of branches before doing the splitting. It corrects for information gain by taking into account separate intrinsic information.

Suppose we have a dataset that contains users and their movie genre preferences based on variables such as gender, age group, class, etc. With the help of information gain you will separate in "Gender" (assuming it has the highest information gain), now the variables "Age Group" and "Rating" may be equally important, with the help of gain ratio we can choose Properties that are separated in the next layer.

Anatomy of Decision Tree Algorithm

where before is the data set before separation, K is the number of subsets generated by separation, (j, after) is the subset after separation j.

5. Variance reduction

Variance reduction is an algorithm used for continuous target variables (regression problems). The algorithm uses the standard variance formula to choose the best separation. Choose the separation with lower variance as the criterion for separating the population:

Anatomy of Decision Tree Algorithm

is the mean, X is the actual value, and n is the number of values.

Steps to calculate variance:

Calculate the variance of each node.
Calculate the variance of each separation and use it as the weighted average of the variance of each node.

6. Chi-square

CHAID is the abbreviation of Chi-squared Automatic Interaction Detector. This is one of the older tree classification methods. Find the statistically significant difference between a child node and its parent node. We measure it by the sum of squared differences between the observed and expected frequencies of the target variable.

It works with the categorical target variable "Success" or "Failure". It can perform two or more separations. The higher the chi-square value, the more statistically significant the difference between the child node and the parent node. It generates a tree called CHAID.

In mathematics, Chi-squared is expressed as:

Anatomy of Decision Tree Algorithm

The steps to calculate Chi-squared are as follows:

Calculate the chi-square for a single node by calculating the deviation for success and failure
Use separate successes and failures for each node The sum of all chi-squares calculates the separated chi-square

How to avoid/fight against overfitting of the decision tree?

There is a decision tree Common problem, especially with a tree full of columns. Sometimes it looks like the tree memorized the training data set. If a decision tree had no constraints, it would give you 100% accuracy on the training data set, because in the worst case, it would end up producing one leaf for every observation. Therefore, this affects accuracy when predicting samples that are not part of the training set.

Here I introduce two methods to eliminate overfitting, namely pruning decision trees and random forests.

1. Pruning the decision tree

The separation process will produce a fully grown tree until the stopping criterion is reached. However, mature trees are likely to overfit the data, resulting in poor accuracy on unseen data.

Anatomy of Decision Tree Algorithm

In pruning, you prune branches of the tree, that is, delete decision nodes starting from leaf nodes so that the overall accuracy is not disturbed. This is done by splitting the actual training set into two sets: training data set D and validation data set V, preparing the decision tree with the separated training data set D, and then continuing to prune the tree accordingly to optimize the validation data set V accuracy.

Anatomy of Decision Tree Algorithm

In the above image, the "Age" attribute on the left side of the tree has been pruned because it is more important on the right side of the tree, thus eliminating overfitting.

2. Random Forest

Random forest is an example of ensemble learning (Ensemble Learning). We combine multiple machine learning algorithms to obtain better prediction performance. Because the training data set is randomly sampled when building the tree, and random subsets of features are considered when separating nodes, we call this method random.

A technique called bagging is used to create a collection of trees in which multiple training sets are generated by replacement.

Bagging technology uses random sampling to divide the data set into N samples. Then, a single learning algorithm is used to build the model on all samples. The predictions are then combined through parallel voting or averaging.

Anatomy of Decision Tree Algorithm

Which is better, a linear model or a tree-based model?

The question depends on the type of problem you are trying to solve.

1. If the relationship between the dependent variable and the independent variable can be well modeled by a linear model, linear regression will be better than the tree-based model.

2. If there is a highly nonlinear and complex relationship between the dependent variable and the independent variable, the tree model will be better than the classic regression method.

3. If you need to build a model that is easy to understand, a decision tree model is always better than a linear model. The decision tree model is easier to understand than linear regression!

Using Scikit-learn to build a decision tree classifier

The data I used is from https://drive.google.com/open? For the supermarket related data downloaded by id=1x1KglkvJxNn8C8kzeV96YePFnCUzXhBS, first use the following code to load all basic libraries:

import numpy as np
import matplotlib.pyplot as plt 
import pandas as pd

After that, we use the following method to load the data set. It includes 5 attributes, user id, gender, age, estimated salary and purchase status.

data = pd.read_csv('/Users/ML/DecisionTree/Social.csv')
data.head()

Anatomy of Decision Tree Algorithm

Figure 1 Dataset

We only combine age and prediction Estimated salary is used as our independent variable X because other characteristics such as gender and user ID are irrelevant and have no impact on a person's purchasing power, and y is the dependent variable.

feature_cols = ['Age','EstimatedSalary' ]X = data.iloc[:,[2,3]].values
y = data.iloc[:,4].values

The next step is to separate the data set into training and test sets.

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test =train_test_split(X,y,test_size = 0.25, random_state= 0)

Next perform feature scaling

#feature scaling
from sklearn.preprocessing import StandardScaler
sc_X = StandardScaler()
X_train = sc_X.fit_transform(X_train)
X_test = sc_X.transform(X_test)

Fit the model to the decision tree in the classifier.

from sklearn.tree import DecisionTreeClassifier
classifier = DecisionTreeClassifier()
classifier = classifier.fit(X_train,y_train)

Make predictions and check accuracy.

#prediction
y_pred = classifier.predict(X_test)#Accuracy
from sklearn import metricsprint('Accuracy Score:', metrics.accuracy_score(y_test,y_pred))

The decision tree classifier has an accuracy of 91%.

Confusion Matrix

from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)Output:
array([[64,4],
 [ 2, 30]])

This means that there are 6 observations classified as errors.

First, let’s visualize the model predictions

from matplotlib.colors import ListedColormap
X_set, y_set = X_test, y_test
X1, X2 = np.meshgrid(np.arange(start = X_set[:,0].min()-1, stop= X_set[:,0].max()+1, step = 0.01),np.arange(start = X_set[:,1].min()-1, stop= X_set[:,1].max()+1, step = 0.01))
plt.contourf(X1,X2, classifier.predict(np.array([X1.ravel(), X2.ravel()]).T).reshape(X1.shape), alpha=0.75, cmap = ListedColormap(("red","green")))plt.xlim(X1.min(), X1.max())
plt.ylim(X2.min(), X2.max())for i,j in enumerate(np.unique(y_set)):
plt.scatter(X_set[y_set==j,0],X_set[y_set==j,1], c = ListedColormap(("red","green"))(i),label = j)
plt.title("Decision Tree(Test set)")
plt.xlabel("Age")
plt.ylabel("Estimated Salary")
plt.legend()
plt.show()

Anatomy of Decision Tree Algorithm ##Next, imagine this tree

Next you can use Scikit-learn's export_graphviz function to display the tree in a Jupyter notebook. In order to draw the tree, we need to install Graphviz and pydotplus using the following command:

conda install python-graphviz
pip install pydotplus

export_graphviz function converts the decision tree classifier to a point file, and pydotplus converts the point file to png or in The form displayed on Jupyter is implemented as follows:

from sklearn.tree import export_graphviz
from sklearn.externals.six import StringIO
from IPython.display import Image
import pydotplusdot_data = StringIO()
export_graphviz(classifier, out_file=dot_data,
filled=True, rounded=True,
special_characters=True,feature_names = feature_cols,class_names=['0','1'])
graph = pydotplus.graph_from_dot_data(dot_data.getvalue())
Image(graph.create_png())

在决策树形图中，每个内部节点都有一个分离数据的决策规则。Gini代表基尼系数，它代表了节点的纯度。当一个节点的所有记录都属于同一个类时，您可以说它是纯节点，这种节点称为叶节点。

在这里，生成的树是未修剪的。这棵未经修剪的树不容易理解。在下一节中，我会通过修剪的方式来优化树。

随后优化决策树分类器

criteria: 该选项默认配置是Gini，我们可以通过该项选择合适的属性选择方法，该参数允许我们使用different-different属性选择方式。支持的标准包含基尼指数的“基尼”和信息增益的“熵”。

splitter: 该选项默认配置是" best "，我们可以通过该参数选择合适的分离策略。支持的策略包含“best”(最佳分离)和“random”(最佳随机分离)。

max_depth：默认配置是None，我们可以通过该参数设置树的最大深度。若设置为None，则节点将展开，直到所有叶子包含的样本小于min_samples_split。最大深度值越高，过拟合越严重，反之，过拟合将不严重。

在Scikit-learn中，只有通过预剪枝来优化决策树分类器。树的最大深度可以用作预剪枝的控制变量。

# Create Decision Tree classifer object
classifier = DecisionTreeClassifier(criterion="entropy", max_depth=3)# Train Decision Tree Classifer
classifier = classifier.fit(X_train,y_train)#Predict the response for test dataset
y_pred = classifier.predict(X_test)# Model Accuracy, how often is the classifier correct?
print("Accuracy:",metrics.accuracy_score(y_test, y_pred))

至此分类率提高到94%，相对之前的模型来说，其准确率更高。现在让我们再次可视化优化后的修剪后的决策树。

dot_data = StringIO()
export_graphviz(classifier, out_file=dot_data,
filled=True, rounded=True,
special_characters=True, feature_names = feature_cols,class_names=['0','1'])
graph = pydotplus.graph_from_dot_data(dot_data.getvalue())
Image(graph.create_png())

Anatomy of Decision Tree Algorithm

上图是经过修剪后的模型，相对之前的决策树模型图来说，其更简单、更容易解释和理解。

总结

在本文中，我们讨论了很多关于决策树的细节，它的工作方式，属性选择措施，如信息增益，增益比和基尼指数，决策树模型的建立，可视化，并使用Python Scikit-learn包评估和优化决策树性能，这就是这篇文章的全部内容，希望你们能喜欢它。

译者介绍

赵青窕，51CTO社区编辑，从事多年驱动开发。

原文标题：Decision Tree Algorithm, Explained，作者：Nagesh Singh Chauhan

The above is the detailed content of Anatomy of Decision Tree Algorithm. For more information, please follow other related articles on the PHP Chinese website!

Statement

This article is reproduced at:51CTO.COM. If there is any infringement, please contact admin@php.cn delete

From Friction To Flow: How AI Is Reshaping Legal WorkMay 09, 2025 am 11:29 AM

The legal tech revolution is gaining momentum, pushing legal professionals to actively embrace AI solutions. Passive resistance is no longer a viable option for those aiming to stay competitive. Why is Technology Adoption Crucial? Legal professional

This Is What AI Thinks Of You And Knows About YouMay 09, 2025 am 11:24 AM

Many assume interactions with AI are anonymous, a stark contrast to human communication. However, AI actively profiles users during every chat. Every prompt, every word, is analyzed and categorized. Let's explore this critical aspect of the AI revo

7 Steps To Building A Thriving, AI-Ready Corporate CultureMay 09, 2025 am 11:23 AM

A successful artificial intelligence strategy cannot be separated from strong corporate culture support. As Peter Drucker said, business operations depend on people, and so does the success of artificial intelligence. For organizations that actively embrace artificial intelligence, building a corporate culture that adapts to AI is crucial, and it even determines the success or failure of AI strategies. West Monroe recently released a practical guide to building a thriving AI-friendly corporate culture, and here are some key points: 1. Clarify the success model of AI: First of all, we must have a clear vision of how AI can empower business. An ideal AI operation culture can achieve a natural integration of work processes between humans and AI systems. AI is good at certain tasks, while humans are good at creativity and judgment

Netflix New Scroll, Meta AI's Game Changers, Neuralink Valued At $8.5 BillionMay 09, 2025 am 11:22 AM

Meta upgrades AI assistant application, and the era of wearable AI is coming! The app, designed to compete with ChatGPT, offers standard AI features such as text, voice interaction, image generation and web search, but has now added geolocation capabilities for the first time. This means that Meta AI knows where you are and what you are viewing when answering your question. It uses your interests, location, profile and activity information to provide the latest situational information that was not possible before. The app also supports real-time translation, which completely changed the AI experience on Ray-Ban glasses and greatly improved its usefulness. The imposition of tariffs on foreign films is a naked exercise of power over the media and culture. If implemented, this will accelerate toward AI and virtual production

Take These Steps Today To Protect Yourself Against AI CybercrimeMay 09, 2025 am 11:19 AM

Artificial intelligence is revolutionizing the field of cybercrime, which forces us to learn new defensive skills. Cyber criminals are increasingly using powerful artificial intelligence technologies such as deep forgery and intelligent cyberattacks to fraud and destruction at an unprecedented scale. It is reported that 87% of global businesses have been targeted for AI cybercrime over the past year. So, how can we avoid becoming victims of this wave of smart crimes? Let’s explore how to identify risks and take protective measures at the individual and organizational level. How cybercriminals use artificial intelligence As technology advances, criminals are constantly looking for new ways to attack individuals, businesses and governments. The widespread use of artificial intelligence may be the latest aspect, but its potential harm is unprecedented. In particular, artificial intelligence

A Symbiotic Dance: Navigating Loops Of Artificial And Natural PerceptionMay 09, 2025 am 11:13 AM

The intricate relationship between artificial intelligence (AI) and human intelligence (NI) is best understood as a feedback loop. Humans create AI, training it on data generated by human activity to enhance or replicate human capabilities. This AI

AI's Biggest Secret — Creators Don't Understand It, Experts SplitMay 09, 2025 am 11:09 AM

Anthropic's recent statement, highlighting the lack of understanding surrounding cutting-edge AI models, has sparked a heated debate among experts. Is this opacity a genuine technological crisis, or simply a temporary hurdle on the path to more soph

Bulbul-V2 by Sarvam AI: India's Best TTS ModelMay 09, 2025 am 10:52 AM

India is a diverse country with a rich tapestry of languages, making seamless communication across regions a persistent challenge. However, Sarvam’s Bulbul-V2 is helping to bridge this gap with its advanced text-to-speech (TTS) t

See all articles

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

How to fix KB5055523 fails to install in Windows 11?

4 weeks agoByDDD

How to fix KB5055518 fails to install in Windows 10?

4 weeks agoByDDD

Roblox: Grow A Garden - Complete Mutation Guide

3 weeks agoByDDD

Roblox: Bubble Gum Simulator Infinity - How To Get And Use Royal Keys

3 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

How to fix KB5055612 fails to install in Windows 10?

3 weeks agoByDDD

Hot Tools

Dreamweaver Mac version

Visual web development tools

SAP NetWeaver Server Adapter for Eclipse

Integrate Eclipse with SAP NetWeaver application server.

SublimeText3 Chinese version

Chinese version, very easy to use

MantisBT

Mantis is an easy-to-deploy web-based defect tracking tool designed to aid in product defect tracking. It requires PHP, MySQL and a web server. Check out our demo and hosting services.

DVWA

Damn Vulnerable Web App (DVWA) is a PHP/MySQL web application that is very vulnerable. Its main goals are to be an aid for security professionals to test their skills and tools in a legal environment, to help web developers better understand the process of securing web applications, and to help teachers/students teach/learn in a classroom environment Web application security. The goal of DVWA is to practice some of the most common web vulnerabilities through a simple and straightforward interface, with varying degrees of difficulty. Please note that this software

Hot Topics

1664

1422

1316

1268

1241

Anatomy of Decision Tree Algorithm

Foreword

Decision tree algorithm

Types of decision trees

Important terms related to decision trees

Assumptions made when creating decision trees

How do decision trees work?

ID3 algorithm steps:

Attribute selection method

1. Entropy

2. Information Gain

3. Gini Index

4. Gain rate

5. Variance reduction

6. Chi-square

How to avoid/fight against overfitting of the decision tree?

1. Pruning the decision tree

Next you can use Scikit-learn's export_graphviz function to display the tree in a Jupyter notebook. In order to draw the tree, we need to install Graphviz and pydotplus using the following command:

随后优化决策树分类器

总结

译者介绍

Hot AI Tools

Undresser.AI Undress

AI Clothes Remover

Undress AI Tool

Clothoff.io

Video Face Swap

Hot Article

Hot Tools

Dreamweaver Mac version

SAP NetWeaver Server Adapter for Eclipse

SublimeText3 Chinese version

MantisBT

DVWA

Hot Topics