What are decision trees in ML?

Avnii Thakur
Sep 17
709
0
2

Article

🌱 Introduction to Decision Trees

A Decision Tree is one of the most popular and easy-to-understand algorithms in machine learning. Just like humans make decisions by asking "yes or no" questions, decision trees follow a similar process.

They split data into branches based on conditions until a final decision (or prediction) is made. Because of their simplicity, decision trees are often the first choice for classification and regression tasks.

🧩 How Do Decision Trees Work?

Imagine you want to decide if a person will play tennis today based on weather conditions.

If it’s sunny, check the humidity.
If humidity is high → Don’t play.
If humidity is normal → Play.
If it’s rainy, check wind conditions, and so on.

This step-by-step process can be visualized as a tree, where each internal node represents a question (feature), and each branch represents a possible answer. The final nodes (leaf nodes) represent the prediction.

📊 Key Concepts in Decision Trees

Root Node 🌳 – The first question or feature that splits the dataset.
Decision Node 🔀 – A point where the dataset is split further based on conditions.
Leaf Node 🍂 – The final output or decision (e.g., “Play” or “Don’t Play”).
Splitting ✂️ – Dividing a dataset into subsets using a feature.
Pruning ✂️🌱 – Reducing the size of a tree to avoid overfitting.
Gini Impurity & Entropy 📉 – Metrics used to decide the best split.
Information Gain 📈 – The amount of improvement achieved by a split.

🤖 Types of Decision Trees

Classification Tree 🏷️
- Used when the output variable is categorical (Yes/No, Spam/Not Spam, etc.).
Regression Tree 📈
- Used when the output variable is continuous (predicting house prices, temperature, etc.).

🐍 Decision Tree in Python (Scikit-learn Example)

Here’s a simple Python example of using a Decision Tree Classifier:

from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier
from sklearn import tree

# Load dataset
iris = load_iris()
X, y = iris.data, iris.target

# Train Decision Tree model
clf = DecisionTreeClassifier(criterion="entropy", max_depth=3)
clf.fit(X, y)

# Visualize the tree
tree.plot_tree(clf, feature_names=iris.feature_names, class_names=iris.target_names, filled=True)

👉 This code loads the famous Iris dataset, trains a decision tree, and visualizes it.

✅ Advantages of Decision Trees

Easy to understand and interpret 🌍
No need for feature scaling (like normalization) ⚡
Works with both numerical and categorical data 🔢🔠
Can handle missing values 🌫️
Provides clear visualization 📊

❌ Disadvantages of Decision Trees

Prone to overfitting if not pruned properly 😕
Sensitive to small changes in data 🔄
Sometimes biased towards features with more levels ⚖️
May not be as accurate as ensemble methods like Random Forests 🌲🌲

🌍 Real-World Applications

Healthcare 🏥 → Diagnosing diseases based on symptoms
Finance 💰 → Predicting loan approvals or credit risks
E-commerce 🛒 → Recommending products based on customer behavior
Marketing 📢 → Customer segmentation for targeted campaigns
Sports ⚽ → Predicting outcomes based on player stats

🎯 Conclusion

Decision Trees are a powerful yet simple machine learning algorithm. They form the foundation for more advanced methods like Random Forests and Gradient Boosted Trees.

If you are just starting with ML, decision trees are the best place to begin—easy to visualize, simple to code, and useful for both beginners and professionals.