a) See the following figure for the ID3 decision tree:
b) Only the disjunction of conjunctions for Martians was required:
$\begin{aligned}
&(\text { Legs }=3) \vee \\
&(\text { Legs }=2 \wedge \text { Green }=\text { Yes } \wedge \text { Height }=\text { Tall }) \vee \\
&(\text { Legs }=2 \wedge \text { Green }=\text { No } \wedge \text { Height }=\text { Short } \wedge \text { Smelly }=\text { Yes })
\end{aligned}$
Python Code
Step 1: Organize the Dataset
Our data has the following features and values:
- Species: Target variable (M or H)
- Features:
- Green: \( N \) or \( Y \)
- Legs: \( 2 \) or \( 3 \)
- Height: \( S \) or \( T \)
- Smelly: \( N \) or \( Y \)
Index |
Species |
Green |
Legs |
Height |
Smelly |
1 |
M |
N |
3 |
S |
Y |
2 |
M |
Y |
2 |
T |
N |
3 |
M |
Y |
3 |
T |
N |
4 |
M |
N |
2 |
S |
Y |
5 |
M |
Y |
3 |
T |
N |
6 |
H |
N |
2 |
T |
Y |
7 |
H |
N |
2 |
S |
N |
8 |
H |
N |
2 |
T |
N |
9 |
H |
Y |
2 |
S |
N |
10 |
H |
N |
2 |
T |
Y |
Step 2: Calculate the Initial Entropy for the Target Variable (Species)
We start by calculating the entropy of the target variable, Species, which has two classes: M (Martian) and H (Human).
Total Counts
- Martians (M): 5
- Humans (H): 5
- Total: 10
Entropy Formula
The entropy \( E \) for a binary classification is calculated as:
$$ E = -p_+ \log_2(p_+) - p_- \log_2(p_-) $$
Where:
- \( p_+ \): Probability of positive class (M)
- \( p_- \): Probability of negative class (H)
Calculation
$$ p(M) = \frac{5}{10} = 0.5 $$
$$ p(H) = \frac{5}{10} = 0.5 $$
$$ E(Species) = -0.5 \cdot \log_2(0.5) - 0.5 \cdot \log_2(0.5) $$
$$ = -0.5 \cdot (-1) - 0.5 \cdot (-1) $$
$$ = 1.0 $$
Step 3: Calculate Entropy and Information Gain for Each Feature
We’ll calculate the entropy for each feature split and determine the information gain.
Feature: Green
Green can be either Y or N.
For Green = Y:
- Martians (M): 3
- Humans (H): 1
- Total: 4
Entropy:
$$ E(Green = Y) = -\left(\frac{3}{4}\right) \log_2\left(\frac{3}{4}\right) - \left(\frac{1}{4}\right) \log_2\left(\frac{1}{4}\right) $$
$$ = -0.75 \cdot \log_2(0.75) - 0.25 \cdot \log_2(0.25) $$
$$ = -0.75 \cdot (-0.415) - 0.25 \cdot (-2) $$
$$ = 0.311 + 0.5 = 0.811 $$
For Green = N:
- Martians (M): 2
- Humans (H): 4
- Total: 6
Entropy:
$$ E(Green = N) = -\left(\frac{2}{6}\right) \log_2\left(\frac{2}{6}\right) - \left(\frac{4}{6}\right) \log_2\left(\frac{4}{6}\right) $$
$$ = -0.333 \cdot \log_2(0.333) - 0.667 \cdot \log_2(0.667) $$
$$ = -0.333 \cdot (-1.585) - 0.667 \cdot (-0.585) $$
$$ = 0.528 + 0.389 = 0.917 $$
Weighted Entropy for Green
$$ E(Green) = \frac{4}{10} \cdot 0.811 + \frac{6}{10} \cdot 0.917 $$
$$ = 0.3244 + 0.5502 = 0.8746 $$
Information Gain for Green
$$ IG(Species, Green) = E(Species) - E(Green) $$
$$ = 1.0 - 0.8746 = 0.1254 $$
Continue this process to calculate the entropy and information gain for each feature (Legs, Height, and Smelly) similarly.