a) See the following figure for the ID3 decision tree:

b) Only the disjunction of conjunctions for Martians was required:
( Legs =3)∨( Legs =2∧ Green = Yes ∧ Height = Tall )∨( Legs =2∧ Green = No ∧ Height = Short ∧ Smelly = Yes )
Python Code
Step 1: Organize the Dataset
Our data has the following features and values:
- Species: Target variable (M or H)
- Features:
- Green: N or Y
- Legs: 2 or 3
- Height: S or T
- Smelly: N or Y
Index |
Species |
Green |
Legs |
Height |
Smelly |
1 |
M |
N |
3 |
S |
Y |
2 |
M |
Y |
2 |
T |
N |
3 |
M |
Y |
3 |
T |
N |
4 |
M |
N |
2 |
S |
Y |
5 |
M |
Y |
3 |
T |
N |
6 |
H |
N |
2 |
T |
Y |
7 |
H |
N |
2 |
S |
N |
8 |
H |
N |
2 |
T |
N |
9 |
H |
Y |
2 |
S |
N |
10 |
H |
N |
2 |
T |
Y |
Step 2: Calculate the Initial Entropy for the Target Variable (Species)
We start by calculating the entropy of the target variable, Species, which has two classes: M (Martian) and H (Human).
Total Counts
- Martians (M): 5
- Humans (H): 5
- Total: 10
Entropy Formula
The entropy E for a binary classification is calculated as:
E=−p+log2(p+)−p−log2(p−)
Where:
- p+: Probability of positive class (M)
- p−: Probability of negative class (H)
Calculation
p(M)=510=0.5
p(H)=510=0.5
E(Species)=−0.5⋅log2(0.5)−0.5⋅log2(0.5)
=−0.5⋅(−1)−0.5⋅(−1)
=1.0
Step 3: Calculate Entropy and Information Gain for Each Feature
We’ll calculate the entropy for each feature split and determine the information gain.
Feature: Green
Green can be either Y or N.
For Green = Y:
- Martians (M): 3
- Humans (H): 1
- Total: 4
Entropy:
E(Green=Y)=−(34)log2(34)−(14)log2(14)
=−0.75⋅log2(0.75)−0.25⋅log2(0.25)
=−0.75⋅(−0.415)−0.25⋅(−2)
=0.311+0.5=0.811
For Green = N:
- Martians (M): 2
- Humans (H): 4
- Total: 6
Entropy:
E(Green=N)=−(26)log2(26)−(46)log2(46)
=−0.333⋅log2(0.333)−0.667⋅log2(0.667)
=−0.333⋅(−1.585)−0.667⋅(−0.585)
=0.528+0.389=0.917
Weighted Entropy for Green
E(Green)=410⋅0.811+610⋅0.917
=0.3244+0.5502=0.8746
Information Gain for Green
IG(Species,Green)=E(Species)−E(Green)
=1.0−0.8746=0.1254
Continue this process to calculate the entropy and information gain for each feature (Legs, Height, and Smelly) similarly.