First time here? Checkout the FAQ!
x
0 votes
2.1k views
asked in Machine Learning by (116k points)  

NASA wants to be able to discriminate between Martians (M) and Humans (H) based on the
following characteristics: Green ∈{N, Y }, Legs ∈{2, 3}, Height ∈{S, T}, Smelly ∈{N, Y }.
Our available training data is as follows:

a) Greedily learn a decision tree using the ID3 algorithm and draw the tree.
b) Write the learned concept for Martian as a set of conjunctive rules (e.g., if (green=Y
and legs=2 and height=T and smelly=N), then Martian; else if ... then Martian; ...; else
Human).

  

1 Answer

0 votes
answered by (116k points)  
selected by
 
Best answer

a) See the following figure for the ID3 decision tree:

b) Only the disjunction of conjunctions for Martians was required:

$\begin{aligned}
&(\text { Legs }=3) \vee \\
&(\text { Legs }=2 \wedge \text { Green }=\text { Yes } \wedge \text { Height }=\text { Tall }) \vee \\
&(\text { Legs }=2 \wedge \text { Green }=\text { No } \wedge \text { Height }=\text { Short } \wedge \text { Smelly }=\text { Yes })
\end{aligned}$

Python Code

Step 1: Organize the Dataset

Our data has the following features and values:

  • Species: Target variable (M or H)
  • Features:
    • Green: \( N \) or \( Y \)
    • Legs: \( 2 \) or \( 3 \)
    • Height: \( S \) or \( T \)
    • Smelly: \( N \) or \( Y \)
Index Species Green Legs Height Smelly
1 M N 3 S Y
2 M Y 2 T N
3 M Y 3 T N
4 M N 2 S Y
5 M Y 3 T N
6 H N 2 T Y
7 H N 2 S N
8 H N 2 T N
9 H Y 2 S N
10 H N 2 T Y

Step 2: Calculate the Initial Entropy for the Target Variable (Species)

We start by calculating the entropy of the target variable, Species, which has two classes: M (Martian) and H (Human).

Total Counts

  • Martians (M): 5
  • Humans (H): 5
  • Total: 10

Entropy Formula

The entropy \( E \) for a binary classification is calculated as:

$$ E = -p_+ \log_2(p_+) - p_- \log_2(p_-) $$

Where:

  • \( p_+ \): Probability of positive class (M)
  • \( p_- \): Probability of negative class (H)

Calculation

$$ p(M) = \frac{5}{10} = 0.5 $$

$$ p(H) = \frac{5}{10} = 0.5 $$

$$ E(Species) = -0.5 \cdot \log_2(0.5) - 0.5 \cdot \log_2(0.5) $$

$$ = -0.5 \cdot (-1) - 0.5 \cdot (-1) $$

$$ = 1.0 $$

Step 3: Calculate Entropy and Information Gain for Each Feature

We’ll calculate the entropy for each feature split and determine the information gain.

Feature: Green

Green can be either Y or N.

For Green = Y:

  • Martians (M): 3
  • Humans (H): 1
  • Total: 4

Entropy:

$$ E(Green = Y) = -\left(\frac{3}{4}\right) \log_2\left(\frac{3}{4}\right) - \left(\frac{1}{4}\right) \log_2\left(\frac{1}{4}\right) $$

$$ = -0.75 \cdot \log_2(0.75) - 0.25 \cdot \log_2(0.25) $$

$$ = -0.75 \cdot (-0.415) - 0.25 \cdot (-2) $$

$$ = 0.311 + 0.5 = 0.811 $$

For Green = N:

  • Martians (M): 2
  • Humans (H): 4
  • Total: 6

Entropy:

$$ E(Green = N) = -\left(\frac{2}{6}\right) \log_2\left(\frac{2}{6}\right) - \left(\frac{4}{6}\right) \log_2\left(\frac{4}{6}\right) $$

$$ = -0.333 \cdot \log_2(0.333) - 0.667 \cdot \log_2(0.667) $$

$$ = -0.333 \cdot (-1.585) - 0.667 \cdot (-0.585) $$

$$ = 0.528 + 0.389 = 0.917 $$

Weighted Entropy for Green

$$ E(Green) = \frac{4}{10} \cdot 0.811 + \frac{6}{10} \cdot 0.917 $$

$$ = 0.3244 + 0.5502 = 0.8746 $$

Information Gain for Green

$$ IG(Species, Green) = E(Species) - E(Green) $$

$$ = 1.0 - 0.8746 = 0.1254 $$

Continue this process to calculate the entropy and information gain for each feature (Legs, Height, and Smelly) similarly.

...