Loading [MathJax]/jax/output/HTML-CSS/jax.js
First time here? Checkout the FAQ!
x
0 votes
2.3k views
asked in Machine Learning by (116k points)  

NASA wants to be able to discriminate between Martians (M) and Humans (H) based on the
following characteristics: Green ∈{N, Y }, Legs ∈{2, 3}, Height ∈{S, T}, Smelly ∈{N, Y }.
Our available training data is as follows:

a) Greedily learn a decision tree using the ID3 algorithm and draw the tree.
b) Write the learned concept for Martian as a set of conjunctive rules (e.g., if (green=Y
and legs=2 and height=T and smelly=N), then Martian; else if ... then Martian; ...; else
Human).

  

1 Answer

0 votes
answered by (116k points)  
selected by
 
Best answer

a) See the following figure for the ID3 decision tree:

b) Only the disjunction of conjunctions for Martians was required:

( Legs =3)( Legs =2 Green = Yes  Height = Tall )( Legs =2 Green = No  Height = Short  Smelly = Yes )

Python Code

Step 1: Organize the Dataset

Our data has the following features and values:

  • Species: Target variable (M or H)
  • Features:
    • Green: N or Y
    • Legs: 2 or 3
    • Height: S or T
    • Smelly: N or Y
Index Species Green Legs Height Smelly
1 M N 3 S Y
2 M Y 2 T N
3 M Y 3 T N
4 M N 2 S Y
5 M Y 3 T N
6 H N 2 T Y
7 H N 2 S N
8 H N 2 T N
9 H Y 2 S N
10 H N 2 T Y

Step 2: Calculate the Initial Entropy for the Target Variable (Species)

We start by calculating the entropy of the target variable, Species, which has two classes: M (Martian) and H (Human).

Total Counts

  • Martians (M): 5
  • Humans (H): 5
  • Total: 10

Entropy Formula

The entropy E for a binary classification is calculated as:

E=p+log2(p+)plog2(p)

Where:

  • p+: Probability of positive class (M)
  • p: Probability of negative class (H)

Calculation

p(M)=510=0.5

p(H)=510=0.5

E(Species)=0.5log2(0.5)0.5log2(0.5)

=0.5(1)0.5(1)

=1.0

Step 3: Calculate Entropy and Information Gain for Each Feature

We’ll calculate the entropy for each feature split and determine the information gain.

Feature: Green

Green can be either Y or N.

For Green = Y:

  • Martians (M): 3
  • Humans (H): 1
  • Total: 4

Entropy:

E(Green=Y)=(34)log2(34)(14)log2(14)

=0.75log2(0.75)0.25log2(0.25)

=0.75(0.415)0.25(2)

=0.311+0.5=0.811

For Green = N:

  • Martians (M): 2
  • Humans (H): 4
  • Total: 6

Entropy:

E(Green=N)=(26)log2(26)(46)log2(46)

=0.333log2(0.333)0.667log2(0.667)

=0.333(1.585)0.667(0.585)

=0.528+0.389=0.917

Weighted Entropy for Green

E(Green)=4100.811+6100.917

=0.3244+0.5502=0.8746

Information Gain for Green

IG(Species,Green)=E(Species)E(Green)

=1.00.8746=0.1254

Continue this process to calculate the entropy and information gain for each feature (Legs, Height, and Smelly) similarly.

...