3,590 views
0 0 votes

NASA wants to be able to discriminate between Martians (M) and Humans (H) based on the
following characteristics: Green ∈{N, Y }, Legs ∈{2, 3}, Height ∈{S, T}, Smelly ∈{N, Y }.
Our available training data is as follows:

a) Greedily learn a decision tree using the ID3 algorithm and draw the tree.
b) Write the learned concept for Martian as a set of conjunctive rules (e.g., if (green=Y
and legs=2 and height=T and smelly=N), then Martian; else if ... then Martian; ...; else
Human).

50% Accept Rate Accepted 31 answers out of 62 questions

1 Answer

Best answer
0 0 votes

a) See the following figure for the ID3 decision tree:

b) Only the disjunction of conjunctions for Martians was required:

$\begin{aligned}
&(\text { Legs }=3) \vee \\
&(\text { Legs }=2 \wedge \text { Green }=\text { Yes } \wedge \text { Height }=\text { Tall }) \vee \\
&(\text { Legs }=2 \wedge \text { Green }=\text { No } \wedge \text { Height }=\text { Short } \wedge \text { Smelly }=\text { Yes })
\end{aligned}$

Python Code

Step 1: Organize the Dataset

Our data has the following features and values:

  • Species: Target variable (M or H)
  • Features:
    • Green: \( N \) or \( Y \)
    • Legs: \( 2 \) or \( 3 \)
    • Height: \( S \) or \( T \)
    • Smelly: \( N \) or \( Y \)
Index Species Green Legs Height Smelly
1 M N 3 S Y
2 M Y 2 T N
3 M Y 3 T N
4 M N 2 S Y
5 M Y 3 T N
6 H N 2 T Y
7 H N 2 S N
8 H N 2 T N
9 H Y 2 S N
10 H N 2 T Y

Step 2: Calculate the Initial Entropy for the Target Variable (Species)

We start by calculating the entropy of the target variable, Species, which has two classes: M (Martian) and H (Human).

Total Counts

  • Martians (M): 5
  • Humans (H): 5
  • Total: 10

Entropy Formula

The entropy \( E \) for a binary classification is calculated as:

$$ E = -p_+ \log_2(p_+) - p_- \log_2(p_-) $$

Where:

  • \( p_+ \): Probability of positive class (M)
  • \( p_- \): Probability of negative class (H)

Calculation

$$ p(M) = \frac{5}{10} = 0.5 $$

$$ p(H) = \frac{5}{10} = 0.5 $$

$$ E(Species) = -0.5 \cdot \log_2(0.5) - 0.5 \cdot \log_2(0.5) $$

$$ = -0.5 \cdot (-1) - 0.5 \cdot (-1) $$

$$ = 1.0 $$

Step 3: Calculate Entropy and Information Gain for Each Feature

We’ll calculate the entropy for each feature split and determine the information gain.

Feature: Green

Green can be either Y or N.

For Green = Y:

  • Martians (M): 3
  • Humans (H): 1
  • Total: 4

Entropy:

$$ E(Green = Y) = -\left(\frac{3}{4}\right) \log_2\left(\frac{3}{4}\right) - \left(\frac{1}{4}\right) \log_2\left(\frac{1}{4}\right) $$

$$ = -0.75 \cdot \log_2(0.75) - 0.25 \cdot \log_2(0.25) $$

$$ = -0.75 \cdot (-0.415) - 0.25 \cdot (-2) $$

$$ = 0.311 + 0.5 = 0.811 $$

For Green = N:

  • Martians (M): 2
  • Humans (H): 4
  • Total: 6

Entropy:

$$ E(Green = N) = -\left(\frac{2}{6}\right) \log_2\left(\frac{2}{6}\right) - \left(\frac{4}{6}\right) \log_2\left(\frac{4}{6}\right) $$

$$ = -0.333 \cdot \log_2(0.333) - 0.667 \cdot \log_2(0.667) $$

$$ = -0.333 \cdot (-1.585) - 0.667 \cdot (-0.585) $$

$$ = 0.528 + 0.389 = 0.917 $$

Weighted Entropy for Green

$$ E(Green) = \frac{4}{10} \cdot 0.811 + \frac{6}{10} \cdot 0.917 $$

$$ = 0.3244 + 0.5502 = 0.8746 $$

Information Gain for Green

$$ IG(Species, Green) = E(Species) - E(Green) $$

$$ = 1.0 - 0.8746 = 0.1254 $$

Continue this process to calculate the entropy and information gain for each feature (Legs, Height, and Smelly) similarly.

selected by

Related questions

2 2 votes
1 answers 1 answer
6.6k
6.6k views
tofighi asked Aug 10, 2020
6,577 views
We have data on 1000 pieces of fruit. The fruit being a Banana, Orange or some Other fruit and imagine we know 3 features of each fruit, whether it’s long or not, sweet o...
2 2 votes
1 answers 1 answer
4.9k
4.9k views
tofighi asked Jun 26, 2019
4,859 views
We want to use Naive Bayes for tagging documents. It is a classification task that we want to assign a class (tag) to each string. We currently have two tags: Sport and N...
1 1 vote
0 0 answers
457
457 views
2 2 votes
1 answers 1 answer
14.3k
14.3k views
tofighi asked Jan 27, 2020
14,279 views
In the following example, calculate Accuracy, Precision, Recall or F1?https://i.imgur.com/OezFpqC.png
3 3 votes
2 answers 2 answers
8.4k
8.4k views
tofighi asked Apr 4, 2019
8,402 views
The scatter plot of Iris Dataset is shown in the figure below. Assume Softmax Regression is used to classify Iris to Setosa, Versicolor, or Viriginica using just petal le...