You have a decision tree algorithm and you are trying to figure outwhich attribu
ID: 3870318 • Letter: Y
Question
You have a decision tree algorithm and you are trying to figure outwhich attribute is the best to test on first. You are using the“information gain” metric. You are given a set of 128 examples, with 64 positively labeled and64 negatively labeled. There are three attributes, Home Owner, In Debt, and Rich. For 64 examples, Home Owner is true. The Home Owner=true examplesare 1/4 negative and 3/4 positve. For 96 examples, In Debt is true. Of the In Debt=true examples, 1/2are positive and half are negative. For 32 examples, Rich is true. 3/4 of the Rich=true examples arepositive and 1/4 are negative (1) What is the entropy of the initial set ofexamples? (2) What is the information gain of splitting on theHome Owner attribute as the root node? (3) What is the information gain of splitting on the InDebt attribute as the root node? (4) What is the information gain of splitting on theRich attribute as the root node? (5) Which attribute do you split on?
Explanation / Answer
The entropy of a distribution P with probabilities (p1, p2, p3…....pn) is defined as - pi *ln(pi)
The mathematical definition of the information content of a signal state is the negative logarithm of the signal state probability:
I(p(s)) = -log(p(s))
Information gain is the difference in entropy before asking the question and the (weighted) entropy after.
ln(1/4) = -2.0 ln(5/12) = -1.263
ln(1/2) = -1.0 ln(7/12) = -.778
ln(3/4) = -0.415
(a) Entropy = I(1/2,1/2)
= - ( 0.5*-1 + 0.5*-1 )
= 1
(b) Information gain of splitting on the HomeOwner attribute as the root node
=0.5 * I(1/4,3/4) + 0.5 * I(1/4,3/4)
= I(1/4,3/4)
= 2*.25 + .415*.75
= 0.81125
So the information gain is 0.18875
(c) Information gain of splitting on the InDebt attribute as the root node
= .75 * I(1/2,1/2) + .25*I(1/2,1/2)
= I(1/2,1/2)
= 1
So the information gain is 0.0
(d) Information gain of splitting on the Rich attribute as the root node
=.25*I(1/4,3/4) + .75*I(5/12, 7/12)
= .25*0.81125 + .75*(.417*1.263 + .583*.778)
= .203 + .739 = .942
So the information gain is ..058
(e) HomeOwner
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.