Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

You have a decision tree algorithm and you are trying to figure outwhich attribu

ID: 3870318 • Letter: Y

Question

You have a decision tree algorithm and you are trying to figure outwhich attribute is the best to test on first. You are using the“information gain” metric. You are given a set of 128 examples, with 64 positively labeled and64 negatively labeled. There are three attributes, Home Owner, In Debt, and Rich. For 64 examples, Home Owner is true. The Home Owner=true examplesare 1/4 negative and 3/4 positve. For 96 examples, In Debt is true. Of the In Debt=true examples, 1/2are positive and half are negative. For 32 examples, Rich is true. 3/4 of the Rich=true examples arepositive and 1/4 are negative (1) What is the entropy of the initial set ofexamples? (2) What is the information gain of splitting on theHome Owner attribute as the root node? (3) What is the information gain of splitting on the InDebt attribute as the root node? (4) What is the information gain of splitting on theRich attribute as the root node? (5) Which attribute do you split on?

Explanation / Answer

The entropy of a distribution P with probabilities (p1, p2, p3…....pn) is defined as - pi *ln(pi)

The mathematical definition of the information content of a signal state is the negative logarithm of the signal state probability:

I(p(s)) = -log(p(s))

Information gain is the difference in entropy before asking the question and the (weighted) entropy after.

ln(1/4) = -2.0   ln(5/12) = -1.263

ln(1/2) = -1.0 ln(7/12) = -.778

ln(3/4) = -0.415

(a) Entropy = I(1/2,1/2)

                 = - ( 0.5*-1 + 0.5*-1 )

= 1

(b) Information gain of splitting on the HomeOwner attribute as the root node

                 =0.5 * I(1/4,3/4) + 0.5 * I(1/4,3/4)

                 = I(1/4,3/4)

                 = 2*.25 + .415*.75

                 = 0.81125

So the information gain is 0.18875

(c) Information gain of splitting on the InDebt attribute as the root node

                     = .75 * I(1/2,1/2) + .25*I(1/2,1/2)

   = I(1/2,1/2)

                     = 1

So the information gain is 0.0

(d) Information gain of splitting on the Rich attribute as the root node

                   =.25*I(1/4,3/4) + .75*I(5/12, 7/12)

                   = .25*0.81125 + .75*(.417*1.263 + .583*.778)

                   = .203 + .739 = .942

So the information gain is ..058

(e) HomeOwner

Hire Me For All Your Tutoring Needs
Integrity-first tutoring: clear explanations, guidance, and feedback.
Drop an Email at
drjack9650@gmail.com
Chat Now And Get Quote