11. Below we see a set of 20 points and a decision tree for classifying the poin
ID: 3362668 • Letter: 1
Question
11. Below we see a set of 20 points and a decision tree for classifying the points. (31, 130 (25, 120) 08115) (33,115 alary Buy-Ne 50 :612, 32) (23, 40) (89, 30) 455,30) 30 To be precise, the 20 points represent (Age, Salary) pairs of people who do or do not buy House. Age is the x-axis, and Salary is the y-axis. Those that do are represented by green points, and those that do not by gold points. The 10 points of House buyers are: (23,40), (25,120), (29,97), (33.22), (35,63), (42,52), (44, 40), (55,63), (55,30), and (64,37). The 10 points of those that do not buy House are: (28,145), (38,115), (43,83), (51,130), (50,90), (50,60), (49,30), (55,118), (63,88), and (65, 140). Some of these points are correctly classified by the decision tree and some are not. Determine the classification of each point, and then indicate in the list below the point that is misclassified. a. (63, 88) b. (55, 63) c. (29, 97) d. (49, 30) 12. Consider the process of building a decision-tree-based classifier using Entropy as a measure of impurity associated with a tree node that represents a subset of training examples. A node is split into partitions represented by its child nodes based on the values of a selected attribute. The goodness of the attribute for the split, referred to as information gain of the attribute, is the difference between the impurity of the parent node and the weighted sum of the impurities of the child nodes. 7-5Explanation / Answer
(11)
As per given decision tree,
Condition-1 : Do buy House if (Age<40, Salary>=100) or (Age>=40, Salary>=50)
Condition-2 : Do not buy House if (Age<40, Salary<100) or (Age>=40, Salary<50)
Point a. (63, 88) fall in condition-1,
Point b. (55, 63) fall in condition-1,
Point c. (29, 97) fall in condition-2,
Point d. (49, 30) fall in condition-2.
Hence, Point a. (63, 88) and Point c. (29, 97) are misclassified.
(12)
A1
A2
C1
C2
Big Data
Male
2
10
Database
Male
5
40
Data Mining
Male
7
35
Text Mining
Male
10
4
Big Data
Female
20
16
Database
Female
8
5
Data Mining
Female
28
29
Text Mining
Female
20
20
As per given data and decision tree,
Option-d : Splitting based on any of the attributes, A1 or A2, reduces the impurity existing in the given training set at least to some extent.
A1
A2
C1
C2
Big Data
Male
2
10
Database
Male
5
40
Data Mining
Male
7
35
Text Mining
Male
10
4
Big Data
Female
20
16
Database
Female
8
5
Data Mining
Female
28
29
Text Mining
Female
20
20
Related Questions
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.