We will build a naïve Bayes classifier based on the below weather dataset in ord
ID: 3842402 • Letter: W
Question
We will build a naïve Bayes classifier based on the below weather dataset in order to determine whether to play golf or not. There are four categorical attributes (outlook, temperature, humidity, windy) and one binary target (play).
outlook
temperature
humidity
windy
play
sunny
hot
high
false
no
sunny
hot
high
true
no
overcast
hot
high
false
yes
rainy
mild
high
false
yes
rainy
cool
normal
false
yes
rainy
cool
normal
true
no
overcast
cool
normal
true
yes
sunny
mild
high
false
no
sunny
cool
normal
false
yes
rainy
mild
normal
false
yes
sunny
mild
normal
true
yes
overcast
mild
high
true
yes
overcast
hot
normal
false
yes
rainy
mild
high
true
no
a) For a day when it is sunny, hot, windy with high humidity, please use naïve Bayes to predict if we should play golf or not.
b) Does the prediction agree with the classification provided in the training data set?
c) During a day which is overcast, hot, not windy with normal humidity, please apply Laplace method to predict if one should play golf or not?
d) Use the same training dataset (as in Problem 6) for golf playing and calculate the better splitting attribute (between OUTLOOK and HUMIDITY) to use as the first level attribute in constructing decision tree with Gini index.
outlook
temperature
humidity
windy
play
sunny
hot
high
false
no
sunny
hot
high
true
no
overcast
hot
high
false
yes
rainy
mild
high
false
yes
rainy
cool
normal
false
yes
rainy
cool
normal
true
no
overcast
cool
normal
true
yes
sunny
mild
high
false
no
sunny
cool
normal
false
yes
rainy
mild
normal
false
yes
sunny
mild
normal
true
yes
overcast
mild
high
true
yes
overcast
hot
normal
false
yes
rainy
mild
high
true
no
Explanation / Answer
For Question a)
Attributes for the 1st column outlook are Sunny, overcast and rainy
Attributes for the 2nd column temperature are hot mild cool
Attributes for the 3rd column humidity are high or normal
and Attributes for the 4th column Windy are true or false
So ,
For the 1st attribute Outlook
p(sunny/yes) =2/9 ,p(sunny/no)=3/5 , p(overcast/yes)=4/9 , p(overcast/no)=0 , p(rainy/yes)=3/9 , p(rainy/no)=2/5
Therefore ,
P(Yes)=9/14 and P(No)=5/14
For the 2nd Attribute Temperature
P(Hot/Yes)=2/9 , P(mild/Yes)=4/9 , P(cool/Yes)=3/9 , P(Hot/No)=2/5 , P(Mild/No)=2/5 , P(Cool/No)=1/5
For the 3rd Attribute Humidity
P(high/Yes)=3/9 , P(high/No)=4/5 , P(normal/Yes)=6/9 , P(normal/No)=2/5
For the 4th Attribute Windy
P(true/Yes)=3/9 , P(false/Yes)=3/9 , P(true/no)=3/5 , P(false/No)=2/5
Now, for the day when it is sunny, hot, windy and highly humid Say, D<sunny, hot, windy and highly>
P(D/Yes).P(Yes) = p(sunny/yes) . P(Hot/Yes). P(high/Yes). P(windy/Yes) .P(Yes)
= 2/9 .2/9 . 3/9. 3/9 . 9/14 = 0.00352733686
and P(D/No).P(No) = p(sunny/no) . P(Hot/no). P(high/no). P(windy/no) .P(no)
= 3/5. 2/5. 4/5 . 3/5 . 5/14 = 0.04114285714
As we can see that P(D/No)>P(D/Yes) so , We should not play golf
Answer b)
Yes the prediction agree with the classification provided in the training data set
Answer c)
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.