Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

Data Mining and Knowledge Discovery in Databases Given a database table containi

ID: 666696 • Letter: D

Question

Data Mining and Knowledge Discovery in Databases

Given a database table containing weather data as follows:

Outlook

Temperature

Humidity

Windy

Class: Play

Sunny

Hot

High

False

No

Sunny

Hot

High

True

No

Overcast

Hot

High

False

Yes

Rainy

Mild

High

False

Yes

Rainy

Cool

Normal

False

Yes

Rainy

Cool

Normal

True

No

Overcast

Cool

Normal

True

Yes

Sunny

Mild

High

False

No

Sunny

Cool

Normal

False

Yes

Rainy

Mild

Normal

False

Yes

Sunny

Mild

Normal

True

Yes

Overcast

Mild

High

True

Yes

Overcast

Hot

Normal

False

Yes

Rainy

Mild

High

True

No

C.   For the weather database table given in B, please predict a class label for the weather data by using naïve Bayesian classification approach (20 points).

The unknown samples to be classified are:

(Outlook = ‘Sunny’, Temperature = ‘Mild’ , Humidity = ‘High’ , Windy = ‘False’)

(Outlook = ‘Sunny’, Temperature = ‘Hot’ , Humidity = ‘High’ , Windy = ‘False’)

Outlook

Temperature

Humidity

Windy

Class: Play

Sunny

Hot

High

False

No

Sunny

Hot

High

True

No

Overcast

Hot

High

False

Yes

Rainy

Mild

High

False

Yes

Rainy

Cool

Normal

False

Yes

Rainy

Cool

Normal

True

No

Overcast

Cool

Normal

True

Yes

Sunny

Mild

High

False

No

Sunny

Cool

Normal

False

Yes

Rainy

Mild

Normal

False

Yes

Sunny

Mild

Normal

True

Yes

Overcast

Mild

High

True

Yes

Overcast

Hot

Normal

False

Yes

Rainy

Mild

High

True

No

Explanation / Answer

Let us point out all values.....

Play = yes
--------------
sunny
n = 5 ( number of items with sunny)
n_c = 2 (sunny with play yes)
P = 2/5 = 0.4 (probability)

temperature mild
n = 6
n_c = 4
p = 4/6 = 0.66

Humidity high
n = 7
n_c = 3
p = 3/7 = 0.42

windy false
n = 8
n_c = 6
p = 6/8 = 0.75

so P(sunny/yes) = 5 + 2 * 0.4 = 5.8/5+2 = 5.8/7 = 0.82
P(temperature mild/yes) = 6 + 4*0.66 / 10 = 0.84
P(Humidity high/yes) = 7 + 3 * 0.42 = 8.26/10 = 0.82
P(windy false) = 8 + 6 * 0.75 = 12.45 / 14 = 0.88

So finally (Outlook = ‘Sunny’, Temperature = ‘Mild’ , Humidity = ‘High’ , Windy = ‘False’) =
0.82 * 0.84 * 0.82 * 0.88 = 0.497

---------------------------------------------------------------------------------------------------------------------------

Let us point out all values.....

Play = yes
--------------
sunny
n = 5
n_c = 3
P = 3/5 = 0.6

temperature hot
n = 4
n_c = 2
p = 2/4 = 0.5

Humidity high
n = 7
n_c = 4
p = 4/7 = 0.57

windy false
n = 8
n_c = 2
p = 2/8 = 0.24

so P(sunny/yes) = 5 + 3 * 0.6 = 6.8/5+2 = 6.8/7 = 0.97
P(temperature hot/yes) = 4 + 2*0.5 / 6 = 0.83
P(Humidity high/yes) = 7 + 4 * 0.57 = 9.28/10 = 0.92
P(windy false) = 8 + 2 * 0.24 = 8.48 / 14 = 0.605

So finally (Outlook = ‘Sunny’, Temperature = ‘Mild’ , Humidity = ‘High’ , Windy = ‘False’) =
0.97 * 0.83 * 0.92 * 0.605 = 0.4481