Q2.a. The 48 observations in data set were portioned by 3 means clustering metho
ID: 3363152 • Letter: Q
Question
Q2.a. The 48 observations in data set were portioned by 3 means clustering method and the following information is available
CLUSTER CENTERS
Original coordinates
Cluster
rating
Box office (2015 dollars)
Cluster 1
17.08333333
36.78625
Cluster 2
29.2
162.2546667
Cluster 3
63
50.37222222
Nominalized coordinates
Cluster
rating
Box office (2015 dollars)
Cluster 1
-0.59356616
-0.647117648
Cluster 2
-0.013367708
1.297329632
Cluster 3
1.605122609
-0.436568991
Original coordinates
DISTANCE BETWEEN THE CENTERS
Cluster 1
Cluster 2
Cluster 3
Cluster 1
0
126.0521209
47.88443295
Cluster 2
126.0521209
0
116.8765219
Cluster 3
47.88443295
116.8765219
0
Nominalized coordinates
DISTANCE BETWEEN THE CENTERS
Cluster 1
Cluster 2
Cluster 3
Cluster 1
0
2.029163736
2.029163736
Cluster 2
2.029163736
0
2.371901208
Cluster 3
2.208746939
2.371901208
0
DATA ANLYSIS
Original coordinates
Cluster
#Obs
Avg. Dist
Cluster 1
24
23.05251209
Cluster 2
15
32.08746692
Cluster 3
9
25.68957845
Overall
48
26.37038542
Nominalized coordinates
Cluster
#Obs
Avg. Dist
Cluster 1
24
0.572324172
Cluster 2
15
0.744118115
Cluster 3
9
0.71100553
Overall
48
0.652012534
1. The most homogeneous cluster is:
a. Cluster 1
b. Cluster 2
c. Cluster 3
2. The most heterogeneous cluster is:
a. Cluster 1
b. Cluster 2
c. Cluster 3
3. The most closest from each other clusters are:
a. Cluster 1 and 2
b. Cluster 1 and 3
c. Cluster 2 and 3
4. The most distinct from each other clusters are:
a. Cluster 1 and 2
b. Cluster 1 and 3
c. Cluster 2 and 3
Q2.b. The company that specializes in the development of software that tracks web browsing history of individuals. Using XL miner, the top association rule was found and the following information is available about this rule:
Confidence %
Antecedent (A)
Consequent (C)
Support for A
Spport for C
Spport for A and C
Lift Ratio
CNN
Weatherchannel
1693
1837
867
1. The confidence of the top rule is
a. 75.00%
b. 65.25%
c. 51.21%
d. 49.55%
2. Assuming that the total number of transactions is 20000, the lift ratio of the top rule is
a. 5.57
b. 22.30
c. 2.79
d. 1.64
DO QUESTION 2b
Cluster
rating
Box office (2015 dollars)
Cluster 1
17.08333333
36.78625
Cluster 2
29.2
162.2546667
Cluster 3
63
50.37222222
Explanation / Answer
2a)
the confidence is the number of transactions where XUY appears divided by the number of transactions where X appears.
so 867/1693 = 0.5121 , hence C
2b)
The lift value of a rule is defined like this:
The lift of a rule X-->Y is calculated as lift(X-->Y) = ( (sup(X U Y)/ N) / (sup(X)/ N*sup(Y)/ N ), where
so putting the values we get
(867/20000)/(1693/ 20000*1837/ 20000 ) = 5.57 , hence A
Please rate !!
Related Questions
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.