Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

Q2.a. The 48 observations in data set were portioned by 3 means clustering metho

ID: 3363127 • Letter: Q

Question

Q2.a. The 48 observations in data set were portioned by 3 means clustering method and the following information is available

CLUSTER CENTERS

Original coordinates

Cluster

rating

Box office (2015 dollars)

Cluster 1

17.08333333

36.78625

Cluster 2

29.2

162.2546667

Cluster 3

63

50.37222222

Nominalized coordinates

Cluster

rating

Box office (2015 dollars)

Cluster 1

-0.59356616

-0.647117648

Cluster 2

-0.013367708

1.297329632

Cluster 3

1.605122609

-0.436568991

Original coordinates

DISTANCE BETWEEN THE CENTERS

Cluster 1

Cluster 2

Cluster 3

Cluster 1

0

126.0521209

47.88443295

Cluster 2

126.0521209

0

116.8765219

Cluster 3

47.88443295

116.8765219

0

Nominalized coordinates

DISTANCE BETWEEN THE CENTERS

Cluster 1

Cluster 2

Cluster 3

Cluster 1

0

2.029163736

2.029163736

Cluster 2

2.029163736

0

2.371901208

Cluster 3

2.208746939

2.371901208

0

DATA ANLYSIS

Original coordinates

Cluster

#Obs

Avg. Dist

Cluster 1

24

23.05251209

Cluster 2

15

32.08746692

Cluster 3

9

25.68957845

Overall

48

26.37038542

Nominalized coordinates

Cluster

#Obs

Avg. Dist

Cluster 1

24

0.572324172

Cluster 2

15

0.744118115

Cluster 3

9

0.71100553

Overall

48

0.652012534

1. The most homogeneous cluster is:

a. Cluster 1

b. Cluster 2

c. Cluster 3

2. The most heterogeneous cluster is:

a. Cluster 1

b. Cluster 2

c. Cluster 3

3. The most closest from each other clusters are:

a. Cluster 1 and 2

b. Cluster 1 and 3

c. Cluster 2 and 3

4. The most distinct from each other clusters are:

a. Cluster 1 and 2

b. Cluster 1 and 3

c. Cluster 2 and 3

Q2.b. The company that specializes in the development of software that tracks web browsing history of individuals. Using XL miner, the top association rule was found and the following information is available about this rule:

Confidence %

Antecedent (A)

Consequent (C)

Support for A

Spport for C

Spport for A and C

Lift Ratio

CNN

Weatherchannel

1693

1837

867

1. The confidence of the top rule is

a. 75.00%

b. 65.25%

c. 51.21%

d. 49.55%

2. Assuming that the total number of transactions is 20000, the lift ratio of the top rule is

a. 5.57

b. 22.30

c. 2.79

d. 1.64

Cluster

rating

Box office (2015 dollars)

Cluster 1

17.08333333

36.78625

Cluster 2

29.2

162.2546667

Cluster 3

63

50.37222222

Explanation / Answer

1. The most homogeneous cluster is:
a. Cluster 1
b. Cluster 2
c. Cluster 3


Cluster 1 , the least value of average distance is for cluster 1
Cluster
#Obs
Avg. Dist
Cluster 1
24
0.572324172

2. The most heterogeneous cluster is:
a. Cluster 1
b. Cluster 2
c. Cluster 3

Cluster2

The highest value for avg distance is for cluster 2
Cluster 2
15
0.744118115

3. The most closest from each other clusters are:
a. Cluster 1 and 2
b. Cluster 1 and 3
c. Cluster 2 and 3


Cluster 1 and cluster 2 are closest
Look the normalised table for distance between the centers , lowest value is 2.029163736 , which is between cluster1 and cluster 2

4. The most distinct from each other clusters are:
a. Cluster 1 and 2
b. Cluster 1 and 3
c. Cluster 2 and 3


Cluster 2 and cluster 3 are farthest , as the distance between the centers is highest for these 2 clusters

Please note that we can answer only 4 subparts of a question at a time , as per the answering guidelines