Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

Josephine Mater works for a market research firm that specializes in the food in

ID: 3043200 • Letter: J

Question

Josephine Mater works for a market research firm that specializes in the food industry. She currently is analyzing Trader Joe's, a national chain of specialty grocery stores. Specifically, Josephine would like to gain insight on Trader Joe's future expansion plans (which are closely guarded by the company). Josephine knows that Trader Joe's replenishes its inventory at its retail stores with frequent trucking shipments from its distribution centers. The file TraderJoes contains data on the location of Trader Joe's retail stores. To keep costs low, retail stores are typically located near a distribution center. Josephine would like to use k-means clustering to estimate the location and number of Trader Joe's distribution centers (information on Trader Joe's distribution centers is not publicly disclosed).

Click on the datafile logo to reference the data.

How large must k be so that the average distance to each cluster centroid is less than 8 distance units as measured in the original (nonnormalized) coordinates? Be sure to Normalize Input Data and specify 50 iterations and 10 random starts in Step 2 of the XLMiner k-Means Clustering procedure.

Explanation / Answer

We cannot provide soultions using the paid softwares such as XLminer. However we shall provide a solution using the open source package R . The complete R snippet is as follows

# read the data into R dataframe
data.df<- read.csv("C:\Users\586645\Downloads\Chegg\store.csv",header=TRUE)
str(data.df)

# drop the variables not use in clustering
data.df<- data.df[,-c(1,4)]
# normalise the data
data.df<- scale(data.df)

# perform clustering with 5 clusters and 10 starts and 50 iterations
set.seed(20)
storeCluster <- kmeans(data.df, 5, nstart = 10,iter=50)
storeCluster

The results are

> storeCluster
K-means clustering with 5 clusters of sizes 26, 111, 1, 152, 44

Cluster means:
Longitude Latitude
1 -0.9930847 1.4708951
2 1.0065202 0.6230539
3 1.4276869 -11.3209316
4 -0.8009789 -0.3053598
5 0.7822173 -1.1287872

Clustering vector:
[1] 1 1 1 1 1 1 1 1 4 1 1 1 1 1 1 4 1 4 4 4 4 1 4 1 4 1 4 4 1 1 4 4 4 4 4 1 4 1 4 4
[41] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 1 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[81] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[121] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 1 1 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 1 4 4 4 4 4 4 4 4
[161] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 5 5 5 5 5 5 5 5 5 5 5 2 5 2 4 5 5 5 5 5 2 2 2
[201] 2 2 2 2 5 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 5 2 2 2 2 2 2 2 2 5 2 5 5 5 5
[241] 2 2 2 2 2 2 2 2 5 5 5 5 5 5 2 5 2 5 5 5 5 5 5 5 5 5 5 5 2 5 2 2 2 2 2 2 2 2 2 2
[281] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
[321] 2 2 3 2 2 2 2 2 2 2 2 2 2 5

Within cluster sum of squares by cluster:
[1] 1.791132 22.998673 0.000000 21.732823 57.163643
(between_SS / total_SS = 84.4 %)

The goodness of classification is 84.4% , however if we change the number of clusters to 10 this value becomes 96%. hence we shall consider the number of clusters as 10 in this case

> storeCluster <- kmeans(data.df, 10, nstart = 10,iter=50)
> storeCluster
K-means clustering with 10 clusters of sizes 28, 52, 17, 1, 1, 47, 29, 34, 99, 26

Cluster means:
Longitude Latitude
1 0.6729979 -1.3901477
2 -0.9205086 0.1011377
3 0.5521286 -0.5513041
4 7.2748687 -1.7256028
5 1.4276869 -11.3209316
6 1.3233326 0.6866643
7 0.9795524 0.2768490
8 0.5967975 0.8546012
9 -0.7488970 -0.5203826
10 -0.9930847 1.4708951

Clustering vector:
[1] 10 10 10 10 10 10 10 10 2 10 10 10 10 10 10 2 10 2 2 2 2 10 2 10 2 10
[27] 2 2 10 10 2 2 2 2 2 10 2 10 2 2 2 2 2 2 2 2 2 2 2 2 2 2
[53] 2 2 2 2 2 2 2 2 10 2 2 2 9 9 9 9 9 2 2 2 2 9 2 9 9 9
[79] 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9
[105] 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9
[131] 9 9 9 9 10 10 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 10 2 2 2 9
[157] 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 2 2 2 1 1 1 1 1
[183] 3 3 3 3 3 3 8 3 8 3 1 1 1 1 1 8 8 8 8 8 8 8 1 8 7 7
[209] 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 3 8 7 7 7 7 8 8 7
[235] 3 7 3 3 1 3 7 7 3 8 8 7 8 7 1 1 1 1 3 1 8 1 6 1 1 1
[261] 3 1 1 1 1 1 1 1 7 3 7 7 6 7 6 6 7 7 7 7 7 7 7 7 7 7
[287] 7 7 6 7 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6
[313] 6 6 6 6 6 6 6 6 6 6 5 6 6 6 6 6 6 6 6 6 6 4

Within cluster sum of squares by cluster:
[1] 5.534844 2.997878 2.388979 0.000000 0.000000 1.586215 2.107288 2.662288 3.409009
[10] 1.791132
(between_SS / total_SS = 96.6 %)