Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

Run tha tallowing faw calla,which piat each of t caaat in thair low-dimanaional

ID: 3735258 • Letter: R

Question

Run tha tallowing faw calla,which piat each of t caaat in thair low-dimanaional ccord natas found by tha FCA algothm In [12]: plot lowdin(A data, A colors, "Dataset A) Dataset A 0.006 0.002 -0.002 -0.004 -1 In [13]: plot lowdin(B data, Bcolors, "Dataset B Dataset B 2.0 1.5 0.5 0.01 -0.5 -2.0 In [14] plot_lowdin(Cdata, colors, "Detaset c") Figure 1 Dataset C -L -2 X-1.10118 y334120 How many dimensions are each of the datasets projected onto? YOUR ANSWER HERE For each datase explain whether or net PCA norks wel, and why YOUR ANSWER HERE

Explanation / Answer

1) PCA is always projected in 2 dimensions. That is the work of PCA. From high dimension to 2 Dimensional plot.

2) Does not work for Dataset C. As it has overlapping features. The different classes could not get seperated. The data underlying does not follow a joint normal distribution it seems.

For dataset B, it works very well as you can see all the classes getting differentiated and forming clusters. In case of Dataset B, a lot of the data is outlier which can be easily identified by PCA.

The first principal component bisects a scatterplot with a straight line in a way that explains the most variance; that is, it follows the longest dimension of the data. The second principal component cuts through the data perpendicular to the first, fitting the errors produced by the first. The third component would fit the errors from the first and second principal components, and so forth.

By looking at dataset A, we can tell the variance of the underlying data among all classes is zero.