2. In this problem we will perform principal component analysis on the sepal and
ID: 3353701 • Letter: 2
Question
2. In this problem we will perform principal component analysis on the sepal and petal measurements of the first 50 flowers in the Iris data, i.e., iris[1:50,1:4] (a) Center the data matrix and denote the centered data matrix by X. Find the covari ance matrix Sx of X (b) Report the eigenvectors of Sx and the variance of X along each eigenvector direction. (c) Calculate the principal components of the first two flowers. (d) Make a scatter plot of the first two principal components of the data. Do you see any correlation between the two principal components? (e) If we wish to keep at least 85% of the total variance, how many principal components do we need to keep?Explanation / Answer
data("iris")
dim(iris)
new_iris<-iris[1:50,1:4]
dim(new_iris)
View(new_iris)
##a###
# center with 'colMeans()'
centerColMean <- function(x) {
xcenter = colMeans(x)
x - rep(xcenter, rep.int(nrow(x), ncol(x)))
}
# apply it
X<-centerColMean(new_iris)
View(X)
#Correlation Matrix
Sx<-cor(X)
View(Sx)
#____________________________________________________________
##b##
X<-as.matrix(X)
Sx<-as.matrix(Sx)
X_eigen<-eigen(Sx)
X_eigen
#eigen vectors of Sx
X_eigen$vectors
'''
The eigenvector with the largest eigenvalue is the direction along which the data
set has the maximum variance.
'''
X_eigen$values[1]*X_eigen$vectors
X_eigen$values[2]*X_eigen$vectors
X_eigen$values[3]*X_eigen$vectors
X_eigen$values[4]*X_eigen$vectors
#______________________________________________________________________________
##c##
pca<-prcomp(X[1:2,1:4])
pca
plot(pca)
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.