T/F Questions. Justify your answer. (a) In case of PCA, the rst principal compon
ID: 3760354 • Letter: T
Question
T/F Questions. Justify your answer.
(a) In case of PCA, the rst principal component is the direction of maximum variability of
data.
(b) Output of a clustering algorithm depends heavily on initial cluster center assignments.
(c) Outliers have no effect on k-means clustering results.
(d) In case of hierarchical clustering, the number of clusters required need to be specied
up front.
(e) PCA requires the label information to nd the principal component directions.
(f) K-means clustering allows us to see clusters at di
erent level of granularity/resolution.
(g) Dimension reduction is very similar to feature extraction.
(h) Linear dimension reduction corresponds to nding a good projection matrix.
(i) PCA is essentially a change of co-ordinate system.
(j) Linear Discriminant Analysis is an unsupervised dimension reduction technique.
(k) Dimension reduction, when viewed as matrix factorization, helps to reveal latent aspects
of data.
(l) Kernel PCA is a supervised learning technique.
(m) Kernel PCA assumes data is centered" in the feature space. If data is not centered,
kernel PCA can not be applied at all.
(n) A Gram matrix is a negative denite matrix.
(o) Any kernel function must satisfy Mercer's condition.
Explanation / Answer
o) Any kernel function must satisfy Mercer's condition
In 1995, Cortes and Vapnik published the first paper on SVM [1]. SVM is actually a linear classifier, that is, it classifies between two sets of points by constructing a line that separtes these two classes. This is fine as long as the two sets of points can actually be separated by a line, but what if, that is not possible?
Cortes and Vapnik, instead did the following, they mapped all their points through a non-linear function and then used SVM in this transformed space. The observation was that if the non-linear map they used maps the two sets of points such that the two sets of points can be separated by a line after the transformation, then SVM can be used in this transformed space instead of the original space. But how to find this non-linear map that would help separate the two sets of points?
Let :XF be the non linear map described in the earlier paragraph, where X is the space from which inputs points are coming from and F is the transformed space. The next observation made by the authors was that, for SVM to work, they don't need to know explicitly, but only needed to know (xi),(xj), that is the dot product of the transformed points. So, instead of working with , they can instead work with K:X×XR where K takes two points as input and returns a real value that represents (xi),(xj). However, can any arbitrary choice of K guarantee existence of ? [1] cites papers where they show that exists only when K is positive definite.
Thus K being positive definite (that is satisfying Mercer condition) guarantees existence of an underlying map , such that K(xi,xj)=(xi),(xj). This allowed us to select kernels such that the underlying map could even be an infinite vector! (Example: Gaussian Kernels) However, this still does not guarantee separation of points which is why we have soft margin SVMs. But being able to run SVM on an infinite dimensional space is still quite powerful as it still covers a vast range of function class over the input space X.
Related Questions
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.