SVM Result Correctly Classified Instances 324 95.2941 % Incorrectly Classified I
ID: 3698036 • Letter: S
Question
SVM Result
Correctly Classified Instances 324 95.2941 %
Incorrectly Classified Instances 16 4.7059 %
Kappa statistic 0.937
Mean absolute error 0.2539
Root mean squared error 0.318
Relative absolute error 67.886 %
Root relative squared error 73.4756 %
Total Number of Instances 340
RBF Result
Correctly Classified Instances 315 92.6471 %
Incorrectly Classified Instances 25 7.3529 %
Kappa statistic 0.9019
Mean absolute error 0.0368
Root mean squared error 0.1917
Relative absolute error 9.8267 %
Root relative squared error 44.2853 %
Total Number of Instances 340
Hi, I make a model in weka using Support vektor machine and Radial basis function algorithm. When I compared them I couldnt understand something. RBF 's mean absolute error is less than SVM's mae although SVM's Correctly Classified Instances is higher than RBF's. Even though the correctly instances is high why the error too much ?
Explanation / Answer
First reason is influence of C and gamma parameters
Intuitively, the gamma parameter defines how far the influence of a single training example reaches, with low values meaning ‘far’ and high values meaning ‘close’. The gamma parameters can be seen as the inverse of the radius of influence of samples selected by the model as support vectors.
The C parameter trades off misclassification of training examples against simplicity of the decision surface. A low C makes the decision surface smooth, while a high C aims at classifying all training examples correctly by giving the model freedom to select more samples as support vectors.
The first plot is a visualization of the decision function for a variety of parameter values on a simplified classification problem involving only 2 input features and 2 possible target classes (binary classification). Note that this kind of plot is not possible to do for problems with more features or target classes.
The second plot is a heatmap of the classifier’s cross-validation accuracy as a function of C and gamma. For this example we explore a relatively large grid for illustration purposes. In practice, a logarithmic grid from to is usually sufficient. If the best parameters lie on the boundaries of the grid, it can be extended in that direction in a subsequent search.
Note that the heat map plot has a special colorbar with a midpoint value close to the score values of the best performing models so as to make it easy to tell them appart in the blink of an eye.
The behavior of the model is very sensitive to the gamma parameter. If gamma is too large, the radius of the area of influence of the support vectors only includes the support vector itself and no amount of regularization with C will be able to prevent overfitting.
When gamma is very small, the model is too constrained and cannot capture the complexity or “shape” of the data. The region of influence of any selected support vector would include the whole training set. The resulting model will behave similarly to a linear model with a set of hyperplanes that separate the centers of high density of any pair of two classes.
For intermediate values, we can see on the second plot that good models can be found on a diagonal of C and gamma. Smooth models (lower gamma values) can be made more complex by selecting a larger number of support vectors (larger Cvalues) hence the diagonal of good performing models.
Finally one can also observe that for some intermediate values of gamma we get equally performing models when C becomes very large: it is not necessary to regularize by limiting the number of support vectors. The radius of the RBF kernel alone acts as a good structural regularizer. In practice though it might still be interesting to limit the number of support vectors with a lower value of C so as to favor models that use less memory and that are faster to predict.
We should also note that small differences in scores results from the random splits of the cross-validation procedure. Those spurious variations can be smoothed out by increasing the number of CV iterations n_iter at the expense of compute time. Increasing the value number of C_range and gamma_range steps will increase the resolution of the hyper-parameter heat map.
Second reason is
RBF implicitly maps every point to an infinite dimensional space.
Question: Why does the RBF (radial basis function) kernel map into infinite dimensional space? Answer : Consider the polynomial kernel of degree 2 defined by,
k(x,y)=(xTy)2k(x,y)=(xTy)2
where x,yR2x,yR2 and x=(x1,x2),y=(y1,y2)x=(x1,x2),y=(y1,y2).
Thereby, the kernel function can be written as,
k(x,y)=(x1y1+x2y2)2=x21y21+2x1x2y1y2+x22y22k(x,y)=(x1y1+x2y2)2=x12y12+2x1x2y1y2+x22y22
Now, let us try to come up with a feature map such that the kernel function can be written ask(x,y)=(x)T(y)k(x,y)=(x)T(y).
Consider the following feature map,
(x)=(x21,2x1x2,x22)(x)=(x12,2x1x2,x22)
Basically, this feature map is mapping the points in R2R2 to points in R3R3. Also, notice that,
(x)T(y)=x21y21+2x1x2y1y2+x22y22(x)T(y)=x12y12+2x1x2y1y2+x22y22
which is essentially our kernel function.
This means that our kernel function is actually computing the inner/dot product of points in R3R3. That is, it is implicitly mapping our points from R2R2 to R3R3.
Exercise Question : If your points are in RnRn, a polynomial kernel of degree 2 will map implicitly map it to some vector space F. What is the dimension of this vector space F? Hint: Everything I did above is a clue.
Now, coming to RBF.
Let us consider the RBF kernel again for points in R2R2. Then, the kernel can be written as
k(x,y)=exp(xy2)=exp((x1y1)2(x2y2)2)k(x,y)=exp(xy2)=exp((x1y1)2(x2y2)2)
=exp(x21+2x1y1y21x22+2x2y2y22)=exp(x12+2x1y1y12x22+2x2y2y22)
=exp(x2)exp(y2)exp(2xTy)=exp(x2)exp(y2)exp(2xTy)
(assuming gamma = 1). Using the taylor series you can write this as,
k(x,y)=exp(x2)exp(y2)n=0(2xTy)nn!k(x,y)=exp(x2)exp(y2)n=0(2xTy)nn!
Now, if we were to come up with a feature map just like we did for the polynomial kernel, you would realize that the feature map would map every point in our R2R2 to an infinite vector. Thus, RBF implicitly maps every point to an infinite dimensional space.
Related Questions
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.