Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

Use the first part of the code in order to perform cross-validation using k-fold

ID: 3601361 • Letter: U

Question

Use the first part of the code in order to perform cross-validation using k-fold (k = 10). Call the variables for your cross-validation result, misclassification rate (loss), and accuracy, CVSVMModel1, classLoss1, and validationAccuracy1, respectively.
You should get a misclassification rate of ~9% (the exact value will change every time you run this block of code), which is equivalent to an accuracy of ~91%.

Code that I have made for the first part:

Partition dataset into 3 groups, 80% for training and cross validation, and 20% for testing, so:


holdoutCVP = cvpartition(grp,'holdout',43);
dataTrain = obs(training(holdoutCVP),:);
grpTrain = grp(training(holdoutCVP));
dataTest = obs(test(holdoutCVP),:);
grpTest = grp(test(holdoutCVP));

Code needed:

Cross validate the SVM classifier using 10-fold cross validation (already have the SVM too).

% Perform cross-validation


ENTER YOUR CODE HERE!!!

% Estimate the out-of-sample misclassification rate.
ENTER YOUR CODE HERE!!!

% Compute validation accuracy
ENTER YOUR CODE HERE!!!

Explanation / Answer

the mean squared error is got by applying the fit to the K minus 1 parts that don't involve part number K. That gives us our fit ^yiy^i for observation i.

we add up the error (MSE).

we have a weighted average formula with nknnkn because each of the folds might not be exactly the same size. If we are lucky that the n divides by k exactly, then that weight will just be 1/k.

Since this cross-validation error is just an average, the standard error of that average also gives us a standard error of the cross-validation estimate:

We take the error rates from each of the folds.

Their average is the cross-validation error rate.

The standard error is the standard deviation of the cross-validation estimate.

The standard deviation is a useful estimate but not quite valid because we're computing the standard error as if the folds were independent observations but they're not strictly independent. ErrorkErrork overlaps with, ErrorjErrorjbecause they share some training samples. There's some correlation between them. But it's still use because it's actually a quite good estimate, mathematically proved.

Leave-one out cross-validation (LOOCV) is a special case of K-fold cross validation where the number of folds is the same number of observations (ie K = N).

There would be one fold per observation and therefore each observation by itself gets to play the role of the validation set. The other n minus 1 observations playing the role of training set.

With simple least-squares linear or polynomial regression, an amazing short-cut makes the performance cost of LOOCV the same as that of a single model. LOOCV represents a nice special case in the sense that this cross-validation can be done without actually having to refit the model at all. You do the fit on the overall data set in order to calculate the MSE.

For most of the methods and most statistical learning methods, K equals 5 or 10 tend to be a good choice, rather than have a leave-one out cross-validation.

In a leave-one out cross-validation, the n folds look very similar to each other, because the training sets are almost the same. They only differ by one observation and therefore are highly correlated. As the leave-one out cross-validation is actually trying to estimate the error rate for the training sample of almost the same size as what you have, it's got then a low bias but a high variance.

We get a curve that's got the minimum around 2 and it's pretty flat after that.

A 10-fold cross-validation shows the minimum around 2, but there's there's less variability than with a two-fold validation. They are more consistent because they're averaged together to give us the overall estimate of cross-validation.

So K equals 5 or 10-fold is a good compromise for this bias-variance trade-off.

Hire Me For All Your Tutoring Needs
Integrity-first tutoring: clear explanations, guidance, and feedback.
Drop an Email at
drjack9650@gmail.com
Chat Now And Get Quote