Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

Use SVM (you may use any SVM software such as libsvm http://www.csie.ntu.edu.tw/

ID: 3688231 • Letter: U

Question

Use SVM (you may use any SVM software such as libsvm http://www.csie.ntu.edu.tw/~cjlin/libsvm/) to perform a binary classification task, and show final classification accuracy by using 5-fold cross-validation. Please show your steps clearly. Please download the data called processed.cleveland.data at http://archive.ics.uci.edu/ml/machine-learning-databases/heart-disease/. You may use MS-Word to open this file. The first 13 columns are inputs, and the final column is class (either Healthy (0) or Sick (1, 2, 3, 4)). So you need to pre-process the data to make a training file for binary classification (the final column value will be either Healthy (+1) or Sick (-1), i.e., 1, 2, 3 and 4 are converted into Sick (-1)). More Information about the data: http://archive.ics.uci.edu/ml/datasets/Heart+Disease Please submit a report with detailed steps, software, and pre-processed data.

Explanation / Answer

The preprocessed input file is attached.

It contains data in the format as required by libsvm. You can use checkdata.py to verify it.

Usage : python checkdata.py preprocessed-cleveland-data

Use subset.py to divide your data file into testing and training file.

Usage : python subset.py 200 train test

It will have 200 rows in trainingFile and 103 in testingFile

Use the `make` command to build you training and prediction modules (UNIX env) as mentioned in the README.

Then scale your data to such that it lies between -1 and +1.

> svm-scale -l -1 -u 1 -s range train > train.scale
> svm-scale -r range test > test.scale

Choose the various kernel and other options in the svm-train as per your requirements.
DO NOT forget to use -v 5 for 5-fold cross validation.

Use svm-predict to obtain the classification result.

Example :
> svm-train -s 0 -c 100 -g 0.1 -v 5 train.scale

Do five-fold cross validation for the classifier using
the parameters C = 100 and gamma = 0.1

> svm-predict -b 1 test.scale train.scale.model result

Message me if you have any doubts.