Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

Data Mining: Classification Consider the dataset of printed letters and their fe

ID: 3883976 • Letter: D

Question

Data Mining: Classification

Consider the dataset of printed letters and their features as given at the UCI Machine Learning repository site ( http://archive.ics.uci.edu/ml/datasets/Letter+Recognition ). This dataset has 20000 instances and 16 attributes. Perform the following tasks and submit the results/answers for each of the following tasks. With each answer state the toolbox / program that you use for getting the answer (preferably MATLAB). Also state the commands, function calls, and parameter values used for obtaining each answer.

1.) Normalize the columns for their values to be in uniform ranges. Describe the process you followed to do the normalization.

2.) Split the dataset into three randomly selected parts: 12000 for training, 4000 for validation, and 4000 for testing. Describe how you made these partitions.

Explanation / Answer

1) Answer

The process of normalization.

Normalization = Xmin(X)/Max(X)min(X)