Use Excel and the training dataset below to classify (predict) the records in th
ID: 3865595 • Letter: U
Question
Use Excel and the training dataset below to classify (predict) the records in the test dataset below. What is the error rate? Use k- nearest neighbors, Euclidian distance, K = 1 and method = ”unweighted vote”. Make sure you normalize the data. (Hint, you can copy and paste the tables into Excel).
Training data set Sepal length Sepal width Petal length Petal Width Species 5.4 3.7 1.5 0.2 Setosa 5 3 1.6 0.2 Setosa 5 3.5 1.3 0.3 Setosa 5.7 2.8 4.5 1.3 Versicolor 5.9 3.2 4.8 1.8 Versicolor 6 3.4 4.5 1.6 Versicolor 6.3 3.3 6 2.5 virginica 6.4 3.2 5.3 2.3 virginica 7.4 2.8 6.1 1.9 virginica 6.7 3 5.2 2.3 virginicaExplanation / Answer
Here the error predicate rate is 0.96 (i.e. 96% error rate is found)
Partition the data into Training and Validation Sets using the Standard Data Partition defaults, with percentages of 60% of the data randomly allocated to the Training Set, and 40% of the data randomly allocated to the Validation Set
Select Normalize Input data. When this option is selected, the input data is normalized, which means that all data is expressed in terms of standard deviations. This option is available to ensure that the distance measure is not dominated by variables with a large scale.
For Number of Nearest Neighbors (k), enter 5. This is the parameter k in the k-nearest neighbor algorithm. If the number of observations (rows) is less than 50 then the value of k should be between 1 and the total number of observations (rows). If the number of rows is greater than 50, then the value of k should be between 1 and 50.
Under Scoring Option, select Score on best k between 1 and specified value. XLMiner displays the output for the best k between 1 and 5. If Score on specified value of k as above is selected, the output is displayed for the specified value of k.
Under both Score Training Data and Score Validation Data, Summary Report is selected. Under both Score Training Data and Score Validataion Data, select Detailed Report, and Lift Charts show an assessment of the performance in predicting the Training Set.
The options under Score Test Data group are enabled only when a test partition is available. Since we did not create a test data set when we partitioned the data, these options are disabled. See the Data Mining Partition section for more information on how to create a test data set.
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.