Use Excel and the training dataset below to classify (predict) the records in th

ID: 3865595 • Letter: U

Question

Use Excel and the training dataset below to classify (predict) the records in the test dataset below. What is the error rate? Use k- nearest neighbors, Euclidian distance, K = 1 and method = ”unweighted vote”. Make sure you normalize the data. (Hint, you can copy and paste the tables into Excel).

Training data set Sepal length Sepal width Petal length Petal Width Species 5.4 3.7 1.5 0.2 Setosa 5 3 1.6 0.2 Setosa 5 3.5 1.3 0.3 Setosa 5.7 2.8 4.5 1.3 Versicolor 5.9 3.2 4.8 1.8 Versicolor 6 3.4 4.5 1.6 Versicolor 6.3 3.3 6 2.5 virginica 6.4 3.2 5.3 2.3 virginica 7.4 2.8 6.1 1.9 virginica 6.7 3 5.2 2.3 virginica

Explanation / Answer

Here the error predicate rate is 0.96 (i.e. 96% error rate is found)

Partition the data into Training and Validation Sets using the Standard Data Partition defaults, with percentages of 60% of the data randomly allocated to the Training Set, and 40% of the data randomly allocated to the Validation Set

Select Normalize Input data. When this option is selected, the input data is normalized, which means that all data is expressed in terms of standard deviations. This option is available to ensure that the distance measure is not dominated by variables with a large scale.

For Number of Nearest Neighbors (k), enter 5. This is the parameter k in the k-nearest neighbor algorithm. If the number of observations (rows) is less than 50 then the value of k should be between 1 and the total number of observations (rows). If the number of rows is greater than 50, then the value of k should be between 1 and 50.

Under Scoring Option, select Score on best k between 1 and specified value. XLMiner displays the output for the best k between 1 and 5. If Score on specified value of k as above is selected, the output is displayed for the specified value of k.

Under both Score Training Data and Score Validation Data, Summary Report is selected. Under both Score Training Data and Score Validataion Data, select Detailed Report, and Lift Charts show an assessment of the performance in predicting the Training Set.

The options under Score Test Data group are enabled only when a test partition is available. Since we did not create a test data set when we partitioned the data, these options are disabled. See the Data Mining Partition section for more information on how to create a test data set.

Navigate

Use Excel Megastat values or partial output to answer the following questions. A

Use Excel for Problem 1-33 Interrelationships among financial statements Crawfor

Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.

Use Excel and the training dataset below to classify (predict) the records in th

Question

Explanation / Answer

Related Questions

Navigate