Given the data set below, apply the k-Nearest Neighbor algorithm to classify the
ID: 3849092 • Letter: G
Question
Given the data set below, apply the k-Nearest Neighbor algorithm to classify the test data for k=1 and k=3. Use the Euclidean distance metric. Show steps
Training Set
#
x1
x2
true label
1
0.453705
-0.0106
1
2
3.258589
0.169734
1
3
3.184656
-0.83691
0
4
-0.42561
1.385033
0
5
0.658765
-1.87715
0
6
-0.40507
-1.9574
0
7
-4.52775
4.123102
1
8
2.538689
-1.5386
1
9
-1.04649
-3.59664
1
10
2.967113
0.505111
0
Testing Set
#
x1
x2
true label
predicted label
11
-4.69237
-4.77898
1
12
-2.1147
-1.81277
0
13
4.277164
-4.83136
1
14
-1.33862
-0.93995
0
15
-4.02728
-4.96129
1
16
4.968125
3.757161
1
17
-2.19987
-3.48712
0
18
2.849136
-3.33965
0
19
-4.30273
2.530094
1
20
4.690116
-0.36379
1
Training Set
#
x1
x2
true label
1
0.453705
-0.0106
1
2
3.258589
0.169734
1
3
3.184656
-0.83691
0
4
-0.42561
1.385033
0
5
0.658765
-1.87715
0
6
-0.40507
-1.9574
0
7
-4.52775
4.123102
1
8
2.538689
-1.5386
1
9
-1.04649
-3.59664
1
10
2.967113
0.505111
0
Explanation / Answer
Here we are cosidering first testing sample data for k=3
Steps :
1. k=3 , query data x1=-4.69237 x2=-4.77898 and true label=1
2.Calculate the euclidean distance between the query sample and training data
Euclidean distance between each training data and query sample
#
x1
x2
Euclidean distance
1
0.453705
-0.0106
(0.453705-(-4.69237))^2+(-0.0106-(-4.77898))^2=49.21953573
2
3.258589
0.169734
85.9551968441
3
3.184656
-0.83691
77.5874544896
4
-0.42561
1.385033
56.2002971618
5
0.658765
-1.87715
37.0552631371
6
-0.40507
-1.9574
26.3422549864
7
-4.52775
4.123102
79.2741636791
8
2.538689
-1.5386
62.7863326679
9
-1.04649
-3.59664
14.69036885
10
2.967113
0.505111
86.58929752357
3.Sort this euclidean distance from mimimum to maximum and give rank as shown in above table last column
4. Now for k=3 consider first 3 minimum distance here in above table first 3 minumum are row number 9,6 and 5.
5.Now , there labels in training data sets are 1,0,0 .Here we cosinder the maximum occurence of each label
0 comes 2 times which greater than 1 which comes only 1.
So predicted label for first test data is 0 when k=3
In case of k=1 we will consider only 1 data which is having label 1.
So predicted label for first test data is 1 when k=1
Here accuracy of algorithm is more when k=3 than k=1.
Similar way we can predict other testing data.
Euclidean distance between each training data and query sample
#
x1
x2
Euclidean distance
Rank minimum distance1
0.453705
-0.0106
(0.453705-(-4.69237))^2+(-0.0106-(-4.77898))^2=49.21953573
42
3.258589
0.169734
85.9551968441
93
3.184656
-0.83691
77.5874544896
74
-0.42561
1.385033
56.2002971618
55
0.658765
-1.87715
37.0552631371
36
-0.40507
-1.9574
26.3422549864
27
-4.52775
4.123102
79.2741636791
88
2.538689
-1.5386
62.7863326679
69
-1.04649
-3.59664
14.69036885
110
2.967113
0.505111
86.58929752357
10Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.