Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

Given the data set below, apply the k-Nearest Neighbor algorithm to classify the

ID: 3849092 • Letter: G

Question

Given the data set below, apply the k-Nearest Neighbor algorithm to classify the test data for k=1 and k=3. Use the Euclidean distance metric. Show steps

Training Set

#

x1

x2

true label

1

0.453705

-0.0106

1

2

3.258589

0.169734

1

3

3.184656

-0.83691

0

4

-0.42561

1.385033

0

5

0.658765

-1.87715

0

6

-0.40507

-1.9574

0

7

-4.52775

4.123102

1

8

2.538689

-1.5386

1

9

-1.04649

-3.59664

1

10

2.967113

0.505111

0

Testing Set

#

x1

x2

true label

predicted label

11

-4.69237

-4.77898

1

12

-2.1147

-1.81277

0

13

4.277164

-4.83136

1

14

-1.33862

-0.93995

0

15

-4.02728

-4.96129

1

16

4.968125

3.757161

1

17

-2.19987

-3.48712

0

18

2.849136

-3.33965

0

19

-4.30273

2.530094

1

20

4.690116

-0.36379

1

Training Set

#

x1

x2

true label

1

0.453705

-0.0106

1

2

3.258589

0.169734

1

3

3.184656

-0.83691

0

4

-0.42561

1.385033

0

5

0.658765

-1.87715

0

6

-0.40507

-1.9574

0

7

-4.52775

4.123102

1

8

2.538689

-1.5386

1

9

-1.04649

-3.59664

1

10

2.967113

0.505111

0

Explanation / Answer

Here we are cosidering first testing sample data for k=3

Steps :

1. k=3 , query data x1=-4.69237 x2=-4.77898 and true label=1

2.Calculate the euclidean distance between the query sample and training data

Euclidean distance between each training data and query sample

#

x1

x2

Euclidean distance

1

0.453705

-0.0106

(0.453705-(-4.69237))^2+(-0.0106-(-4.77898))^2=49.21953573

2

3.258589

0.169734

85.9551968441

3

3.184656

-0.83691

77.5874544896

4

-0.42561

1.385033

56.2002971618

5

0.658765

-1.87715

37.0552631371

6

-0.40507

-1.9574

26.3422549864

7

-4.52775

4.123102

79.2741636791

8

2.538689

-1.5386

62.7863326679

9

-1.04649

-3.59664

14.69036885

10

2.967113

0.505111

86.58929752357

3.Sort this euclidean distance from mimimum to maximum and give rank as shown in above table last column

4. Now for k=3 consider first 3 minimum distance here in above table first 3 minumum are row number 9,6 and 5.

5.Now , there labels in training data sets are 1,0,0 .Here we cosinder the maximum occurence of each label

0 comes 2 times which greater than 1 which comes only 1.

So predicted label for first test data is 0 when k=3

In case of k=1 we will consider only 1 data which is having label 1.

So predicted label for first test data is 1 when k=1

Here accuracy of algorithm is more when k=3 than k=1.

Similar way we can predict other testing data.

Euclidean distance between each training data and query sample

#

x1

x2

Euclidean distance

Rank minimum distance

1

0.453705

-0.0106

(0.453705-(-4.69237))^2+(-0.0106-(-4.77898))^2=49.21953573

4

2

3.258589

0.169734

85.9551968441

9

3

3.184656

-0.83691

77.5874544896

7

4

-0.42561

1.385033

56.2002971618

5

5

0.658765

-1.87715

37.0552631371

3

6

-0.40507

-1.9574

26.3422549864

2

7

-4.52775

4.123102

79.2741636791

8

8

2.538689

-1.5386

62.7863326679

6

9

-1.04649

-3.59664

14.69036885

1

10

2.967113

0.505111

86.58929752357

10
Hire Me For All Your Tutoring Needs
Integrity-first tutoring: clear explanations, guidance, and feedback.
Drop an Email at
drjack9650@gmail.com
Chat Now And Get Quote