First time here? Checkout the FAQ!
x
+3 votes
12.1k views
asked in Machine Learning by (660 points)  
I am wondering what happens as K increases in the KNN algorithm. It seems that as K increases the "p" (new point) tends to move closer to the middle of the decision boundary?

Any thoughts?
  

2 Answers

+3 votes
answered by (1.1k points)  

First of all, let's talk about the effect of small $k$, and large $k$. A small value of $k$ will increase the effect of noise, and a large value makes it computationally expensive. Data scientists usually choose as an odd number if the number of classes is 2 and another simple approach to select k is set $k=\sqrt n$.

The smaller values for $k$ , not only makes our classifier so sensitive to noise but also may lead to the overfitting problem. Large values for $k$ also may lead to underfitting. So, $k=\sqrt n$ for the start of the algorithm seems a reasonable choice.  We need to use Cross-validation to find a suitable value for $k$. 

The location of the new data point in the decision boundary depends on the arrangement of data points in the training set and the location of the new data point among them. Assume a situation that I have 100 data points and I chose $k = 100$ and we have two classes. In this special situation, the decision boundary is irrelevant to the location of the new data point (because it always classify to the majority class of the data points and it includes the whole space). So the new datapoint can be anywhere in this space. Therefore, I think we cannot make a general statement about it. 

+2 votes
answered by (1.4k points)  
edited by
I am assuming that the knn algorithm was written in python. It depends if the radius of the function was set. The default is 1.0. Changing the parameter would choose the points closest to p according to the k value and controlled by radius, among others.
...