K Nearest Neighbor (KNN) - includes itself

Question

K Nearest Neighbor (KNN) - includes itself

3 Answers

kalyanak.p · Answer 1 · 2018-10-02T00:02:35+0000

If the person grading this assignment expects the answer to include the point that is also within the data set, the answer would be incorrect and vice versa (exclude the point if it is in the data set). The functions/codes are compiled with numerous libraries especially NumPy and SKlearn (as they are the ones mandatory for the assignment).

# Function that does **not** consider itself as a neighbor
def k_neighbor(input_data, k, p):
## increases k by one and a line to remove the point that is in the data set
## find the mean of the set of numbers that have been found by the algorithm

data = np.array([1,3,4,5,7,8,11,12,13,15,19,24,25,29,40])
print(k_neighbor(input_data=data, k=3, p=5)) ## this is an example
print(k_neighbor(input_data=data, k=10, p=55)) ## this is another example
## 5 is in the data set. 55 is not in the data set
## There are multiple k_neighbor() functions to evaluate that ultimately finds 
## the mean/average of the closest points. 
## Some are in and some are not in the data set
## The results are varied when the point(s) are in the data set versus
## the points that are not in the data set

# Answer for one
print(k_neighbor(input_data=data, k=3, p=5)) ## this is an example
## When included: 4.0
## When excluded: 4.666666666666667

# Answer for another using the same function
print(k_neighbor(input_data=data, k=10, p=55)) ## this is another example
## When included: 19.6
## When excluded: 19.6
# Same answer, clearly

## However, the k_neighbor function contains parameters that are mixed
## which can cause varying answers, which pose to be errors

That is why it is important to know if the grader for this assignment is expecting to include or exclude the point(s) because this particular question asks for the mean for each

Results are different.

AskDataScience · Answer 2 · 2018-10-01T23:14:35+0000

k-NN is an instance-based predictive model, meaning that it relies on the data-points in the training set themselves. When we are going to classify a new data-point, we calculate the distances among all the instances in the training set and that new data point.

As far as I understand from your question you are talking about special cases as follows. You can write a general code and calculate these distances and then sort them from the minimum distance (which could include 0, of the new data-point is exactly the same as one of the data-points in the training set) to a maximum distance which is calculated. Therefore, you do not need to have separate algorithms for when a special case happens, because the general algorithm should take care of it as well.

AskDataScience · Answer 3 · 2018-10-03T00:07:39+0000

Please take a look at the following example:

https://repl.it/@tofighi/knn

K Nearest Neighbor (KNN) - includes itself

The KNN function in the sklearn library (when coded properly), outputs the points closest to p based on the value of k, and others.

Please log in or register to add a comment.

Please log in or register to answer this question.

3 Answers

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Related questions

Categories