+1 vote
2.7k views
asked in Machine Learning by (1.4k points)  
reshown by

The KNN function in the sklearn library (when coded properly), outputs the points closest to p based on the value of k, and others.

The point(s) would include itself when the code does not consider the point is in the data set.

I included codes to exclude and include this occurrence.

Is this acceptable?

  

3 Answers

+1 vote
answered by (1.4k points)  

If the person grading this assignment expects the answer to include the point that is also within the data set, the answer would be incorrect and vice versa (exclude the point if it is in the data set). The functions/codes are compiled with numerous libraries especially NumPy and SKlearn (as they are the ones mandatory for the assignment).

# Function that does **not** consider itself as a neighbor
def k_neighbor(input_data, k, p):
## increases k by one and a line to remove the point that is in the data set
## find the mean of the set of numbers that have been found by the algorithm
data = np.array([1,3,4,5,7,8,11,12,13,15,19,24,25,29,40])
print(k_neighbor(input_data=data, k=3, p=5)) ## this is an example
print(k_neighbor(input_data=data, k=10, p=55)) ## this is another example
## 5 is in the data set. 55 is not in the data set
## There are multiple k_neighbor() functions to evaluate that ultimately finds 
## the mean/average of the closest points. 
## Some are in and some are not in the data set
## The results are varied when the point(s) are in the data set versus
## the points that are not in the data set
# Answer for one
print(k_neighbor(input_data=data, k=3, p=5)) ## this is an example
## When included: 4.0
## When excluded: 4.666666666666667

# Answer for another using the same function
print(k_neighbor(input_data=data, k=10, p=55)) ## this is another example
## When included: 19.6
## When excluded: 19.6
# Same answer, clearly

## However, the k_neighbor function contains parameters that are mixed
## which can cause varying answers, which pose to be errors

That is why it is important to know if the grader for this assignment is expecting to include or exclude the point(s)  because this particular question asks for the mean for each

 

Results are different.

0 votes
answered by (115k points)  

k-NN is an instance-based predictive model, meaning that it relies on the data-points in the training set themselves. When we are going to classify a new data-point, we calculate the distances among all the instances in the training set and that new data point.

As far as I understand from your question you are talking about special cases as follows. You can write a general code and calculate these distances and then sort them from the minimum distance (which could include 0, of the new data-point is exactly the same as one of the data-points in the training set) to a maximum distance which is calculated. Therefore, you do not need to have separate algorithms for when a special case happens, because the general algorithm should take care of it as well.

0 votes
answered by (115k points)  

Please take a look at the following example:

https://repl.it/@tofighi/knn

Related questions

+1 vote
1 answer 2.3k views
+3 votes
2 answers 12.1k views
+3 votes
1 answer 447 views
+3 votes
0 answers 212 views
...