reshown by
3,713 views
1 1 vote

The KNN function in the sklearn library (when coded properly), outputs the points closest to p based on the value of k, and others.

The point(s) would include itself when the code does not consider the point is in the data set.

I included codes to exclude and include this occurrence.

Is this acceptable?

25% Accept Rate Accepted 1 answers out of 4 questions

3 Answers

1 1 vote

If the person grading this assignment expects the answer to include the point that is also within the data set, the answer would be incorrect and vice versa (exclude the point if it is in the data set). The functions/codes are compiled with numerous libraries especially NumPy and SKlearn (as they are the ones mandatory for the assignment).

# Function that does **not** consider itself as a neighbor
def k_neighbor(input_data, k, p):
## increases k by one and a line to remove the point that is in the data set
## find the mean of the set of numbers that have been found by the algorithm
data = np.array([1,3,4,5,7,8,11,12,13,15,19,24,25,29,40])
print(k_neighbor(input_data=data, k=3, p=5)) ## this is an example
print(k_neighbor(input_data=data, k=10, p=55)) ## this is another example
## 5 is in the data set. 55 is not in the data set
## There are multiple k_neighbor() functions to evaluate that ultimately finds 
## the mean/average of the closest points. 
## Some are in and some are not in the data set
## The results are varied when the point(s) are in the data set versus
## the points that are not in the data set
# Answer for one
print(k_neighbor(input_data=data, k=3, p=5)) ## this is an example
## When included: 4.0
## When excluded: 4.666666666666667

# Answer for another using the same function
print(k_neighbor(input_data=data, k=10, p=55)) ## this is another example
## When included: 19.6
## When excluded: 19.6
# Same answer, clearly

## However, the k_neighbor function contains parameters that are mixed
## which can cause varying answers, which pose to be errors

That is why it is important to know if the grader for this assignment is expecting to include or exclude the point(s)  because this particular question asks for the mean for each

 

Results are different.

0 0 votes

k-NN is an instance-based predictive model, meaning that it relies on the data-points in the training set themselves. When we are going to classify a new data-point, we calculate the distances among all the instances in the training set and that new data point.

As far as I understand from your question you are talking about special cases as follows. You can write a general code and calculate these distances and then sort them from the minimum distance (which could include 0, of the new data-point is exactly the same as one of the data-points in the training set) to a maximum distance which is calculated. Therefore, you do not need to have separate algorithms for when a special case happens, because the general algorithm should take care of it as well.

Related questions

1 1 vote
1 1 answer
3.2k
3.2k views
RSH asked Oct 1, 2018
3,240 views
I am not able to figure out how the calculation of the $m$ nearest points will be in a single dimensional array using kNN. Can anyone offer a clue or example?Thank you
3 3 votes
2 2 answers
13.3k
13.3k views
Neo asked Sep 27, 2018
13,334 views
I am wondering what happens as K increases in the KNN algorithm. It seems that as K increases the "p" (new point) tends to move closer to the middle of the decision bound...
3 3 votes
1 1 answer
1.1k
1.1k views
kalyanak.p asked Sep 26, 2018
1,069 views
I have read online articles involving KNN and its emphasis on normalization. I would like to know if all KNN functions in Python need to involve normalization? I do know ...
4 4 votes
1 answers 1 answer
7.6k
7.6k views
tofighi asked Jun 26, 2019
7,619 views
Suppose, you have given the following dataset where x and y are the 2 features and color Red or Blue is the target variable.a) A new data point $x=1$ and $y=1$ is given. ...
3 3 votes
0 0 answers
534
534 views