12,181 views
1 1 vote
Hi,

Since I still have confuse to use the score()  and accuracy_score(), so I want to confirm my test assumption.
Q1: score(), we use the split data to test the accuracy by knn.score(X_test, y_test) to prevent bias using the same training data, right? here knn.score(X_test, y_test) just compare the pair of test value.

Q2: accuracy_score from sklearn.metrics to test the predicted output of target value "y_pred" with the y_test, using accuracy_score(y_test, y_pred), just compare the actual target value and predicted target value?

Q3.My result is the same after using both methods, are they doing the same thing?

Q4.using accuracy_score(), I can using to compare the split training target data y_train with the y_train_pred(return form knn.predict(X_train) ). Then it should be OK now, using it to show the accuracy by accuracy_score(y_train, y_train_pred), since the prediction is done and just compare the original data, then the bias does not exist?

Thanks.
50% Accept Rate Accepted 1 answers out of 2 questions

2 Answers

Best answer
2 2 votes

Q1: knn.score(X_test, y_test) calls accuracy_score of sklearn.metrics for classifier. For regressor, it calls r2_score, which is the coefficient of determination defined in the statistics course.

You can find the source code of knn.score here. It’s open source. https://github.com/scikit-learn/scikit-learn/blob/a24c8b46/sklearn/base.py#L324

Q2: accuracy_score is not a method of knn, but a method of sklearn.metrics. If normalize argument is true, accuracy_score(knn.predict(X_test),y_test) returns the same result as knn.score(X_test,y_test). You can check document below for more details

https://scikit-learn.org/stable/modules/generated/sklearn.metrics.accuracy_score.html

Q3: As explained above, yes, they return the same result, but only in the give situation

Q4: If there is bias after the split, the bias still exists whichever data set is compared. Here the bias exists when the data distribution in the train set and the data distribution in the whole set are not the same. Taking the Iris dataset as example, if the distribution of the three classes (Setosa, Versicolour, Virginica) is 50-50-50 in the 150 samples, and you make a 20-80 split, then the distribution of the three classes in the train set should be 40-40-40. If not, there’s bias, because your train set is different from the population in terms of data distribution.

This may be why Elon doesn't trust the simulation and insist on using the data from the real world to train the Tesla auto-pilot system. 

selected by
0 0 votes

Q1,2,3: Please take a look at the example here and see what are the differences. The application of functions for regression and classification is different.

Q4. You need to know more a bit about the procedure of Cross-Validation to see how to avoid bias. If you have access to DataCamp, complete this course first to understand the whole pipeline. After you complete the course, you will learn Cross-Validation to avoid bias.

Related questions

0 0 votes
1 1 answer
1.3k
1.3k views
tofighi asked Feb 18, 2020
1,329 views
Is there any proper way to combine multiple classifiers and their parameter grids in one Pipeline?
0 0 votes
0 0 answers
1.5k
1.5k views
Frenzy asked Apr 27, 2022
1,522 views
I have a dataset with 7 labels in the target variable.X = data.drop('target', axis=1) Y = data['target'] Y.unique()array(['Normal_Weight', 'Overweight_Level_I', 'Overweig...
1 1 vote
1 1 answer
2.2k
2.2k views
tofighi asked Sep 25, 2018
2,237 views
I am looking for a roadmap for choosing the right estimator in scikit-learn
2 2 votes
1 1 answer
841
841 views
cbarbisan asked Jan 31, 2019
841 views
Regarding the datacamp tutorial "Python Machine Learning: Scikit-Learn Tutorial", the author is considering the use cases that are relevant to the digits data set, so she...
3 3 votes
1 1 answer
810
810 views
Neo asked Oct 14, 2018
810 views
I am wondering what is the difference between normalization and feature scaling and usually when working on a machine learning project what comes normalization or feature...