Python Machine Learning: Scikit-Learn Tutorial

cbarbisan asked Jan 31, 2019

1,269 views

Regarding the datacamp tutorial "Python Machine Learning: Scikit-Learn Tutorial", the author is considering the use cases that are relevant to the digits data set, so she can select an appropriate machine learning algorithm. The reader is directed to the scikit-learn machine learning map. Here is the excerpt from the tutorial:

As your use case was one for clustering, you can follow the path on the map towards “KMeans”. You’ll see the use case that you have just thought about requires you to have more than 50 samples (“check!”), to have labeled data (“check!”), to know the number of categories that you want to predict (“check!”) and to have less than 10K samples (“check!”).

However, if you follow the learning map based on the listed use cases, KMeans is not the algorithm you would arrive at. According to the map, you would only arrive at the KMeans algorithm if you do NOT have labelled data. But the digits dataset contains labels.

When KMeans does not return optimal results, the learning map suggests trying the Spectral Clustering or GMM algorithms. But the author selected SVC (which is a classification algorithm, not a clustering algorithm), when KMeans didn't work.

Did the author select the wrong algorithm or is the learning map incorrect? Should classification or clustering have been used?

cbarbisan

180 points

1 Answer

Related questions

0 0 votes

1 1 answer

1.7k

1.7k views

tofighi asked Feb 18, 2020

1,680 views

Can I use a single Pipeline for multiple estimators in scikit-learn?

Is there any proper way to combine multiple classifiers and their parameter grids in one Pipeline?

tofighi

116k points

tofighi asked Feb 18, 2020

2 2 votes

1 1 answer

3.9k

3.9k views

askdatascience asked Sep 25, 2018

3,855 views

What is the fastest way to learn scikit-learn?

I know Python and I am looking for the fastest way or a quick tutorial to learn how start using scikit-learn library.

askdatascience

1.0k points

askdatascience asked Sep 25, 2018

1 1 vote

1 1 answer

2.7k

2.7k views

tofighi asked Sep 25, 2018

2,712 views

What is the best roadmap to choose the right estimator in scikit-learn?

I am looking for a roadmap for choosing the right estimator in scikit-learn

tofighi

116k points

tofighi asked Sep 25, 2018

0 0 votes

0 0 answers

1.9k

1.9k views

Frenzy asked Apr 27, 2022

1,859 views

Kmeans clustering in python - Giving original labels to predicted clusters

I have a dataset with 7 labels in the target variable.X = data.drop('target', axis=1) Y = data['target'] Y.unique()array(['Normal_Weight', 'Overweight_Level_I', 'Overweig...

Frenzy

120 points

Frenzy asked Apr 27, 2022

1 1 vote

2 answers 2 answers

13.1k

13.1k views

kaADSS asked Jan 21, 2020

13,077 views

score() vs accuracy_score() in sklearn

Hi,Since I still have confuse to use the score() and accuracy_score(), so I want to confirm my test assumption.Q1: score(), we use the split data to test the accuracy by...

kaADSS

230 points

kaADSS asked Jan 21, 2020

Python Machine Learning: Scikit-Learn Tutorial

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Related questions

0 reply

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

0 reply

Please log in or register to add a comment.

Related questions

0

0