Feature scaling: standardization vs Normalization

Question

Feature scaling: standardization vs Normalization

asked Oct 3, 2018 in Machine Learning by fayeezhao (170 points)

Hi,

After learnt feature scaling, I have some questions regarding Normalization.

Standardization: rescales data to have a mean of 0 and standard deviation of 1 after a data distribution is selected.(Please correct me if I understood it wrong)

Normalization: rescale data into a range of 0-1.(Please correct me if I understood it wrong)

Questions:

1. in what cases we use standardization? when to use normalization? when to use a combined method?

2. In class, features are standardized first using N(0,1), then Z score can be standardized to some number between 0-1. How is Z score standardized? (what Z score will be 0 and what Z score will be 1?) if we use different data set, will Z score with same value always reach to same standardized result? Can this standardization formula be stored so we can reproduce the same standardization for prediction purpose?

Thanks a lot for your help.

1 Answer

answered Oct 3, 2018 by kalyanak.p (1.4k points)
selected Oct 9, 2018 by fayeezhao

Best answer

This depends on the data set. Thorough research data in a pandas DataFrame,1-D NumPy array or NumPy multi-array will make some difference in which technique(s) to use. And whether you use code or not.

What we did in class was done to show the math behind the coding. Z-score was taught, variables were set/known and it was used to show one of the techniques. There is also Manhattan, Euclidean, Max-min when standard deviation and mean are unknown.

Remember the purpose of Standardization and Normalization, they are there to scale for visualization (graphing). These new points/coordinates will be different than graphing the original points in the data set.

If you are coding, depending on your preference on libraries, most likely there are two lines of algorithms to find normalization and standardization. The norm function is usually the first and the transformation is done second.

This is a code I wrote, it is using just NumPy to perform Manhattan Normalization.

Feature scaling: standardization vs Normalization

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Related questions

Categories