+2 votes
1.6k views
asked in Machine Learning by (170 points)  
Hi,

After learnt feature scaling, I have some questions regarding Normalization.

Standardization: rescales data to have a mean of 0 and standard deviation of 1 after a data distribution is selected.(Please correct me if I understood it wrong)

Normalization: rescale data into a range of 0-1.(Please correct me if I understood it wrong)

Questions:

1. in what cases we use standardization? when to use normalization? when to use a combined method?

2. In class, features are standardized first using N(0,1), then Z score can be standardized to some number between 0-1. How is Z score standardized? (what Z score will be 0 and what Z score will be 1?) if we use different data set, will Z score with same value always reach to same standardized result? Can this standardization formula be stored so we can reproduce the same standardization  for prediction purpose?

Thanks a lot for your help.
  

1 Answer

+3 votes
answered by (1.4k points)  
selected by
 
Best answer

This depends on the data set. Thorough research data in a pandas DataFrame,1-D NumPy array or NumPy multi-array will make some difference in which technique(s) to use. And whether you use code or not.

What we did in class was done to show the math behind the coding. Z-score was taught, variables were set/known and it was used to show one of the techniques. There is also Manhattan, Euclidean, Max-min when standard deviation and mean are unknown.

Remember the purpose of Standardization and Normalization, they are there to scale for visualization (graphing). These new points/coordinates will be different than graphing the original points in the data set.

 

If you are coding, depending on your preference on libraries, most likely there are two lines of algorithms to find normalization and standardization. The norm function is usually the first and the transformation is done second.

This is a code I wrote, it is using just NumPy to perform Manhattan Normalization.

Related questions

...