+2 votes
13.6k views
asked in Machine Learning by (115k points)  

The dataset with two features $(x,y)$ is shown as follows (note $y$ in this example is the second feature, not a target value):

x y
2.5 2.4
0.5 0.7
2.2 2.9
1.9 2.2
3.1 3.0
2.3 2.7
2.0 1.6
1.0 1.1
1.5 1.6
1.1 0.9

a) Calculate the Covariance Matrix.
b) Calculate eigenvalues and eigenvectors
c) Calculate all the PCs
d) How much percent of the total variance in the dataset is explained by each PC?

  

1 Answer

0 votes
answered by (115k points)  
selected by
 
Best answer

Answers:

Click Here for Python program

$a)$ To calculate the Covariance Matrix you should take steps 1,2 and 3:


$\begin{bmatrix}
0.616556 & 0.615444 \\
0.615444 & 0.716556
\end{bmatrix}$

$b)$ To calculate eigenvectors and eigenvalues see step 4. If you do not know how to calculate eigenvalues and vectors watch this video.

\(\lambda_{1}=1.284028, \mathbf{v}_{1}=\left(\begin{array}{c}{-0.67787} \\ {-0.73518}\end{array}\right)\)

\(\lambda_{2}=0.049083, \mathbf{v}_{2}=\left(\begin{array}{c}{-0.73518} \\ {0.67787}\end{array}\right)\)


$c)$ To calculate the PCs, we should first create the Transfer Matrix. The transfer matrix should be created by putting the sorted eigenvectors ($[v_i  v_{i+1} v_{i+2} ... v_{n} ]$), and sorting is descending based on absolute eigenvalues ($|\lambda_{i} |>|\lambda_{i+1}|> |\lambda_{i+2} | > ... >|\lambda_{n} | $). 

Eigenvalues are $\lambda_1=1.284028$ and $\lambda_1=0.049083$. You need to sort them based on their absolute value (ignoring the sign). It is important to know the eigenvalues could be negative, but you should consider their absolute value when you are comparing them. In this example, the absolute values of eigenvalues are the same as their absolute values, and $|\lambda_1| > |\lambda_2| $. Therefore, the transform matrix should be as follows: $[v_1 v_2]$ where $v_1$ is the eigenvector for $\lambda_1$ and $v_2$ is the eigenvector for $\lambda_2$. 

\(\text{Transfer Matrix}=P=\left[\mathbf{v}_{1} \mathbf{v}_{2}\right]=\left[\begin{array}{cc}{v_{11}=-0.67787} & {v_{21}=-0.73518} \\ {v_{12}=-0.73518} & {v_{22}=0.67787}\end{array}\right]\)

Next step is multiplying the scaled dataset to the Transfer Matrix to calculate PCs:

\(\begin{aligned}\left(X^{\prime}\right)^{T} &=\left[x^{\prime} y^{\prime}\right]=[x y]\left[\begin{array}{c}{v_{11}=-0.67787} & {v_{21}=-0.73518} \\ {v_{12}=-0.73518} & {v_{22}=0.67787}\end{array}\right] \\ &=\left[v_{11} x+v_{12} y \quad v_{21} x+v_{22} y\right] \\ &=[-0.67787 x-0.73518  y \quad -0.73518 x+0.67787 y] \end{aligned}\)

The result matrix is shown below:


$d)$ 

$\text{Explained variance of } PC_{1} = \frac{|\lambda_{2}|}{ \left(|\lambda_{1}|+|\lambda_{2}|\right)} = \frac{1.284028} {(1.284028+0.049083)} =96.32 \%$

$\text{Explained variance of } PC_{2} = \frac{|\lambda_{2}|}{ \left(|\lambda_{1}|+|\lambda_{2}|\right)} = \frac{0.049083} {(1.284028+0.049083)} =3.68 \%$

Source:
A more comprehensive solution is available here (or here).

commented by (100 points)  
+1
I am getting the V1 positive. (0.67 0.73). Is that right?
commented by (115k points)  
Yes,  because it is parallel to  (0.67787 , 0.73518) and (−0.67787, −0.73518) are parallel. But if you found (0.67787 , - 0.73518), that would not be correct.
commented by (100 points)  
For part c, is it necessary to do those steps to find PC's? Because aren't PC's the eigen vectors. I was watching the videos of the notes which had it like that. Also what is case 2 for?
 
commented by (115k points)  
In the above video you mentioned, he called them PC1 and PC2, because they are actually directions of the PCA space you should project your data points. By putting them in sorted order (based on the absolute value of eigenvalues ), you create the Transfer Matrix.

 It means for EACH data point in original space (after we removed means), we can calculate an equivalent PC1 and PC2 in PCA space. When you multiply each data point to Transfer Matrix, its equivalent PCs will be calculated. Therefore, in the problem of this question, when we have 10 data points with 2 features, we will have 10 values with PC1 and PC2 for each. If we decide to just keep one of PCs, for 10 data points with 2 features, we have 10 data points in PCA space with just one PC.

In the next video, he did the part (c) similar to our problem:
commented by (100 points)  
Ok thank you. For the eigen vectors: for a eigen value of 1.283, I got x1 = 1 and x2= 0.9235 and for eigen value of 0.0491 i got the x1 = -0.9235 and x2 = 1. Would this be correct too as eigen vectors are just different points on the 2 perpendicular lines?
commented by (115k points)  
If you normalize to unit vector:
v1:
1/sqrt(1^2+0.9235^2) = 0.7346
0.9235/sqrt(1^2+0.9235^2) = 0.6784
v2:
-0.9235/sqrt(1^2+ (-0.9235)^2) = -0.6784
1/sqrt(1^2+ (-0.9235)^2)=  0.7346
They will be parallel and same size to the v1 and v2 calculated in the problem. If you use your own calculated v1 and v2 without normalizing to the unit vector, your final results will be just scaled to what in the solution is showed, and generally speaking, yes, your answers will be correct.  I think somewhere you divided v1 elements to -1 and removed the negative sign, and that is why however your v1 is parallel to the v1 in solution, it is in the opposite direction.
commented by (100 points)  
Yea when I tried it im getting the x1= 0.735 and x2 = 0.678 for 0.9235 and x1= 0.735 and x2 = -0.678 for 0.0491. Would this be wrong then?
commented by (115k points)  
Based on what I explained already, your answer is also correct.
commented by (110 points)  
Having trouble with part c. It seems we get the same values as the eigenvalues, but you're saying that not always the case?
According to the example posted we split up the values x' = -0.68x - 074y and y' = -0.74x + 068y.
So if my understanding is correct we sub in the zero mean x and y values to get x' and y'. Then using all the numbers are we just finding the variance? And do we always have to do a case2 where y =0?
commented by (115k points)  
The above discussion was about eigenvectors, not eigenvalues. Eigenvalues should be found always the same as what you see. The eigenvector probably will be found parallel to the vectors shown in the example.

These two equations will calculate PCs (x',y') for zero centered values (x,y):
x' = -0.68x - 074y
y' = -0.74x + 068y.

But if you just want to keep one PC, the equations will be reduced to the following:

x' = −0.68x−0.74y
Therefore, for each zero-centered data point (x,y) we have (x'). The original space has 2 dimensions (features), but PCA space has just 1 dimension.
commented by (140 points)  
+1
so the correctness in the eigen vectors is tested by the ratio between x1 and x2 values for the same lambda, is that correct?
commented by (115k points)  
Yes. For having the same result, we should set additional rules, such as choosing the unit vector as the representative of eigen vectors because you have infinite number of eigen vectors for each lambda.
commented by (115k points)  
+2
A question is asked for clarifying the following step and why we chose e1 = [2.2,1].

The fact is [2.2,1] is just one of the eigen-vectors. For each eigen-value, we have infinite number of eigen vectors. Based on those equations, you can see for lambda = 2.36, vectors that their first elements are 2.2 times second elements are  considered as eigen-vectors. For example, [2.2, 1] or [1.1,0.5] or [-4.4,-2],.... Among all, we found one representative which is [2.2,1] and to have a normalized answer, we divided it to the magnitude of the vector.
commented by (140 points)  
edited by
Thank you so much!
...