How to calculate Covariance Matrix and Principal Components for PCA?

Question

How to calculate Covariance Matrix and Principal Components for PCA?

asked Jun 26, 2019 in Machine Learning by AskDataScience (115k points)

The dataset with two features $(x,y)$ is shown as follows (note $y$ in this example is the second feature, not a target value):

x	y
2.5	2.4
0.5	0.7
2.2	2.9
1.9	2.2
3.1	3.0
2.3	2.7
2.0	1.6
1.0	1.1
1.5	1.6
1.1	0.9

a) Calculate the Covariance Matrix.
b) Calculate eigenvalues and eigenvectors
c) Calculate all the PCs
d) How much percent of the total variance in the dataset is explained by each PC?

1 Answer

answered Jun 26, 2019 by AskDataScience (115k points)
selected Jun 26, 2019 by AskDataScience

Best answer

Answers:

Click Here for Python program

$a)$ To calculate the Covariance Matrix you should take steps 1,2 and 3:

$\begin{bmatrix}
0.616556 & 0.615444 \\
0.615444 & 0.716556
\end{bmatrix}$

$b)$ To calculate eigenvectors and eigenvalues see step 4. If you do not know how to calculate eigenvalues and vectors watch this video.

$\lambda_{1}=1.284028, \mathbf{v}_{1}=\left(\begin{array}{c}{-0.67787} \\ {-0.73518}\end{array}\right)$

$\lambda_{2}=0.049083, \mathbf{v}_{2}=\left(\begin{array}{c}{-0.73518} \\ {0.67787}\end{array}\right)$

$c)$ To calculate the PCs, we should first create the Transfer Matrix. The transfer matrix should be created by putting the sorted eigenvectors ($[v_i v_{i+1} v_{i+2} ... v_{n} ]$), and sorting is descending based on absolute eigenvalues ($|\lambda_{i} |>|\lambda_{i+1}|> |\lambda_{i+2} | > ... >|\lambda_{n} | $).

Eigenvalues are $\lambda_1=1.284028$ and $\lambda_1=0.049083$. You need to sort them based on their absolute value (ignoring the sign). It is important to know the eigenvalues could be negative, but you should consider their absolute value when you are comparing them. In this example, the absolute values of eigenvalues are the same as their absolute values, and $|\lambda_1| > |\lambda_2| $. Therefore, the transform matrix should be as follows: $[v_1 v_2]$ where $v_1$ is the eigenvector for $\lambda_1$ and $v_2$ is the eigenvector for $\lambda_2$.

$\text{Transfer Matrix}=P=\left[\mathbf{v}_{1} \mathbf{v}_{2}\right]=\left[\begin{array}{cc}{v_{11}=-0.67787} & {v_{21}=-0.73518} \\ {v_{12}=-0.73518} & {v_{22}=0.67787}\end{array}\right]$

Next step is multiplying the scaled dataset to the Transfer Matrix to calculate PCs:

$\begin{aligned}\left(X^{\prime}\right)^{T} &=\left[x^{\prime} y^{\prime}\right]=[x y]\left[\begin{array}{c}{v_{11}=-0.67787} & {v_{21}=-0.73518} \\ {v_{12}=-0.73518} & {v_{22}=0.67787}\end{array}\right] \\ &=\left[v_{11} x+v_{12} y \quad v_{21} x+v_{22} y\right] \\ &=[-0.67787 x-0.73518 y \quad -0.73518 x+0.67787 y] \end{aligned}$

The result matrix is shown below:

$d)$

$\text{Explained variance of } PC_{1} = \frac{|\lambda_{2}|}{ \left(|\lambda_{1}|+|\lambda_{2}|\right)} = \frac{1.284028} {(1.284028+0.049083)} =96.32 \%$

$\text{Explained variance of } PC_{2} = \frac{|\lambda_{2}|}{ \left(|\lambda_{1}|+|\lambda_{2}|\right)} = \frac{0.049083} {(1.284028+0.049083)} =3.68 \%$

Source:
A more comprehensive solution is available here (or here).

Show 9 previous comments

commented Jul 1, 2019 by AskDataScience (115k points)

commented Aug 7, 2020 by EH (140 points)

commented Aug 10, 2020 by AskDataScience (115k points)

commented Aug 12, 2020 by EH (140 points)
edited Aug 12, 2020 by EH

How to calculate Covariance Matrix and Principal Components for PCA?

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Related questions

Categories

x	y
2.5	2.4
0.5	0.7
2.2	2.9
1.9	2.2
3.1	3.0
2.3	2.7
2.0	1.6
1.0	1.1
1.5	1.6
1.1	0.9

x	y
2.5	2.4
0.5	0.7
2.2	2.9
1.9	2.2
3.1	3.0
2.3	2.7
2.0	1.6
1.0	1.1
1.5	1.6
1.1	0.9

x	y
2.5	2.4
0.5	0.7
2.2	2.9
1.9	2.2
3.1	3.0
2.3	2.7
2.0	1.6
1.0	1.1
1.5	1.6
1.1	0.9