20,439 views
2 2 votes

The dataset with two features $(x,y)$ is shown as follows (note $y$ in this example is the second feature, not a target value):

x y
2.5 2.4
0.5 0.7
2.2 2.9
1.9 2.2
3.1 3.0
2.3 2.7
2.0 1.6
1.0 1.1
1.5 1.6
1.1 0.9

a) Calculate the Covariance Matrix.
b) Calculate eigenvalues and eigenvectors
c) Calculate all the PCs
d) How much percent of the total variance in the dataset is explained by each PC?

50% Accept Rate Accepted 31 answers out of 62 questions

1 Answer

Best answer
0 0 votes

Answers:

Click Here for Python program

$a)$ To calculate the Covariance Matrix you should take steps 1,2 and 3:


$\begin{bmatrix}
0.616556 & 0.615444 \\
0.615444 & 0.716556
\end{bmatrix}$

$b)$ To calculate eigenvectors and eigenvalues see step 4. If you do not know how to calculate eigenvalues and vectors watch this video.

\(\lambda_{1}=1.284028, \mathbf{v}_{1}=\left(\begin{array}{c}{-0.67787} \\ {-0.73518}\end{array}\right)\)

\(\lambda_{2}=0.049083, \mathbf{v}_{2}=\left(\begin{array}{c}{-0.73518} \\ {0.67787}\end{array}\right)\)


$c)$ To calculate the PCs, we should first create the Transfer Matrix. The transfer matrix should be created by putting the sorted eigenvectors ($[v_i  v_{i+1} v_{i+2} ... v_{n} ]$), and sorting is descending based on absolute eigenvalues ($|\lambda_{i} |>|\lambda_{i+1}|> |\lambda_{i+2} | > ... >|\lambda_{n} | $). 

Eigenvalues are $\lambda_1=1.284028$ and $\lambda_1=0.049083$. You need to sort them based on their absolute value (ignoring the sign). It is important to know the eigenvalues could be negative, but you should consider their absolute value when you are comparing them. In this example, the absolute values of eigenvalues are the same as their absolute values, and $|\lambda_1| > |\lambda_2| $. Therefore, the transform matrix should be as follows: $[v_1 v_2]$ where $v_1$ is the eigenvector for $\lambda_1$ and $v_2$ is the eigenvector for $\lambda_2$. 

\(\text{Transfer Matrix}=P=\left[\mathbf{v}_{1} \mathbf{v}_{2}\right]=\left[\begin{array}{cc}{v_{11}=-0.67787} & {v_{21}=-0.73518} \\ {v_{12}=-0.73518} & {v_{22}=0.67787}\end{array}\right]\)

Next step is multiplying the scaled dataset to the Transfer Matrix to calculate PCs:

\(\begin{aligned}\left(X^{\prime}\right)^{T} &=\left[x^{\prime} y^{\prime}\right]=[x y]\left[\begin{array}{c}{v_{11}=-0.67787} & {v_{21}=-0.73518} \\ {v_{12}=-0.73518} & {v_{22}=0.67787}\end{array}\right] \\ &=\left[v_{11} x+v_{12} y \quad v_{21} x+v_{22} y\right] \\ &=[-0.67787 x-0.73518  y \quad -0.73518 x+0.67787 y] \end{aligned}\)

The result matrix is shown below:


$d)$ 

$\text{Explained variance of } PC_{1} = \frac{|\lambda_{2}|}{ \left(|\lambda_{1}|+|\lambda_{2}|\right)} = \frac{1.284028} {(1.284028+0.049083)} =96.32 \%$

$\text{Explained variance of } PC_{2} = \frac{|\lambda_{2}|}{ \left(|\lambda_{1}|+|\lambda_{2}|\right)} = \frac{0.049083} {(1.284028+0.049083)} =3.68 \%$

Source:
A more comprehensive solution is available here (or here).

selected by

Related questions

2 2 votes
1 answers 1 answer
4.9k
4.9k views
tofighi asked Jun 26, 2019
4,859 views
We want to use Naive Bayes for tagging documents. It is a classification task that we want to assign a class (tag) to each string. We currently have two tags: Sport and N...
2 2 votes
1 answers 1 answer
6.6k
6.6k views
tofighi asked Aug 10, 2020
6,577 views
We have data on 1000 pieces of fruit. The fruit being a Banana, Orange or some Other fruit and imagine we know 3 features of each fruit, whether it’s long or not, sweet o...
5 5 votes
1 answers 1 answer
9.1k
9.1k views
tofighi asked Jun 26, 2019
9,104 views
Assume we have a $5\times5$ px RGB image with 3 channels respectively for R, G, and B. IfR2000012001201021210101020G0212211100002202002002111B0100111201102021011012112 We...
2 2 votes
1 answers 1 answer
14.3k
14.3k views
tofighi asked Jan 27, 2020
14,279 views
In the following example, calculate Accuracy, Precision, Recall or F1?https://i.imgur.com/OezFpqC.png
3 3 votes
2 answers 2 answers
8.4k
8.4k views
tofighi asked Apr 4, 2019
8,402 views
The scatter plot of Iris Dataset is shown in the figure below. Assume Softmax Regression is used to classify Iris to Setosa, Versicolor, or Viriginica using just petal le...