## Problem formulation

Let the data $X$,

The result $C$ which we expect to have the smaller dimensions,

## Intuition

PCA is an unsupervised machine learning algorithm as it learns from the data itself, or more concretely, it's based on the variance of the data in each dimension in order to remove those which have the smaller one.

The objective now is to find the new basis in which the variances of the data in some dimensions are small and we can skip them. With no loss of generality, we assume this basis is orthonormal for the purpose of making the optimization easier.

Let the orthonormal basis $U \in \mathbb{R}^{n\times{n}}$, $\widetilde{X}$ is the data $X$ perform in $U$, $D\in \mathbb{R}^{n\times{k}}$ is the first $k$ eigenvector of $U$ which have highest eigenvalues, $EY$ is the remainder, we have:

## Optimization

**note**: in this part, I 'll skip the optimization step and only show the result.
Apparently, we need to solve this optimization problem, namely:

We can split it and deal with $C^{*}$ first:

and the result is (the fact that columns in $D$ are orthonormal make the result more compact):

then:

now let optimize $D$:

As a result, $D\in \mathbb{R}^{n\times{k}}$ has their columns as $k$ eigenvectors with the highest eigenvalues of the matrix $X^{T}X$.

## Step (code)

# my code on understanding pcaimport numpy as npimport numpy.linalg as linalgimport matplotlib.pyplot as pltm = 100 # num of samplen = 2 # num of dim (vector size)k = 1 # num of dim to take downx = np.random.randn(m, n)# compute the S matrixmean = np.mean(x, axis=0)x_ = x - mean # subtract meanS = np.matmul(x_.transpose(), x_) # the S matrix# compute eigen vector of Seig = linalg.eig(S)# the decode matrix DD = eig[1][:k] # get first k eigenvector (nx1) with highest eigenvalueD = D.reshape(-1, k) # change the shape to n x k# the encode EE = D.transpose()z = np.matmul(E, x.transpose()).transpose() # z (m x k), the result of pca# visualize# get two eigenbasis in new basisd1 = eig[1][0]d2 = eig[1][1]# check d1 and d2 are orthonormalprint(linalg.norm(d1)) # 1.0print(linalg.norm(d2)) # 1.0print(np.dot(d1, d2)) # 0.0# plot the xt = 4plt.scatter(x[:, 0], x[:, 1])plt.arrow(0, 0, t*d1[0], t*d1[1], fc='red', ec='red')plt.arrow(0, 0, t*d2[0], t*d2[1], fc='blue', ec='blue')plt.margins(x=0.5, y=0.5)plt.show()

*Fig. 1: Two eigenvectors, the red one has the higher eigenvalue, project the data to this vector to get the new data with the dimension k=1*

## Discussion

There are some questions that reader can working on to comprehend the algorithm:

Why the orthogonal basis is chosen. Why not another basis? Whether or not that generality is guaranteed and other bases can give better results.

Investigating the details of optimization procedure can help with solving the above question and gain more insights.

Hopefully that this tutorial can help.