EAS

Principal component analysis

Introduction

Principal component analysis or PCA in short is a methord mainly used to reduce the dimensionality of a data set containing a lot of interrelated variables. It is importante to retain as much information as possible regarding the variations in the original data while reducing the dimention. This is achived through computing the principal components or PCs of the data set which by design will accumilate most of the variations in the first few PCs when ordered in the decending order of variance.

Method Through Example

The method to find PCs uses the following operations:

Mean of a data set

Variance of a data set

Co-Variance of a data set

Covariance metrix

Eigenvalues of a metrix

Eigenvectors corresponding to the eigenvalues

Consider a data set 'A' with the marks of five students in three subjects.

Now the mean score in each subject is given as:

A general 2x2 covariance matrix is shown below:

Similarly a covariance matrix is created for the 3x3 data set 'A' which is given above:

The ones shown in blue are the variances and the others are covariances.

Find the eigenvalues for the above covariance matrix, they come to be:

and the corresponding eigenvectors are:

The eigenvalues are sorted in the decending order and the first 'N' corresponding eigenvectors are choosen to form the PCA graph which is either dimentionaly lesser or equal to the initial data set. Here 'N' is the dimention of the PCA graph and its elements are optained through the formula:

where W' is the transpose of the NxM matrix W, were 'N' is the dimention of the reduced PCA graph and 'M' is the dimention of the initial set.

Following from the above example, consider the first 2 eigenvalues and its corresponding eigenvectors, we get a 2x3 matrix

Thus the PCA graph in this case will be two dimentional, but still retaining most of the information about the variations in the data point, where as the initial data set is in three dimentions.

For an intiutive understanding, follow the second referance.

References

The Mathematics Behind Principal Component Analysis by Akash Dubey (Example taken from)
StatQuest with Josh Starmer (link). (For visualization)
CS460, lecture by Dr. Subhankar Mishra, NISER