Principal component analysis or PCA in short is a methord mainly used to reduce the dimensionality of a data set containing a lot of interrelated variables. It is importante to retain as much information as possible regarding the variations in the original data while reducing the dimention. This is achived through computing the principal components or PCs of the data set which by design will accumilate most of the variations in the first few PCs when ordered in the decending order of variance.
The method to find PCs uses the following operations:
Consider a data set 'A' with the marks of five students in three subjects.
Now the mean score in each subject is given as:
Similarly a covariance matrix is created for the 3x3 data set 'A' which is given above:
Find the eigenvalues for the above covariance matrix, they come to be:
where W' is the transpose of the NxM matrix W, were 'N' is the dimention of the reduced PCA graph and 'M' is the dimention of the initial set.
Following from the above example, consider the first 2 eigenvalues and its corresponding eigenvectors, we get a 2x3 matrixThus the PCA graph in this case will be two dimentional, but still retaining most of the information about the variations in the data point, where as the initial data set is in three dimentions.
For an intiutive understanding, follow the second referance.