Small trick to compute PCA for large training datasets

In the last post, we talked about the choice of method for dimensionality reduction and how PCA is sometimes overused due to its popularity and available implementation in several libs. Having said that, we still gonna talk about PCA, because we use it a lot! 🙂 If you do not know how PCA works, a very good introduction to PCA is the one by Lindsay Smith, check it out. The PCA is simply defined as the eigenvectors of the data covariance matrix. When you want to reduce dimensionality, only the N highest eigenvalues associated with eigenvectors are kept, forming a base for reduction. If this sounds weird, don’t worry, the important part here is the covariance matrix and it’s the main cause of headaches for large datasets.

Imagine that we a have data points that have a very large dimensionality, such as 50k, for instance (this is very common in Computer Vision!). The covariance matrix of any number of points it’s going to be a 50k X 50k matrix, which is huge. To get an idea of how huge, if we use the standard 8 bytes for each float, this matrix will be around 18GB of memory, just to describe it! This problem is well known and one nice (and quite old) trick to compute PCA is described in the seminal paper of Eigenfaces [1]. So, assuming that we arrange our P points of M-dimensionality in an M by P matrix, the algebraic way of computing the covariance matrix is:

CodeCogsEqn

If M = 50k as in our example, this matrix will be huge, as we said. But, if we are interested in extracting only the eigenvalue/vectors from this matrix, we can compute it from the following matrix:

CodeCogsEqn

It turns out that we can extract the original eigenvalues/vectors using this matrix, excluding the eigenvalues (and associated vectors) which are equal to zero. In order to do that we multiply the eigenvectors after extraction with A, v = Av. This is such a nice trick because usually P << M, i.e. the number of examples of our training dataset is much smaller than the dimensions. Both MATLAB and OpenCV have this small tweak for PCA reduction implemented:

coeff = pca(X, 'Economy', true); % for MATLAB
cv::calcCovarMatrix(..., cv::CV_COVAR_SCRAMBLED) // in OpenCV

There are still some cases where P is also very large, so the main approach is to extract an approximation of PCA. We’ll be talking about it in the next post.

References:

[1] Turk, Matthew, and Alex P. Pentland. “Face recognition using eigenfaces.”Computer Vision and Pattern Recognition, 1991. Proceedings CVPR’91., IEEE Computer Society Conference on. IEEE, 1991.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s