Saturday, April 27, 2013

Incremental SVD

Here is an email I got from Prof. Magnasco from Rockefeller University, NY:

Hi Danny, I've seen some of your posts regarding lanczos and thought I'd ask you.

I have a problem where I need to compute the first few hundred eigenvectors/values in the PCA of a large, dense dataset, about ten thousand times five million. (It's an avi of a fluorescence microscopy experiment). For the larger datasets I might not even be able to hold the entire set in memory. The normal approach would be to compute the 10000^2 A*A' matrix and then diagonalize it, the problem being that this matrix entails 12 million dot products of 5M element vectors. So I was hoping to find a method that iteratively computes the eigenvectors without such explicit evaluations. Is Lanczos numerically stable enough for such sizes? 

Thanks, 

Marcelo

__________________________________________________________

Marcelo Magnasco     Box 212, 1230 York Avenue, NY NY10065
Professor and Head,      +1 212 3278542 f +1 212 3277422
Mathematical Physics Lab  http://sur.rockefeller.edu/Plone
The Rockefeller University        magnasco@rockefeller.edu




As a great coincidence, I just heard about the same problem solution from Byron Boots, a postdoc in UW:

The incremental SVD papers that I am thinking of have been written by Matthew Brand. He has several papers on this topic, but the one that I have been using the most is this one: http://www.merl.com/papers/docs/TR2006-059.pdf

Unfortunately I am not aware of software package which implements the Brand method. You will probably have to implement it yourself.

Additional resource I found is another paper by Brand:

Brand, M.E., Incremental Singular Value Decomposition of Uncertain Data with Missing ValuesEuropean Conference on Computer Vision (ECCV), Vol 2350, pps 707-720, May 2002 (Lecture Notes in Computer Science)

Which has a better explanation of the setup.

3 comments:

  1. I don't know him in person - do you know him?
    :-)

    ReplyDelete
  2. There is work by Gu and Eisenstat on incremental SVD.

    Now regarding his problem:

    1) He is really interested in computing the SVD of A, so no need to form A'*A (which is actually not the suggested way to compute the SVD of A). One can use dense SVD methods, and compute all singular values. I would suggest trying the package Elemental.

    2) Since he wants only a few hundred singular values, iterative methods might work better. I would suggest taking a look at ARPACK (it computes eigenvalues, but you can compute the eigenvalus of [0 A'; A 0]).

    3) If low accuracy is enough, one can try the randomized PCA methods. Just google those.

    ReplyDelete