Thursday, May 26, 2011

Label propagation: should one use GraphLab or Hadoop?

I got a question today from Mike Khristo , who is working on building a collaborative filtering engine, about implementing label propagation in GraphLab.  Label propagation is rather a simple algorithm for ranking items, see a nice tutorial I got from Mike: http://skillsmatter.com/podcast/nosql/algorithms-hadoop (see around minute 13 of this video).

The algorithm is very simple to implement in GraphLab. One of our GraphLab users, Akshay Bat, a Cornell graduate student, donated his implementation. He tested the algorithm using 40M Twitter users. 

Hadoop implementation of the label propagation algorithm is described on:
GraphLab performance is described on:

Specifically, we got the following feedback:
"The Graphlab implementation is significantly faster than the Hadoop implementation, and requires much less resources.It is extremely efficient for networks with millions of nodes and billions of edges, as shown by its performance on the Twitter Social network from June 2009.I haven't yet come across any other implementation or a paper describing an implementation of community detection algorithm which can scale for such large networks."
    - Akshay Bhat



If anyone else is using label propagation with GraphLab I would love to know!


1 comment:

  1. The Label Propagation algorithm used for Classification and Ranking, is different from Label Propagation algorithm used for Community Detection.

    What I have implemented/tested is the second one, described here http://pre.aps.org/abstract/PRE/v76/i3/e036106. The both algorithms are different in their origins and uses.

    ReplyDelete