Wednesday, June 27, 2012

Triangle Counting in GraphLab

My collaborator Yucheng Low have implemented a cool and simple triangle counting algorithm. The algorithm is based on the following PhD thesis: T. Schank. Algorithmic Aspects of Triangle-Based Network Analysis. Phd in computer science, University Karlsruhe, 2007.

 Triangle counting algorithm counts for each graph node, in how many graph triangles this node participates in. In my mind, triangle counting relates to statistical mechanics and is completely useless. However, as we will show below, there are actual some interesting intuitions when you apply it to twitter social network.

 I got the below text from Yucheng which simply summarizes the algorithm:  
 *
 * The procedure is quite straightforward:
 *   - each vertex maintains a list of all of its neighbors in a hash table.
 *   - For each edge (u,v) in the graph, count the number of intersections
 *     of the neighbor set on u and the neighbor set on v.
 *   - We store the size of the intersection on the edge.
 * 
 * This will count every triangle exactly 3 times. Summing across all the
 * edges and dividing by 3 gives the desired result.
 *
 * The preprocessing stage take O(|E|) time, and it has been shown that this
 * algorithm takes $O(|E|^(3/2))$ time.
 *
 * If we only require total counts, we can introduce a optimization that is
 * similar to the "forward" algorithm
 * described in thesis above. Instead of maintaining a complete list of all
 * neighbors, each vertex only maintains a list of all neighbors with
 * ID greater than itself. This implicitly generates a topological sort
 * of the graph.
 *
 * Then you can see that each triangle
  
     A----->C
     |     ^
     |   /
     v /
     B
 * Must be counted only once. (Only when processing edge AB, can one
 * observe that A and B have intersecting out-neighbor sets).

And here are some results mined from the Twitter graph and made visual by Joey Gonzalez.
The graph contains a twitter capture of 41M users from 2009 of the type user->follower.
Degree - the number of people the user is following.
Followers - the number of users following this user.
Cycle triangles - number of triangles of the form A->B->C->A for this user.
Through triangles - number of triangles of the type A->B, B->C, A<=>C for user B.
Triangles - number of triangles going out this user (A->B, A->C for user A).
In triangles - number of triangles entering this user (B->A, C->A, B<=>C for user A).




No comments:

Post a Comment