In the field of social community detection, it is commonly accepted to utilize graphs with reference community structure for accuracy evaluation. The resulting accuracy value is obtained by directly comparing the ground-truth set of communities with the one produced by the algorithm. Therefore, a generic tool capable of generating random social graphs with realistic community structure and diverse properties is required. As soon as populations of modern social networks reach billion users in size, the tool must be scalable enough to produce synthetic networks of similar scale.

The method for generating large random social graphs with realistic community structure is introduced in the paper. The resulting graphs have several of recently discovered properties of social community structure which run counter to conventional wisdom: dense community overlaps, superlinear growth of number of edges inside a community with its size, and power law distribution of user-community memberships. Further, the method is by-design distributable and showed near-linear scalability in Amazon EC2 cloud using Apache Spark implementation.

Speaker: Kyrylo Chykhradze

presentation (pdf)