The detection of communities in social networks is a challenging task. A rigorous way to model communities considers maximal cliques, that is, maximal subgraphs in which each pair of nodes is connected by an edge. State-of-the-art strategies for finding maximal cliques in very large networks decompose the network in blocks and then perform a distributed computation. These approaches exhibit a trade-off between efficiency and completeness: decreasing the size of the blocks has been shown to improve efficiency but some cliques may remain undetected since high-degree nodes, also called hubs, may not fit with all their neighborhood into a small block. In this paper, we present a distributed approach that, by suitably handling hub nodes, is able to detect maximal cliques in large networks meeting both completeness and efficiency. The approach relies on a two-level decomposition process. The first level aims at recursively identifying and isolating tractable portions of the network. The second level further decomposes the tractable portions into small blocks. We demonstrate that this process is able to correctly detect all maximal cliques, provided that the sparsity of the network is bounded, as it is the case of real-world social networks. An extensive campaign of experiments confirms the effectiveness, efficiency, and scalability of our solution and shows that, if hub nodes were neglected, significant cliques would be undetected.
Conte, A., DE VIRGILIO, R., Maccioni, A., Patrignani, M., Torlone, R. (2016). Finding All Maximal Cliques in Very Large Social Networks. In Proceedings of the 19th International Conference on Extending DatabaseTechnology (EDBT 2016) (pp.173-184). OpenProceedings.org [10.5441/002/edbt.2016.18].
Finding All Maximal Cliques in Very Large Social Networks
DE VIRGILIO, ROBERTO;PATRIGNANI, Maurizio;TORLONE, Riccardo
2016-01-01
Abstract
The detection of communities in social networks is a challenging task. A rigorous way to model communities considers maximal cliques, that is, maximal subgraphs in which each pair of nodes is connected by an edge. State-of-the-art strategies for finding maximal cliques in very large networks decompose the network in blocks and then perform a distributed computation. These approaches exhibit a trade-off between efficiency and completeness: decreasing the size of the blocks has been shown to improve efficiency but some cliques may remain undetected since high-degree nodes, also called hubs, may not fit with all their neighborhood into a small block. In this paper, we present a distributed approach that, by suitably handling hub nodes, is able to detect maximal cliques in large networks meeting both completeness and efficiency. The approach relies on a two-level decomposition process. The first level aims at recursively identifying and isolating tractable portions of the network. The second level further decomposes the tractable portions into small blocks. We demonstrate that this process is able to correctly detect all maximal cliques, provided that the sparsity of the network is bounded, as it is the case of real-world social networks. An extensive campaign of experiments confirms the effectiveness, efficiency, and scalability of our solution and shows that, if hub nodes were neglected, significant cliques would be undetected.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.