Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Clustering coefficient
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
{{Short description|Measure of how connected and clustered a node is in its graph}} In [[graph theory]], a '''clustering coefficient''' is a measure of the degree to which nodes in a graph tend to cluster together. Evidence suggests that in most real-world networks, and in particular [[social network]]s, nodes tend to create tightly knit groups characterised by a relatively high density of ties; this likelihood tends to be greater than the average probability of a tie randomly established between two nodes (Holland and Leinhardt, 1971;<ref>{{Cite journal | author = P. W. Holland | author-link = P. W. Holland | author2 = S. Leinhardt | author2-link = S. Leinhardt | name-list-style = amp | title = Transitivity in structural models of small groups | year = 1971 | journal = Comparative Group Studies | volume = 2 | issue = 2 | pages = 107–124 | doi=10.1177/104649647100200201 | s2cid = 145544488 }}</ref> Watts and Strogatz, 1998<ref name=WattsStrogatz1998>{{Cite journal | author = D. J. Watts | author-link = D. J. Watts | author2 = Steven Strogatz | author2-link = Steven Strogatz | name-list-style = amp | title = Collective dynamics of 'small-world' networks |date=June 1998 | journal = [[Nature (journal)|Nature]] | volume = 393 | pages = 440–442 | doi = 10.1038/30918 | pmid = 9623998 | issue = 6684 | bibcode=1998Natur.393..440W | s2cid = 4429113 }}</ref>). Two versions of this measure exist: the global and the local. The global version was designed to give an overall indication of [[Clustering_coefficient#Global_clustering_coefficient|the clustering in the network]], whereas the local gives an indication of the [[Clustering_coefficient#Local_clustering_coefficient|extent of "clustering" of a single node]]. == Local clustering coefficient == [[Image:Clustering coefficient example.svg|thumb|Example local clustering coefficient on an undirected graph. The local clustering coefficient of the blue node is computed as the proportion of connections among its neighbours which are actually realised compared with the number of all possible connections. In the figure, the blue node has three neighbours, which can have a maximum of 3 connections among them. In the top part of the figure all three possible connections are realised (thick black segments), giving a local clustering coefficient of 1. In the middle part of the figure only one connection is realised (thick black line) and 2 connections are missing (dotted red lines), giving a local cluster coefficient of 1/3. Finally, none of the possible connections among the neighbours of the blue node are realised, producing a local clustering coefficient value of 0.]] The '''local clustering coefficient''' of a [[vertex (graph theory)|vertex]] (node) in a [[Graph (discrete mathematics)|graph]] quantifies how close its [[Neighbourhood (graph theory)|neighbours]] are to being a [[Clique (graph theory)|clique]] (complete graph). [[Duncan J. Watts]] and [[Steven Strogatz]] introduced the measure in 1998 to determine whether a graph is a [[small-world network]]. A graph <math>G=(V,E)</math> formally consists of a set of vertices <math>V</math> and a set of edges <math>E</math> between them. An edge <math>e_{ij}</math> connects vertex <math>v_i</math> with vertex <math>v_j</math>. The [[Neighbourhood (graph theory)|neighborhood]] <math> N_i </math> for a vertex <math>v_i</math> is defined as its immediately connected neighbours as follows: : <math>N_i = \{v_j : e_{ij} \in E \lor e_{ji} \in E\}.</math> We define <math>k_i</math> as the number of vertices, <math>|N_i|</math>, in the neighbourhood, <math>N_i</math>, of vertex <math>v_i</math>. The local clustering coefficient <math>C_i</math> for a vertex <math>v_i</math> is then given by a proportion of the number of links between the vertices within its neighbourhood divided by the number of links that could possibly exist between them. For a directed graph, <math>e_{ij}</math> is distinct from <math>e_{ji}</math>, and therefore for each neighbourhood <math>N_i</math> there are <math>k_i(k_i-1)</math> links that could exist among the vertices within the neighbourhood (<math>k_i</math> is the number of neighbours of a vertex). Thus, the '''local clustering coefficient for directed graphs''' is given as <ref name=WattsStrogatz1998 /> : <math>C_i = \frac{|\{e_{jk}: v_j,v_k \in N_i, e_{jk} \in E\}|}{k_i(k_i-1)}.</math> An undirected graph has the property that <math>e_{ij}</math> and <math>e_{ji}</math> are considered identical. Therefore, if a vertex <math>v_i</math> has <math>k_i</math> neighbours, <math>\frac{k_i(k_i-1)}{2}</math> edges could exist among the vertices within the neighbourhood. Thus, the '''local clustering coefficient for undirected graphs''' can be defined as : <math>C_i = \frac{2|\{e_{jk}: v_j,v_k \in N_i, e_{jk} \in E\}|}{k_i(k_i-1)}.</math> Let <math>\lambda_G(v)</math> be the number of triangles on <math>v \in V(G)</math> for undirected graph <math>G</math>. That is, <math>\lambda_G(v)</math> is the number of subgraphs of <math>G</math> with 3 edges and 3 vertices, one of which is <math>v</math>. Let <math>\tau_G(v)</math> be the number of triples on <math>v \in G</math>. That is, <math>\tau_G(v)</math> is the number of subgraphs (not necessarily induced) with 2 edges and 3 vertices, one of which is <math>v</math> and such that <math>v</math> is incident to both edges. Then we can also define the clustering coefficient as : <math>C_i = \frac{\lambda_G(v)}{\tau_G(v)}.</math> It is simple to show that the two preceding definitions are the same, since : <math>\tau_G(v) = C({k_i},2) = \frac{1}{2}k_i(k_i-1).</math> These measures are 1 if every neighbour connected to <math>v_i</math> is also connected to every other vertex within the neighbourhood, and 0 if no vertex that is connected to <math>v_i</math> connects to any other vertex that is connected to <math>v_i</math>. Since any graph is fully specified by its [[adjacency matrix]] ''A'', the local clustering coefficient for a simple undirected graph can be expressed in terms of ''A'' as:<ref name="Wang2017">{{cite journal |last1=Wang |first1=Yu |last2=Ghumare |first2=Eshwar |last3=Vandenberghe |first3=Rik |last4=Dupont |first4=Patrick |date=2017 |title=Comparison of Different Generalizations of Clustering Coefficient and Local Efficiency for Weighted Undirected Graphs |url=https://www.mitpressjournals.org/ |journal=Neural Computation |volume=29 |issue=2 |pages=313–331 |doi=10.1162/NECO_a_00914 |pmid=27870616 |s2cid=11000115 |access-date=August 8, 2020 |doi-access=free |archive-date=August 10, 2020 |archive-url=https://web.archive.org/web/20200810032708/https://www.mitpressjournals.org/ |url-status=live }}</ref> :<math> C_i=\frac{1}{k_i(k_i-1)}\sum_{j,k} A_{ij}A_{jk}A_{ki} </math> where: :<math> k_i=\sum_j A_{ij} </math> and ''C<sub>i</sub>''=0 when ''k<sub>i</sub>'' is zero or one. In the above expression, the numerator counts twice the number of complete triangles that vertex ''i'' is involved in. In the denominator, ''k<sub>i</sub><sup>2</sup>'' counts the number of edge pairs that vertex ''i'' is involved in plus the number of single edges traversed twice. ''k<sub>i</sub>'' is the number of edges connected to vertex i, and subtracting ''k<sub>i</sub>'' then removes the latter, leaving only a set of edge pairs that could conceivably be connected into triangles. For every such edge pair, there will be another edge pair which could form the same triangle, so the denominator counts twice the number of conceivable triangles that vertex ''i'' could be involved in. == Global clustering coefficient == The '''global clustering coefficient''' is based on triplets of nodes. A triplet is three nodes that are connected by either two (open triplet) or three (closed triplet) undirected ties. A [[triangle graph]] therefore includes three closed triplets, one centred on each of the nodes ([[Nota bene|n.b.]] this means the three triplets in a triangle come from overlapping selections of nodes). The global clustering coefficient is the number of closed triplets (or 3 x triangles) over the total number of triplets (both open and closed). The first attempt to measure it was made by Luce and Perry (1949).<ref>{{Cite journal | author = R. D. Luce | author-link = R. D. Luce | author2 = A. D. Perry | author2-link = A. D. Perry | name-list-style = amp | title = A method of matrix analysis of group structure | year = 1949 | journal = Psychometrika | volume = 14 | pages = 95–116 | doi = 10.1007/BF02289146 | issue = 1 | pmid=18152948 | hdl = 10.1007/BF02289146 | s2cid = 16186758 | hdl-access = free }}</ref> This measure gives an indication of the clustering in the whole network (global), and can be applied to both undirected and directed networks (often called transitivity, see Wasserman and Faust, 1994, page 243<ref>[[Stanley Wasserman]], Katherine Faust, 1994. ''Social Network Analysis: Methods and Applications.'' Cambridge: Cambridge University Press.</ref>). The global clustering coefficient is defined as: : <math>C = \frac{\mbox{number of closed triplets}}{\mbox{number of all triplets (open and closed)}}</math>. The number of closed triplets has also been referred to as 3 × triangles in the literature, so: : <math>C = \frac{3 \times \mbox{number of triangles}}{\mbox{number of all triplets}}</math>. A generalisation to [[weighted networks]] was proposed by Opsahl and Panzarasa (2009),<ref>{{Cite journal | author = Tore Opsahl | author-link = Tore Opsahl | author2 = Pietro Panzarasa | author2-link = Pietro Panzarasa | name-list-style = amp | title = Clustering in Weighted Networks | url = http://toreopsahl.com/2009/04/03/article-clustering-in-weighted-networks/ | year = 2009 | journal = Social Networks | volume = 31 | pages = 155–163 | doi = 10.1016/j.socnet.2009.02.002 | issue = 2 | access-date = 2009-06-11 | archive-date = 2019-07-01 | archive-url = https://web.archive.org/web/20190701224513/https://toreopsahl.com/2009/04/03/article-clustering-in-weighted-networks/ | url-status = live }}</ref> and a redefinition to two-mode networks (both binary and weighted) by Opsahl (2009).<ref name="Tore Opsahl 2009">{{Cite journal | author = Tore Opsahl | author-link = Tore Opsahl | title = Clustering in Two-mode Networks | url = http://toreopsahl.com/2009/09/11/clustering-in-two-mode-networks/ | year = 2009 | journal = Conference and Workshop on Two-Mode Social Analysis (Sept 30-Oct 2, 2009) | access-date = September 11, 2009 | archive-date = March 21, 2016 | archive-url = https://web.archive.org/web/20160321002314/http://toreopsahl.com/2009/09/11/clustering-in-two-mode-networks/ | url-status = live }}</ref> Since any simple graph is fully specified by its [[adjacency matrix]] ''A'', the global clustering coefficient for an undirected graph can be expressed in terms of ''A'' as: :<math> C=\frac{\sum_{i,j,k} A_{ij}A_{jk}A_{ki}}{\frac{1}{2}\sum_i k_i(k_i-1)} </math> where: :<math> k_i=\sum_j A_{ij} </math> and ''C''=0 when the denominator is zero. === Network average clustering coefficient === As an alternative to the global clustering coefficient, the overall level of clustering in a network is measured by Watts and Strogatz<ref name=WattsStrogatz1998 /> as the average of the local clustering coefficients of all the vertices <math>n</math> :<ref>{{Cite book|last= Kemper |first=Andreas|title=Valuation of Network Effects in Software Markets: A Complex Networks Approach|publisher=Springer|year=2009|isbn=9783790823660|page=142|url=https://books.google.com/books?id=isdza0IiXbUC&pg=PA142}}</ref> : <math>\bar{C} = \frac{1}{n}\sum_{i=1}^{n} C_i.</math> This metric places more weight on the low degree nodes, while the transitivity ratio places more weight on the high degree nodes. A generalisation to [[weighted networks]] was proposed by Barrat et al. (2004),<ref>{{Cite journal | first1= A. |last1= Barrat |first2= M. |last2= Barthelemy |first3= R. |last3= Pastor-Satorras |first4= A. |last4= Vespignani | title = The architecture of complex weighted networks | year = 2004 | journal = Proceedings of the National Academy of Sciences | volume = 101 | pages = 3747–3752 | doi = 10.1073/pnas.0400087101 | issue = 11 | pmid = 15007165 | pmc = 374315 | bibcode=2004PNAS..101.3747B|arxiv = cond-mat/0311416 |doi-access= free }}</ref> and a redefinition to [[bipartite graph]]s (also called two-mode networks) by Latapy et al. (2008)<ref>{{Cite journal | first1 = M. |last1= Latapy |first2= C. |last2= Magnien |first3= N. |last3= Del Vecchio | title = Basic Notions for the Analysis of Large Two-mode Networks | year = 2008 | journal = Social Networks | volume = 30 | pages = 31–48 | issue = 1 | doi = 10.1016/j.socnet.2007.04.006|url= https://hal.archives-ouvertes.fr/hal-01146080/file/Latapy2008.pdf }}</ref> and Opsahl (2009).<ref name="Tore Opsahl 2009"/> Alternative generalisations to weighted and [[directed graph]]s have been provided by Fagiolo (2007)<ref>{{Cite journal | first1= G. |last1= Fagiolo | title = Clustering in complex directed networks | year = 2007 | journal = Physical Review E | volume = 76|issue= 2 Pt 2 |pages= 026107 | arxiv=physics/0612169 | doi= 10.1103/PhysRevE.76.026107 |pmid= 17930104 |citeseerx= 10.1.1.262.1006 |s2cid= 2317676 }}</ref> and Clemente and Grassi (2018).<ref>{{Cite journal | first1 = G.P. |last1= Clemente |first2= R. |last2= Grassi | title = Directed clustering in weighted networks: A new perspective | year = 2018 | journal = Chaos, Solitons & Fractals | volume = 107 | pages = 26–38 | doi = 10.1016/j.chaos.2017.12.007|arxiv= 1706.07322 |bibcode= 2018CSF...107...26C |s2cid= 21919524 }}</ref> This formula is not, by default, defined for graphs with isolated vertices; see Kaiser (2008)<ref>{{Cite journal | first= Marcus | last= Kaiser |title = Mean clustering coefficients: the role of isolated nodes and leafs on clustering measures for small-world networks | year = 2008 | journal = New Journal of Physics | volume = 10 | pages = 083042 | issue = 8 | doi = 10.1088/1367-2630/10/8/083042|bibcode = 2008NJPh...10h3042K |arxiv = 0802.2512 | s2cid= 16480565 }}</ref> and Barmpoutis et al.<ref name=BarmpoutisMurray2010>{{Cite arXiv | first1= D. |last1= Barmpoutis |first2=R. M. |last2= Murray | title = Networks with the Smallest Average Distance and the Largest Average Clustering | eprint = 1007.4031 | year = 2010 | class = q-bio.MN}}</ref> The networks with the largest possible average clustering coefficient are found to have a modular structure, and at the same time, they have the smallest possible average distance among the different nodes.<ref name=BarmpoutisMurray2010 /> ==Percolation of clustered networks== For a random [[Tree (graph theory)|tree-like]] network without degree-degree correlation, it can be shown that such network can have a [[giant component]], and the [[percolation threshold]] (transmission probability) is given by <math>p_c = \frac{1}{g_1'(1)}</math>, where <math>g_1(z)</math> is the [[Degree distribution#Generating functions method|generating function]] corresponding to the [[Degree distribution#Generating functions method|excess degree distribution]]. In networks with low clustering, <math> 0 < C \ll 1 </math>, the critical point gets scaled by <math> (1-C)^{-1} </math> such that: <math>p_c = \frac{1}{1-C}\frac{1}{g_1'(1)}.</math><ref>{{Cite journal|last1=Berchenko|first1=Yakir|last2=Artzy-Randrup|first2=Yael|last3=Teicher|first3=Mina|last4=Stone|first4=Lewi|date=2009-03-30|title=Emergence and Size of the Giant Component in Clustered Random Graphs with a Given Degree Distribution|url=https://link.aps.org/doi/10.1103/PhysRevLett.102.138701|journal=Physical Review Letters|language=en|volume=102|issue=13|pages=138701|doi=10.1103/PhysRevLett.102.138701|pmid=19392410|issn=0031-9007|access-date=2022-02-24|archive-date=2023-02-04|archive-url=https://web.archive.org/web/20230204143725/https://journals.aps.org/prl/abstract/10.1103/PhysRevLett.102.138701|url-status=live}}</ref> This indicates that for a given degree distribution, the clustering leads to a larger percolation threshold, mainly because for a fixed number of links, the clustering structure reinforces the core of the network with the price of diluting the global connections. For networks with high clustering, strong clustering could induce the core–periphery structure, in which the core and periphery might percolate at different critical points, and the above approximate treatment is not applicable.<ref>{{Cite journal|last1=Berchenko|first1=Yakir|last2=Artzy-Randrup|first2=Yael|last3=Teicher|first3=Mina|last4=Stone|first4=Lewi|date=2009-03-30|title=Emergence and Size of the Giant Component in Clustered Random Graphs with a Given Degree Distribution|url=https://link.aps.org/doi/10.1103/PhysRevLett.102.138701|journal=Physical Review Letters|language=en|volume=102|issue=13|pages=138701|doi=10.1103/PhysRevLett.102.138701|pmid=19392410|issn=0031-9007|access-date=2022-02-24|archive-date=2023-02-04|archive-url=https://web.archive.org/web/20230204143725/https://journals.aps.org/prl/abstract/10.1103/PhysRevLett.102.138701|url-status=live}}</ref> For studying the robustness of clustered networks a percolation approach is developed.<ref>{{cite journal |title =Random Graphs with Clustering|author=M. E. J. Newman |journal=Phys. Rev. Lett. |date=2009 |volume=103 |issue=5 |pages=058701|doi=10.1103/PhysRevLett.103.058701 |pmid=19792540 |arxiv=0903.4009 |s2cid=28214709 }}</ref><ref>{{cite journal |title=Cascades on a class of clustered random networks|author=A. Hackett|author2=S. Melnik|author3=J. P. Gleeson |name-list-style=amp|journal=Phys. Rev. E |date=2011 |volume=83 |issue=5 Pt 2|pages=056107|doi=10.1103/PhysRevE.83.056107|pmid=21728605|arxiv=1012.3651|s2cid=18071422}}</ref> ==See also== *[[Directed graph]] *[[Graph theory]] *[[Network theory]] *[[Network science]] *[[Percolation theory]] *[[Scale free network]] *[[Small-world network|Small world]] ==References== {{Reflist}} ==External links== *{{Commons category-inline}} {{DEFAULTSORT:Clustering Coefficient}} [[Category:Graph invariants]] [[Category:Algebraic graph theory]] [[Category:Network theory]] [[Category:Network analysis]]
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)
Pages transcluded onto the current version of this page
(
help
)
:
Template:Cite arXiv
(
edit
)
Template:Cite book
(
edit
)
Template:Cite journal
(
edit
)
Template:Commons category-inline
(
edit
)
Template:Reflist
(
edit
)
Template:Short description
(
edit
)