Quantifying ideological polarization on a network using generalized Euclidean distance

An intensely debated topic is whether political polarization on social media is on the rise. We can investigate this question only if we can quantify polarization, by taking into account how extreme the opinions of the people are, how much they organize into echo chambers, and how these echo chambers organize in the network. Current polarization estimates are insensitive to at least one of these factors: They cannot conclusively clarify the opening question. Here, we propose a measure of ideological polarization that can capture the factors we listed. The measure is based on the generalized Euclidean distance, which estimates the distance between two vectors on a network, e.g., representing people’s opinion. This measure can fill the methodological gap left by the state of the art and leads to useful insights when applied to real-world debates happening on social media and to data from the U.S. Congress.

1 Alternative Measures

Assortativity
Assortativity is a way to quantify the extent to which nodes with similar values connect to each other (31). The specific formula we use to estimate assortativity is the one developed by Newman (30). This measure is calculated by creating a matrix e. Each entry e xy in the matrix contains the fraction of the edges connecting two nodes in the network with values x and y of the measure of interest. This can be considered as a probability of the edge existing, implying that P xy e xy = 1.
For simplicity, we also record the fraction of edges originating from a node with value x ( P y e xy = a x ) and the fraction of edges ending in a node with value y ( P x e xy = b y ). If a network is undirected and unipartite, then a x = b y .
The assortativity is thus the Pearson correlation coefficient of x and y, which is: where σ a and σ b are the standard deviations of the distributions a x and b y . In this formulation, one needs the topology of G and the node values from the opinion vector o to build the matrix e (and, as a consequence, a x and b y ).
It is not hard to see that assortativity cannot capture changes in the opinion component as Assortativity also gives diminishing returns in capturing the structural component. It is linearly related to p out , the probability of nodes in a community to connect outside their community, but linear changes in p out have little effect for high p out and large effects for low p outas Figure S1 shows. Networks that score the medium assortativity value (ρ G,o = 0.5, we show a network with this value in the callout in the middle) are hardly distinguishable from networks scoring the minimum value (ρ G,o = 0.0, we show an example of a network in the callout on the left). The regime in which networks have well-defined communities occupies a small fraction of the assortativity's domain.
In summary, assortativity can capture the alignment of opinions with the network struc- ture although it progressively loses sensitivity as we have better-defined communities, and it is wholly insensitive to the distribution of opinions.

Random Walk Controversy
Random Walk Controversy (RW C G ) is a measure that assumes the network can be partitioned into two opposing communities and then estimates the probability of a random walker to be able to transition from one community to another (37). The first operation is to bisect the graph into two communities. In the original paper, the authors use the METIS algorithm (79), although one could use any community discovery algorithm that can take the number of desired communities as input and that finds communities using a comparable definition as the one used by METIS (80).
Let us assume that we now have two communities C 1 and C 2 , which are disjoint sets of nodes (C 1 ∩ C 2 = ∅) that cover the entire node set of G (C 1 ∪ C 2 = V ). RW C G simulates a number of random walks -by default 10 times the number of nodes in the network -and records four probabilities: p C 1 ,C 2 , p C 2 ,C 1 , p C 1 ,C 1 , and p C 2 ,C 2 . In general, p x,y records the probability for a random walker to start in partition x and end in partition y. Half of the walkers start from C 1 and half from C 2 . The walk terminates when it visits any node from a 10% random sample (from either side). Then: This measure is equal to +1 when all random walkers stay in their starting community, and equal to −1 when all random walkers end in the opposite community. A few things should be noted here.
First, the bisection into communities can be largely suboptimal because it is independent from the opinion vector o. If there are no clear-cut communities aligning themselves with o, then RW C G 's estimation is necessarily wrong. One could skip the community discovery and assign nodes to two communities according to whether their values in o are above or below 0. In either case, the actual distribution of o's values is irrelevant (it does not matter whether o values cluster tightly around 0 or at the extremes), and thus this measure cannot capture the opinion component by design, as we show in the main paper.
Second, the measure is not deterministic because it relies on a randomized simulation. One could fix this issue by deciding a fixed random walk length l, then take A l -with A being the stochastic adjacency matrix of G. The result is the probability of a random walk starting from any node to reach any other node in l steps. Then one could estimate RW C G with the formula above. In this case, RW C G would be exact and deterministic. However, we do not perform this correction to stick with the original definition of RW C G , noting that one could get different RW C G values even with the same G and o.
Finally, in the original paper (37), the authors define multiple measures of polarization.
Specifically, they also introduce versions using a random walk with restart, betweenness centrality, a 2D node embedding, Boundary Connectivity (36), and Dipole Moment (26). We do not investigate these variants as they are either highly correlated with RW C G , or suffering from the same drawbacks about not considering the opinion distribution o.

Average Neighbor Opinion
In this approach, we estimate polarization by creating a density plot of opinions (33). For each node i in the network, we record its value o i on the x axis. Then, the y axis value is derived The resulting density plot is then generated with a standard Kernel Density Estimation. A network is polarized if most of the density of points is concentrated among the top right and/or bottom left corners, implying that only nodes with similar values connect to each other.
As we show in the main paper, the issue with this approach is that it looks only at immediate neighbors, whereas any mesoscale organization of the network is not captured.

Influenced Set Opinion
This approach also estimates polarization graphically (34). Each node is considered in turn as the origin of a Susceptible-Infected-Recovered (SIR) epidemic event. In SIR, nodes start in a Susceptible state. They transition to Infected with a certain rate β when one or more of their neighbors are Infected. Finally, there is a recovery rate γ, which makes them transition to the Recovered state (81). Recovered nodes cannot be infected anymore. The process ends when all nodes are either in the Susceptible or Recovered state, and no more propagation can happen. In addition to epidemics, the SIR process can also model the spread of information or rumors on a network.
Say that R i is the set of nodes in the Recovered state after the rumor propagates from node i.
We can calculate their average opinion as P Thus, this approach is exactly the same as the one we describe in the previous section, with the difference that we look at R i (the set of infected nodes) rather than N i (the direct neighbors of i).
Finally, rather than looking at the density plot, we bin all origin nodes according to their opinion value and show the distribution of the average influenced set nodes for each bin. A polarized network will show boxplots clustering on the top right and bottom left quadrants.
This can fix some of the conceptual problems of using the average opinion of the neighbours N i , since R i contains nodes that are not directly connected to i. However, a few more conceptual issues arise.
First, the β and γ parameters regulating the SIR event -as well as the number of bins in the plot -are chosen somewhat arbitrarily. While the authors of (34) show that this does not significantly impact the results as long as they are chosen within reasonable bounds, instructions on which bounds are reasonable are absent and rely on judgment calls from the analyst.
Second, just like RW C G , the process relies on a randomized simulation, and thus can result in different estimations even with the same G and o. space and connected with their two closest neighbors with the exception of the two endpoints of the chain, which are only connected to their closest neighbor.
Then we generate a corresponding o vector. All entries in o are set to zero except the two entries for the endpoints of which one is set to −1 and the other to +1. We then grow the chain graphs by extending the chain as Figure S2 shows in the top row.
For each additional node in the chain, δ G,o grows by a predictable amount. Specifically, Figure S2 (bottom row). This is a known result in line with what was shown in the original paper defining the baseline node vector distance measure (24). This is the reason why it is not possible to simply normalize δ G,o to be defined between 0 and 1. There is not a well-defined maximum that can be used as a normalization factor.
One could narrow down the problem. Rather than finding the maximum of δ G,o for any G, Proof: Since both L † and the domain for o are bounded, δ G,o will be bounded from above by some finite number and thus at least a supremum exists. As the domain [−1, 1] |V | is compact, this supremum is achieved by some o ⋆ , the optimal solution to MPP. There is no uniqueness, e.g., δ G,o ⋆ = δ G,−o ⋆ , but this is not required for the proof. Assume for contradiction that there is a node i for which o ⋆ i ∈ (−1, +1), i.e., not equal to ±1. Consider what happens if we increase or decrease the opinion of i, i.e., o ⋆ i ± ϵ, with ϵ small enough such that this lies in [−1, 1]. Then the node vector distance becomes: where e i is the i th unit vector. Since ϵ 2 > 0 and L † is positive semidefinite with kernel L † o = 0 if and only if o is a constant vector (which e i is not), the second term is positive. For the last term, we may choose the sign of ϵ such that the term becomes positive, i.e. sign(ϵ) = But this is in contradiction with o ⋆ being the maximum polarization vector, hence our assumption that a node i with o i ∈ (−1, 1) exists must be false and this concludes the proof.

□
Following this proposition, we can rewrite the MPP as: In this form, MPP is equivalent to the so-called (weighted) MAX CUT problem (82) where the weight signs are determined by the off-diagonal entries of L † . If the off-diagonal is nonpositive (≤ 0), this problem is known to be NP hard. If the entries have mixed signs, as in the case of MPP, the problem complexity transitions from hard to easy, as described in (82). But this transition is not fully understood and we cannot be conclusive on the complexity of MPP in general. have a slightly different topology, so the score is not exactly the same, but it will tend to the same limit.

Scale Invariance
To illustrate, let us consider a relatively simple topology. We have a network G with two cliques, connected by few edges between them. Specifically, the number of edges connecting the two cliques is 5% of the number of edges inside each clique. We can now grow the network by increasing the size of the two cliques one node at a time, while adding edges between the cliques to keep the density constant. Figure S3 shows examples of the generated graphs with their corresponding polarization score δ G,o . We can see that the scores tend to a finite limit. The larger the system, the closer the score gets to the limit, as the randomization of the connections between the cliques plays a smaller role in creating fluctuations in the score.
On the other hand, if we modify the actual topology of the network, the score will change even if the scale -the number of nodes -remains constant. We illustrate this with two further simulations.
In the first simulation, we have two equally sized communities, each containing 250 nodes.
One community has all positive random values in o while the other has negative random values.
We build this with a stochastic blockmodel, ensuring that there are roughly 150 edges between the two communities. We then start removing these edges between the blocks one by one.
Intuitively, this should make the network more polarized, as it becomes harder and harder for an opinion to propagate from one community to the other. This is exactly what we see in the scores of δ G,o , as Figure S4 (left) shows. The first removed edges have little impact, and the impact of each removed edge grows exponentially as we get closer and closer to isolating the two communities.
In the second simulation, we also have two blocks, but we start from an unbalanced situation.
One community has 495 nodes while the other only has 5. Every node in one community has an opinion value of +1, while every node in the other community has an opinion value of −1.
Then, we select one node at random from the large community and we move it to the small community. We remove all of its connections to its old community and we add connections to the new community at random, preserving its degree (if there are enough candidate neighbors in the community to do so). Its opinion value also flips to align with its new community.

Effect of Density & Fragmentation
δ G,o is sensitive to density, because the more dense a network is, the easier it is for an individual to be exposed to a dissenting point of view. This sensitivity can be tested in two ways.
First, we can test it directly. We create a network of 1,680 nodes and divide it in two cliques.
The probability of edges being established between the cliques is 5%, i.e., p out = 0.05. Then we start removing edges at random from this network, rendering it progressively less dense. Figure S5 (left) shows the growth of δ G,o as we remove more and more edges.
A second way for a network to become less dense is by fragmenting its community structure.
A network with many smaller cliques is less dense than a network with few large ones, even when keeping the number of nodes and p out constant. As before, δ G,o should grow because people become more and more isolated and it is therefore harder to be exposed to both opposing but also conforming views. To test this, we take our network of 1,680 nodes. We then divide it into a growing number of cliques, from 2 to 16. All nodes in each clique have an o value of either +1 or −1, and there is an equal number of cliques with either opinion value. We then connect cliques with p out = 0.05.
We only induce an even number of cliques to avoid having one opinion being over-represented in the leftover clique -1,680 is the smallest number divisible by all even numbers between 2 to 16.
Figure S5 (right) shows the growth of δ G,o as we increase the fragmentation in communities, which matches this intuition.

Main Results
To test our measure of polarization, we generate synthetic networks that vary along three different parameters and explore how δ G,o changes along. In Figures 2 to 4 in the main text, one parameter is varied at a time with the other two fixed but here we consider the full extent of the parameter space.
We explore the three parameters as follows. The first parameter (µ) regulates the divergence of opinions. Each side of the opinion vector o peaks at ±µ. Thus, if µ is 0.2, the left side of o peaks at −0.2 and the right side peaks at 0.2. For this reason, the higher the µ value, the more polarization there is, as Figure 2 in the main paper shows.
The second parameter (p out ) regulates the structural component by determining the probability of a node in a community to connect to a node in a different community -as we describe in the Materials and Methods section in the main paper. The lower p out , the more polarization there is, as nodes get progressively more isolated from opinions in outside communities.
Finally, for the mesoscale interplay, we use the parameter n which indicates how many communities a community can link to. We pick possible neighboring communities among the n  Table S1 shows all values of δ G,o across the three parameters. It reports the average of 25 randomly initialized runs. The standard errors of these means are in the order of 1% of the mean value. This shows that these scores are stable and reliable as there are no wild fluctuations from small differences in random initialization. Table S1 shows a smooth transition across these parameters with no significant discontinuities. This supports our claim in the main paper that δ G,o is sensitive to all these factors.  Table S1: The evolution of polarization scores across all parameters. Each cell reports the value of δ G,o for different values of the opinion component (µ, row groups), the structural component (p out , rows), and the opinion-structural mesolevel interplay (n, columns). Each cell background is colored proportionally to the value of δ G,o , from low (bright red) to high (dark red). tion due to the structural component. This is the case in which the network contains only one node. As soon as we have two nodes we can measure distances between individuals and, as a consequence, we have a non-zero contribution of the structural component to polarization and  Figure S6 shows what happens as we approach zero opinion divergence: δ G,o naturally captures the decrease in diversity of opinions. Both assortativity and RW C G are blind to those changes and still estimate a constant polarization value even for insignificant differences of opinion. The average neighbor opinion plot could capture the differences, provided the KDE is initialized with the proper parameters -here we fail to see a difference between the last two cases only because we have too large bandwidth. The average influenced set opinion can detect a lack of opinion divergence.

Secondary Polarization Components
It is possible to consider other aspects as potential components of polarization. For instance, in Figure 1 in the main paper, we move from a random G n,p graph with random opinion assignments to a graph with communities that have assortative opinions. In other words, we make communities whose nodes all occupy a contiguous portion of the opinion spectrum: a node with a given opinion will be embedded in a community whose nodes have a similar opinion to them.
In doing this, we are skipping over a potential polarization component: what happens if we To illustrate, we create a graph with 8 communities using an SBM and random o assignment with µ = 0.8 (high opinion divergence) and p out = 0.0003 (which should imply high structural separation). After 50 independent runs, we observe δ G,o = 9.63 ± 0.16. For each run, we also create a randomized version of the graph by performing enough double edge swaps to render the graph effectively random. This random graph has an indistinguishable polarization score: δ G,o = 9.64 ± 0.17. We do edge swaps rather than generating an alternative random graph because this way we can ensure that the graphs have the exact same number of edges.
The real factor that induces polarization in the system is a community's purity -how wellaligned communities are with o scores. If purity is 1, all nodes in all communities share similar opinion scores. If purity is 0, opinion scores are random. Figure S7 shows how δ G,o varies as we increase the purity of our communities. This empirical result matches with what we would expect theoretically: in the low purity networks, everyone is exposed to different views and the polarization score should be low.
Since one cannot have pure communities if there are no communities, we prefer using p out to discuss the structural component of polarization rather than purity.

Interpretation of the Units
In the Materials and Methods section in the main paper, we show the relationship between δ G,o and effective resistance and heat diffusion processes. Here, we show what happens if some of the assumptions we made do not hold, and we provide additional information about these relationships.

δ G,o and Effective Resistance in General
If the assumptions that P o + i = P o − i = 1 do not hold, and it is not reasonable to normalize the vectors such that we can make them hold by construction, we can still arrive at a similar interpretation. Letō := 1 |V | P o i be the overall mean opinion. Then we say that two individuals agree if they are on the same "side" ofō (both larger, or both smaller), and disagree if they are not. Then we assign to every node the variable y equal to y i = (o i −ō) 1 2 P |o i −ō| , and we consider the positive and negative opinions y + and y − as before. In other words, the variable y i captures how far an opinion is from the mean, and whether it is larger or smaller than the mean. We now notice that for y + and y − the following holds: by construction -i.e., by removing the mean -we know that the positive sum equals the negative sum, and by normalization with Z := 1 2 P |o i −ō|, the sums equal one. We can thus define the random variables Y + and Y − with distribution given by y + and y − and find that, in general, the polarization can be interpreted as:  Figure S8: The relationship between polarization score and heat diffusion equilibrium.
The squared δ G,o value (y axis) against the unit of time in which heat diffusion reaches equilibrium (x axis). Color encodes the opinion divergence factor µ. The blue line shows the best fits calculated for each µ factor independently.
so it is still the difference in distance between conflicting individuals and agreeing individuals, but in this case conflicting and agreeing is defined based on whether they are on the same side of the mean opinion or not; similarly the probability of an individual is proportional to their distance to the mean. Furthermore, the factor Z is included, which quantifies how far the opinions are, on average, from the mean opinion.

Heat Diffusion
In support of our interpretation of δ G,o as the (square root of the) time it takes for heat to diffuse in the network, we calculate both values on the collections of SBMs used to generate Table S1.
That is, we explore all possible values of opinion divergence µ, community interconnectedness p out , and mesolevel organization n. Figure S8 shows the result. We can see that the opinion divergence factor µ provides a constant multiplicative factor -if the opinion vectors are more extreme it takes a constant additional amount of time to reach equilibrium. If we take that factor out, δ 2 G,o correlates almost perfectly with the amount of time required for o to have a standard deviation lower than ϵ = 2 × 10 −4 .
The choice of ϵ does not matter, provided it is a small value.
All the power laws we calculated independently for each µ value have an R 2 ∼ 0.996, and an exponent approximately equal to 1, showing a linear relationship between δ 2 G,o and equilibrium time.

Relations to Network Covariance
As we show in the Materials and Methods section in the main paper, δ G,o de facto measures the Generalized Euclidean distance (24) between two vectors o + and o − on the nodes of a given graph G. We show here that this polarization measure can also be seen as a measure of covariance between the opposing opinion distributions, in the sense of (52). If V + are the nodes with positive opinion o i > 0 and V − the nodes with negative opinion, then we define the following joint distribution P between pairs of nodes which can also be written as a |V | × |V | matrix as: where o ± are the vectors with only positive (negative) opinions. This is a distribution which selects a random pair of nodes with opposing opinions, with probability of picking each node proportional to their opinion. So if o v = 2o w , then v is twice as likely to be picked as w. The covariance of this joint distribution, with respect to the effective resistance matrix Ω = (ω ij ), is calculated according to (52) as cov ω (P ) = 1 2 tr �� P 11 T P − P Ω (with 1 = (1, . . . , 1)) This shows that δ G,o is proportional to minus the covariance of distribution P . Our polarization measure captures the covariance of a pair of nodes with opposing opinions -a high covariance signifies that they are close together in the network, while a small covariance indicates that they are far apart. This is in line with how our measure intends to quantify polarization. United States, this approach cannot handle the case of multidimensional issues. Here, we briefly discuss how our binary measure can be extended to deal with these more complex cases.
Let us assume that any individual can support an opinion in a set of options {A 1 , A 2 , . . . , A k } -these can for instance be k political parties -with a certain strength given by the vector o A i that records for each individual how strongly they support opinion A i . An overall measure of polarization can then be obtained as an average over all possible pairwise conflicts, as If all the opinion vectors are normalized, then the same derivation we use in the Methods section in the main paper can be used here, which means that this multidimensional polarization can intuitively still be interpreted as a measure of the difference in (effective resistance) distance between disagreeing and agreeing individuals. We note that other extensions to multidimensional measures might be possible, for instance assigning a different weight to the different pairwise conflicts.

Behavior
We can show that the multidimensional version of our measure behaves in the same way as the original measure across the various components and it is equally intuitive. To do so, we repeat our synthetic experiments. Rather than having two vectors o + and o − , we now have eight different opinions since our synthetic networks contain eight communities. Each community i contains only nodes with nonzero values for opinion A i and zero for all other opinions A j . The opinion vectors are constructed with the same procedure outlined for the simple unidimensional case, with µ representing the mode value. We construct the network with the same algorithm, so also parameters p out and n retain their meaning.
By running the experiments, we can create an equivalent of Table S1 for the multidimensional case. Table S2 is the result. We can see that the overall behavior of the multidimensional measure is the same as the unidimensional one. We can then conclude that the interpretation of the two measures is the same, and that we can apply our extension to the multidimensional case.  Table S2: The evolution of multidimensional polarization scores across all parameters. Same legend as Table S1.
The only difference is that the absolute values of the measure tend to be smaller. This might be due to the fact that the average of all pairwise conflicts decreases as we increase the number of opinions, as by construction some opinions are closer to each other. However, we leave this investigation for future work, along with the consideration that taking the average of all pairwise conflicts might not necessarily be the best way to expand our measure.

Sensitivity to Measurement Errors
While we have assumed that the network G and opinion vector o are given as inputs from which we calculate the polarization, it is important to acknowledge that this data is at best an approximation of the real underlying opinions and social network connections. In particular, measurement errors and other inherent difficulties in obtaining and representing such complex data might mean that the available opinion vector o only records the opinions approximately.
This can be modeled as an additive measurement error o =ô + ϵ whereô is the real opinion vector and ϵ the error vector. If ϵ is large relative to the measurements, i.e., the errors are of the same order as the opinions (and uncorrelated), then there is no real hope for a measure that accurately represents the real polarization. If, on the other hand, the error vector is relatively small, we would hope that the polarization measure is a good approximation. This is the case for our proposed polarization measure. If we assume that ∥ϵ∥ is small, then the polarization δ G,o calculated based on the available opinion measurements o is close to the polarization δ G,ô calculated based on the underlying true opinionsô. In the derivation below, we show that where µ min (L) is the smallest nonzero Laplacian eigenvalue, which is a fixed number for a given network. The error in our polarization measure is thus at most proportional to the measurement error ∥ϵ∥.
The bound (1) follows from the assumption that ∥ϵ∥ is small: In step 1, we use the assumption that ϵ is small, such that ϵ T L † ϵ is negligible with respect to the other terms. In step 2, we use the Taylor expansion √ 1 + x = 1 + x 2 + O(x 2 ) in combination with the small-error assumption. In step 3, we invoke the Cauchy-Schwarz inequality Step 4, finally, follows from the fact that the largest eigenvalue of L † is the inverse of the smallest nonzero eigenvalue of the pseudoinverse Laplacian, as µ max (L † ) = µ −1 min (L).  Table S3: Summary statistics of the Twitter debate networks. For each network, we report: the number of nodes |V |, the number of edges |E|, the average shortest path length ℓ, the transitivity of the network ∆, and the modularity values Q we would get by bisecting the networks in two communities identified by having a o value of a given sign.

Summary Statistics
the one we would get with a random network with the same degree distribution as the original one but no communities (83). This is another rough estimate of structural separation. in modularity starting at the 99th Congress. We can also see that the share of congressmen with more extreme opinion scores steadily increases over time.

Polarization Components Correlation
Since our definition of polarization is made of two components -opinion and structure -and their mesoscale interplay, we are implicitly assuming that there is a correlation between them.
Here, we test whether this assumption holds on the real-world data gathered on Twitter and the   We are unable to estimate the opinion-structural mesoscale interplay, which we indicate with n. This is because real-world data does not organize as neatly as our synthetic experiments.
Even if there are multiple communities, they will always have some degree of interconnectedness. It follows that there is no easy way to calculate n, and since finding an appropriate way goes beyond the scope of this paper, we skip the estimation of this parameter. Figure S9 shows the relationship between µ and p out on all real-world networks we study.
We observe the expected negative relationship. High µ correspond to high opinion divergence which we expect to cause a high structural separation by isolating the communities, causing a low value for p out , which is the probability of edges appearing across communities.
The relationship is exponential (note the logarithmic scale in the y axis). In the US House of Representatives network, we obtain a −0.83 Pearson correlation coefficient and a −0.78 Spearman correlation coefficient, both significant as p < 0.001. We do not obtain significant coefficients for the Twitter networks, but this is most likely due to the small sample size. The We can conclude that indeed the structural and opinion components of our polarization definition are correlated and reinforce each other.

Communities in the Twitter Abortion Debate Network
In the main paper we argue that one of the reasons for the Twitter abortion debate network to score higher δ G,o values is the fact that the network has four nested communities rather than two flat ones. Here, we provide support for this statement. To do so, we infer the community structure of the network using the same method we used in the previous section. Figure S10 shows the result. On the left, we depict the network as it is in Figure 5    between moderates and extremists. This is confirmed by the SBM community inference, which is shown on the right of Figure S10. Here, we can see that both the left and the right side are further split in two communities which, upon visual inspection, overlap with the brighter and darker hues on either side. This is not just a visual artifact. Table S5 reports the probability of a block community to connect to another block community. We can see that the diagonal entries for communities paired on the same side of the network are higher than the ones with their community companion. This means that, e.g., the green community is more tightly knit with itself than with the blue community. If the network had only two communities and this hierarchical division was an artifact, there should be no difference in the edge probabilities between the green and blue block. The fact that we see such a difference shows that the network indeed has a mesoscale organization in four communities.
The communities also have distinctive opinion values. Table S6 shows the average o value for the nodes in each community. Again, we can see noticeable differences, showing that indeed each community is distinct from the others.