Proper network randomization is key to assessing social balance

Social ties, either positive or negative, lead to signed network patterns, the subject of balance theory. For example, strong balance introduces cycles with even numbers of negative edges. The statistical significance of such patterns is routinely assessed by comparisons to null models. Yet, results in signed networks remain controversial. Here, we show that even if a network exhibits strong balance by construction, current null models can fail to identify it. Our results indicate that matching the signed degree preferences of the nodes is a critical step and so is the preservation of network topology in the null model. As a solution, we propose the STP null model, which integrates both constraints within a maximum entropy framework. STP randomization leads to qualitatively different results, with most social networks consistently demonstrating strong balance in three- and four-node patterns. On the basis our results, we present a potential wiring mechanism behind the observed signed patterns and outline further applications of STP randomization.


Introduction
Individuals within society can be viewed as nodes in a social network, with edges representing various relationships between them.These relationships are highly diverse in nature and can be often expressed in either positive (friend/trust) or negative (foe/distrust) terms 1 , leading to signed social networks, with varying degrees of polarization 2 .Quantifying the wiring patterns is the first step towards understanding why certain connections are formed and not others, i.e., the underlying wiring mechanisms, as well as towards understanding and potentially reducing polarization in social media [2][3][4][5][6] .As a key concept, network graphlets (and motifs) 7;8 are fundamental patterns of connections that occur significantly more frequently than in a null model, which is a suitably randomized version of the empirical data 7 .Graphlets, also known as induced subgraphs 9 , specify the existence and sign of each link within a subset of nodes.In contrast, motifs (or non-induced subgraphs) 7 , specify only the required links, allowing for the presence or absence of other links.For instance, in an undirected signed network, a graphlet consisting of three nodes connected by two edges indicates the absence of the third link, while a motif encompasses instances with or without the third link.Note that any fully connected graphlet can also be referred to as a motif.
Fully connected 'triangle' graphlets of three nodes are particularly informative on tie formation mechanisms between acquaintances of the same node.As a starting point, strong structural balance (SB) 15 captures the intuitive notions of "the friend of my friend is my friend", "the enemy of my friend is my enemy", and "the enemy of my enemy is my friend".All these examples correspond to balanced cycles (a path that starts and ends at the same node, without revisiting any other nodes in between) of length three, where the product of edge signs along the cycle in the network is positive.This expectation has been extended to cycles of any length, stating that a network is maximally balanced if all cycles are balanced 15 , including cycles of length four, corresponding to 'square' graphlets.In practice, there are often deviations from maximal balance, requiring the statistical analyses of enrichment of the studied patterns versus a null model.Although it is generally believed that social networks tend to be in somewhat balanced states 16 , the conclusions strongly vary depending on the studied datasets and the chosen null model [17][18][19][20] .
As a basic example, the 'rewire' null model 18 swaps edges between nodes while preserving the node degree (k, number of neighbors), leading to networks with disrupted topology.Hence, the conclusions based on the rewire null model mix the pattern formation mechanisms arising from edge signs with those of purely topological origin.A more commonly used approach is the 'sign shuffle' null model 17 .In this null model, the topology of the network and the total number of positive and negative edges are preserved, while the sign of each edge is randomly assigned according to the observed fraction of positive edges in the input network.Note that this method has the limitation that all nodes are assumed to have the same ratio of positive edges.As illustrated in Fig. 1, this assumption is far from reality.Indeed, in real-life networks, some nodes are more 'friendly' than others, i.e., holding mostly positive edges.
In this paper, we show that the current discrepancies arise from the fact that an adequate null model needs to preserve both the network topology and the expected signed degree of each node, while state-of-the-art null models can only preserve one of these constraints.As a solution, we propose an alternative network ensemble, a Signed degree and Topology Preserving (STP) null model based on the maximum entropy framework (see Methods), that simultaneously preserves the network topology as well as the mean positive and negative node degrees.
We examine the impact of network randomization on a collection of signed social networks covering datasets of various scales, including (i) Slashdot -a friend/foe network in the technological news site Slashdot 1 ; (ii) Congress -a political network where signed edges represent (un/)favorable interactions between U.S. congresspeople on the House floor in 2005 2 ; (iii) Bitcoin-Alpha -a trust/distrust network of Bitcoin traders on the platform Bitcoin Alpha 21 ; and (iv) Epinions: a trust/distrust network among users of the product review site Epinions 1 .For an overview of the fundamental properties of these datasets, see Table 1.As a key observation, positive (k + ) and negative (k − ) node degrees have a low correlation in all four networks (Fig. 1), indicating that null models that do not consider the signed degree as a confounding factor may lead to biased results.On these examples, we show that the STP null model changes the results qualitatively, leading to a consistent interpretation of patterns in signed social networks.We conclude by discussing potential underlying pattern formation mechanisms behind our observations, as well as further applications and extensions of STP randomization.

Signed null models
To investigate how the topology and signed degree affect the graphlet statistics, we consider four null models for signed networks, see Fig. 2. In addition to the commonly used rewire and sign shuffle null models and our STP null model, we also consider the 'signed rewire' null model 22 .The signed rewire null model preserves the signed degree of each node by rewiring the positive and negative subgraphs separately.As a result, the topology is not preserved and there can be edges in the signed rewire null model that are both negative and positive (see Methods).In Fig. 2, we illustrate the studied null models on a toy network satisfying SB.This toy network contains two groups of nodes (indicated by different node colors), with positive edges among group members and negative edges between the groups 23 .Note that some nodes are more friendly (like node 0) or more aggressive (like node 1), i.e., have a higher fraction of positive or negative edges than others.

Signed triangle patterns
To test social balance in real networks, we first consider signed triangles as illustrated in Fig. 3A for the Slashdot network.Each triangle graphlet (or equivalently motif) is counted and compared to the expected number of such triangles in four different null models.The fold change is calculated as indicating the relative graphlet frequency (n obs ) in the original network compared to the frequency in each randomized version (< n rand >) in Fig. 3.As a standard measure of statistical significance, we calculate the zscore as where σ rand denotes the standard deviation of n rand .In the Slashdot network, both the rewire and sign shuffle null models would claim that only ++− triangles are underrepresented.This conclusion aligns with the notion of weak structural balance (WB) 1;24;25 , which alleviates the structural balance so that triangles with exactly one negative edge should be underrepresented.In other words,  The same conclusion is observed consistently in the other studied networks as well, when both the signed degrees and the topology is conserved, meaning that signed social networks are strongly balanced at the triangle level according to STP randomization (Fig. 4).No consistent conclusions can be drawn from the rewire and signed rewire null models, most likely due to the fact that the topology is disrupted in these cases.The sign shuffle null model appears to be consistent with WB in all real networks.However, as a clear shortcoming, the sign shuffle and rewire null models fail to detect SB in our model network explicitly built as an SB reference 26 (see Meth-ods).To sum, the known null models lead to conflicting and inadequate conclusions at the triangle level.This incoherence is further amplified when analyzing four-node graphlets, as discussed next.

Signed square patterns
We start the analyses of four-node graphlets by investigating six cases of square graphlets (Fig. 3B).In the Slashdot network, STP randomization again indicates consistency with SB, while the conclusions of other null models would differ qualitatively.However, understanding SB at the square level requires further efforts.Squares are traditionally less investigated than triangles for multiple reasons.In addition to the increased computational complexity, it appears less realistic to assume that individuals know their social networks at the square level 23 .Furthermore, comparing the observed frequen- cies of signed squares to the current null models often leads to inconsistent conclusions (Fig. 4).Indeed, all square graphlets can be either significantly over-or underrepresented depending on the choice of network data and the null model.Existing null models again fail to detect SB at the square level even in the SB reference network.In contrast, when compared to the STP null model, all unbalanced squares are consistently underrepresented, while most balanced squares, with the exception of the − − − − graphlet, are overrepresented.While the − − − − graphlet appears to be underrepresented, it is just barely so at these network sizes, as even the strongest signal in Bitcoin-Alpha is just z ∼ 2. Overall, the square graphlet results with STP randomization indicate that social networks are compatible with SB, with the potential exception of the − − − − graphlet.

Potential mechanisms behind signed patterns
Next, we show that the consistent results obtained by using the STP null model enable the formulation of hypotheses for potential wiring mechanisms.Previous studies attempted to explain square graphlet statistics by a node copying mechanism 27 .We first generalize the node copying mechanism to signed networks, where a new node can replicate both the connections of existing nodes in the network and the corresponding edge signs.This mechanism contributes to the formation of balanced squares, as illustrated in Fig. 5A.However, our results in Fig. 4 show that the square + − + − graphlet is also overrepresented in all studied networks, suggesting that this simple node copying mechanism is not sufficient to explain the full extent of our observations.Moreover, signed triangle graphlets are not necessarily explained by a node copying mechanism, expected to depend also on the initial conditions.
Thus, as a potential solution, we propose an edge copying mechanism as shown in Fig. 5B.As a basic step, a node can copy one of its friend's connections.In other words, nodes connected by positive edges are assumed to copy each other's attitude towards other nodes.Conversely, negatively linked nodes replicate the edges of their foes while reversing the sign.The process of edge copying initially results in the formation of balanced triangles and eventually leads to larger balanced graphlets, such as 'squareZ' graphlets, which refer to squares with an additional edge.Eventually, this process leads to the formation of fully connected four-node subgraphs, referred to as 'squareX' graphlets/motifs, when other nodes also copy edges in the same subnetwork.
The proposed edge copying mechanism is compatible with the squareZ graphlet results shown in Fig. 4. Comparing the observed squareZ statistics to the STP null model, all datasets consistently agree with the SB expectation, while there is no consistent conclusion from other null models (Fig. 4).In addition to squareZ, we have also considered the fully connected four-node squareX  graphlet/motifs as shown in Fig. S1.When compared to the STP null model, balanced (unbalanced) squareXs are again overrepresented (underrepresented), a pattern that is not observed with other null models.We also implemented the edge copying mechanism to generate an edge copying reference network, EC (see Methods).Note that with the chosen initial conditions, EC also satisfies SB.Yet, in general, the edge copying mechanism does not necessarily lead to the formation of balanced square graphlets.This finding is actually in line with our observations, as the statistics of − − − − appears to vary across datasets.Yet, our analysis indicates that − − − − square motifs with one or two additional positive edges are overrepresented, as suggested by the edge copying mechanism.
The proposed edge copying mechanism on its own leaves the question of balanced squares open.To potentially elucidate the presence of mostly balanced squares in the studied datasets, we also explore the implications of edge copying between disconnected nodes.In this scenario, a node may connect to the neighbors of another (unrelated) node with or without reversing the edge signs.Irrespectively of the frequency of edge reversals in this step, it will lead to balanced squares of the form + + − − .Events with no sign reversal also contribute to + + + + and − − − − , in addition to + + − − .Notably, as long as some sign reversal occurs, it leads to the enrichment of + − + − graphlets, contrary to the above-discussed node copying mechanism.Note that this particular scenario of edge copying is more feasible when a node can access information on the signed edges of other unrelated nodes.Intriguingly, this can potentially explain why the square graphlet results depend on the datasets.For example, in Slashdot, one can easily know others' friends and foes without establishing a direct connection, potentially resulting in more balanced squares.Under other circumstances, individuals may have access to a strangers' friend lists but may have limited access to a strangers' blacklists (or foes) 28 , potentially leading to the underrepresentation of − − − − squares.

Discussion
Graphlet statistics provides key insight into mechanisms of network wiring and function.However, it is important to interpret the results in the context of an adequate null model.Up until now, signed network null models had a crucial shortcoming as they either ignored the signed degree preferences of the nodes or the network topology.In this study, we proposed the STP null model that preserves both the signed degrees and the network topology.We found that the STP null model provides more consistent results across signed social networks than previous methods, favoring SB at both the level of triangles and four-node graphlets, with the potential exception of the − − − − graphlets.Note that the analysis of various mo-tifs (instead of graphlets) arrives at the same qualitative conclusions (Fig. S3).
We also proposed a signed edge copying mechanism that results in the formation of balanced triangles, squareZ and squareX graphlets, in line with our observations.Note that edge copying provides a simple, yet plausible, example of forming balanced four-node patterns, questioning the current paradigm that ignores four-node mechanisms 23 .Even so, without following the dynamics of these networks, we cannot fully conclude that edge copying is actually at play in these networks.This a point that needs to be further explored in future work.
STP randomization has widespread potential applications and extensions.To start, it can provide a more adequate alternative null model to quantify balance 19;23;29-31 or measure polarization [2][3][4][5] in social networks.STP randomization can also be used for anomaly detection 23 , as well as to infer the existence and sign of future or missing connections [32][33][34][35] , and to perform signed graph embeddings [36][37][38][39][40] .In addition, the STP null model can be extended to directed and weighted networks 41 , with the potential to contrast large-scale data against alternatives to SB, such as status theory 1 .

Signed social network datasets
The three large signed social networks analyzed in this study were downloaded from the Stanford Network Analysis Platform (http://snap.stanford.edu/):(i) Bitcoin-Alpha, the trust/distrust network among people who trade Bitcoin on a platform called Bitcoin Alpha; (ii) Slashdot, friend/foes network of the technological news site Slashdot released on February 2009; (iii) Epinions, who-trust-whom online social network of a general consumer review site Epinions.The smaller scale Congress network is from ref. 2 .More details of the construction of the datasets can be found in ref. 1;21 .Network edges are considered to be undirected.This process leads to only a very limited number of edge sign inconsistencies.Such inconsistent edges are disregarded in our analysis, together with any self-loops 29 .Only the largest connected component of each network is considered.

Construction of SB reference network
We construct an SB reference network according to Harary's theorem of balance 26 .The 3000 nodes in the SB reference network are first divided into two equal groups.We then generate two degree sequences according to power-law degree distributions with exponent = 2 and exponent = 3 for positive and negative degree sequences, respectively.The negative degree sequence is used to generate negative edges between members of different groups and the positive degree sequence is used to generate positive edges between members of the same groups.For the positive degree sequence, we swap each degree with another randomly picked degree in the sequence with probability = 0.2 to deliberately make positive degrees and negative degrees less correlated (r = 0.56).The resulting SB reference network has comparable density and positive edge ratios to real-life social networks as shown in Table 1.

Construction of the EC network
We use the edge copying mechanism to construct EC networks.We initialized the construction process with a balanced + + + triangle and subsequently added nodes to the network.Each new node connects to a randomly chosen node 27 , with the sign determined by a parameter q, which defines the probability of a positive edge.Additionally, each new node attempts to connect to the neighbors of the selected node with a probability p.If the new node is connected to the selected node with a positive (negative) edge, the new node copies (reverses) the sign between the randomly chosen node and its neighbor.We used the parameter values p = 0.3, q = 0.85.The positive and negative degrees follow the power-law degree distribution shown in Fig. S2.

Existing Null Models
In the rewire null model, we randomly pick two edges A → B and C → D and try to swap the edges as A → D and C → B. Such an attempt is aborted when the resulting edges form self-loops (A → A) or existing edges.To achieve sufficient network randomization, we perform 4E edge swap attempts, where E represents the number of edges in the network.In the signed rewire null model, we use the same method as in the rewire null model but only swap edges if they have the same sign, thus preserving the signed degrees of each node.This may lead to multi-edges with both positive and negative signs.In such cases, we randomly assign a sign to the edge.In the sign shuffle null model, we randomly and independently assign positive or negative signs to each edge, while preserving the total number of positive and negative edges.

STP Null Model
A signed network G, is first divided into two subnetworks, namely the positive (negative) subnetwork G p (G n ) that includes all the positive (negative) edges in G.We then randomly select a subnetwork G pr from G while keeping the node degrees of G pr the same as G p on average.The remaining network is considered as the randomized negative subnetwork G nr .The required constrained randomization is achieved using the maximum entropy approach 42;43 .To construct G pr , we fix its average node degree as the original degree in G p as < k i > Gpr = k i (Gp).The resulting probability of selecting an existing edge in G to be part of the subnetwork between nodes i and j is given by p ij = 1/(1 + α i α j ), where α i are found iteratively as The initial condition is simply α (0) i ≡ 1 and we stop the iteration when the maximum relative change of α i is less than 10 −3 between two consecutive iterations.

Figure 1 :
Figure 1: Signed degree inconsistency.Positive (k + ) and negative degrees (k − ) in social networks are poorly correlated.The dashed black line indicates a perfect correlation between k + and k − .The r values denote the Pearson correlation coefficient between k + and k − of each dataset.

Figure 2 :
Figure 2: Overview of signed null models.The original network that contains two groups of nodes (yellow and grey) is shown in the middle.Positive edges are shown in blue, while negatives in red.Thicker lines indicate edges that are different from the original network.

Figure 3 :
Figure 3: Signed graphlets in the Slashdot network compared to different null models.(A) Triangles.(B) Squares.The log 2 (fold change) is shown on the top accompanied by the grey dashed line indicating a 2-fold increase or decrease.z-scores are shown at the bottom, in white if matching SB expectations, and black otherwise.The background of the z-scores is blue for positive values and red for negative values.

Figure 4 :
Figure 4: Overview of graphlet significance in the studied networks.The z-scores are indicated by blue (overrepresented) and red (underrepresented) blocks.We list the balanced graphlets first, separated from the unbalanced graphlets by a yellow line.We leave the block white if n obs = σ rand = 0 as it leads to an undetermined z-score.

Figure 5 :
Figure5: Illustration of signed copying mechanisms.(A) Signed node copying.When node 4 is added to the existing network, it copies the edges and signs of node 1, forming squares.Note that graphlet 7 (+ − + − ) can not be generated this way.(B) Signed edge copying.When node 4 is connected to node 1 by a positive edge, it may copy node 1's edges and signs, forming triangles, squareZs and squareXs.When node 4 is connected to node 1 by a negative edge, it may copy node 1's edges but will reverse the signs.The edge copying mechanism forms balanced triangles, eventually leading to larger balanced graphlets.The initial edges are indicated by solid lines and the copied edges are indicated by dotted lines.The resulting graphlets are indicated by the indices within the green boxes, following the notation of Fig.4and Fig.S1.The purple arrows point from the node being copied to the node that is copying.

Table 1 :
Basic characteristics of the studied signed network datasets