A meta-analysis of Boolean network models reveals design principles of gene regulatory networks

Gene regulatory networks (GRNs) play a central role in cellular decision-making. Understanding their structure and how it impacts their dynamics constitutes thus a fundamental biological question. GRNs are frequently modeled as Boolean networks, which are intuitive, simple to describe, and can yield qualitative results even when data are sparse. We assembled the largest repository of expert-curated Boolean GRN models. A meta-analysis of this diverse set of models reveals several design principles. GRNs exhibit more canalization, redundancy, and stable dynamics than expected. Moreover, they are enriched for certain recurring network motifs. This raises the important question why evolution favors these design mechanisms.


Introduction
Gene regulatory networks (GRNs) describe how a collection of genes governs key processes within a cell. Understanding how GRNs perform particular functions and do so consistently despite ubiquitous perturbations constitutes a fundamental biological question (1). Over the last two decades, a variety of design principles of GRNs have been proposed and studied, with a focus on discovering causal links between network form and function.
GRNs have been shown to be enriched for certain sub-graphs with a specific structure, socalled network motifs, like feed-forward loops, feedback loops but also larger subcircuits (2)(3)(4)(5).
Theoretical studies of the dynamic properties of these motifs revealed specific functionalities (6,7). For example, coherent feed-forward loops can delay the activation or inhibition of a target gene, while incoherent ones can act as accelerators (8). Other hypothesized design principles include redundancy in the regulatory logic (9,10), and a high prevalence of canalization (11,12). Canalization, a concept originating from the study of embryonal development (13), refers to the ability of a GRN to maintain a stable phenotype despite ample genotypic as well as environmental variation.
Over the past decades, Boolean networks (reviewed in (14)) have become an increasingly popular modeling framework for the study of biological systems, as they are intuitive and simple to describe. When data is sparse, as is still often the case for less-studied organisms and processes, complicated models (e.g., continuous differential equation models), which harbor the potential for quantitative predictions, cannot be appropriately fitted to the data due to their high number of parameters (15). In this case, Boolean network models can often still yield qualitative results.
Static network models are comprised of (i) a set of considered nodes (genes, external parameters, etc.), and (ii) a wiring diagram (also known as dependency graph), which describes which node regulates which and often also contains information about the respective type of regulation (positive due to e.g. transcriptional activation vs negative due to f.e. inhibition). A dynamic Boolean network model possesses these same features but obtains its dynamics from an additional set of update rules (i.e., Boolean functions) that describe the regulatory logic governing the expression of each gene. Each gene is either on (i.e., high concentration, expressed) or off (i.e., low concentration, unexpressed) and time is discretized as well.
Large, genome-wide static transcriptional network models can be easily assembled from existing databases like TRANSFAC (16), JASPAR (17) or RegulonDB (18), by simply considering all known transcriptional regulations for a given species. However, information about the network topology alone provides only an incomplete understanding of a system, which is intrinsically dynamic. The formulation of dynamic models such as Boolean networks requires a careful calibration of the update rules by a subject expert. Therefore, all dynamic Boolean GRN models published thus far focus on specific biological processes of interest and contain only those genes involved in these processes (19). Moreover, most dynamic models have been published over the course of the last twelve years, as biological data needed for an accurate model description has become increasingly available. Over the course of the last few years, researchers have started to leverage the collection of these models to gain insights into specific aspects of GRNs such as the role of nonlinearity (20), canalization (21,22), or the connection between canalization and criticality (12,23,24).
Here, we describe a comprehensive meta-analysis of the largest repository of published, expert-curated Boolean GRN models assembled thus far. This provides a detailed understanding of the design principles of GRNs that are potentially conserved across organisms, and can help explain how GRNs operate smoothly and perform particular functions.

Results and discussion
Using the biomedical literature search engine Pubmed, we created a database of 163 Boolean GRN models. To avoid introducing bias into the meta-analysis, we only included expert-curated models where the nodes and the update rules were selected by hand and not by a prediction algorithm or where default choices like threshold rules were used throughout. We further included only one version of highly similar modelsThis led to the exclusion of 41 models (see Methods for details), resulting in a total of 122 models used in the meta-analysis, of which 61 are included in the Cell Collective (19) and 61 are not. The models describe the regulatory logic underlying a variety of processes in numerous species across multiple kingdoms of life (animals: 93, plants: 10, fungi: 9, bacteria: 9; Supplementary Dataset 1).
The models contain different types of nodes. Some nodes are unregulated (i.e., they do not receive incoming edges in the wiring diagram) and remain thus constant over time. We refer to these nodes as external parameters since they frequently represent abstract external conditions such as the temperature or pH level. Most other nodes represent genes. We therefore refer to all nodes that receive incoming edges in the wiring diagram as genes but acknowledge that this is a simplification as some regulated nodes also represent molecules or abstract phenotypes such as cell proliferation or apoptosis. The 122 investigated GRN models ranged in size from 3 to 302 genes (mean = 41.9, median = 23), and encompassed a total of 5112 genes as well as 742 external parameters (Fig. 1A). Some genes (as well as external parameters) appeared in multiple models (Supplementary Dataset 2), with AKT appearing the most frequently, in 33 models.
A majority of the investigated models (94 , 77%) contained external parameters. As expected, network models with more genes contained on average more external parameters (ρ Spearman = 0.51, Fig. 1B). On the other hand, the size of a network was slightly negatively correlated with the average connectivity, i.e., the average number of regulators per gene (ρ Spearman = −0.15,  When considering all update rules separately, we identified that the in-degree distribution resembled a Poisson distribution, while the out-degree distribution possessed a power-law tail ( Fig. 1D), as has been observed for many diverse types of networks (25,26), including the yeast transcriptional regulatory network (27). The tails of the two degree distributions differed significantly; we found many more instances of high out-degree versus high in-degree, highlighting the presence of key transcription factors that act as network hubs (28,29). where a gene's state is determined by one or only a few regulators (Fig. 1E), irrespective of the considered kingdom ( fig. S1).
Interestingly, we found that 215 of the 12514 regulators (1.7%) contained in the ensemble of Boolean update rules were non-essential. That is, these regulators appeared in the published rules but did not have any effect on the output. For example, the Boolean update rule (X AN D Y ) OR X simplifies to X; Y is therefore a non-essential regulator. The non-essential regulators were spread across 23 (18.9%) models and 120 (2.3%) update rules, i.e., some update rules contained more than one non-essential regulator. In one extreme case, an update rule with twelve different inputs simplifies to the Boolean zero function. Fig.Figure S2 shows the discrepancy between the number of inputs in the published update rules and the number of inputs that have an actual effect on the dynamics. In the rest of this paper, only essential regulators were considered.

Canalization
The concept of canalization, already introduced in the 1940s in the context of embryonal development (13), has been proposed as a possible explanation for the remarkable stability of GRNs in the face of ubiquitous perturbations (32,33). Accordingly, Boolean canalizing functions have been proposed as suitable update functions in Boolean GRN models (34). Recently, the class of canalizing functions has been further stratified and studied (35,36). Some smaller studies support the general hypothesis by revealing an overabundance of canalizing functions in GRN models (12, 37) but a rigorous, comprehensive analysis that considers various types of  canalization is still missing.
A canalizing function possesses at least one input variable such that, if this variable takes on a certain "canalizing" value, then the output value is already determined, regardless of the values of the remaining input variables. If this variable takes on another value, and there is a second variable with this same property, the function is 2-canalizing. If k variables follow this pattern, the function is k-canalizing (35), and the number of variables that follow this pattern is the canalizing depth of the function (38). If the canalizing depth equals the number of inputs (i.e., if all variables follow the described pattern), the function is also called a nested canalizing function.
To test the level of canalization in published GRN models, we stratified all 5112 update rules based on their number of essential inputs and their canalizing depth. The number of Boolean functions with a certain canalizing depth is known (35), and the fraction of random Boolean functions which are canalizing (i.e., those with canalizing depth ≥ 1) decreases exponentially as the number of inputs increases ( Fig. 2A). Most identified update rules, however, possessed a high canalizing depth, even rules with many inputs (Fig. 2B). 4827 out of the 5110 investigated update rules (94.4%) were even nested canalizing, meaning that all their variables become "eventually" canalizing (39). ization (40). Rather than focusing on single inputs that determine the output of a function regardless of the values of the remaining inputs, we studied the proportion of sets of inputs that have this canalizing ability. The recently introduced canalizing strength of a Boolean function summarizes this information in a single measure (41). By comparing the canalizing strength of all identified non-canalizing update rules with 3 to 6 inputs (i.e., those with canalizing depth 0) with random non-canalizing Boolean functions, we found that even those update rules, noncanalizing according to Kauffman's stringent definition of canalization (34), exhibited a higher level of collective canalization than expected (Fig. 2C). Published non-canalizing update rules also exhibited more than expected input redundancy (Fig. 2D), which is an alternative measure of collective canalization (21).

Redundancy
Genetic redundancy constitutes an important feature of gene regulation, as the presence of duplicate genes provides robustness against null mutations (9,10). We tested the level of redundancy

Feed-forward loops
Network motifs are sub-graphs with a specific structure that recur throughout a network and often carry out a certain function (3,4). Several network motifs are commonly found in large, static GRN models such as the transcriptional network of E. coli (2). One such motif is the  S6). All coherent FFL types, most involving two inhibitions, were almost as frequent as any incoherent FFL type, most of which contain only one inhibition. Moreover, the incoherent FFL with three inhibitory regulations (type 8) was more prevalent than two of the three incoherent FFL types with only one inhibitory regulation. It was even the only incoherent FFL, which appeared more frequently than expected.
As reported for the static GRN models of E. Coli and S. cerevisiae (6) and as expected by chance, the FFL with three activating edges (type 1) proved by far the most prevalent. to be the least abundant. This may be due to low sample sizes in the earlier publication, or due to genuine differences in genome-wide transcriptional networks versus dynamic GRN models, which focus on a relatively small subset of genes involved in a certain biological process of interest. To explain all these observed differences, theoretical studies similar to (8, 42) may be needed, which focus on the functions of the different types of FFLs in dynamic GRN models.
The target gene in a FFL is regulated by both the master regulator and the intermediate regulator.
To test if one of these two regulations is generally more important, we compared their edge effectiveness, which captures the extent to which a given input (i.e., an edge) is on average necessary to determine the value of a Boolean function (21); an important input has high edge effectiveness. As inputs to functions with more variables generally have lower edge effectiveness, we stratified the analysis by the essential in-degree k of the target gene. Albeit weakly significant but opposing differences for k = 3 (two-tailed Wilcoxon signed-rank test; p=0.003) and k = 4 (p=0.004), we did not find any support for the hypothesis that either the master regulator or the intermediate regulator in a FFL is generally more important.
We further investigated the occurrence of clusters of FFLs, that is, two FFLs that share at least one node. As with single FFLs, we can distinguish different types of FFL clusters based on the distribution of activating and inhibiting edges in the motif ( We identified a total of 101832 FFL clusters in the 122 GRN models (Supplementary Dataset 4). As with the single FFL motifs, we stratified the number of occurrences by type (Fig. 5) and additionally by model ( fig. S7). As expected, we found most FFL clusters to involve 5 genes (79115, 77.7%), followed by 4 (21168, 20.8%) and by 3 genes (1549, 1.5%). As in the transcriptional networks of E. coli and S. cerevisiae (43), type 6 was the most abundant. This type of FFL cluster features a master regulator involved in both FFLs and its abundance is likely due to the known presence of transcription factor hubs, which was also observed in this metaanalysis (Fig. 1D). Type 11 was the most abundant among all FFL clusters involving 4 genes.  This is surprising since transcriptional networks of E. coli and S. cerevisiae contained almost exclusively type 12 and hardly any of the other 4-gene FFL clusters (43). An explanation for these discrepancies likely requires novel theoretical or computational studies that relate motif structure to motif function. For all different lengths, positive FBLs appeared slightly more frequently than expected (Fig. 6A).

Feedback loops
We also observed more self-reinforcing than self-inhibitory regulations (1-loops) than expected.
On the other hand, more complex FBLs containing two or more genes were enriched for inhibitory regulations. To enable an unbiased comparison, we considered specifically complex loops of the same type (positive or negative), with the same number of genes and the same number of combinatorially expected occurrences (that is, 4-loops with 4 versus none or 3 versus 1 inhibitory regulations, or 6-loops with 6 versus none, 5 versus 1 or 4 versus 2 inhibitory regulations). All five comparisons confirmed a surprising overabundance of negative regulations in the observed FBLs (Fig. 6B). Notably, the differences between observed and expected relative abundances were consistently smaller (but still substantial) when considering null model 2.
This aligns with our finding that most SCCs that contain many FBLs have a lower proportion of activating edges than the full network ( Supplementary Datasets 1 and 5). Due to insufficient numbers of FBLs in non-animal GRN models, we were unable to assess the potential for kingdom-specific differences in the prevalence of specific types of FBLs (figs. S9,S10). which depends on the number of activating versus inhibitory edges in the FBL, the observed relative abundance of FBLs with more activating versus more inhibitory edges is compared to the respective expected relative abundance, which is based on the same two null models as in A.

Criticality
Gene regulation is a highly stochastic process due to e.g. low copy numbers of expressed molecules, random transitions between chromatin states, and extrinsic environmental perturbations (46,47). While some bacteria rely on noise in gene regulation to successfully mitigate risk through bet-hedging strategies (48) and if it remains, on average, of the similar size, the system exhibits criticality. Many biological systems, modeled using Boolean networks, operate in the critical regime (12,53).
For a synchronously updated Boolean network with N nodes, the Derrida value for a single perturbation is simply the mean average sensitivity fig. S11). The eight models with the lowest mean average sensitivity (≤ 0.9) were all completely governed by NCFs, while the five models with the highest mean average sensitivity (≥ 1.17) were among the models containing the lowest proportion of NCFs (Fig. 7A).
This led us to investigate the relative frequency of different NCFs in the published models.
Any nonzero Boolean function has a unique standard monomial form, in which all variables are distributed into canalizing layers of importance and a non-canalizing core (35,55). NCFs are specifically those Boolean functions where the core is empty, i.e., where all variables become eventually canalizing and possess a hierarchical importance order. To understand why NCFs appear frequently in GRNs, consider as an example a typical situation in gene regulation: two proteins X and Y can each independently initiate the transcription of a gene, as long as a repressor Z is not present to block the recruitment of RNA polymerase. The regulation of the gene in Boolean logic is best described by the NCF (X OR Y) AND NOT Z, which has two layers of importance, with Z being most important. NCFs with the same layer structure (i.e., with the same number of variables in each canalizing layer) have the same average sensitivity (36,56).
For a given number of variables k ≥ 2, there exists a bijection between p(1 − p) and the layer structure of an NCF, and there are 2 k−2 NCFs with different layer structure, with each layer structure appearing equally likely by chance. Surprisingly, we discovered a very non-equal occurrence among the NCFs in the published models (table S2). Partially in line with the findings of high redundancy, NCFs with fewer layers appeared more frequently (Fig. 7B). The observed NCFs also exhibited lower than expected mean average sensitivity (Fig. 8), and the higher the number of variables the lower was the observed mean average sensitivity. These findings suggest that biological networks are enriched for NCFs that induce stable dynamics as a means to counter-balance some less canalizing and more sensitive functions. While earlier studies suggested GRNs manage to operate in the critical regime due to the abundance of canalizing update rules (12), our results provide a more detailed understanding of this process, by pointing to NCFs with specific dynamic features as stabilizers of GRNs.

Conclusion
Gene expression constitutes the most fundamental process in which genotype determines phenotype. A detailed understanding of the design principles that regulate this process is therefore of great importance. We utilized combined knowledge from numerous experts in their respective fields to perform a meta-analysis of published gene regulatory networks. Boolean networks constituted the perfect modeling framework for this kind of analysis due to their simplicity, easy comparability, and widespread use. A large literature search yielded the most extensive database of expert-curated Boolean gene regulatory network models thus far, which may be queried to generate and test various types of hypotheses.
We highlighted the usefulness of this resource by focusing on several design principles of GRNs. We confirmed that the regulatory logic is not random but highly canalized. Using a broader definition of canalization, we showed that even regulatory interactions that were not considered canalizing in previous analyses, exhibited a high level of canalization. Canalization and genetic redundancy are two correlated concepts; GRNs proved to be independently enriched for both. We further studied the presence of small network motifs and discovered various types of motifs that were vastly more or less abundant than expected by chance. Finally, we provided strong evidence for the hypothesis that all GRNs operate dynamically close to the edge of order and chaos due to a tradeoff between stability and adaptability. The abundance of nested canalizing update rules, specifically NCFs that are insensitive to perturbations, appeared to maintain critical dynamics for more densely connected GRNs.
The described analysis suffers from several obvious limitations. First, not all biological phenomena can be accurately described in simple Boolean logic. There are a variety of published models that allow for more than two states. A similar analysis of more general models might provide more detailed insights into gene regulation but will itself suffer from the increased complexity of describing the studied concepts in the non-Boolean case. Second, there exists no feasible way to test the representativeness or completeness of our generated database of Boolean models. Even if a complete database of all published Boolean network models existed, the results would still be biased as some processes and species (e.g., model organisms) receive more attention and are modeled more frequently than others. Third, design principles of GRNs will likely differ among kingdoms of life or even among lower taxonomic levels.
We therefore stratified the main analyses, wherever feasible, by kingdom. Since most of the published Boolean models and especially the large ones describe GRNs in animals, this metaanalysis lacks the statistical power to identify potential differences in design principles between kingdoms. In light of this, the identified design principles should primarily be understood as features of animal GRNs. A last limitation lies in the study design itself. Since we analyze expert-curated Boolean GRN models, it is impossible to rule out the introduction of bias by the experts who built the models. Many of the trends and properties we identified are highly significant and consistent, which means they likely reflect true biological qualities of regulatory networks. However, to know for sure, future research is needed. Since one of the main goals of synthetic biology is to generate complex networks with programmable functionality, synthetic biologists could, for example, engineer and study gene circuits that feature specific design principles suggested here. In addition, in silico experiments could clarify if and how the suggested design principles are advantageous for GRNs.

Database creation
Aiming to identify all published Boolean network models of GRNs, we developed an algorithm that parses all of the more than 30 million abstracts indexed in the literature search engine Pubmed and used keywords to rank the abstracts based on how likely they were to contain a Boolean network model. To identify the keywords, we relied on the Cell Collective, a preexisting repository of Boolean network models, which, at the time of access, contained 78 Boolean network models published in 65 distinct papers (19). The abstracts of these 65 papers served as a training set for the identification of keywords indicative of the presence of a Boolean network model. We considered as possible indicators (i) any word that occurred in at least two Cell Collective abstracts and was not among the most common 3000 words found in an English dictionary, (ii) all fixed combinations of two and three non-common words like "logical modeling" or "Boolean network model", and (iii) all co-occurrences of two or three single non-common words in the same abstract, e.g. the co-occurrence of the words "logical", "regulatory" and "modelling" in an abstract, not necessarily in the same fixed order. While the use of an automatic British English to American English conversion tool may have helped to limit the number of indicators, we chose to treat words that are spelled differently in British and American English as two separate words. For any possible indicator, we calculated a quality score as the ratio of the number of Cell Collective abstracts in which it occurred over the total number of Pubmed abstracts containing this indicator. This procedure resulted in 1297 publications with at least one indicator with a quality score of 5% or greater. We then manually investigated these 1297 publications to decide if they indeed contained a GRN model. During the manual review, an additional 369 referenced publications were investigated, as they were manually deemed to be of potential interest despite lacking an indicator with quality score ≥ 5%, resulting in a total of 1666 reviewed publications.

Model exclusion
To avoid the introduction of various of kinds of bias into the analysis, we used the following strict criteria for the inclusion of models.
1. We excluded models where the update rules were solely generated using an inference method or where default updates like threshold rules were consistently used. Our goal was to include only models where the update rules were built based on biological expertise and knowledge gained from appropriate experiments.
2. In addition, identical models that were presented in multiple publications were only in-cluded once, and we aimed to include the earliest publication that initially presented the model. In total, 165 models passed this step and were extracted as described in the next subsection.
3. An automated quality check ensured that highly similar models were only included once in the analysis. The overlap index, also known as Szymkiewicz-Simpson coefficient, measures the overlap between two sets A and B and is defined as |A∩B|/ min(|A|, |B|) ∈ 4. Finally, we manually investigated the overlap between all models stemming from the same publication. For one publication, we removed two additional models as a third, included model from this publication was the combination of the two excluded models (58).
Three other publications also contained more than one model. All these models were substantially different, as they described different GRNs or pathways with low overlap between the variables (59-61).

Model extraction and standardization
Boolean network models are presented in various formats in the literature. Using customized Python scripts, we extracted all published Boolean network models that were not excluded (see

Meta-analysis
All analyses were performed in Python 3.10 using the libraries numpy, scipy, networkx, cana, matplotlib and itertools. In particular, we wrote a Python script, available at https://github.com/ckadelka/DesignPr which takes as input a Boolean model, described in standardized format, and returns, among other things, an adjacency matrix of the wiring diagram of the model, as well as completely evaluated update rules. That is, each update rule of k inputs is represented as a vector of length 2 k , which together with the wiring diagram enables all presented analyses.
For computational reasons, we restricted most analyses to update rules with 20 or fewer inputs. The two models that each contained a single rule with more inputs (GLI1 in the hedgehog signaling pathway (62) is regulated by 24 inputs, while Shc in a multi-scale model of ErbB receptor signal transduction (63) is even regulated by 27 inputs) were excluded from the network motif and criticality analyses, as the specific types of regulation (activation, inhibition, conditional) and number of essential inputs could not be determined for rules with so many inputs.
When f C is not canalizing, then the integer k is the canalizing depth of f (38). If k = n (i.e., if all variables are become eventually canalizing), then f is a nested canalizing function (NCF) (64). By (35), every nonzero Boolean function f (x 1 , . . . , x n ) can be uniquely written as is a non-constant extended monomial, p C is the core polynomial of f , and k = r i=1 k i is the canalizing depth. Each x i appears in exactly one of {M 1 , . . . , M r , p C }. The layer structure of f is the vector (k 1 , k 2 , . . . , k r ) and describes the number of variables in each layer M i (36,39).
More recently, canalization has been considered as a property of the Boolean function, rather than on the variable level (40). In (21), canalization is equated to input redundancy, enabling the definition of variable/edge-and function/node-level properties, used in this study, such as the edge effectiveness and the effective connectivity. The canalizing strength constitutes an alternative approach to measure canalization on the function level (41). This approach generalizes Kauffman's original definition of canalization more closely. For brevity, we refer the interested reader to these papers for details.

Expected number of loops
The likelihood of a specific FFL or FBL type depends on the ratio of positive versus negative edges. Due to substantial variation of this ratio across models (Supplementary Dataset 1), we computed the expected distribution of specific FFL and FBL types separately for each model. To compute the expected number of different FFLs in model i, let n i and n t i denote the total number of FFLs and the total number of FFLs of type t, respectively. To create a null expectation, we assume that each edge is activating with probability p i and inhibitory with probability 1 − p i . Then, where a(t) ∈ {0, 1, 2, 3} denotes the number of activating edges in FFLs of type t. The expected number of FFLs of type t across all models is simply the sum of all model-specific expected numbers. This is null model 1.
Null model 1 can also be used to compute the expected number of different FBLs. Let n k i and n k,j i denote the total number of k-loops and the total number of k-loops containing exactly j inhibitory edges, respectively. Then, As before, the expected number of k-loops containing exactly j inhibitory edges across all models is the sum of all model-specific expected numbers.

Dynamical robustness
As an indicator of the dynamical robustness of a Boolean network F , we computed the mean average sensitivity s, which describes the average size of an initial perturbation of size 1 after each gene has been synchronously updated once. That is,

Supplementary Material
A B C D Figure S1: Prevalence of different types of regulation per kingdom. The prevalence of each type of regulation (activation: blue, inhibition: orange, conditional: gray) is shown. The analysis is restricted to rules with 1-7 essential inputs from Boolean GRN models of (A) animals,  Figure S4: Prevalence of redundancy. (A) Stratification of all identified update rules based on the number of essential inputs (rows) and the redundancy, measured by the number of symmetry groups (columns). Update rules with more than ten essential inputs were omitted. (B-C) Expected distribution of the number of symmetry groups for random Boolean functions with 1-10 essential inputs. For each row, 1000 random, non-generated functions were generated. In (C), the distribution of the canalizing depth of the random, non-degenerated Boolean functions was matched to the one observed for the GRN models, shown in Fig Figure S9: Stratification of all observed feedback loops per kingdom. All FBLs found in GRN models of (A) animals, (B) bacteria, (C) fungi, (D) plants are stratified based on the number of involved genes (x-axis) and the number of activating versus inhibitory edges they contain (color). Positive FBLs are blue, while negative FBLs are red. FBLs that contain conditional regulations are excluded. Each observed distribution (the rightmost of three bars with solid border) is compared to the expected distribution (left and middle bars with dashed and dotted borders), which is computed using two different null models (see Methods for details). n = total number of observed FBLs of a given length. The corresponding inter-kingdom analysis is shown in Fig. 6A.   ). Stratification of all identified update rules based on the number of essential inputs (rows) and the canalizing depth (columns). The color gradient is computed separately for each row.  Table S2: Observed number of NCFs with 3-6 inputs, stratified by layer structure. The number of NCFs with a given number of inputs (3-6) and a given canalizing layer structure is shown. NCFs with the same layer structure have the same dynamical properties. All layer structures appear equally likely by chance. Green (orange) count data indicates more (fewer) than expected NCFs with a given layer structure. K e describes the effective connectivity and p the output bias of the NCF.