Mapping nonlocal relationships between metadata and network structure with metadata-dependent encoding of random walks

Integrating structural information and metadata, such as gender, social status, or interests, enriches networks and enables a better understanding of the large-scale structure of complex systems. However, existing approaches to augment networks with metadata for community detection only consider immediately adjacent nodes and cannot exploit the nonlocal relationships between metadata and large-scale network structure present in many spatial and social systems. Here, we develop a flow-based community detection framework based on the map equation that integrates network information and metadata of distant nodes and reveals more complex relationships. We analyze social and spatial networks and find that our methodology can detect functional metadata-informed communities distinct from those derived solely from network information or metadata. For example, in a mobility network of London, we identify communities that reflect the heterogeneity of income distribution, and in a European power grid network, we identify communities that capture relationships between geography and energy prices beyond country borders.

Section S1: Map equation with metadata-dependent encoding in synthetic graphs In the main text we only show the partitions obtained for the Map Equation with metadata-dependent encoding in two cases. We provide in Fig. S1 further partitions in the intermediate regime in a network with p 1 = 0.6 and p 2 = 0.06 for p = 0.5. More concretely we show (a) c=1, (b) c=7, (c) c=8, (d) c=10 and (e) c=100. For completeness panel f has the corresponding alluvial plot. For visualization purposes Fig. 4 of the main text only focuses on the p = 0.5 case, yet we provide in Figure S2 the results for a wider range of p and c. Given the higher complexity of our framework, we focus in only two cases per network: one with a low (a,b) and another with high (c,d) reshuffling. The first notable difference with Fig. 4 of the main manuscript, is that in all cases our framework can recover r m = 1 with a smart choice of parameters. However, more interesting than the mere recovery on the extreme scenario, we can attain a far wider variety of partitions in-between. Our framework is also sensible to the network structure and the heterogeneity in the presence of metadata. Mixing together nodes with equal metadata in different communities is far easier (low values of c) when p 2 = 0.2 than when p 2 = 0.06. Similarly, we also capture the stronger separation between categories produced by more reshufflings.
Figure S1: Communities obtained with the map equation with metadata-dependent encoding in synthetic graphs. a-e Partitions according to the map equation with metadata-dependent encoding for a network of 240 nodes with probabilities p 1 = 0.6 and p 2 = 0.06. In all cases p = 0.5 an a c = 1, b c = 7, c c = 8, d c = 10, and e c = 100. The color of the nodes correspond to their community assignment and the shape to their metadata category.
The reshuffling of metadata occurs hitherto between nodes in neighboring communities (d c = 1), yet we could extend it to include larges distances and assess if our framework can detect non-local correlations in the presence of metadata. In Fig. S3, we display the values of r m as function of p and c in four networks with an equivalent amount of randomizations (n r = 112) but where the distance d c is set to either 1, 2 3 or 4. As it can be seen, different distances yield different patterns of values in the (p,c) space. When d c , it becomes more difficult to merge nodes of different categories into the same communities, or in other words, depending on the values raised by the combination of probabilities we can assess how are the metadata categories distributed. Mapping non-local relations between metadata and network structure with metadata-dependent encoding of random walks We investigate here if similar results can be recovered using more straightforward approach in which walkers move either through the real network or get teleported to another node with a class dependent probability. In detail, walkers have a probability p n of moving through the network and a probability 1 − p n of being teleported, if it is the latter case, the walker have a probability p c of moving to a node of the same category and 1 − p c of moving towards one of different category. In Fig. S4 we display the partitions obtained with four combinations of the probabilities p n and p c . When p n is high (a), we retrieve the topological communities while as p n decreases and p c is high, the nodes with the same metadata information are assigned to the same community (c). Finally, in the extreme case in which p n and p c are low, all the nodes are assigned to the same community (d). We further analyze in Fig. S5 how the mixing r m changes as a function of the probabilities p n and p c for high and low reshufflings in a similar fashion to Figs. S2 and S3. Instead of the gradual classification we obtain with the map equation with metadata-dependent encoding, there is a dichotomous-like behavior that goes from either a split of nodes according to the structural communities or to their metadata information. When p c is small we also observe that all the nodes fall within the same community. This behavior is likely caused by the fact that the teleporting only takes into account the metadata information regardless of the topological distance between nodes, with the exception of the first neighbors.

Section S3: Additional results for contact networks
We provide in Figs. S6 and S7 additional results of the communities detected for a workplace (InVS13) and a Hospital (LH10).
Mapping non-local relations between metadata and network structure with metadata-dependent encoding of random walks Mapping non-local relations between metadata and network structure with metadata-dependent encoding of random walks   Mapping non-local relations between metadata and network structure with metadata-dependent encoding of random walks SECTION S4: ADDITIONAL RESULTS FOR THE COMMUTING NETWORK OF LONDON Section S4: Additional results for the commuting network of London We provide in Supplementary Figs. S8, S9, S11, S12 and S10 additional results for the London commuting graph when classes are assigned according to unemployment, life expectancy, deprivation, fraction of white individuals and obesity respectively.

Section S5: Ethnic segregation in Detroit
In this section we investigate the emergence of different neighborhoods in Detroit according to our metadata-dependent encoding scheme in the adjacency graph of spatial cells where categories obey to ethnicities. The network employed connects two cells, in this census tract units, when they are adjacent. The ethnicity with a higher relative abundance, calculated as the ratio between the number of residents of an ethnicity α divided by the city average, is assigned to each cell. The results for p = 0.01 and different values of c Mapping non-local relations between metadata and network structure with metadata-dependent encoding of random walks

Section S6: Organization of activities in urban areas
The proposed methodology can help identify functional modules in spatial systems. Standard community detection algorithms often do not provide the desired results on spatially-embedded networks. The spatial constraints are too strong to allow communities whose nodes are too far apart from each other. However, metadata is often available in spatial networks, and taking this information into account when detecting modules is desirable in many concrete applications. Typical examples include analyzing spatial correlation in the distribution of certain commercial activities or identifying spatial segregation according to a specific socio-economic indicator. We consider a spatial data set constructed from the location-based social network Gowalla [33,34], which includes the location and type of millions of venues across the world. Whereas each venue has multiple classes organized hierarchically in this data set, we have only analyzed the main six categories: food, nightlife, outdoors, community, entertainment, and travel. The graph connecting the venues is spatial, there is a link between any pair of venues if the distance d ij separating them is lower than 2 km, and the weight of each link is given by log(1/d ij ) [32]. Figure S14 shows the partitions obtained on the network of commercial activities in Barcelona for p = 0.5 and c = 1 (a), c = 2 (b) and c = 1000 (c). For c = 1, the venues organize in spatial communities, determined solely by the relative distance among nodes. Some communities split already for c = 2, leading to a grouping of venues of the same type. Still, the more isolated spatial communities do not split until c = 1000, where most venues of the same category are clustered together. Whereas the results in Fig. S14 correspond to p = 0.5, by changing p we can also tune the typical size of the spatial communities, with higher values leading to smaller groups. For additional results using three other cities, see Figs. S7 and S8. We provide similar results on the spatial clusterisation for urban activities in Berlin (Fig. S15) and Prague (Fig. S16)  Mapping non-local relations between metadata and network structure with metadata-dependent encoding of random walks