Cognitive performance is linked to fitness in a wild primate

Cognitive performance varies widely across animal species, but the processes underlying cognitive evolution remain poorly known. For cognitive abilities to evolve, performance must be linked to individual fitness benefits, but these links have been rarely studied in primates even though they exceed most other mammals in these traits. We subjected 198 wild gray mouse lemurs to four cognitive and two personality tests and subsequently monitored their survival in a mark-recapture study. Our study revealed that survival was predicted by individual variation in cognitive performance as well as body mass and exploration. Because cognitive performance covaried negatively with exploration, individuals gathering more accurate information enjoyed better cognitive performance and lived longer, but so did heavier and more explorative individuals. These effects may reflect a speed-accuracy trade-off, with alternative strategies yielding similar overall fitness. The observed intraspecific variation in selective benefits of cognitive performance, if heritable, can provide the basis for the evolution of cognitive abilities in members of our lineage.


INTRODUCTION
Cognitive abilities vary considerably both within and among animal species (1)(2)(3). They guide behavioral decisions in many fitness-relevant contexts, such as homing, habitat and food selection, predator avoidance, mate choice, parental care, and the navigation of complex social challenges (4)(5)(6)(7). Understanding the relationship between cognitive performance and fitness can help understand the factors surrounding cognitive evolution. Recent approaches toward explaining interspecific variation in cognitive abilities or the underlying relative brain size have traditionally taken a comparative approach to assess the potential role of ecological and social factors (8)(9)(10), but such studies can reveal neither how minds actually work (11) nor the evolutionary processes that shape intraspecific variation in cognitive abilities (12). Studies of interindividual variation in cognitive abilities of wild animals that show how they are associated with fitness are therefore required to determine to what extent cognitive abilities covary with each other as well as with the abilities to survive and to reproduce successfully (3,13).
The first step in investigating fitness consequences of cognitive variation is to establish whether particular cognitive abilities are critical for survival, reproductive success, or both (3). This approach is challenging because it requires both an estimate of cognitive performance and fitness proxies of the same individuals (6,7,14,15). The available studies suggest that both natural and sexual selection may act upon interindividual variation in some cognitive abilities, but these studies differed widely in the kinds and the number of cognitive tests as well as the fitness proxies used, and they revealed mixed results (6,(15)(16)(17). Some studies found a positive link between cognitive performance and mate choice (15), reproductive success (7,14,18,19), or survival (20,21). Thus, differential fitness resulting from heritable variation in cognitive performance suggests that cognitive traits can evolve (22). However, in other studies, no or a negative link between cognitive performance and one of these fitness proxies has been reported (18,(23)(24)(25)(26). This outcome might be due to the fact either that there is a trade-off between the energetic costs of enhanced cognitive abilities and somatic maintenance or reproduction (23,(26)(27)(28) or that the applied cognitive tests did not capture ecologically relevant cognitive traits (21).
In principle, selection may act upon different cognitive abilities separately or jointly as part of a general cognitive ability (7,26). The presence of a general intelligence factor (g) has been suggested for humans, but its existence remains debated in animals (29)(30)(31). One key finding supporting the existence of g are the uniformly positive correlations among different cognitive traits (29). To assess g in animals requires the establishment of a psychometric test battery encompassing several cognitive tasks tapping into different cognitive domains (13,30). If all tasks in the test battery address the same domain, however, they may reflect the same underlying cognitive mechanisms and the positive correlations indicative of a g may arise spuriously (29,30). Designing a suitable test battery that targets specific cognitive traits is challenging because tasks might be too much alike or associated with similar underlying motivations involved in solving a task, such as an inhibitory component in reversal learning or detour tasks (2,30,32). Moreover, identifying domains in animal cognition is not always straightforward, with some authors classifying spatial reversal learning tasks as spatial cognition, whereas others stress their inhibition component (29). Animals may recruit several specific cognitive abilities to solve a particular task, and different subjects or species may even enlist a different set of cognitive abilities (29,33,34).
The link between a general intelligence factor and proxies of fitness has been mainly studied in wild birds (7,26,35). In Western Australian magpies (Gymnorhina tibicen dorsalis), performance in associative and reversal learning, inhibitory control, and spatial memory correlated positively with each other, indicating the existence of a g, and females that had a higher g score enjoyed greater reproductive success (7). In contrast, in male spotted bowerbirds (Gymnohina tibicen dorsalis) and Arabian babblers (Turdoides bicolor), variation in a general intelligence factor did not predict reproductive success (26,35). In other studies, performance across cognitive tasks did not positively correlate with each other, indicating that the used cognitive tasks addressed abilities in different cognitive domains (36,37). Although cognitive functions are attributable to isolated operations of single brain areas, it is well established that cognition results from dynamic interactions of distributed brain areas operating in large-scale networks (38). Thus, even if there is domain-specific learning, it does not necessarily mean that cognitive problems are solved by solely activating these domain-specific brain areas. Hence, cognitive abilities work in concert, with several cognitive abilities potentially being involved in solving these "human-designed domain specific tasks." Accordingly, a composite cognitive score that weighs performance across tasks equivalently might reflect this ability. Such a composite score has been used in studies of satin bowerbirds (Ptilonorhynchus violaceus) and male budgerigars (Melopsittacus undulatus), where it correlated positively with reproductive output (36,37). Therefore, independent of how cognitive performance has been assessed, either in single tests or via a g or composite score, there is suggestive evidence that cognitive performance tends to be associated with differential fitness in the reproductive domain.
Because of their relatively large brains, socially complex societies, and advanced cognitive abilities, primates have traditionally been a primary target to investigate the evolution of intelligence (39). However, although some primate populations have been studied for decades and their cognitive abilities are now being studied in the wild (40)(41)(42), the link between intraspecific variation in cognitive performance and fitness has so far only been assessed in one preliminary study of a wild primate: gray mouse lemurs (Microcebus murinus) (17). Performance in problem-solving and spatial memory was not associated with short-term survival, but problem-solving abilities predicted body condition. Because of this mixed support for a link between cognition and fitness, we expanded our preliminary study by estimating survival directly in a much larger sample and by establishing a battery of cognitive tests of multiple, ecologically relevant cognitive abilities (34,43).
Specifically, we tested problem-solving abilities, spatial memory, inhibitory control, and causal understanding in separate tests. These domain-general tests are thought to capture cognitive abilities that play a pivotal role in various fitness-related behaviors (table S1). Moreover, because cognitive performance may also be influenced by intrinsic individual traits (44), we additionally conducted two personality tests-an open field test to assess exploration and a novel object test to assess neophilia-to control for potential noncognitive confounds.
Gray mouse lemurs are small nocturnal primates, endemic to Madagascar, that have become a model species in genetics, biomedical studies, and cognition (45,46). At our study site in Kirindy Forest, these ecological generalists face a full set of natural predators and competitors (47), have a median life span of 3 years (48), and are resilient to repeated testing in temporary captivity (43). In the present study, we first determined whether cognitive performance covaries across tasks. Next, we investigated whether cognitive performance is influenced by individual traits, such as age, sex, or personality. We then examined whether body mass, which is an established predictor of survival in gray mouse lemurs (48,49), is best predicted by cognitive performance, personality, age, sex, or rainfall in the year of testing (as a proxy for food availability). Last, we investigated the relative importance of cognitive performance and personality for predicting survival by controlling for other factors, namely, body mass, sex, age at testing, and food availability. We approximated lifetime food availability by calculating the mean monthly rainfall an individual experienced during its life. In contrast to most previous studies, we used lifetime survival as our main fitness proxy to contribute an additional perspective on the cognition-fitness link and because individuals with greater survival usually also have higher lifetime reproductive success (21).

RESULTS
We conducted experiments using wild-caught mouse lemurs kept in temporary confinement for a maximum of three consecutive nights per test session. We quantified the cognitive performance of 198 individuals that reached a mean age of 1.94 years (range, 0.27 to 8.83 years) in tests designed to assess problem-solving abilities, spatial memory, inhibitory control, and causal understanding ( Fig. 1; movies S1 to S4). We also conducted two standard personality tests: an open field and a novel object test ( Fig. 1; movie S5). Because all six tests could not be performed in a single session, individuals had to be recaptured, resulting in a total of 194 individuals who performed both personality tests, of which 130 individuals performed four, 17 individuals three, 32 individuals two, and 15 individuals one cognitive test.

Performance across tasks and composite cognition score
Individual mouse lemurs varied in longevity, body mass, as well as in performance in personality and cognitive tests ( fig. S1). Repeatability in performance was moderate, except for problem-solving abilities (see Materials and Methods). Individual performance did not correlate positively across tasks (table S2), suggesting that no underlying general intelligence factor can be derived from these test scores. Therefore, we derived a composite cognition score (CCS) by first transposing the different measures of performance to a value between 0 and 1 ( fig. S2), with lower values representing better performance (Fig. 2B). Next, we calculated a mean score for   individuals that participated in at least three cognitive tests (N = 147; Fig. 2B). On average, mouse lemurs performed best in the spatial memory task (mean ± SD: 0.20 ± 0.15; Fig. 2B), equally well in the problem-solving (0.44 ± 23) and string-pulling tasks (0.44 ± 28), and were poorest at exhibiting inhibitory control (0.57 ± 0.28).

Influence of individual characteristics on cognitive performance
We found that CCS covaried with age at testing and exploration (open field test), but not with neophilia (novel object test) and sex (Table 1A; F test: full-null model comparison: F = F = 3.70, P = 0.007). Specifically, older individuals and more explorative individuals performed on average worse than younger and less explorative individuals (Fig. 2, C and D).

Factors influencing body mass
On average, mouse lemurs had a body mass of 55.9 ± 9.4 g (mean ± SD). We found that average body mass at testing was best predicted by age and exploration, with older and more explorative individuals having a higher body mass (Table 1B and Fig. 2E; F test: full-null model comparison: F = 18.8, P < 0.001). However, CCS, neophilia, and rainfall in the year of testing were not associated with body mass. Similarly, average body mass at testing was not predicted by performance in single cognitive tests, but instead by exploration and age (Table 1C; F test: full-null model comparison: F = 10.65, P < 0.001).

Factors influencing survival
We found that survival after testing was predicted by sex, CCS, body mass, and exploration (Fig. 3, A and B, and Table 2A; Wald test: χ 2 = 29.39, df = 7, P < 0.001). Males and individuals with a higher CCS (i.e., indicating poorer cognitive performance) died sooner. In contrast, individuals with a higher body mass and those that were more explorative survived for longer. Age at testing, neophilia, and rainfall were not significant predictors of survival. Similarly, the Cox proportional hazard model including performance in single tests revealed that survival after testing was significantly predicted by sex and body mass, but only by trend by performance in the problemsolving and inhibitory control task as well as by exploration (Table  2B; Wald test: χ 2 = 23.59, df = 10, P = 0.009). Males and, by trend, individuals with a less good performance in the problem-solving and inhibitory control task died sooner. Heavier and, by trend, more explorative individuals lived longer, however. Neither performance in the spatial memory or causal understanding task nor age at testing, neophilia, and rainfall did covary with survival after testing (Table 2B).

DISCUSSION
We investigated the cognition-fitness link in a wild primate species by quantifying cognitive performance across four domain-general cognitive tasks and two personality tests. Performances across tasks did not correlate positively with each other, suggesting that there is no general intelligence factor underlying variation in individual performance in this species. The derived CCS covaried with age and exploration. Body mass, an important physical predictor of survival, covaried neither with the CCS nor with performance in single cognitive tests. However, body mass covaried with age at testing and exploration. In the survival analysis, we controlled for age, body mass, and rainfall to gauge the relative importance of these factors and cognitive abilities to predict survival. Survival was predicted by body mass, age, exploration, and CCS. Our findings therefore support a key prediction of the role of selection in driving cognitive evolution: Superior cognitive abilities are associated with tangible fitness benefits in gray mouse lemurs.

CCS and factors influencing cognitive performance
As in many other animal species studied previously (29, 30), we did not find evidence for the existence of a general intelligence factor. Among primates, there is contradictory support for a g factor across and even within species (32,43,(50)(51)(52). In captive chimpanzees (Pan troglodytes), for example, a g factor derived from performance in tests of the Primate Cognition Test Battery has been reported in one (51), but not in another population (50). However, as research on general intelligence in nonhuman animals is still in its infancy, and the question of how best to develop robust domain-specific tasks and statistical methods to derive a g factor is still heavily debated (29)(30)(31)53), more research is needed for a better understanding of the evolution of general intelligence. Even so, cognitive abilities nevertheless work in concert, with several cognitive abilities potentially being involved in solving a given cognitive task (29,33,34,38), which may be assessed by a CCS that weighs performance across all tasks equivalently (36,37).
In mouse lemurs, this CCS was influenced by age and exploration, but not by sex or body mass, with older and more explorative individual performing less good. Cognitive senescence has been documented in captive mouse lemurs, with individuals being older than 5 years exhibiting deficits in memory, flexible learning, and spatial abilities (54). However, unlike their captive conspecifics, wild mouse lemurs do not seem to exhibit functional senescence (48). Thus, potential age effects in cognitive performance should be treated with caution because only few individuals reach such an old age in the wild; we tested overall only four individuals older than 5 years. A decline with age in a general cognition factor has also been reported in female Arabian babblers (26). Because older females produced more fledglings, there might be a trade-off between cognitive performance and reproductive success in this species (26).
Personality has been suggested to be one predictor of individual variation in cognitive performance, with personality types being linked to cognitive styles via a speed-accuracy trade-off (55). Fast animals that are more explorative, aggressive, and/or bolder take risks while gathering more short-term gains, whereas slow animals take time to make accurate inferences and decisions that are often safer, but associated with relatively low short-term gains (55). Mouse lemurs that were more explorative had a higher CCS and performed poorer in the cognitive tests, supporting the existence of this speed-accuracy trade-off. Body mass as a proxy for hunger and, thus, motivation to engage with the experimental apparatuses (56), as well as neophilia did not covary with CCS and did therefore not influence performance in cognitive tests.

Factors predicting body mass
Body mass is one of the best predictors of longevity in several mammalian species (57). Body mass also best predicted mouse lemurs' short-term survival (48,49). However, body mass was not predicted by CCS, but instead by age and exploration, with older and more explorative individuals having a higher body mass. Because explorative tendencies can affect foraging strategies (58,59), more explorative mouse lemurs may potentially gather more resources and might be better in optimizing their foraging strategies than less explorative individuals. Neophilia did not covary with body mass, paralleling results of earlier studies on personality and lifehistory trade-offs in mouse lemurs (60).

Factors predicting survival
We found that survival was best predicted by body mass, sex, exploration, and CCS. Heavier and more explorative mouse lemurs lived longer after testing, whereas males and individuals that had a higher CCS, i.e., poorer cognitive performance, died sooner. In line with earlier studies in this population of mouse lemurs, body mass and sex predicted survival in the present large sample (48,49,61).
Moreover, more explorative individuals lived longer, in echoing findings of a meta-analysis across animals ranging from insects to mammals (62). Whereas heavier and more explorative individuals lived longer, individuals that gathered more accurate information and exhibited better cognitive performance also lived longer. These patterns may reflect a potential speed-accuracy trade-off with alternative strategies yielding similar overall fitness (55).
Rainfall experienced during mouse lemurs' lifetime until testing, neophilia, or age at testing did not influence survival. Because age at testing did not predict survival after testing, the effect of CCS on survival should not be driven by older individuals with poorer cognitive performance. Such an effect might arise if animals become less motivated with age to enter traps. However, mouse lemurs are generally very "trap-happy" (many individuals have been trapped >100 times), and it will now be interesting to determine whether cognitive test performance, exploration, or neophilia influence the probability of recapturing individuals (63,64).
The survival analysis including performance in single cognitive tests revealed that survival was predicted by trend in two (problemsolving and inhibitory control) out of the four cognitive tests. Table 1. Summary statistics. Results of the linear models (LMs) estimating (A) the influence of age at testing, sex, exploration, and neophilia on cognitive performance [composite cognition score (CCS)] (N = 147); (B) the influence of sex, age at testing, CCS, exploration, neophilia, and rainfall on average body mass at testing (N = 147); and (C) the influence of sex, age at testing, performance in the problem-solving, spatial memory, causal understanding, and inhibitory control tasks, as well as exploration, neophilia, and rainfall on average body mass at testing (N = 147).  Table 2. Survival analysis. Results of the Cox proportional hazard model estimating (A) the influence of sex, composite cognition score (CCS), average body mass at testing, exploration, age at testing, neophilia, and average rainfall on survival after testing (N = 147) and (B) the influence of sex, performance in the problem-solving, spatial memory, causal understanding, and inhibitory control tasks, as well as average body mass at testing, exploration, age at testing, neophilia, and average rainfall on survival after testing (N = 147). Animals that performed poorer in the problem-solving and inhibitory control task tended to die sooner, mirroring the results of the analysis using the CCS. Because animals may use several cognitive abilities to solve a given problem (29,33,34,38), we think it is reasonable to derive a CCS that weights several cognitive measures equally, although they are not positively interrelated, to operationalize overall cognitive performance in this small test battery. This approach has already been applied in budgerigars and satin bowerbirds (36,37) and was slightly adapted in the present study (see Materials and Methods). In our earlier study, performance in two cognitive tests (problem-solving and spatial memory) did not covary with survival (17), which might be due to the fact that we only considered short-term survival in a much smaller sample than in the present study.

Cognition and fitness
Few previous studies have studied the link between cognitive performance and survival (21). Survival/life span is likely to be a better predictor of total fitness than assessing reproductive success by counting offspring during one or a few seasons, but it is far more challenging to measure. Furthermore, in promiscuous species without paternal care, such as most mammals, cognitive performance is challenging to link to variation in reproductive success because of sex-specific differential investment in reproduction. In addition, many previous animal studies investigating the effects of cognitive performance on fitness used relatively small sample sizes or indirect measures of mate preferences, which may explain why they found effects for one sex but not the other, or in only 1 year but not another, or for only some fitness proxies and not others [ (6,65,66), but see (26)]. Studies on humans, in contrast, agree with the key results of our study: Cognitive function and body mass index were both independent predictors of mortality risk in a Chinese population (67), and intelligence scores and fertility of Swedish males were positively correlated (68).
Our study demonstrates that individual differences in cognitive performance can result in differential fitness, but the heritability of cognitive traits remains to be demonstrated. Individual variation in cognitive abilities can only drive adaptive brain size evolution when they cause improvements in fitness that more than compensate for the higher energetic costs of larger brains (69) associated with superior cognitive abilities (70). However, our results do not allow us to draw conclusions about whether selection acts on individual cognitive abilities or on groups of them through some kind of threshold or cumulative effects. Determining individual variation in brain size (66), the neural and developmental underpinnings of cognitive evolution (46), and the exact nature and strength of selection on cognitive abilities remain important next steps. Mouse lemurs, because of their tractability in the laboratory and our ability to monitor them in the field to obtain ecologically relevant measures of fitness, have the potential to be a primate model species for this endeavor (41,46).

Study site and subjects
We conducted this study in Kirindy Forest, a dry deciduous lowland forest in central western Madagascar located within a 12,500-ha forest concession operated by the Centre National de Formation, d'Etudes et de Recherche en Environnement et Foresterie (CNFEREF) Morondava (71). We captured gray mouse lemurs (N = 198) in a population that has been regularly monitored since 1995 by capturing them on a nearly monthly basis as part of an ongoing long-term study (61,72,73). To do so, we baited Sherman live traps with banana, set them at dusk at trail intersections, and collected them at dawn. Captured mouse lemurs were brought to the nearby field station and individually identified. All individuals were weighed. If they were newly captured, they were briefly restrained with 0.6 μl of ketamine (50 mg/ml) per 1 g body mass to mark them individually with a subdermal microtransponder (Trovan, Usling, Germany) and subjected to several standard morphometric measures, including body mass (72). We estimated an individual's age by determining the number of days between birth and the date of the respective experimental test. We set an individual's birth date to the modal birth date January 1st of the year of its first capture (72). To define death operationally for individuals not recaptured for longer periods, we determined the 95th percentile of the frequency distribution of 10,936 intercapture intervals recorded between 1995 and 2017 as a cutoff point. Accordingly, study subjects were operationally considered dead if they were not recaptured within 161 days. Because mouse lemurs exhibit "trap-happiness" and enter traps regularly (61), we do not assume that some individuals were mistakenly recorded as dead because of individuals becoming trap shy.

Housing and experimental procedure
Experiments were conducted between 2015 and 2019 by keeping animals in temporary short-term captivity at the field station during the dry season (March to November). Mouse lemurs were housed in cages of 80 cm × 80 cm × 80 cm equipped with a nest box, several branches, an experimental platform, and ad libitum access to water. After testing, mouse lemurs were fed with insects and banana. We kept animals for a maximum of three nights, after which they were released back at their site of capture. In total, we tested up to 198 mouse lemurs per task in a total of 1038 tests (17,43). Because the complete experimental test battery was usually not completed during one test session comprising three consecutive nights, we selectively recaptured subjects after they had spent at least three nights back in the forest. In general, mouse lemurs respond to capturing with a short-term increase in fecal glucocorticoid metabolite concentrations (fGCMs) on day 1 or 2 after capture and a decrease to baseline levels on day 2 or 3 after capture (48). However, there was no evidence for long-term consequences of repeated captures on the animal's stress physiology (48). Although we cannot rule out that the initial increase in fGCM levels influenced cognitive performance, it should, however, be similar across individuals because they were all treated in the same way. Because mouse lemurs were only tested when they voluntarily entered and explored the experimental platform, and the majority of individuals participated in the cognitive tests already in the first night after capture, we are confident that performance in cognitive tests was at most only marginally influenced by potential physiological stress and that this putative effect did not introduce any systematic bias.
Testing started between 06:00 and 07:00 p.m. under red light conditions, whose wavelength is not visible for the dichromatic mouse lemurs (74), and ended when the motivation of the animals ceased. The experimental test battery comprised two personality tests (an open field and a novel object test) and four cognitive tests (a problem-solving, spatial memory, inhibitory control, and causal understanding task), for which we used small pieces of banana as food rewards (Fig. 1). The order of the tests was randomized and counterbalanced among subjects. Before any experimental session, we cleaned the experimental platform, the test apparatus, or the respective arena for the open field test and plus maze with 70% ethanol to remove any odor cues. At the beginning of each test, we lured mouse lemurs with a stick covered with banana to the starting position at the opposite end of the experimental platform. Each experimental test session was videotaped (Sony HDR-CX 240) and later analyzed using BORIS (75). For each test, 10% of the videos were double-coded by a second observer naïve to the research question, resulting in a mean interobserver reliability of 95.9% [minimum: 80.2%, maximum: 100%; (43)]. Because it was logistically impossible to re-test all individuals, repeatability estimates were based on small subsamples. Repeatability was calculated by intraclass correlation coefficients.

Personality tests Exploration and neophobia
We assessed an individual's explorative tendencies in an unknown environment, using an open field test with either a rectangular (80 cm × 60 cm × 60 cm; Fig. 1) or cylindric wooden arena (Ø 80 cm × 80 cm). After subjects entered the arena voluntarily, they were observed for 5 min exploring the arena. We used the duration subjects spent locomoting as measure for exploration (movie S5). To assess an individual's neophilic tendencies, we introduced a novel object (either a plastic snoopy or a toy car; Fig. 1) directly after each open field test into the arena and measured how often they contacted the novel object within a 5-min test duration as a measure of neophilia (movie S5). Exploration (time spent locomoting: N = 83, ICC = 0.263) was moderately repeatable, and neophilia (number of contacts: N = 83, ICC = 0.028) was weakly repeatable.

Problem-solving
To assess an individual's problem-solving abilities, we presented a problem-solving box (6 cm × 12 cm; Fig. 1 and movie S1) consisting of six uniform wells (5 cm × 4.5 cm) that were each baited with a small piece of banana, which could be extracted by sliding a lid open. After the box was introduced onto the experimental platform within the subject's cage, the animals had 20 min to extract the six rewards. As problem-solving abilities, we measured the mean time individuals needed to open the lids. In case an individual did not succeed at all, we set its success time to 20 min as the maximum test duration. Problem-solving abilities (solving efficiency: N = 20, ICC = 0.035) were weakly repeatable.

Spatial memory
To assess an individual's spatial memory, we set up a plus maze with four arms (40 cm × 17 cm × 17 cm) leading to four terminal boxes (20 cm × 17 cm × 17 cm; Fig. 1 and movie S2). Each end box contained a plastic lid in its back at the opposite side of its door, in which we placed a small piece of banana in case of the rewarding location, only visible from the door. For an initial familiarization trial, we baited three end boxes and released the subject at the fourth box, the start arm. The familiarization trial started when we opened the door of the start box and ended either when the subject had eaten all the three food rewards or after a maximum of 15 min. In that case, we repeated the familiarization trial until successful completion. For the actual test session (15 successive trials), only one end box (=goal box) was baited. As goal box, we chose the box in which we caught the animal at the end of the familiarization trial. Because only the goal box was baited during the test session, we placed a piece of banana peel that was out of reach for the subject on top of each end box to control for olfactory stimuli. Similar to the familiarization trial, a test trial started when we opened the door of the start box and ended as soon as the subject had retrieved the food reward in the goal box. We stopped a session when the subject did not exit the start box within 10 min and continued the session either later in the night or in the following night. We additionally cleaned the maze after every third trial. We rotated the goal boxes throughout the session to prevent the subject from following potential odor cues left inside the goal box. We counted how often a subject entered the wrong arm per trial and calculated the mean sum of errors over 15 trials. Spatial memory (mean number of errors: N = 19, ICC = 0.210) was moderately repeatable.

Inhibitory control
To assess an individual's inhibitory control, we conducted a detourreaching task using the cylinder test design ( Fig. 1 and movie S3). Before testing, we conducted a training session with an opaque cylinder. The food reward was placed in the center of the cylinder, invisible to the subject during its approach. To reach the reward, the subject had to take a detour and enter the cylinder by one of the open sides that were set in an 90°angle from the approach direction onto the experimental platform within the subject's cage. We additionally removed odor cues after every fifth trial. For the actual testing session of 10 trials, we changed the cylinder to a transparent cylinder and repeated the cleaning every third trial, and the rest of the experimental setup was the same as in the training sessions. Using a transparent cylinder, the subject could see the charcoalcolored piece of banana while approaching the task. It could also smell the banana from the front through little holes in the center of the cylinder. Nevertheless, it had to inhibit an initial response to reach directly through the transparent barrier. Instead, it had to take a detour through one of the open sides to enter the cylinder to access the reward. We used the number of incorrect trials during a session of 10 trials as a measure for inhibitory control. Inhibitory control (number of errors: N = 8, ICC = 0.353) was moderately repeatable.

Causal understanding
To assess an individual's means-end understanding, we conducted a string-pulling task ( Fig. 1 and movie S4). To this end, a cable tie was placed onto an external platform attached to the subject's cage in S C I E N C E A D VA N C E S | R E S E A R C H A R T I C L E Fichtel et al., Sci. Adv. 9, eadf9365 (2023) 12 July 2023 front of the experimental platform within the cage. At the far end of the cable tie, we attached a piece of banana onto a small plate. The inner end of the cable tie reached 5 cm into the cage. Just before we placed the cable tie at its position, we lured the subject to the top wire in the center of the experimental platform. After positioning the cable tie, the subject had 20 min time to pull the end of the cable tie into reach to access the reward. We measured the success latency as time span between the response latency, i.e., first orientation toward the string, and accessing the reward. For subjects that did not succeed, we set the success latency to the maximum time of the trial (20 min) plus the response latency. Causal understanding (success latency: N = 14, ICC = 0.587) was repeatable.

Statistical analyses
We conducted all statistical analyses in R (version 4.2.0, R Core Team, 2022). To assess repeatability in cognitive performance, we calculated intraclass correlation coefficients (i) by using the package "ICC" (76). We used Spearman rank correlations (ii) to investigate whether performances across cognitive tests correlated positively with each other. We derived a CCS akin to earlier studies on birds (36,37). They derived a CCS either by assigning performance with a rank of 1 being worst for a task and then calculating the mean (37) or by splitting the values for each measure of cognitive performance into four ranks (1 to 4), using the distribution quartiles as dividing points and adding them up (36). Because assigning ranks or splitting measures of cognitive performance in four ranks does not reflect the measured variance in cognitive performance, we applied an alternative method.
Specifically, we derived a CCS instead by first transposing the different measures of performance, i.e., durations or frequencies, from all individuals to a range between 0 and 1 {x transformed = [x − min(x)]/[max(x) − min(min)]}, which maintained the measured variation in performance ( fig. S2). Then, we calculated a mean score for individuals that performed at least three cognitive tests (N = 147). Performances in the problem-solving, causal understanding, and inhibitory control task were log-transformed to achieve a less skewed distribution before transformation. We chose not to use a principal components analysis because performance in one or several cognitive tests may more heavily load on the derived principal components than others, weighting performance across tasks not equally (31,36).
To assess (iii) whether cognitive performance covaried with individual characteristics, we fitted a Gaussian linear model (LM) with the CCS score as response, sex, age at testing, exploration, and neophilia as predictors. To investigate (iv) whether body mass at testing was predicted by cognitive performance or personality, we estimated another LM by fitting average body mass as response. Because not all tests were performed in one test session and we had to recapture the animals later, we used the average body mass across test sessions in this model. CCS, rainfall in the year of testing as a proxy for food availability, and age at testing were included as predictors. We additionally fitted the same model by including each cognitive test separately instead of CCS. Before fitting LMs, we z-transformed quantitative covariates to a mean of 0 and an SD of 1 to facilitate interpretation of predictor estimates. We checked the model assumptions "absence of collinearity" using variance inflation factors [package "car"; (77)] and "absence of influential observations" using dfbetas and visually checked normally distributed and homogeneous residuals. To test the significance of the predictors as a whole, we compared all full models with the respective null model comprising only the intercept (78).
We estimated a Cox proportional hazard model [package "survival"; (79)] to estimate (v) whether survival after testing was predicted by body mass at testing, sex, CCS, exploration, neophilia, and mean lifetime rainfall as proxy for food availability. We chose years after testing as survival metric to account statistically for age effects on cognitive performance and survival because our sample consisted mainly of young individuals. We right-censored individuals that were still alive at the time of census (N = 22) or potentially dispersed (N = 10), i.e., males that were not recaptured up to an age of 8 months, the age at which they usually disperse (80). Before fitting, we z-transformed covariates to a mean of 0 and an SD of 1 to facilitate interpretation of predictor estimates. We checked for "absence of influential observations" using the package "survminer" and checked the violation of proportional hazards using the function "testph." Again, we fitted the same survival model by including each cognitive test separately instead of the CCS performance. Because only 130 individuals performed all four cognitive tests, we additionally fitted a Cox proportional hazard model including CCS for these individuals separately (table S3).

Supplementary Materials
This PDF file includes: Figs. S1 to S3 Tables S1 to S3 Legends for movies S1 to S5 Other Supplementary Material for this manuscript includes the following: Movies S1 to S5