Leaf metabolic traits reveal hidden dimensions of plant form and function

The metabolome is the biochemical basis of plant form and function, but we know little about its macroecological variation across the plant kingdom. Here, we used the plant functional trait concept to interpret leaf metabolome variation among 457 tropical and 339 temperate plant species. Distilling metabolite chemistry into five metabolic functional traits reveals that plants vary on two major axes of leaf metabolic specialization—a leaf chemical defense spectrum and an expression of leaf longevity. Axes are similar for tropical and temperate species, with many trait combinations being viable. However, metabolic traits vary orthogonally to life-history strategies described by widely used functional traits. The metabolome thus expands the functional trait concept by providing additional axes of metabolic specialization for examining plant form and function.


INTRODUCTION
Plants produce a staggering diversity of metabolites-upward of one million throughout the plant kingdom and many thousands within an individual (1) (the metabolome).The plant metabolome is well known to human society, being the source of most cosmetics (2), flavors (3), and medicines (4).Yet, little is known about how the metabolome varies across the plant kingdom.The metabolome is the biochemical basis of physiology, comprising primary metabolites involved in cellular function (e.g., carbohydrates, amino acids, and nucleic acids) and specialized metabolites produced to confront stress (e.g., flavonoids and terpenoids) (1,5,6).The metabolome thus encompasses the biochemical mechanisms through which evolutionary and ecological processes shape plant form and function [i.e., morphology and physiology (7)] (8,9).Plant form and function vary predictably among species because evolutionary and ecological processes generally act to maximize fitness (10)(11)(12)(13)(14), thus limiting the number of morphological or physiological strategies that are viable (7,15,16).It follows that the metabolome should also be constrained to a limited number of strategies that maximize fitness (17).However, although cross-species studies are now emerging (18)(19)(20)(21), no attempt has been made to characterize how the metabolome varies systematically throughout the plant kingdom.We thus remain unable to place the vast diversity of the plant metabolome into a coherent ecological context.
The plant functional trait concept offers a powerful framework to contextualize the plant metabolome.Plant functional traits are a standardized set of morphological and physiological characteristics that provide universal metrics of plant form and function (22).Plant functional traits have been pivotal in identifying the life-history strategies that govern plant fitness (7,15,16); understanding plant impacts on population (23), community (24), and ecosystem (25) processes; and predicting global change effects on ecosystems (26).Plant functional traits thus provide a strong reference point from which to interpret variation in the plant metabolome.However, measurements of the metabolome remain absent from the functional trait concept, partly because they yield complex data describing thousands of metabolites from hundreds of biochemical pathways (17).Advances in chemoinformatics make it now possible to translate metabolite identities into a targeted set of chemical properties (27,28).Doing so resonates with the functional trait concept-i.e., that quantitative characteristics provide a bridge between identity and function (22)-and is an intuitive way to contextualize the metabolome against plant functional traits.Yet, while chemoinformatics is commonplace in natural products chemistry (29)(30)(31), it remains absent from ecology.
Contextualizing the plant metabolome against existing plant functional traits raises two alternative hypotheses.On the one hand, if the metabolome varies colinearly with functional traits, then it offers a biochemical validation of the functional trait concept.This validation is necessary because functional traits are emergent properties of multiple biochemical processes (22,32), making them proxies for physiology that can be poor predictors of ecological processes (33)(34)(35).On the other hand, if the metabolome varies orthogonally to functional traits, then it describes dimensions of plant life-history variation missed by existing functional traits.This orthogonal variation would enhance the explanatory power of the functional trait concept by providing additional axes of trait specialization to examine plant form and function.Either way, integrating measurements of the plant metabolome into the plant functional trait concept will yield insight into how the metabolome varies across the plant kingdom while also offering a high-resolution mechanistic lens with which to advance the functional trait concept itself.
Here, we performed an assessment of the leaf metabolomes of 457 tropical and 339 temperate plant species to characterize variation in the metabolome and test whether it validates, or expands upon, existing plant functional traits.First, we examined the chemical properties of the whole set of identified metabolites (N = 4292) to derive a minimum number of "metabolic functional traits" that capture chemical variation in the leaf metabolome.We then characterized how metabolic traits vary among species to identify major axes of leaf metabolic specialization in the plant kingdom.Last, we combined metabolic traits with eight widely used functional traits (plant height, seed mass, stem-specific density, leaf area, specific leaf area (SLA) and leaf carbon, and nitrogen and phosphorus concentrations; hereafter "classical traits") (7) to determine whether axes of metabolic specialization are colinear with, or orthogonal to, plant life-history strategies described by classical functional traits.

Five chemical properties capture chemical variation in the leaf metabolome
We expressed the leaf metabolomes of tropical (36) and temperate (37) plant species as the presence or absence of annotated metabolic features (i.e., metabolites) detected through liquid chromatography-mass spectrometry (LC-MS) (tropical, N = 3356; temperate, N = 2227).We then characterized the chemistry of the leaf metabolome by examining variation in the chemical properties of all identified metabolites from both datasets (N = 4292) (31).We focused on 21 properties describing metabolite constitution, geometry, or topology that have relationships with (bio)chemical activity (table S1).Pairwise correlations among chemical properties (Fig. 1A, top) reveal that properties separate into five clusters, each representing a distinct facet of leaf metabolite chemistry.The largest cluster (Fig. 1A, green) describes overall size and structural complexity, comprising total numbers of atoms and bonds, molecular weight, a predictor of polarity based on the proportion of noncarbon atoms [M log P; (38)], and related topological parameters [i.e., Wiener numbers (39) and eccentric connectivity (40)].A second cluster (Fig. 1A, purple) describes bond conjugation and reactivity, comprising numbers of aromatic atoms and bonds, numbers of atoms in the largest conjugated (pi-)system, and a derived complexity index indicative of promiscuous activity in drugs [f MF ; (41)].A third cluster (Fig. 1A, yellow) describes polar intermolecular (noncovalent) forces, comprising total and mass-specific polar surface areas and numbers of hydrogen (H) bond donors or acceptors (42,43).These forces are important for the selective binding of small molecules to DNA (44), RNA (45), and proteins (46).A fourth cluster (Fig. 1A, blue) comprises two predictors of metabolite polarity based on summed properties of atoms and bonds [X log P (47) and A log P (48)], where lower values indicate higher polarity.A final cluster (Fig. 1A, red) describes carbon bond saturation via the relative occurrence of sp 3 (saturated, three-dimensional) and sp 2 (unsaturated, planar) hybridization (49).Overall, these clusters indicate that the leaf metabolite chemistry is highly structured and can be described by five distinct facets: size/complexity, bond conjugation/reactivity, contributions to polar intermolecular forces, general polarity, and carbon bond saturation.
The strong collinearity among chemical properties (Fig. 1A, top) suggests that the five facets of leaf chemistry can be represented by a small number of discriminant variables.Indeed, we observe that a principal components analysis (PCA) performed on a subset of five chemical properties, one from each cluster (Fig. 1C), is equivalent to a PCA performed on all 21 properties (PC1: r 4290 = 0.97; PC2: r 4290 = 0.99; PC3: r 4290 = 0.93; P < 0.001 in all cases; Fig. 1B, left).We selected molecular weight, H-bond acceptor count, aromatic atom count, a-polarity (X log P), and carbon hybridization (sp 3 :sp 2 ) as five properties that are both general structural characteristics of leaf chemistry and intuitive for nonspecialists to interpret (Fig. 1C and table S1).However, most combinations of properties covering all facets of leaf chemistry also yield equivalent PCAs (fig.S5).Pairwise correlations between all properties and subset principal component (PC) scores further confirm that selected properties capture observed leaf chemical variation, in that correlations are consistent within each cluster (Fig. 1A, bottom).Selected properties also describe biologically relevant chemical space according to an existing framework that classifies natural products based on size, then aromaticity, and then lipophilicity (PC1: r 4290 = 0.99; PC2: r 4290 = 0.87; r 4290 = 0.81; P < 0.001 in all cases; Fig. 1B, right) (31).The first axis of the subset PCA (38% variance) separates heavy metabolites with many H-bond acceptors from light metabolites with fewer H-bond acceptors.Alignment between metabolite size and numbers of Hbond acceptors is expected because larger molecules usually engage in stronger intermolecular interactions (50).The second axis (37% variance) separates reactive metabolites with more aromatic carbon bonds from unreactive metabolites with more saturated carbon bonds.This axis is also intuitive because conjugated bond systems are unsaturated, reactive, and planar, whereas saturated bonds, such as those abundant in fatty acids, are unreactive and three-  S1) of identified annotated metabolites detected in tropical and temperate datasets (N = 4292).Properties are ordered on the basis of correlation similarity (hierarchical clustering; Ward), with colors separating distinct clusters of properties (maximal silhouette width: K = 5).The bottom panel shows correlations between chemical properties and scores from the first three axes of a PCA performed on a subset of five properties, as well as (B) between subset PC scores and either scores from a PCA performed on all 21 properties (left) or metabolite positions on the first three axes of ChemGPS-NP (31) space (right panel).Circle sizes indicate coefficient strengths, while fills discriminate between positive (black) and negative (white) correlations.(C) Chemical properties selected to represent each cluster, alongside chemical graphs of metabolites with the lowest and highest values for each trait.
dimensional (51).The third axis (18% variance) separates nonpolar metabolites from polar metabolites (29).Our findings thus show that leaf chemical variation can be represented by five chemical properties, one from each facet of leaf chemistry.

Five chemical properties provide more information than metabolite family
The five selected chemical properties not only represent leaf metabolite chemistry (Fig. 1), but also reflect known differences among metabolite families.Using the subset PCA to interpret variation among metabolite families (Fig. 2), we observe that, relative to other identified metabolites, peptides are large, display average bond saturation, and are moderately polar; carbohydrates are of average size, display high bond saturation, and are highly polar; and fatty acids are smaller, display high bond saturation, and are of average polarity.Lignans, alkaloids, coumarins, and flavonoids, families of specialized metabolites, display low bond saturation due to the presence of many aromatic bonds, with alkaloids and coumarins also being relatively small.Terpenoids, a family of abundant specialized metabolites (N = 2306), vary in size, bond saturation, and polarity, echoing their diversity across the plant kingdom (52).Although chemical properties generally differentiate among metabolite families, all families have variable and overlapping chemistries (Fig. 2, marginal boxplots).We thus suggest that chemical properties, quantitative metrics of the chemistry that underpins metabolite composition and activity, provide more precise information about metabolite function than metabolite family identity.Drug discovery and medicinal chemistry rely on many of the same chemical properties to screen for bioactive molecules (29)(30)(31).Such a perspective resonates with the functional trait concept by reaching beyond taxonomy to describe function (22) and provides a strong foundation for a novel set of metabolic functional traits.

Five metabolic functional traits describe two axes of leaf metabolic specialization
We converted selected properties into plant-level metabolic functional traits to examine how the leaf metabolome varies among tropical and temperate plant species.Tropical and temperate datasets were analyzed separately due to differences in data structure (see Materials and Methods).We first compared the volume and "lumpiness" (i.e., aggregation around values) of multidimensional space occupied by metabolic traits to four null models to test whether species converge on a restricted set of interrelated trait combinations (7).Observed hypervolumes of metabolic traits are 98.1 to 99.8% smaller and 71.0 to 94.7% lumpier than those of three null models assuming no trait covariance (tables S2 and S3; P < 0.001 in all cases).However, observed hypervolumes are larger (tropical: P = 0.002; temperate: P = 0.047) than that of the null model assuming trait covariance and selection against extreme values while also being similarly lumpy (tropical: P = 0.662; temperate: P = 0.858).By comparison, observed hypervolumes of eight classical traits are smaller than null models assuming no trait covariance (table S2; P < 0.001 in all cases), either the same size as (tropical species; P = 0.154) or smaller than (temperate species; P = 0.018) that of the null model assuming selection against extreme values, and lumpier than all null models (table S3; P < 0.001 in all cases).Collectively, these findings yield three insights.First, metabolic functional traits are interrelated, so plants are constrained to a limited number of viable metabolic trait combinations.Second, while extreme metabolic trait values may be selected against in temperate species, this is not so for tropical species.Expressing extreme leaf metabolome chemistry is thus a viable lifehistory strategy in tropical environments, potentially due to more intense biotic interactions selecting for a diversity of defensive leaf metabolites and thus divergence of chemical strategies (53).Third, metabolic traits are less aggregated around nonextreme values than classical traits.Hence, although plants converge toward one of several viable classical trait strategies [e.g., woody and nonwoody; (7)], they can display a wider array of leaf metabolome strategies.
We performed PCAs on metabolic traits to characterize major axes of leaf metabolic specialization among species.Metabolic trait variation is consistent among tropical and temperate species, with 92% variation being explained by the first two PCs (Fig. 3).The first axis (tropical: 52%; temperate: 49%) separates species that produce more unsaturated, aromatic, nonpolar leaf metabolites from those that produce more saturated, polar, leaf metabolites.This axis likely describes a spectrum of leaf chemical defense strategies (12).Unsaturated aromatic metabolites, such as alkaloids, coumarins, and flavonoids (Fig. 2), are reactive and serve as toxins or antioxidants in response to stress (54)(55)(56).For instance, conjugated bond structures can interfere with protein function by binding covalently to sidechains (54), generate or quench oxidative stress (55), or absorb damaging wavelengths of light (56).By contrast, saturated nonaromatic metabolites can play different roles in leaf defense, both directly as toxins (57) and indirectly as signaling molecules (58).An alternative explanation is that the first axis reflects an investment (or not) into defense metabolites (12), which could arise due to the high cost of synthesizing aromatic metabolites (13).However, all plants must defend their leaves against stress, and we find that plants can maximize fitness at either end of this axis (Fig. 3).It is thus more plausible that the first axis represents a leaf chemical defense spectrum.
The second PCA axis (tropical: 40%; temperate: 43%) discriminates species that produce larger metabolites with many H-bond acceptors from those that produce smaller metabolites with fewer Hbond acceptors.This axis likely relates to leaf longevity because large metabolites, such as lignans, peptides, and waxes, form the basis of long-lived physical (59,60) and storage (61) structures.Many of these structures also depend on strong intermolecular forces to form stable macromolecules or complexes (62,63), which are enhanced by H-bond donors and acceptors (43).As for the first axis, plants can maximize fitness at either end of the second axis, with species at the positive end of the axis investing in metabolites indicative of a longer leaf life span.Hence, the second axis is probably a manifestation of the fast-slow continuum (64) at the leaf level.Alternatively, the second axis may reflect an investment (or not) into structurally complex metabolites for physical defense, such as lignans or waxes (59,60).However, these metabolites act by creating rigid structures that lengthen the leaf life span anyway (65), so physical protection is one component of leaf longevity.Together, metabolic trait PCAs reveal two major axes of leaf metabolic specialization.These axes relate to metabolite bond saturation and polarity (i.e., a leaf chemical defense spectrum) as well as metabolite size and propensity for intermolecular interactions (i.e., leaf longevity), with many combinations yielding successful life-history strategies.

Axes of leaf metabolic specialization reveal hidden dimensions of plant form and function
The apparent consistency of metabolic trait variation with classical life-history theory (7,12,13,15,16,64,65) raises the possibility that metabolic traits describe aspects of plant form and function already captured by classical traits.We thus performed a PCA on metabolic traits plus the same widely used functional traits used above to test whether axes of leaf metabolic specialization are colinear with, or orthogonal to, trait-based life-history strategies.Considered alone, classical trait variation supports life-history theory (7), with the first two PCs (tropical: 49%; temperate: 51%) describing the leaf economics spectrum [SLA, leaf nitrogen, and leaf phosphorus; (16)] and variation indicative of plant longevity [plant height and seed mass; (64)] and hardiness [stem density and leaf carbon; ( 15)] (fig.S1).However, when combining classical and metabolic traits in the same PCA, the first four PCs (tropical: 66%; temperate: 67%) are equivalent to axes described by individual PCAs (Fig. 4A, lower marginal matrices).The first and third axes describe the first two axes of leaf metabolic specialization (Fig. 3), whereas the second and fourth axes describe the first two axes of classical trait specialization (fig.S1).Pairwise correlations among all traits further show that metabolic and classical traits do not covary (Fig. 4, A and B) but separate into six clusters describing unique aspects of either metabolic or classical trait variation.Hence, major axes of leaf metabolic specialization vary orthogonally to major axes of classical trait variation (fig.S4) and are at least equally relevant for describing variation in plant form and function among tropical and temperate plant species.
Leaf metabolism underpins energy production via photosynthesis (66) but, in doing so, yields tissue that is sensitive to light (67) and temperature (68) and is of high nutritional value to herbivores (65).Plants are thus under selective pressure to maximize photosynthetic rates while also protecting leaf tissue against damage.While this trade-off is consistent with life-history trade-offs derived from classical traits (7,15,16), it does not necessarily follow that individuals apply the same strategy in every organ to maintain fitness.We observe that species with "productive" trait values (e.g., high SLA and high leaf nitrogen) can produce leaves containing many complex aromatic metabolites, species with "hardy" trait values (e.g., high stem density and high leaf carbon) can produce leaves containing many simple polar metabolites, and species with trait values typical of a fast pace of life (e.g., low stature and small seeds) can produce leaves with many large stable or structural metabolites.Our findings thus question classical trait (16) and plant defense (12) theory that predicts relationships between the leaf  chemical phenotype, plant productivity, and pace of life.In short, metabolic functional traits describe unique dimensions of plant lifehistory variation that are complementary to, and independent from, those captured by classical functional traits.

DISCUSSION
In summary, we show that leaf metabolite chemistry is highly structured and can be described by five chemical properties.Using chemical properties as metabolic functional traits reveals that plants vary on two axes of leaf metabolic specialization-a leaf chemical defense spectrum (i.e., bond saturation and polarity) and leaf longevity (i.e., size and propensity for intermolecular interactions).Axes of metabolic specialization are similar in tropical and temperate species, with many combinations of trait values yielding successful life-history strategies.Axes are also consistent with expectations from life-history theory that plants invest strategically in tissue defense (12,13,16) and between a fast versus slow pace of life (7,64).Nevertheless, we find that metabolic trait variation is orthogonal to, not colinear with, classical trait variation.Metabolic functional traits thus reveal dimensions of plant life-history variation missed by existing functional traits.We note that metabolomics data were acquired differently for tropical and temperate species and were matched to open functional trait databases.Nevertheless, we find consistent variation in the leaf metabolomes of 457 tropical and 339 temperate species that originate from distinct geographic distributions and evolutionary histories (fig.S1,  A and B), as well as that encompass most documented functional trait variation (fig.S1G, red ribbons).We thus propose that distilling the leaf metabolome into metabolic functional traits captures macroecological patterns of metabolic variation widely across the plant kingdom, enhances the explanatory power of the functional trait concept, and offers a new set of tools for the discovery of species or genotypes with trait combinations adapted to societal needs.

Sample collection and preparation
We used two existing datasets describing the leaf metabolomes of tropical and temperate plant species (as the unit of observation).The tropical dataset originated from a random subsample of tropical leaves from the Pierre Fabre Sample Library (69), which is an archive of ~17,000 plant samples from in situ communities around the globe that was originally collected for drug discovery and is a registered collection of the European Union (registration code: 03-FR-2020) (70).Samples were oven-dried (55°C for 3 days) and extracted using overnight maceration (8 g of sample and 80 ml of ethyl acetate).Extracts were dried once (Genevac, SP Industries, Warminster, PA, USA) and eluted [30 mg of extract plus 200 mg of silica, 1-g silica solid phase extraction cartridges, and 6 ml of dichloromethane (DCM) plus 85:15 DCM:MeOH] and then dried again and resuspended in dimethyl sulfoxide at a final concentration of 5 mg ml −1 .The temperate dataset originated from a study of landscape-scale ecological variation in phytochemical diversity (37), wherein leaves were sampled from in situ grassland communities across Switzerland.Briefly, leaves were dried (40°C for 5 days), milled (Retsch TissueLyser, QIAGEN, Hilden, Germany), and extracted (20 mg of tissue and 0.5 ml of 80:19.5:0.5 MeOH:H 2 O:H 2 CO 2 ).Extracts were homogenized (glass beads; 30 Hz for 3 min) and centrifuged (14,000 rpm for 3 min), and the supernatant was isolated for measurement.

Metabolomics measurements
Untargeted metabolomics was performed on all samples using ultra high-performance LC-MS (UHPLC-MS).Tropical extracts were analyzed using a Waters Acquity UHPLC system coupled to a Q-Exactive Focus MS (Thermo Fisher Scientific, Bremen, Germany) with a heated electrospray ionization (HESI-II) source in positive mode.Extracts (2 μl) were separated on an Acquity C18 column (50 mm by 2.1 mm, 1.7 μm; Waters, Milford, MA, USA) at a flow rate of 600 μl min −1 using a binary solvent system (A: H 2 O plus 0.1% H 2 CO 2 ; B: acetonitrile plus 0.1% H 2 CO 2 ) and a linear gradient elution (5 to 100% B, 7 min) plus isocratic elution (100% B, 1 min).MS data were acquired in data-dependent acquisition mode, in which MS/ MS scans were performed on the three most intense ions detected during repeated full MS scans [full scans = 35,000 full width at half maximum (FWHM) at mass/charge ratio (m/z) 200; MS/MS scans = 17,000 FWHM; MS/MS isolation window width = 1 Da; and normalized collision energy = 15, 30, and 45 units].The MS was calibrated using a quality control (QC) mix containing caffeine, methionine-arginine-phenylalanine-alanine-acetate, SDS, sodium taurocholate, and Ultramark 1621 dissolved in acetonitrile/methanol/H 2 O plus 1% H 2 CO 2 .Temperate extracts were analyzed using a Waters Acquity UHPLC system coupled to a SYNAPT G2 MS (Waters, Milford, MA, USA) with a heated electrospray ionization (HESI-II) source in positive mode.Extracts (2.5 μl) were separated on an Acquity C18 column (50 mm by 2.1 mm, 1.7 μm; Waters, Milford, MA, USA) at a flow rate of 600 μl min −1 using a binary solvent system (A: H 2 O plus 0.05% H 2 CO 2 ; B: acetonitrile plus 0.05% H 2 CO 2 ) and a linear gradient elution (2 to 100% B, 6 min).MS data were acquired in data-independent acquisition mode, in which all precursor ions across the full mass range (85 to 1200 Da) were fragmented to yield MS/MS spectra.The MS was calibrated using a QC mix derived from all sample extracts.

Metabolomics data preprocessing
Tropical MS data were treated using MZMine (version 2.53) (71).Peak detection was performed using the centroid mass detector (MS noise = 1 × 10 4 ; MS/MS noise = 0) and the ADAP chromatogram builder [minimum scan group size = 5, min.group intensity = 1 × 10 4 , min.highest intensity = 5 × 10 5 , and m/z tolerance = 12 parts per million (ppm)] (72).Peaks were deconvoluted using the wavelets algorithm in ADAP with a signal to noise intensity window (SN) [S/N threshold = 10; min.feature height = 5 × 10 5 , coefficient/area threshold = 130, peak duration range = 0 to 5 min, and retention time (RT) wavelet range = 0.01 to 0.03 min].Isotopes were detected using the isotopes peak grouper (m/z tolerance = 12 ppm, absolute RT tolerance = 0.01 min absolute, and maximum charge = 2), and, where present, the most intense isotope was chosen.Peaks were filtered to retain features having an MS/MS scan and a RT of between 0.5 and 8 min, following which they were aligned using the join aligner method (m/z tolerance = 40 ppm, absolute RT tolerance = 0.2 min, m/z weight = 2, and RT weight = 1).Temperate MS data were treated using MS-DIAL (73).Peaks were detected with a minimum height of 300, and data were collected with an MS1 tolerance of 0.05 Da and an MS2 tolerance of 0.01 Da from 100 to 1200 Da and 0.5 to 12 min.Peaks were deconvoluted with a sigma window value of 0.5 and an MS/MS abundance cutoff of 50.Last, raw features were aligned using the QC mix with a retention time tolerance of 0.1 min.Following this, we separately clustered tropical and temperate metabolic features into sets of spectrally similar "consensus" features using molecular networks constructed on the Global Natural Products Social (GNPS; precursor and MS/MS fragment ion mass tolerances = 0.02 Da) (74,75).We retained edges with a cosine score of more than 0.7 and with more than six matched peaks, permitted edges that connected two nodes only if each node appeared in the neighbor's "top 10" most similar node list, and, where necessary, limited the total size of a cluster to the 100 highest scoring edges.Last, we converted peak height/area data to presence-absence data by summing peak heights/areas at the consensus metabolic feature level and expressing them as a one (present) or zero (absent) (tropical: N = 7343; temperate: N = 6682).

Metabolite annotation
We annotated all metabolic features with chemical information by in silico spectral matching (76) and taxonomically informed scoring (77) using a theoretically fragmented version of the LOTUS resource (78-80), retaining matches with a minimum score of 0.7 and at least six matched peaks.We assigned consensus metabolic features with a single consensus chemical classification, which we derived by taking the most common chemical classification across individual features contributing to a consensus feature.We filtered data to retain those features that were annotated with a metabolite identifier [i.e., SMILES notation; tropical: N = 3356 (46%); temperate: N = 2227 (33%)].We then derived the chemistry of annotated metabolites by using SMILES identifiers to query the Chemistry Development Kit (CDK) (27) with the package "rcdk" (81).The CDK comprises 51 chemical descriptors, but we focused on 21 chemical properties that describe metabolite constitution, geometry, and topology (table S1).We also derived values for annotated metabolites from the first three dimensions of the ChemGPS-NP tool (28), which is an existing framework for classifying natural products that separates molecules based on size, then aromaticity, and then lipophilicity (31).

Classical functional traits
All further data processing, analysis and visualization were undertaken in R (82).For core functionality, we used the R packages "cowplot" (83), "data.table"(84), "drake" (85), "furrr" (86), "future" (87), "parallel" (82), "psych" (88), and "tidyverse" (89) (see below for packages pertaining to specific steps).Plant species names were cleaned against the Catalogue of Life and Encyclopedia of Life using the global name resolver in the package "taxize" (90), following which we added a taxonomic hierarchy using "taxonlookup" (91).We removed 14 tropical (ferns, cycads, and conifers) and nine temperate (ferns and trees) species that were phylogenetically distinct from others within their respective datasets to prevent a small number of divergent species from overwhelming downstream analyses (fig.S2).We obtained data for classical functional traits with proven relevance to plant form and function from the open access version of the TRY database (version 5) (92)-namely, plant height (m), seed mass (mg), stem-specific density (mg mm −3 ), leaf area (mm 2 ), specific leaf area (mm 2 mg −1 ) and leaf carbon, and nitrogen and phosphorus concentrations (mg g −1 ) (7,16,93).After removing clear outliers (reported error risk ≥4; values falling outside of documented limits), we calculated species-wise means and gap-filled missing trait values for all species present in the dataset (N = 46,905 species) using a two-step process.First, we performed a Bayesian hierarchical probabilistic matrix factorization imputation [package "BHPMF" (94)], which is a taxonomically constrained gap-filling approach that has been regularly applied to the TRY database (7,24).The imputation was repeated 90 times, each time starting with a different combination of parameters (per-fold samples = 900 to 1000, cross-validation steps = 10 to 20, and burnin steps = 10% data length) and filtering out extreme (>1.5 times the maximum observed value for a trait) or highly uncertain (coefficient of variation of >1) gap-filled values.We then calculated means of remaining gap-filled values across all iterations and used these to replace missing cases in original trait data.Second, we performed five iterations of multivariate imputation by chained equations [package "mice" (95)] on the partially gapfilled dataset and replaced remaining missing cases with the mean values from all iterations.Following this, we filtered fully gap-filled datasets for selected tropical temperate species (see table S4 for details about missing data) using genus-level mean trait values where species-level data were not available (215 tropical species and 5 temperate species).Note that trait data coverage differed substantially between tropical and temperate species (table S4).While this undoubtedly reduced the strength of observed trait variation in tropical relative to temperate data, it is unlikely that the reduced resolution of the tropical data changed the nature of functional trait variation because artificially reducing the resolution of trait datasets to genus level did not change the resulting PC axes (fig.S3).Last, we accounted for non-normal distributions in plant height, seed mass, and leaf area data using log 10 transformations and summarized trait variation among tropical and temperate species separately using scores from the first two components of varimax-rotated PCAs performed on all traits (fig.S1).

Species distributions
We matched species names to Global Biodiversity Information Facility (GBIF) IDs using "rgbif" (29) and gathered all available species occurrences from the GBIF website (www.gbif.org;29 September 2020; GBIF occurrence download: tropical DOI = 10.15468/dl.824a9z, temperate DOI = 10.15468/dl.6wappm).Working with each species individually, we filtered occurrences to include human observations and living or preserved specimens recorded from the year 1945 to present day and accompanied by either no individual counts or counts with a reasonable value (>100) (96).We used the packages "CoordinateCleaner" (97) to discard records falling in the sea, outside the stated country, or within 10 km of capital cities, GBIF headquarters, or known biodiversity institutions; records lying more than five times the interquartile range of the minimum distances to the nearest neighbor; and records containing known issues with longitude/latitude information (i.e., zeros, rounding errors, or conversion errors from other coordinate systems) (96,97).Following this, we manually removed a further 20 tropical and 4 temperate records representing single occurrences within a continent, leaving a total of 239,223 and 1,251,500 cleaned occurrences for tropical and temperate species, respectively.We plotted cleaned occurrences to visualize the geographical extents of tropical and temperate plant species (fig.S1, A and B), although it is important to emphasize that samples for metabolomics analyses were taken from within natural (not ornamental) distributions.

Leaf metabolite chemistry
We collated all unique annotated metabolites detected in any tropical or temperate species into a single dataset capturing overall variation in the chemical properties of the leaf metabolome (N = 4292).We tested for interrelatedness between metabolite chemical properties using pairwise Pearson correlations coupled with hierarchical clustering (Ward; Fig. 1A, top) and estimated the number of discrete clusters of chemical properties using the maximal average silhouette width (Fig. 1A, dendrogram) (98).We described major dimensions of leaf chemical variation using a varimax-rotated PCA performed on a subset of five representative chemical properties [one from each cluster: C hybridization ratio, H-bond acceptor count, molecular weight, a-polarity (X log P), and aromatic atom count].The first three PC axes described 97% variation, which we anchored back to individual chemical properties using pairwise Pearson correlations (Fig. 1A, bottom).We tested whether the subset PCA was functionally equivalent to a full PCA performed on all 21 properties using two approaches.First, we performed pairwise Pearson correlations between subset versus full PC scores to determine whether the two PCAs separated metabolic features similarly on the same dimensions of chemical variation (Fig. 1B, left; see main text for statistics).Second, we used a Procrustes rotation test [package "vegan" (99); 999 permutations] to determine whether maximally rotated configurations of full versus subset PCAs were significantly (i.e., nonrandomly) associated (m 12 = 0.04, r 2 = 0.98, P < 0.001).We also confirmed that the similarity among subset and full PCAs was not sensitive to which chemical properties were selected from each cluster.This was achieved by repeating Procrustes rotation tests and PC score correlations for PCAs performed on every combination of five chemical properties that covered all five clusters and plotting distributions of resulting coefficients (fig.S5).We performed pairwise Pearson correlations between subset PC scores and values from the first three dimensions of the ChemGPS-NP framework (28) to test for correspondence between the subset PCA and existing descriptors of biologically relevant chemical space (Fig. 1B, right; see main text for statistics).We visualized chemical variation among metabolites using biplots of PC scores from the subset PCA, grouped by biochemical class (Fig. 2).We also illustrated the range of chemical structures captured by the five selected chemical properties by using the PubChem sketcher (https://pubchem.ncbi.nlm.nih.gov//edit3/index.html) to predict chemical graphs of metabolites with the lowest and highest values for each property (Fig. 1C).

Plant-level metabolic functional traits
We selected C hybridization ratio, H-bond acceptor count, molecular weight, X log P, and aromatic atom count to convert into five plant-level metabolic functional traits.For each species (i.e., unit of observation), we calculated the mean value of a chemical property considering all the metabolites present.In doing so, we generated datasets akin those capturing classical functional trait variation among tropical or temperate species but instead quantifying variation in the mean C hybridization ratio, H-bond acceptor count, molecular weight, a-polarity, and aromatic atom count of the annotated leaf metabolome among tropical or temperate species.We considered tropical and temperate datasets as separate entities for data analysis of trait variation, for four reasons.First, while in silico data treatment was identical for aligned MS data, sample collection and LC-MS measurements were not (see above), making it challenging to combine data on species-level metabolic variation without inflating differences between them.Second, unlike genomics and proteomics analyses, where only one type of molecule is isolated, untargeted metabolomics aims to simultaneously isolate thousands of molecules with a wide range of chemical properties (17,100).Technical decisions made during sample extraction and analysis discriminate for or against certain types of metabolites, which filters the resulting view of the metabolome.There is still no single accepted method for extracting samples or acquiring metabolomics data (101), so we chose here to use two datasets generated using different approaches to confirm that findings are not sensitive to the methods chosen.Third, classical functional trait and species occurrence data were sparser for tropical than temperate species (see the "Classical functional traits" section above), so we chose to keep datasets separate rather than artificially reduce the resolution of the temperate dataset.Last, maintaining a separation between the two datasets increased our interpretive power by providing the opportunity to characterize coupling between the metabolome and functional traits that held true not only within but also among tropical and temperate plant species.

Plant multidimensional space
Following (7), we explored constraints on the multidimensional space occupied by metabolic and classical functional traits separately for tropical and temperate species using an observed versus simulated hypervolume approach.Briefly, we calculated an n-dimensional convex hull volume [package "geometry"; (102)] representing the observed multidimensional trait space occupied by species, where n is the number of metabolic (n = 5) or classical (n = 8) functional traits.We compared the observed hypervolume to the same four simulated null hypervolumes used in (7), which tested the following null hypotheses: (i) Traits vary independently and have uniform distributions (i.e., each trait is a unique axis and no selection against extreme trait values); (ii) traits vary independently but have normal distributions (i.e., each trait is a unique axis and selection against extreme trait values); (iii) traits vary independently but distributions are as observed (i.e., each trait is a unique axis but no assumptions about selection on values); and (iv) traits covary as but have normal distributions (i.e., no assumptions about trait interdependence but selection against extreme values).We simulated null hypervolumes 999 times and used permutation tests [package "ade4"; (103)] to test for differences between the overall size of observed versus simulated hypervolumes and presented percent changes between observed and mean simulated hypervolume sizes (where positive = observed > simulated; table S2).We also calculated the "lumpiness" (i.e., aggregation around certain values) of species in multidimensional space by splitting n-dimensional space into 10 bins (i.e., 10 n cells), counting the number of species present in each cell, and calculating the minimum number of cells required to capture 10% of species (7).Again, we tested for differences between observed and simulated null hypervolumes using permutation tests and presented percent changes between observed and mean simulated hypervolumes (table S3).

Axes of leaf metabolic specialization
We characterized major axes of leaf metabolic specialization separately among tropical or temperate species using varimax-rotated PCAs performed on metabolic functional traits.The first two PCs explained 92% variation for both tropical and temperate species, respectively, which we displayed using biplots of PC scores (Fig. 3).We explored whether axes of leaf metabolic specialization (Fig. 3) were orthogonal to, or colinear with, axes of classical functional trait variation (Fig. 4) by doing varimax-rotated PCAs containing metabolic plus classical functional traits, again separately among tropical and temperate species.Four PCs were needed to explain 66 and 67% variation among tropical and temperate species, respectively, which we displayed in two ways.First, we created a biplot matrix of all pairwise combinations of the first four axes (fig.S4) to visualize the lack of overlap between metabolic and classical functional traits.Second, we performed Pearson correlations between PC scores and individual functional traits (Fig. 4, bottom panels).Last, we examined interdependence within and among metabolic and classical functional traits by performing pairwise Pearson correlations (Fig. 4, top panels), clustering correlation coefficients (hierarchical clustering; Ward), and identifying discrete clusters of traits using the maximal silhouette width (Fig. 4, dendrograms) (98).Dendrograms are organized on the basis of the outcome of a tanglegram that tested for correspondence between the clustering of tropical and temperate traits (Fig. 4, central connectors).

Supplementary Materials
This PDF file includes: Figs.S1 to S5 Tables S1 to S4 References

Fig. 1 .
Fig. 1.Variation in leaf metabolite chemistry can be captured by five chemical properties.(A) Coefficients for Pearson correlations among 21 chemical properties (tableS1) of identified annotated metabolites detected in tropical and temperate datasets (N = 4292).Properties are ordered on the basis of correlation similarity (hierarchical clustering; Ward), with colors separating distinct clusters of properties (maximal silhouette width: K = 5).The bottom panel shows correlations between chemical properties and scores from the first three axes of a PCA performed on a subset of five properties, as well as (B) between subset PC scores and either scores from a PCA performed on all 21 properties (left) or metabolite positions on the first three axes of ChemGPS-NP (31) space (right panel).Circle sizes indicate coefficient strengths, while fills discriminate between positive (black) and negative (white) correlations.(C) Chemical properties selected to represent each cluster, alongside chemical graphs of metabolites with the lowest and highest values for each trait.

Fig. 2 .
Fig. 2. Variation among and within eight metabolite families is described by five chemical properties.Biplots of PC1 scores and (A) PC2 scores or (B) PC3 scores from a PCA describing variation in five selected chemical properties (C hybridization, H-bond acceptors, molecular weight, polarity, and aromatic atoms) among all annotated leaf metabolites detected in 862 plant species (N = 4292; line ends: metabolites; centroids: means).The PCA is functionally equivalent to a PCA containing all 21 chemical properties (see main text), with axes separating leaf metabolites based on size (PC1), aromaticity (PC2), and a-polarity (PC3; see Fig. 1A, bottom).Colors separate metabolite families as indicated by marginal boxplot fills/ labels [center: median; box: 25 to 75% quantiles; whiskers: 1.5 × IQR (interquartile range); points: outliers beyond whiskers].

Fig. 3 .
Fig. 3. Metabolic traits describe two axes of leaf metabolic specialization among plant species.Biplots of PC1 and PC2 scores for PCAs describing variation in five metabolic functional traits (arrows) between (A) tropical (brown; N = 457) and (B) temperate (blue; N = 339) plant species.Axes separate species based on the mean aromaticity versus polarity (inverse X log P) and carbon bond saturation (PC1) and mean size/complexity (PC2) of all annotated leaf metabolites present (Materials and Methods).

Fig. 4 .
Fig. 4. Metabolic and classical traits describe unique dimensions of plant form and function.Correlation coefficients for pairwise Pearson correlations among metabolic (Fig. 3) and classical functional traits (fig.S1) in (A) tropical (N = 457) and (B) temperate (N = 339) species.Traits are ordered on the basis of correlation similarity (hierarchical clustering; Ward), with colors separating distinct clusters (maximal silhouette width: K = 6) and horizontal lines between matrices showing correspondence between tropical and temperate traits.Bottom panel shows correlations between traits and scores from the first four axes of a PCA of the same traits (tropical: 66%; temperate: 67%; LES, leaf economics spectrum).Circle sizes indicate coefficient strengths, while fills discriminate between positive (black) and negative (white) correlations.