INTRODUCTION
Breast cancer is a heterogeneous disease, associated with a large range of clinical outcomes within and across molecular subtypes. These disparities are mostly dependent on the distinct ability of individual cancer cells to metastasize to distant organs and to escape standard therapies (
1). Recent studies based on single-cell sequencing have shown that patient samples displayed varying degrees of heterogeneity (
2–
4) and that tumors with a high degree of intratumoral heterogeneity are often associated with poor outcomes because of their likelihood of containing aggressive and drug-resistant clones (
1). However, the dynamics of how seeding occurs in specific organs is still unclear, as are the molecular features of aggressive seeders.
A recent study using a large dataset of matched primary tumors and metastases suggested that cells disseminating early can lack metastasis-specific driver mutations and that the clonality of metastases depends on the organ of dissemination and the presence of treatment (
5). Several cellular and molecular processes may be accountable for the large degree of intra- and intermetastatic heterogeneity observed in breast cancer metastases. First, only a subset of cells present in primary tumors are able to spread to specific organs (
6–
8), and the selective growth of metastases is likely to be determined by an interplay between the intrinsic characteristics of metastatic clones and the properties of the metastatic niche. Second, disseminated cells can genetically evolve independently of primary tumors, a process known as parallel progression (
9), and the genomic, transcriptomic, and epigenetic characteristics of metastases might be influenced by the tumor microenvironment and by standard therapies (
5,
10,
11). Last, metastatic clones can interact within their niche, where both monoclonal and polyclonal seeding have been described (
5,
10,
12,
13). These cellular interactions may confer a survival advantage or disadvantage to cancer cells, depending on whether clonal cooperation or competition is involved (
14).
Understanding the clonal network that underpins tumor spread in different organs is of great clinical interest but remains a challenge when using patient samples. Tumor biopsies are often collected from patients who have previously received multiple rounds of treatment, which is likely to affect the clonality of metastases (
5). Furthermore, needle biopsies may not always provide a complete representation of the overall tumor and metastatic spread. Mouse models in which metastases can be isolated from whole organs are a suitable alternative. In this context, single-cell tracking methods are emerging as a powerful tool to disentangle the complexity of intratumoral heterogeneity, as they provide a unique opportunity to track the fate of individual cancer cells and study the spatiotemporal evolution of their progenies. One such approach, based on genetic barcoding, allowed the detection of cancer cells with differential abilities to extravasate, survive, and grow in different organs. As an example, barcoding of cancer cells from the breast using libraries of thousands to millions of genetic tags has demonstrated that cell lines and patient-derived xenografts (PDXs) contain multiple clones of cancer cells with a wide range of characteristics that confer different cellular fates following orthotopic engraftment (
15–
19). Unexpectedly, primary tumors and metastases were highly heterogeneous despite having different evolutionary life spans and were found to contain both minor and dominant clones at varying frequencies (
15–
18).
Recently, an optical barcoding approach based on a multicolor panel of Lentiviral Gene Ontology (LeGO) vectors was developed to enable better spatiotemporal tracking of individual clones within heterogeneous populations of tumor cells (
20–
23). Despite the limited number of optical barcodes available, a key advantage of this approach over genetic barcoding is the ability to image, quantify, and sort labeled populations by microscopy and flow cytometry at the single-clone level for further molecular characterization. Cell transduction with specific viral titers results in cells expressing specific combinations of fluorescent tags, generating up to 63 colors and allowing the long-term in vitro tracking of labeled cells and their progeny (
22,
24). However, the number of colors that can be used to track cancer cells in vivo is limited. In human cells, the combination of the three LeGO colors red-green-blue (RGB) resulting in seven colors was used to study the relapse of primary tumors in head and neck squamous cell carcinoma (
25) and the spatiotemporal distribution of clones in colorectal primary tumors (
26). However, increasing the number of color combinations that can be detected and quantified in vivo is a challenge because of the spectral overlap of fluorescent tags.
In this study, we used optical barcoding to explore metastatic heterogeneity in the context of triple-negative breast cancer (TNBC), as this molecular subtype is associated with poor prognosis and a higher risk of recurrence in the 5 years following diagnosis, compared to other subtypes (
27). Mounting evidence suggests that clonal heterogeneity plays an important role in metastatic growth in this subtype (
18,
28), with high rates of metastatic spread to visceral organs, such as lungs and liver (
29,
30). To investigate the way TNBC cells interact and adapt to these organs, we optimized the LeGO technology to visualize and quantify in vivo 31 “barcoded clones” (defined here as subpopulations of cells arising from cancer cells and their progenies barcoded with a specific set of fluorescent tags, regardless of their genomic profile). Our results indicate that this cell tracking strategy allows us not only to compare the dynamics of metastatic seeding between organs using high-resolution imaging but also to identify new site-driven gene expression signatures associated with lung and liver metastases.
DISCUSSION
We used optical barcoding to accurately detect, visualize, and quantify minor and dominant barcoded subclones across different metastatic organs in vivo. We found that specific subclones can display different behaviors in two clinically relevant metastatic sites, lungs and liver, in terms of clonal dynamics and transcriptomic profile. Overall, our results suggest that the fitness of the barcoded subclones and their ability to colocalize with other subclones within individual metastases were influenced by the nature of both the subclones and the metastatic niche. We found that the barcode repertoire detected in the lungs is highly heterogeneous compared to the liver, not only at the level of inter-organ heterogeneity, as previously shown (
12,
16–
18), but also at the level of intrametastatic heterogeneity. By mapping the interaction of the barcoded subclones within individual lung and liver metastases, we found that lung metastases were highly polychromatic compared to liver metastases. While abundant subclones were frequently colocalized with those that were less abundant in lung metastases, this was not the case in the liver. This suggests that the tissue-specific tumor microenvironment may dictate the degree of clonal heterogeneity of the corresponding metastases. The relative abundance and colocalization patterns of barcoded subclones within lung and liver metastases were highly reproducible between animals, indicating that these processes were not random. We also confirmed that the difference in polyclonality between lung and liver metastases observed using the barcoded MDA-MB-231 xenografts was corroborated in PDX models. The presence of both monochromatic and polychromatic lung metastases has been previously described in the context of breast cancer (
12,
33). Furthermore, a recent analysis of metastases collected from patients confirmed that untreated metastases in the liver were mostly monoclonal (
5). In this study, however, the number of treatment-naïve lung metastases was insufficient to study differences in polyclonality between lung and liver metastases. While further investigations are required to precisely determine the timing of polyclonal seeding and its importance in metastatic outgrowth, polyclonal metastases are more likely than monoclonal metastases to contain clones capable of surviving standard therapies (
1,
5).
Several mechanisms have been reported to underpin polyclonal seeding in lung metastases, including circulating tumor cell (CTC) clustering and active corecruitment of subclones to distant sites (
42). Although still controversial, the presence of CTC clusters is thought to be associated with poor prognosis (
43,
44), metastasis polyclonality (
12,
33), and increased lung tropism of metastases (
12,
44–
46). In addition, metastatic cells have been shown to compete or cooperate in lungs (
14,
47,
48). Recruitment mechanisms include interleukins (ILs) and growth factors such as IL-11 and vascular endothelial growth factor D, recently shown to favor the polyclonal growth of metastases (
28,
48,
49). In the future, detailed longitudinal studies involving CTC detection, time course analysis of metastasis “patchiness,” and molecular characterization of microenvironmental factors capable of fostering polyclonal metastasis recruitment or growth may fully elucidate the metastasis seeding process in breast cancer models. Regardless, our results indicate that these processes are unlikely to occur in the liver where the vast majority of macrometastases is monochromatic, although the barcoded subclones are the same as those found in the lungs. This disparity may be due to the anatomy of each organ, including differences in the architecture of the vasculature, resulting in different dynamics of extravasation between both organs (
50). Alternatively, this difference could be due to the nature of the tumor microenvironment itself, inducing changes in the transcriptome, proteome, epigenome, and secretome of metastatic cells, resulting in differences in the clonal seeding of metastases.
Here, the optical barcoding system allowed us to identify the transcriptomic changes induced by the microenvironment on metastatic subclones. We derived a niche-associated signature, distinguishing genes differentially expressed in lung versus liver metastases. We confirmed that this signature was sufficient to distinguish samples freshly collected from the lungs and liver of three patients with metastatic breast cancer, showing that its relevance extends to human disease. Some of the genes making up this signature [such as
inhibitor of DNA binding 1 (ID1), c-x-c motif chemokine ligand 1 (CXCL1), and
angiopoietin like 4 (ANGPTL4) in the lungs] were identified previously using serial passages of lung metastases to select clones from the same cell line that have a preferential tropism for this organ (
6–
8). The site-specific gene signature that we derived from barcoded subclones that metastasized to both liver and lungs was conserved after the lung metastasis retransplantation experiment, indicating that these genes represent a feature of niche adaptation by fundamentally plastic cells. Furthermore, genes from this signature were also differentially expressed in normal lung and liver tissues from healthy donors, suggesting that the transcriptional programs selectively displayed by MDA-MB-231 and patient’s breast cancer cells in given metastatic locations shared similarities with those of the microenvironment in which they are located. This indicates that tumor cells have the ability to adapt their gene expression landscape in response to their microenvironment, a feature reminiscent of organ mimicry.
Among these mechanisms of cellular adaptation, we found that the TNFα pathway was most significantly up-regulated in lung compared to liver metastases. The dysregulation of this pathway was confirmed in both retransplantation and autopsy datasets. Some of the genes involved in these pathways may be responsible for the cellular interactions, survival, polyclonality, and proliferation specifically observed in lung metastases.
c-x-c motif chemokine ligand 8 (CXCL8), IL-11, and
IL-6, for instance, which are known targets of the TNFα and inflammatory response pathways and were significantly up-regulated in lungs according to our BSVTK feature set, have previously been shown to foster polyclonal seeding and metastatic growth in lungs (
14,
28,
48,
51,
52). While our results suggest that the involvement of these specific genes may be model dependent, the up-regulation of the TNFα and inflammatory response pathways in lungs was confirmed in autopsy samples and could be due to the presence of TNFα in the lung microenvironment. Although etanercept-mediated TNFα inhibition did not have any significant impact on the survival of the subclones, it significantly decreased the diversity of metastases in lungs. This effect might be mediated by changes in the tumor microenvironment and/or may be directly attributed to changes in the transcriptome of the barcoded subclones. On the other hand, the inhibition of IAPs using birinapant, known to induce TNFα-dependent apoptosis, killed both lung and liver metastases. From a pharmacological point of view, both TNFα inhibitors and SMAC mimetics are likely to be used in the clinic. Anti-TNFα drugs have been proposed to be used in combination with immunotherapy (
53). Because the role of TNFα as an anti- or procancer cytokine is still a matter of debate [as reviewed in (
54)], it will be important to precisely characterize its function in the context of tumor immunology and heterogeneity. Last, birinapant has shown some efficacy in targeting TNBC primary tumors (
55), and our results now suggest that this compound could also be used to target metastases.
In conclusion, we have used an innovative optical barcoding approach to longitudinally monitor the clonal makeup of developing metastases and to identify a set of cellular and molecular features that are directly regulated by the metastatic niche. It will be interesting to investigate whether some of the pathways unraveled by this gene set may contribute to the differential sensitivity of metastatic cells to given drugs. This analysis pipeline holds promise for investigations of the impact of other targeted therapies on clonal cooperation and tumor survival across different microenvironments.
MATERIALS AND METHODS
Cell culture
The human cell line MDA-MB-231 was purchased from the American Type Culture Collection (ATCC; #HTB-26). Cells were maintained in RPMI 1640 medium with Hepes (#22400089, Thermo Fisher Scientific) supplemented with 10% fetal bovine serum (FBS) (Moregate Biotech) and penicillin-streptomycin (10,000 U/ml) (#15140122, Thermo Fisher Scientific) at 37°C in a humidified atmosphere with 5% CO2.
Lentivirus production, purification, and titration
Lentiviruses were produced by transient cotransfection of human embryonic kidney–293T cells with the LeGO vector of interest and the packaging plasmids (LeGO-EBFP2, Addgene, #85213; LeGO-S2, Addgene, #85211; LeGO-V2, Addgene, #27340; LeGO-T2, Addgene, #27342; LeGO-dKatushka2, Addgene, #85214; pCMVR8.74, Addgene, #22036; pMD2.G, Addgene, #12259) (
56), using a FuGENE HD transfection reagent (#E2311, Promega) following the manufacturer’s recommendations. The transfection and virus production were performed in serum-free Opti-MEM (#31985070, Thermo Fisher Scientific). After 48 hours, the supernatant of the cells was collected and concentrated using an Amicon Ultra-15 100-kDa centrifugal filter unit (#UFC910024, Merck). The virus titer was then measured using the fluorescence titrating assay and the formula TU/ml = (Number of cells transduced × % fluorescent)/(Virus volume in ml).
Establishment of the BSVTK-labeled MDA-MB-231 population
Human MDA-MB-231 breast cancer cells (passage 39 from ATCC) were infected with a mixture of five lentiviruses coding for eBFP2, tSapphire, Venus, tdTomato, and dKatushka2. The combination of the five different fluorophores resulted in 31 different colors. Viral titration was predetermined and normalized for each virus to achieve 60 to 75% of positive cells. The cells were harvested and sorted by flow cytometry, 48 hours after the infection. For each color, 100 cells were sorted, and the mixture of the 31 color-coded populations (3100 cells in total) was plated in a 96-well plate [day 0 (D0)]. As the frequency of MDA-MB-231 clones able to engraft in NSG mice was shown to vary between 1 of 7 and 1 of 3333 (
15), we sorted 100 cells per color to increase the purity of individual color-coded populations while maintaining a low likelihood of having multiple “genetically diverse subclones” of the same color successfully engrafting in vivo. This population was then expanded for 18 days (D18) and sorted by flow cytometry but only on the basis of positivity this time to ensure the absence of negative cells. After this second sort, 180,000 cells were plated in a six-well plate and expanded for 10 days (D28) before being frozen into several aliquots to allow multiple experiments to be completed using the same stock. A portion of the cells was kept in culture to observe the population drift.
Cell proliferation and viability assays
MTT assay or CellTiter-Glo was used to determine the proliferation of the cells. For both assays, 1500 cells were seeded per well in a 96-well plate in 100 μl of medium. To avoid the “edge effect” and evaporation, peripheral wells were filled with 200 μl of Dulbecco’s phosphate-buffered saline (DPBS). For each condition, three technical replicates were prepared. For the MTT assay, 10 μl of MTT (5 mg/ml) (#M5655, Sigma-Aldrich) in DPBS was added to each well and incubated for 4 hours at 37°C. To solubilize formazan crystals, 100 μl of solubilization buffer (0.1 N of hydrochloric acid in isopropanol) was added to each well, followed by 1 hour of incubation at room temperature. After homogenization, plates were analyzed with the EnSight Multimode Plate Reader by measuring the absorbance at 590 nm. For the CellTiter-Glo assay (#G7571, Promega), 50 μl of CellTiter-Glo lysis buffer were added to the cells, and the plates were incubated at room temperature with gentle mixing for 15 min. The luminescence was measured using the EnSight plate reader.
Colony formation assays
MDA-MB-231 parental cells and BSVTK-labeled MDA-MB-231 cells were plated at low density (1000 cells per well in six-well plates) with three technical replicates per experiment. After 1 week of incubation, the colonies were stained with crystal violet overnight at 4°C. After wash steps, the plates were scanned and colonies manually counted using FiJi software. For soft agar colony formation, 3 ml of bottom agarose layer, containing 0.75% agarose (#50101, Lonza) in RPMI medium (#23400-021, Thermo Fisher Scientific) was supplemented with FBS and penicillin-streptomycin. Once the bottom layer was solidified, 25,000 MDA-MB-231 parental or LeGO cells were mixed in top agarose layer (0.45% agarose) and seeded on top of the bottom layer. The wells were covered with 3 ml of culture media and incubated 2 weeks at 37°C in a humidified atmosphere with 5% CO2. After 2 weeks, 140 μl of MTT (5 mg/ml) in PBS was added into each well, and the plates were incubated for 4 hours. Last, the plates were scanned and manually counted using FiJi software.
Establishment of the BSVTK-labeled PDX cells
PDX-0066 was established at the Olivia Newton-John Cancer Research Institute (ONJCRI) from a malignant pleural effusion from a patient with breast cancer 1 (BRCA1)–mutated breast cancer, previously treated with Nab-paclitaxel. KCC-P-4295 model, generated from a drug-naïve TNBC primary tumor at the Kinghorn Cancer Centre and Garvan Institute, was obtained from BROCADE. The study was approved by the Austin Health Human Ethics Research Committee. Single-cell suspensions from early-passage PDXs (passage 3 for PDX-0066 and passage 4 for PDX KCC-P-4295) were prepared as described below. The cells were plated at the density of 300,000 cells per well in a 24-well flat-bottom ultralow attachment plate (#734-1584, Corning) and were kept in culture for 48 hours in 300 μl of mammosphere media containing Dulbecco’s modified Eagle’s medium F12 (#10565042, Thermo Fisher Scientific), B-27 supplement 1X (#17504001, Thermo Fisher Scientific), penicillin-streptomycin (100 U/ml) (#15140122, Thermo Fisher Scientific), insulin (5 μg/ml) (#11376497001, Sigma-Aldrich), hydrocortisone (1 μg/ml) (#H0396-100MG, Sigma-Aldrich), heparin (0.8 U/ml) (#H0878-100KcU, Sigma-Aldrich), basic fibroblast growth factor (20 ng/ml) (#01-106, Merck-Millipore), and epidermal growth factor (20 ng/ml) (#E9644, Sigma-Aldrich) in the presence of the BSVTK viruses. Cancer cells were then sorted before transplantation in NSG mice, as described below.
Mammary fat pad transplantation of BSVTK-labeled cancer cells
The BSVTK MDA-MB-231 population was thawed and expanded for 3 days before injection of 100,000 cells into the exposed fat pad from the fourth mammary gland of 7-week-old NSG female mice. The cells were resuspended in 10 μl of injection buffer: 25% growth factor–reduced Matrigel (#734-1101, Corning) 42.5% DPBS (#14190144, Thermo Fisher Scientific), 30% FBS (Moregate Biotech), and 2.5% Trypan blue (Bio-Rad Laboratories) for injection. For the retransplantation experiment, barcoded subclone 2 was isolated from the lungs of a mouse reaching ethical end point after being initially injected with the BSVTK population. Cells were sorted three times to ensure the purity of the population, and 10,000 cells were injected into the mammary fat pads of three NSG mice. For PDX-0066, 150,000 fluorescent cells were transplanted in the mammary fat pads of three mice. For PDX KCC-P-4295, 20,000 labeled cells were injected per mammary fat pad in two recipient mice. All animal procedures included in this manuscript were approved by the Austin Animal Ethics Committee.
Mouse monitoring, tumor resection, and organ collection
Tumor volume was measured twice weekly using electronic calipers. The tumor volume was estimated using the following formula: (minimum diameter)2 × (maximum diameter)/2. Tumors were resected when they reached a volume of about 100 mm3. For all experiments, the resection was completed approximately 28 days after injection. When the tumor was resected, it was cut into two pieces; one half was processed for imaging, and the other half was used for single-cell suspension preparation. The weight of the mice was also measured twice weekly or daily once they lost 10% of their body weight. When the weight loss reached 15%, the animals were euthanized by CO2 excess. To limit the background autofluorescence of tissues for imaging, the animals were transcardially perfused with a freshly made, ice-cold solution containing 0.45% NaCl (Sigma-Aldrich) and 10 U of heparin (Sigma-Aldrich) per milliliter in Milli-Q water. For imaging, three slices of 2-ml thickness were manually cut in three lobes of lungs and three lobes of liver and immediately put in freshly made, ice-cold 4% paraformaldehyde (#28908, Thermo Fisher Scientific) in DPBS. The rest of the organs were kept in ice-cold DPBS before single-cell suspension preparation and flow cytometry analysis.
In vivo drug testing with etanercept and birinapant
Mice were injected with the BSVTK MDA-MB-231 population (as described above), and the primary tumors were resected when they reached 100 mm3 before the treatment. For the etanercept experiment (two independent experiments), the control group received saline as vehicle; the treated group received etanercept (Enbrel) (10 mg/kg), three times weekly until the end of the experiment. For the birinapant experiment (one experiment), the treated group received birinapant (#S7015, Selleck Chemicals) at the dose of 30 mg/kg, three times weekly until the end of the experiment. The mice were euthanized when the control groups died from metastatic disease. Metastases were analyzed by imaging and flow cytometry, as described below.
Single-cell suspension preparation
The tissues were manually chopped into small pieces (about 1 mm by 1 mm) and resuspended in the digestion medium: collagenase IA (300 U/ml) (#C9891, Sigma-Aldrich), hyaluronidase (100 U/ml) (#H3506, Sigma-Aldrich), and deoxyribonuclease I (DNase I) (100 U/ml) (#LS002139, Worthington) in RPMI 1640 medium with Hepes (#22400089, Thermo Fisher Scientific). The tumor was resuspended in 5 ml of digestion medium, the lungs in 5 ml of digestion medium, and the liver in 10 ml of digestion medium. Samples were first incubated for 25 min at 37°C with agitation, then passed through an 18-gauge (G) needle three times, and incubated for another 25 min at 37°C with agitation before being passed through a 20G needle three times. The cell suspension was then filtered through a 70-μm cell strainer and spun down for 5 min at 500g.
Sample preparation for confocal imaging
Fixed samples were washed in DPBS and embedded in 3% low–melting point agarose (#1613111, Bio-Rad Laboratories). Sections (200 μm) were cut using a vibratome (#VT1000S, Leica Biosystems) and mounted on slides (#J1800AMNT, Thermo Fisher Scientific) with high-performance coverslips (#474030-9000-000, Carl Zeiss Microscopy) and antifade mounting medium (#P36961, Thermo Fisher Scientific). Images were acquired using LSM 880 (Carl Zeiss) and LSM 980 (Carl Zeiss) confocal microscopes equipped with 20×/0.8 objectives. Online fingerprinting was used to spectrally unmix the BSVTK fluorophores with reference spectra acquired in cancer cells expressing a single color only. The entire section was acquired as a tile scan, with a resolution of 0.47 μm in xy and 3 μm in z.
Flow cytometry
All samples were sorted/analyzed using a BD FACSAria III cell sorter with BD FACSDiva 8.0.2 software (Becton Dickinson and Company, BD Biosciences, San Jose, CA, USA), equipped with four lasers and capable of detecting up to 17 parameters. Samples were analyzed using the following excitation lasers and emission filters: eBFP2 (405 nm; 450/40), tSapphire (405 nm; 510/50), Venus (488 nm; 530/30), propidium iodide (488 nm; 695/40), tdTomato (561 nm; 582/15), and Katushka (561 nm; 670/14). For sorting, a 100-μm nozzle tip was used with a sheath pressure of 138 kPa and a drop drive frequency of 30 kHz. BD FACSFlow (BD Biosciences, San Jose, CA, USA) sheath fluid was used. The sample and collection tubes were maintained at 4°C.
Subclone identification and measurement in the imaging dataset
First, a 3D median filter (with radius in
xy = 8 and radius in
z = 2) was applied to all channels to reduce the noise and homogenize the intensity. A manually determined hysteresis thresholding was then applied to the five channels, and clone identity was reconstructed according to the signal combination of the five channels. Two thresholds were set, one to determine signal voxels with high confidence and one to determine signal voxels with medium confidence—i.e., above background noise—that may belong to residual noise or signal. The hysteresis procedure will consider medium-confidence voxels as positive if they are connected to high-confidence voxels. Medium-confidence voxels not connected to high-confidence voxels were labeled as background. 3D quantification of subclones was performed using algorithms and tools from the 3D ImageJ suite (
57). For untreated mice, structures with a volume of less than 1000 μm
3 were discarded, this volume being too small to correspond to more than one cell. For the etanercept-treated and control experiments, structures with a volume of less than 100,000 μm
3 were discarded to focus on macrometastases. Individual metastases were defined as populations of directly connected voxels within the dataset. Volumes of each subclone within each metastasis were then computed. Images were stored within an OMERO database (
58); processing and analysis were then automated using a custom-built analysis suite called towards an automated processing and analysis system (TAPAS) (
59). Results were validated by two observers.
Interaction analysis in the imaging dataset
An “interactome” was obtained by computing how often specific pairs of subclones were detected within all metastases. We also computed a “dominant” subclone for each interaction by comparing volumes of the two subclones within the metastasis. Analyses were completed using R software (
60), and interactome visualizations were completed within Cytoscape (
61).
Histology
Samples were fixed in 4% paraformaldehyde for 12 hours before embedding in paraffin. Sections were then stained with hematoxylin and eosin and scanned using an Aperio AT2 (Leica Biosystems).
Single-cell capture for scRNA-seq
To analyze the heterogeneity of the parental MDA-MB-231 population, a cell suspension of MDA-MB-231 was prepared using the TrypLE Express Enzyme (#12604021, Thermo Fisher Scientific). Using a Chromium controller machine (#120223, 10x Genomics) and a Chromium single-cell gel bead kit (#1000075, 10x Genomics), 10,000 cells were captured in gel bead in emulsion. The complementary DNA was then isolated, and the library was prepared as recommended by the manufacturer for the 3′ V3 kit (#1000078, 10x Genomics). Sequencing runs were completed on a NextSeq machine (Illumina) with the following specifications: Read1, 28 cycles; Read2, 91 cycles; Index Read, 8 cycles.
To verify the purity of the metastatic barcoded subclones and to investigate whether they were genomically distinct, subclones 13, 2, 9, 29, and 3 were FACS-purified from lung metastases. The number of cells sorted for single-cell analysis varied between the subclones on the basis of their frequency. For the dominant subclones 13, 2, and 9, 20,000 cells were sorted. For the minor subclones 29 and 3, 7000 and 1000 cells were sorted, respectively.
Identifying genomic heterogeneity using scRNA-seq and inferred CNVs
To investigate the genomic heterogeneity of the original MDA-MB-231 cell line, we (i) probed cells with single-cell mRNA sequencing, (ii) identified cell clusters on the basis of their transcriptional profiles, and (iii) inferred copy number profiles in clusters. For cell clustering, the software suite Seurat3 was used (
62). Briefly, the raw reads were aligned to the hg38 reference genome (
63), and the transcript counts were inferred using CellRanger (10x Genomics: Resolving biology to advance human health, 10x Genomics;
http://10xgenomics.com) with default parameters. The postprocessing analyses were conducted using Seurat3. Droplets were excluded from further analyses if including less than 200 detected transcripts or if they included a fraction of mitochondrial derived reads higher than 0.1. Cell cycle state was inferred for each cell using the CellCycleScoring function. The transcript abundance was adjusted for cell cycle stage and for mitochondrial sequences. PCA was used to reduce the dimensionality of the data. Cells were assigned to clusters using the shared nearest neighbor (SNN) method (
64) using a resolution of 0.7. For copy number inference, the algorithm inferCNV (infercnv, Bioconductor;
https://bioconductor.org/packages/release/bioc/html/infercnv.html) (
65) was used with the parameters scale_data = TRUE and cutoff = 0.1, providing as input the raw transcript abundance matrix as well as cell cluster labels. The dataset gencode_v21 was used as reference transcript location within the genome. The scRNA-seq dataset (GSE113197) including material from four noncancerous epithelial samples was used as a benign reference for inferCNV inference (
66,
67).
To investigate the genomic purity of the five barcoded subclones, we (i) probed cells with single-cell mRNA sequencing, (ii) aligned the sequencing reads to the hg38 reference genome and the transcript counts were inferred using CellRanger with default parameters as described above, (iii) inferred the copy number profile for each cell using inferCNV, and (iv) defined five cell clusters (as five clones were sequenced) on the basis of hierarchical similarity.
mRNA extraction and bulk RNA-seq
mRNA was extracted from sorted cells using an miRNEasy kit (#217084, Qiagen) according to the manufacturer’s recommendations. Briefly, mRNA was isolated using a guanidinium thiocyanate-phenol-chloroform extraction approach. The mRNA is then bound to an exchange column where the remaining genomic DNA was digested by the RNase-Free DNase (#79254, Qiagen) on the column. The RNA was finally washed and eluted in water. The quantity and the quality of the isolated mRNA were assessed using the TapeStation (4200 TapeStation System, Agilent), and 100 ng of the mRNA was used as an input for the library preparation using the TruSeq RNA Library Prep Kit (Illumina) according to the manufacturer’s recommendations. The indexed libraries were pooled and diluted to 1.5 pM for paired-end sequencing (2 × 81 cycles) on a NextSeq 500 instrument using the v2 150 cycle high output kit (Illumina) as per the manufacturer’s instructions. The base calling and quality scoring were determined using Real-Time Analysis on board software v2.4.6, while the FASTQ file generation and demultiplexing used bcl2fastq conversion software v2.15.0.4.
Collection of autopsy samples
Patients were recruited by their oncologists to the BROCADE rapid autopsy program (
www.petermac.org/research/research-cohort-studies/brocade), approved by the Human Research Ethics Committee of the Peter MacCallum Cancer Centre, Melbourne. Patients with advanced breast cancer provided written informed consent. Lungs and liver metastases were collected soon after death for three patients (table S3), and the mRNA was extracted and sequenced as described above.
Analysis of NFκB activation using subcellular fractionation and Western blotting
Metastatic cells were sorted from the lungs and liver as described above. The subcellular fractionation experiments were performed using the subcellular protein fractionation kit for Cultured Cells (#78840, Thermo Fisher Scientific) according to the manufacturer’s instructions. The cytoplasmic and nuclear fractions were then analyzed by Western blotting using the following primary antibodies: anti-p65 (#8242, Cell Signaling Technology), anti-specificity protein 1 (SP1) (#9389, Cell Signaling Technology), and anti–glyceraldehyde phosphate dehydrogenase (GAPDH) (#5174, Cell Signaling Technology). The bands were quantified using Image Lab (V6.0.1, Bio-Rad Laboratories), and the relative nuclear expression of p65 was determined by comparing the liver nuclear expression to the lung nuclear expression, normalized for SP1 expression.
Comparing the transcriptome of barcoded subclones at single-cell resolution
To demonstrate intrasubclone homogeneity and intersubclone heterogeneity, we clustered scRNA-seq from five barcoded subclones (2, 3, 9, 13, and 29) and performed differential expression analysis on pseudo-bulked samples to derive a gene signature. The dimensionality of the log-normalized gene counts was reduced, and to pseudo-bulk each subclone, reads were aggregated across cells into pseudo-bulk samples for each K-nearest neighbor (KNN) cluster. KNN clusters were identified separately for each barcoded subclone (
k = 20) by applying the walktrap method (
68) to SNN graphs. Subclone pseudo-bulk samples were filtered and normalized using edgeR (
69), where samples with low library sizes were discarded or merged and genes required a logCPM greater than or equal to 0.5 in at least 10% of pseudo-bulk samples. As above, differentially expressed genes were identified with
P value and false discovery rate (FDR) thresholds of 0.5, Benjamini and Hochberg (BH) correction, and using a generalized log-linear model. To apply the feature set, log-normalized counts were subset to these differentially expressed genes and visualized using a t-SNE. In
Fig. 2D, genes from the TNF signaling pathway (KEGG) and TNFα signaling by NFκB (hallmark) were used as a feature set. In fig. S7E, genes that are differentially expressed between barcoded subclones 13 and 29 (table S1) were used as a feature set.
Data processing for the LeGO, autopsy, and GTEx datasets
Read pairs were filtered using cutadapt (
70), mapped to the GRCh38 genome using STAR (
71), and quantified using featureCounts (
72). Genes without a current ensemble gene ID were removed. Preprocessing was completed using edgeR (
69), including gene filtering and summing over technical replicates. However, for the autopsy data, genes were only retained if they achieved a count per million (CPM) of 0.5 in at least four samples. Batch effect correction to remove the mouse effect (for the xenograft experiments) was performed using limma (
73), and counts were converted to logCPM with a prior count of 2 for visualization. The GTEx count data downloaded from
https://gtexportal.org/home/datasets were log-transformed and subset to liver and lung samples according to the associated annotation files.
Statistical analyses of the LeGO and autopsy datasets
Three separate subsets of the LeGO dataset were processed and analyzed in parallel, according to the same methodology. The first dataset included all the LeGO primary and metastatic samples, another dataset containing only the lung and liver metastases, and another for the metastases from the retransplanted subclone 2 tumors. Differential expression analyses were completed in edgeR (
60), all of them fitting a generalized log-linear model to uncorrected data using the glmQLfit function.
P values were adjusted using BH correction and were filtered using 0.05 thresholds for both
P value and FDR. Data were scaled for the heatmap visualizations, and mitochondrial genes were excluded.
Competitive gene set enrichment tests were performed in limma (
73) using the hallmark gene set collection from MSigDB (
74). Here, FDR and
P value thresholds of 0.05 were also used for each of the two-group comparisons, after excluding genes that did not map to Entrez Gene identifiers.
Overrepresentation analysis was used to identify GO terms from the “biological process (BP)” subontology that were enriched in three differential expression analyses. Using the same model, primary tumors were compared to lung and liver metastasis separately, and then the metastases were compared to each other. Enriched terms were visualized for the genes up-regulated in the two metastases compared to the primary tumors (636 genes for lungs and 314 for liver), as well as the differentially expressed genes between the metastases (464 for lung and 656 for liver). The GO enrichment plots were generated using the clusterProfiler (
75) by applying BH correction,
P value, and
q value cutoffs of 0.05. The minimum and maximum number of genes in a gene set to be tested were 10 and 500, respectively.
Applying the BSVTK feature set to other datasets
The BSVTK feature set was derived using the second data subset, comparing only lung (650 up-regulated genes) and liver (716 up-regulated genes) metastases, according to the differential expression analysis methods above. To demonstrate that the differentially expressed genes between liver and lung metastases can result in a separation of lung and liver samples in other datasets, a PCA was performed on these data subsets to the BSVTK feature set (1366 genes). This was applied to the retransplantation and autopsy datasets after batch effect correction (logCPM with a prior count of 1) and was also applied to the lung and liver samples from the GTEx (
34) dataset.
Acknowledgments
We are extremely grateful to the donors and their families for choosing to donate samples to the BROCADE program and the ONJCRI-Breast Cancer Biobank. We thank C. Bell, N. Lalaoui, D. Liew, and C. Shembrey for insightful discussions and technical support. We are grateful for the support of the BROCADE Rapid Autopsy Program, funded by the Australian National Breast Cancer Foundation (NBCF) under infrastructure grant IF-14-001. We also acknowledge the Victorian Institute of Forensic Medicine that conducted the rapid autopsies; the Peter MacCallum, Kinghorn Centre, Garvan Institute, and Tobin Brothers Funeral Directors for providing transport of donors; and L. Devereux, who manages BROCADE and who helped with the sample collection and processing. We acknowledge the Kinghorn Cancer Centre and Garvan for the KCC-P-4295 model. The authors and the Olivia Newton-John Cancer Research Institute gratefully acknowledges the generous support of the Love Your Sister Foundation. We acknowledge the ACRF Centre for Imaging the Tumour Environment at the Olivia Newton-John Cancer Research Institute for providing microscopy support. The Olivia Newton-John Cancer Research Institute acknowledges the support of the Operational Infrastructure Program of the Victorian Government.
Funding: D.M., F.H., and B.P. are supported by the NBCF (Investigator Initiated Research Grant IIRS-19-082). D.M. is supported by Susan G. Komen and Cancer Australia (CCR19606878). F.H. is supported by the National Health and Medical Research Council of Australia (grant no. 1164081) and by a Senior Research Grant from the Tour de Cure Foundation. D.M., M.J.D., and B.Y. are supported by the Grant-in-Aid Scheme administered by Cancer Council Victoria. D.M., B.Y., and R.L.A. are supported by the Love Your Sister Foundation. R.L.A. and M.J.D. also acknowledge fellowship support from NBCF. The contents of the published material are solely the responsibility of the individual authors and do not reflect the views of Cancer Australia or other funding agencies.
Author contributions: All the authors were involved in the data analysis and interpretation. J.B., V.C.W., K.L.R., F.H., and D.M. designed the experiments and wrote the paper. J.B. performed the experiments and analyzed the results. V.C.W., A.C.P., and K.L.R. optimized the imaging strategy. T.B. developed the image analysis software. T.B., J.B., and V.C.W. analyzed the imaging dataset. H.J.W., S.M., A.T.P., and M.J.D. analyzed the scRNA-seq and bulk RNA-seq results. A.S., M.M., and F.E.S. helped with the in vitro and in vivo experiments. D.B. provided some expert advice on flow cytometry analysis. B.Y., M.E., and R.L.A. contributed expertise and patient samples. J.W., S.W., and B.P. helped with the sequencing of the samples.
Competing interests: The authors declare that they have no competing interests.
Data and materials availability: All data needed to evaluate the conclusions in the paper are present in the paper and/or the Supplementary Materials. BROCADE PDX KCC-P-4295 and autopsy samples can be provided by BROCADE pending scientific review and a completed material transfer agreement. Requests should be submitted to the BROCADE Manager:
[email protected]. PDX-0066 can be provided by the ONJCRI, depending on availability and completing a material transfer agreement.