Machine learning predictions of T cell antigen specificity from intracellular calcium dynamics

Adoptive T cell therapies rely on the production of T cells with an antigen receptor that directs their specificity toward tumor-specific antigens. Methods for identifying relevant T cell receptor (TCR) sequences, predominantly achieved through the enrichment of antigen-specific T cells, represent a major bottleneck in the production of TCR-engineered cell therapies. Fluctuation of intracellular calcium is a proximal readout of TCR signaling and candidate marker for antigen-specific T cell identification that does not require T cell expansion; however, calcium fluctuations downstream of TCR engagement are highly variable. We propose that machine learning algorithms may allow for T cell classification from complex datasets such as polyclonal T cell signaling events. Using deep learning tools, we demonstrate accurate prediction of TCR-transgenic CD8+ T cell activation based on calcium fluctuations and test the algorithm against T cells bearing a distinct TCR as well as polyclonal T cells. This provides the foundation for an antigen-specific TCR sequence identification pipeline for adoptive T cell therapies.


INTRODUCTION
Adoptive T cell therapies are revolutionizing cancer treatment.In this context, T cells are engineered to redirect their specificity toward cancer antigens with either a chimeric antigen receptor or a predetermined, tumor-specific T cell receptor (TCR).TCR-T cell therapies have the potential to recognize antigens that arise from mutations, fusion proteins, and aberrantly expressed regions of the genome, thereby increasing the breadth of targets (1,2).However, the identification of tumor-specific TCRs is challenging due, in part, to the need to recognize patient-specific tumor antigens presented in the context of highly polymorphic major histocompatibility complex (MHC) molecules.
Despite recent advances in the field of computational biology, in silico prediction of antigen-specific TCR sequences is still ineffective (3).Current TCR identification platforms rely on in vitro selection of antigen-specific T cells for subsequent TCR sequencing.These techniques often depend on the ability of individual T cells to bind peptide-MHC (pMHC) multimers or their capacity to proliferate and/or express activation markers after in vitro peptide stimulation (4).As such, these methods may introduce biases toward selection of high-affinity TCR sequences, which undergo more robust pMHC binding and proliferation.In addition, recent work has shown that T cells bearing antigen receptors of low and high affinity for a given antigen perform different functions.While T cells bearing high affinity antigen receptors may, acutely, be more effective in their antitumor activity, they may also be more susceptible to inhibitory receptor mediated dysfunction and, potentially, off-target cross-reactivity (2,(5)(6)(7)(8)(9)(10)(11)(12).Thus, it may be important to consider engineering T cells with a breadth of TCR affinities for optimal therapeutic efficacy.
In this context, antigen-specific T cell identification based on calcium (Ca 2+ ) oscillations downstream of TCR signaling is an alternative approach with notable potential.The kinetics of TCRdependent increases in intracellular Ca 2+ have been well described following in vitro and in vivo T cell activation (13).TCR engagement induces temporal oscillations in intracellular Ca 2+ concentrations, with sustained, high intracellular Ca 2+ levels associated with strong stimulation.The dynamics (amplitude, rate of oscillation, return to baseline, etc.) of intracellular Ca 2+ fluctuations are dependent on the cellular system as well as the characteristics of the interaction between the T cells and the antigen-presenting cells (APCs) (duration of interaction, costimulation levels, cytokine milieu, etc.) (14)(15)(16)(17)(18). Furthermore, increases in intracellular Ca 2+ concentrations are a proximal readout of TCR activation, occurring within seconds of antigen receptor stimulation, limiting the potential selection biases induced by prolonged interaction with an antigen and expansion of a potentially limited number of clones.Genetic reporters for Ca 2+ signaling (e.g., nuclear factor of activated T cells-green fluorescent protein) have previously been used for the isolation antigen-specific TCR transduced T cells using a microfluidics system (19,20), but their use for the discovery of antigen-specific TCR sequences from polyclonal T cells has not yet been achieved.The complexity of TCR-dependent Ca 2+ signals and the possibility that TCR-independent processes affect intracellular Ca 2+ levels are hurdles for its widespread use as a marker for TCR activation.
The use of supervised machine learning (ML) tools to process highly complex phenomena is revolutionizing approaches to clinical and fundamental research (21)(22)(23)(24).Several studies have shown that these methods can be used to characterize T cell antigen specificity from microscopy-based image datasets by monitoring the interaction of T cells with APCs or the autofluorescence changes that correlate with metabolic state (25)(26)(27).We propose to use ML algorithms, trained to identify TCR-dependent Ca 2+ fluctuations, to provide a prediction of antigen specificity at the single-cell level.
Here, we present a proof-of-concept study for predicting T cell antigen specificity based on intracellular Ca 2+ dynamics.We took advantage of TCR-transgenic T cells of known specificity, intracellular Ca 2+ concentration indicator dyes, and simple imaging techniques to train and validate a ML model to accurately and efficiently predict antigen-specific T cells based on intracellular Ca 2+ dynamics, which was then applied to polyclonal T cell responses.We show that convolutional neural networks (CNNs) allow for efficient and accurate prediction of T cell activation from intracellular Ca 2+ fluctuations at early time points, matching or surpassing other ML approaches.This method also demonstrates the feasibility of training algorithms on monoclonal TCR-transgenic T cells, stimulated with model peptides, for the prediction of antigen specificity in polyclonal T cell responses.

In vitro T cell activation model to track intracellular Ca 2+ dynamics
For the purpose of training an ML algorithm that predicts T cell antigen specificity based on Ca 2+ dynamics, we developed a simple imaging and analysis pipeline (Fig. 1A).To generate a widely applicable and more physiologically relevant in vitro system, we chose to develop an assay that uses peptide rather than pan-T cell stimulation (e.g., anti-CD3ε/CD28 antibodies or phytohaemagglutinin).With this method of stimulation, polyclonal T cells are poorly suited for training ML algorithms due to the very low frequency of antigenspecific T cells and the inability to know a priori the antigen reactivity of individual cells.Standard ML training requires labeled groundtruth data and balanced datasets, with a similar number of positive and negative cells.Therefore, we used murine monoclonal OT-I and P14 TCR-transgenic naïve CD8 + T cells in combination with lipopolysaccharide (LPS)-matured bone marrow-derived dendritic cells (BMDCs), loaded with chicken ovalbumin (OVA) 257-264 peptide (fig.S1).CD8 + T cells are labeled with a ratiometric Ca 2+ indicator dye, Indo-1, where the ratio between Ca 2+ -free and Ca 2+ -bound emission wavelengths is indicative of relative intracellular Ca 2+ concentration.Unless otherwise noted, both OT-I and P14 T cells are cocultured in the same well at a 1:1 ratio; a vital cytoplasmic stain, CellTrace Far Red (CTFR), was used to label either OT-I or P14 T cells before Indo-1 staining to differentiate the two populations.Images for both Indo-1 and CTFR were captured over a period of 2 hours, beginning a few minutes after the start of the coculture, to monitor intracellular Ca 2+ dynamics.An in silico analysis pipeline was generated to automatically identify each cell, track it over time, measure the fluorescence of Indo-1 at each time point, and assign a genotype based on CTFR fluorescence.Thus, we can measure the dynamics of intracellular Ca 2+ concentration for individual cells and know their antigen specificity.
For initial validation of the in vitro assay, we assessed T cell activation using well-established flow cytometric analysis of early (TCRβ down-regulation and CD69 up-regulation) and late (4-1BB expression) markers of T cell activation 3 and 15 hours after coculture initiation.For OT-I transgenic T cells, we find TCRβ down-regulation after 3 hours to be a high-fidelity readout of activation (Fig. 1B and fig.S2).This down-regulation reflects the internalization and degradation of the TCRαβ complex following strong affinity pMHC interaction, as previously described (fig.S2) (28).CD69 and 4-1BB expression, however, show slower kinetics; they not only require more time to be robustly expressed, but there is also evidence of TCR-independent CD69 expression possibly driven by cytokines and LPS in the culture, as has been previously documented (29)(30)(31)(32).The distribution of Ca 2+ concentration values, considering either all individual time points for all cells (left) or the average of each cell over the entire movie (right), shows a noticeable elevation in Ca 2+ concentration only for antigenspecific T cells (Fig. 1C), while nonspecific T cells display baseline intracellular Ca 2+ concentrations.Together, these results show that both the in vitro system and the analysis pipeline are appropriate.
Not all antigen-specific T cells up-regulate intracellular Ca 2+ during the 2-hour imaging window.Because the development of an effective ML classifier, in principle, requires a high-quality training dataset, these non-activated antigen-specific T cells could potentially interfere with the performance of a predictive model.Therefore, we manually labeled the Ca 2+ signals of each cell in the dataset as antigen-reactive or nonreactive based on visual inspection of the Indo-1 fluorescence ratio.Four independent evaluators blindly classified each cell based on its relative Ca 2+ levels over time, and a majority vote determined the final assignment of reactivity status; 68.4% of all antigen-specific T cells and 4.11% of nonspecific T cells were labeled as antigen-reactive in the training datasets (Fig. 1D).This suggests that some Ca 2+ fluctuation occurs in nonspecific T cells, although this would not ultimately result in productive activation (Fig. 1B).Using manual labeling as a method to classify T cell antigen specificity, we show a false discovery rate (FDR), i.e., the fraction of nonspecific T cells within all cells labeled as antigen-reactive, of 6.26% (Fig. 1E).Two unimodal distributions of average Ca 2+ concentrations are observed on the basis of manual assignment of cell status as antigen-reactive or nonreactive.However, nonspecific cells manually labeled as antigen-reactive display an intermediate Ca 2+ concentration distribution (Fig. 1F); while difficult for human evaluators to differentiate from antigen-specific T cells, it is possible that nonspecific cells with intracellular Ca 2+ levels above baseline have Ca 2+ fluctuation dynamics distinct from bona fide antigen-specific T cells.Appropriately trained ML algorithms should thus be able to distinguish these signaling events from TCRdependent Ca 2+ fluctuations.Last, we show a positive correlation between the activation efficiency of each independent culture well, determined by manual assignment of Ca 2+ traces and molecular activation markers measured by flow cytometry, both at early (TCRβ: Pearson correlation coefficient, ρ = 0.976) and later (4-1BB: Pearson correlation coefficient, ρ = 0.649) time points (Fig. 1G), further validating the manual labeling process.
Increases in intracellular Ca 2+ downstream of TCR engagement induce migration arrest (13,(33)(34)(35)(36).Given the importance of migration patterns in other approaches to identify T cell activation (25,26,33), we computed the average speed of all cells for each movie.We show that cells manually labeled as antigen-reactive are slower, on average, than those labeled as nonspecific (fig.S3A).In addition, at time points where Ca 2+ concentration is low on antigen-reactive T cells (before activation), the average and instantaneous velocity is identical to or above that of nonspecific T cells (fig.S3, B and C).

Deep learning approaches perform better than conventional methods for the classification of T cell activation based on Ca 2+ fluctuations
We systematically tested a non-exhaustive list of ML models that have been extensively used for the classification of one-dimensional (1D) datasets.We divided all experiments into training and evaluation datasets, balancing the number antigen-specific and nonspecific T cells, as well as the number of cells manually labeled as antigenreactive and nonreactive.Despite having confirmed that CTFR staining of either OT-I or P14 did not affect the critical parameters of this coculture setup (fig.S4), we also balanced the amount of movies with both CTFR staining conditions to prevent models from learning specific features of either condition (table S1).While the training datasets only contain cocultures of TCR-transgenic cells together with BMDCs presenting OVA, the test datasets consist of cocultures with either OVA or lymphocytic choriomeningitis virus gp33 (gp33-41) peptides to prevent overfitting and optimize the applicability of this model to a broader peptide repertoire.As expected, Ca 2+ fluctuation of P14 T cells in gp33 cocultures is up-regulated as compared to their nonspecific OT-I TCR-transgenic counterparts (fig.S5).In addition, the manual labeling of the Ca 2+ dynamics associated with gp33stimulated T cell cocultures shows a similar FDR to the OVA cocultures (fig.S5).
To benchmark the algorithms and choose an optimal architecture, we computed the efficiency (fraction of cells correctly predicted as antigen-specific) and FDR (fraction of mispredicted cells) for each model.Assuming that a subset of antigen-specific T cells has not been activated, the model efficiency is calculated by comparing the prediction to the manual labels, rather than the genotype of the cell (OT-I or P14).This allows for evaluation of the efficiency of the ML model to predict antigen specificity independently of the efficiency of the in vitro model used to activate antigen-specific T cells.On the other hand, to generate a model that best predicts whether a T cell is antigen-specific, the FDR compares the prediction to the genotype (i.e., nonspecific T cells predicted as antigen-specific) for a measurement of accuracy.To compare models, the efficiency and accuracy metrics for both OVA and gp33 datasets are used to compute an ad hoc weighted performance metric, allowing a choice of the relative importance of accuracy over efficiency (see Materials and Methods); the model that maximizes this metric is chosen as optimal.
We first evaluated the use of a simple thresholding approach to classify each cell as either antigen-specific or nonspecific based on relative intracellular Ca 2+ concentration.Setting a first threshold on intracellular Ca 2+ levels separating time points with high ([Ca 2+ ] hi ) and low ([Ca 2+ ] lo ) average Ca 2+ concentrations, we computed the time each cell spent in each Ca 2+ state.A second threshold is then set; any cell spending more than this amount of time in the [Ca 2+ ] hi state was classified as antigen-specific (fig.S6A).For all possible pairs of thresholds, we computed the efficiency and accuracy of this method on the training dataset.The pair of thresholds that maximized the performance metric was then used to perform classification for the evaluation dataset (fig.S6B).This approach has high efficiency (95.2%) but relatively poor accuracy (FDR = 12.7%) for OVA cocultures (fig.S6C).Furthermore, this method is poorly applicable to cocultures where gp33 is used; despite good accuracy (FDR = 1.48%), a high percentage of antigen-specific cells were not predicted (70.4% efficiency), likely due to differences in average intracellular Ca 2+ concentration between OVA-and gp33-specific TCR-transgenic T cells (fig.S6, C and D).
To find a more suitable approach for identifying antigen-specific cells based on intracellular Ca 2+ levels, we tested a multitude of models for accuracy and efficiency, going from simpler to more complex architectures and assessing the need of preprocessing and data augmentation (Fig. 2A and table S2).Using the performance metric to choose an optimal model, we show that deep learning algorithms were generally superior to other ML approaches.In particular, CNNbased architectures performed much better than any other method with this dataset (fig.S6E), especially when the structure and training parameters are optimized (see Materials and Methods).The optimized CNN model using manually labeled cells as ground truth performs the best and is used for the rest of this study; it is referred to as optCNN man .
This systematic approach revealed several important insights.Normalization of calcium concentration across independent experimental days (see Materials and Methods) is a critical factor, improving the efficiency of prediction of gp33 time lapses by over 28% (table S2).Reevaluating the thresholding method with data normalization shows an improved performance to non-normalized data but still lags behind CNN (fig.S6F).Second, in this in vitro setup, as opposed to other similar studies, the addition of positional data (i.e., instantaneous cell speed) did not improve classification (table S2).Last, we observed that models trained with either the manual labels (T cells labeled as antigen-reactive versus labeled as nonreactive), the genotype (antigen-specific versus nonspecific T cells), or a combination of both (antigen-specific T cells labeled as antigen-reactive versus the rest) as ground truth, all perform relatively well (table S2).When the training parameters are optimized (table S3), all three models show a very similar performance (fig.S6F), and their predictions overlap for 94.5% of the cells in the evaluation dataset (12,421 of 13,145 cells) (fig.S6G).Hence, it appears, for this application, that this architecture is not very sensitive to contamination of the dataset by negative (nonactivated OT-I) cells.
For each individual cell, optCNN man provides a prediction probability; a threshold on this probability was used to determine the classification (Fig. 2B).The distribution of the probability of being antigen-specific (P antigen-spe ) for all individual cells is bimodal, but classification of the rare cells that lie in between can markedly change the performance metrics.By varying the P antigen-spe threshold, above which cells are predicted as antigen-specific, we show that the optimized architecture performs best when using the P antigen-spe threshold of 0.47 (Fig. 2B).
The receiver operating characteristic curve shows the high sensitivity and specificity of the model with an area under the curve (AUC) of more than 0.95 (Fig. 2C).More specifically, optCNN man has high efficiency for both OVA and gp33 cocultures (94.1 and 88.6%, respectively) and low error rates (6.26 and 2.90%, respectively) (Fig. 2D).To further facilitate comparisons of performance between datasets, we used the metric efficiency × (1 − FDR) for each individual field of view; this metric shows an overall performance that is nearly identical for both conditions (Fig. 2D).Furthermore, these predictions are similar to the predictions made by the human evaluators (Fig. 2E).The intracellular Ca 2+ concentration of the nonspecific cells mispredicted as antigen-specific overlaps with those of nonspecific T cells manually labeled as antigen-reactive (Fig. 2F).Given the low prediction probability assigned to these cells (Fig. 2G), using a more restrictive threshold on P antigen-spe would likely remove a large number of the false positive predictions, at the cost of reduced efficiency.
In an effort to validate the prediction algorithm at the single-cell level, i.e., by correlating the dynamics of Ca 2+ fluctuation with indicators of T cell activation, we investigated motility changes in T cells predicted or not to be antigen-specific and as they relate to intracellular Ca 2+ concentration.Using individual T cell trajectories from the analysis pipeline (Fig. 1A), we computed the instantaneous (between two frames) and the average (over the entire movie) velocity and compared cells predicted as antigen-specific to those predicted to be nonspecific.We show that, on average, antigen-specific T cells predicted as antigen-specific are slower than those predicted as nonspecific, as expected given their increased intracellular Ca 2+ concentration (fig.S7, A and B) (33)(34)(35)(36).At the single-cell level, we show that intracellular Ca 2+ concentration and speed are inversely correlated, and a decrease in cell motility is detected as the cells presumably encounter cognate antigen (fig.S7C).We collated all 2443 cells predicted as antigen-specific by optCNNman and where we could identify the initial spike in intracellular Ca 2+ concentration when this occurred during the imaging period.Comparing cell velocity with respect to this time point, we show that cell arrest is associated with an initial spike in intracellular Ca 2+ concentration in cells predicted to be antigenspecific (fig.S7D).

Biological validation of the Ca 2+ -based deep learning algorithm to predict antigen specificity
We next sought to validate optCNN man using an alternative approach.By altering the parameters of the BMDC:T cell coculture to modulate activation efficiency, we investigated how efficiently optCNN man can predict activation in suboptimal conditions.To manipulate the antigen availability in each culture condition, we mixed antigenpresenting (OVA-or gp33-loaded) BMDCs and BMDCs presenting only endogenous peptides at different ratios.The lower absolute number of peptide-loaded APCs should lead to an increase in the time required for T cells to find cognate antigen, which translates, given the short imaging window, into a smaller fraction of activated cells.Using TCRβ down-regulation as well as CD69 and 4-1BB up-regulation after 3 and 15 hours of incubation as indicators of activation, we show a positive correlation between the cells predicted as antigen-specific by optCNN man and the frequency of cells expressing these activation markers (Fig. 3A).Furthermore, we show, as anticipated, a dosedependent relationship between the frequency of T cells predicted as antigen-specific and antigen availability (Fig. 3B).Similarly, decreasing the number of BMDCs in the coculture will reduce the probability of BMDC-T cell encounter and also leads to a reduction in the number of T cells predicted as antigen-specific (Fig. 3B).
Given that the training dataset was generated with a TCR-pMHC pair with high affinity (K d = 3.7 ± 0.7 nM) (37), we sought to evaluate the effectiveness of optCNN man for predicting the antigen specificity of T cells activated by lower affinity TCR-pMHC interactions.We stimulated naïve OT-I T cells with OVA (N4) altered peptide ligands (APLs; Q4 and T4) of decreasing affinity for the OT-I TCR (N4>Q4>T4) and keeping peptide concentration constant.We detected activation of OT-I T cells via flow cytometry, as indicated by CD69 and 4-1BB up-regulation after 3 and 15 hours, respectively (fig.S8A).As seen for stimulation of P14 T cells with gp33 (fig.S5A), lower avidity TCR engagement does not induce strong TCRβ downregulation (fig.S8A).Despite subtle differences in the average intracellular calcium levels (fig.S8B), optCNN man efficiently and accurately predicts antigen specificity over a wide range of physiologically relevant TCR-pMHC interactions (Fig. 3C).
Because it is not possible to know a priori the antigen specificity of individual naïve polyclonal CD8 + T cells, the validation of the model on polyclonal responses to antigenic peptides is challenging without extensive experimental confirmation.Thus, we used a mixed lymphocyte reaction (MLR) that typically leads to a larger fraction of T cells  being activated in a polyclonal fashion as compared to antigenspecific T cells.We cocultured C57BL/6J CD8 + T cells with MHCmatched C57BL/6J (autologous) or MHC-mismatched BALB/c (allogeneic) BMDCs (Fig. 4A).Using molecular markers of activation measured by flow cytometry after 3 and 20 hours, we show that allogeneic culture conditions lead to a higher fraction of T cells expressing activation markers than in autologous conditions.Although apparent as early as 3 hours, monitoring of activation by flow cytometry is more efficient after 20 hours of coculture (Fig. 4B and fig.S9).
Cell surface TCRβ down-regulation is not an obvious marker of T cell activation in the MLR setting (Fig. 4B).Using optCNN man to predict T cell activation based on Ca 2+ fluctuations in this polyclonal system, we also show that T cells cultured in allogeneic culture conditions have a higher frequency of cells predicted as antigen-specific than when T cells are cultured with autologous BMDCs (Fig. 4C).In addition, there is a strong correlation between the ML predictions and the flow cytometry markers of activation, particularly after 20 hours of culture, confirming the accuracy of prediction (Fig. 4D).The distribution of intracellular Ca 2+ concentrations in polyclonal T cells predicted as antigen-specific in the MLR is much wider than that of the monoclonal T cell populations used earlier (Fig. 4E), with an average calcium concentration closer to that of Q4-stimulated OT-I; this may be due to the wider range of affinities in the polyclonal T cell repertoire and differences in alloreactive TCR-pMHC binding biomechanics (38).Together, these data show the applicability of optCNN man , trained on monoclonal T cells responding to a single high-affinity peptide, for the prediction of responses to additional peptides as well as polyclonal T cell responses.Thus, simple models of T cell activation can be used to train ML architectures to recognize general features of Ca 2+ fluctuation, which are common to T cell responses across a wider range of TCR-pMHC affinities.

DISCUSSION
The rapid identification of antigen-specific T cells from naïve polyclonal T cells presents a unique challenge due to the lack of reliable early markers of TCR-specific T cell activation before proliferation.Here, we demonstrate the feasibility of using the time-dependent fluctuations of intracellular Ca 2+ concentration in individual T cells, a TCR-proximal readout, as a means to identify their antigen specificity.While increases in intracellular Ca 2+ may not be strictly TCRspecific, we propose that ML algorithms, trained on T cells of known specificity and activated in an antigen-specific manner, can learn the features of Ca 2+ fluctuation associated with TCR-pMHC engagement.We show that, once trained on monoclonal T cell responses, this model can be applied to predict activation of T cells over a relatively broad range of TCR-pMHC affinities and polyclonal T cells.We believe that this approach could be used to predict the antigen specificity of polyclonal T cells activated with specific peptides or peptide pools.The performance of deep learning models, once trained, is independent of the frequency of the event being predicted, i.e., antigen specificity predictions for T cells at a 1:1 antigen-specific:nonspecific ratio (as shown here) will be as efficient as at the much lower ratio of antigen-specific T cells in the polyclonal T cell population.
As compared to other methods of antigen-specific T cell identification, we suggest that the monitoring of intracellular Ca 2+ signaling is a much faster and simpler approach to the identification of antigenspecific T cells.Although they do not require T cell stimulation, thereby bypassing the delay in the modulation of activation marker expression, multimer pMHC-based enrichment requires the engineering of a new reagent for each peptide and every individual peptide/MHC combination, some of which may be problematic to manufacture (39).Early T cell activation is also challenging to measure by flow cytometry due to the absence of appropriate markers.We show that, while early (3 hours) surface TCRβ down-regulation and CD69 expression are useful for select TCRs in the in vitro model used here, they were not useful in the more physiologically relevant polyclonal MLR cultures and for lower avidity TCR-pMHC interactions.4-1BB, although more specific, appears to be up-regulated much later and only maximally after some proliferation has occurred, limiting its usefulness for early isolation of antigen-specific T cells and biased by preferential expansion of high-affinity T cell clones.In comparison, increases in intracellular Ca 2+ , a very early indicator of the TCR signaling pathway, allows for rapid, within a 2-hour time frame, and accurate identification of antigen-specific T cells.The simple nature of the coculture setup, commonly used for antigen-specific T cell activation, allows for flexibility in the target antigen (e.g., cancer antigens) loaded and MHC/HLA-restriction, by varying the source of APCs.It is important to point out that the dynamics of Ca 2+ fluctuations generated observed with this coculture system are not necessarily representative of those observed with physiological models (e.g., lower peptide concentration or in vivo activation).The decision to use high concentrations of peptides was based on the goal of optimizing antigen-specific T cell discovery rather than mimicking physiological Ca 2+ dynamics; high concentrations of peptide, particularly in the case of low affinity interactions, allow for broader activation of T cells.
Recent studies have also investigated the use of imaging-based technologies and ML to predict antigen specificity, analyzing either the dynamics of interaction between antigen-specific T cells and APCs or the changes in metabolic state associated with pan-T cell activation (25)(26)(27).With an AUC of more than 0.95, the performance of the approach presented here is similar to, if not better than, these previously published studies.In addition, the stimulation of naïve CD8 + T cells with peptide-loaded BMDCs, rather than pan T cell stimulation, ensures that the ML models learn features of Ca 2+ fluctuation, which are generated during TCR engagement.This study also demonstrates the ability to predict T cell reactivity across a range of TCR-pMHC affinities.
In terms of experimental complexity, the use of simple, inexpensive, and widely accessible fluorescent dyes and labware for conventional fluorescence microscopes are the only requirements and represent a low cost of entry for using this technology for downstream applications.The use of the ratiometric Indo-1 dye rather than genetically encoded reporters facilitates the application of these technique to a much wider variety of monoclonal T cell models and translation to the human system.Extraction of Ca 2+ fluctuations from these movies and the training of the 1D ML network also has the major advantage of requiring very little computing power and can be replicated with any desktop computer.Here, we made use of naïve T cells for the evaluation of Ca 2+ fluctuations.Naïve T cells, as opposed to antigen-experienced T cells, are not restricted in their TCR repertoire as may be the case after clonal expansion and may allow for the identification of TCRs with a wider range of affinities for a peptide of interest.Furthermore, naïve and antigen-experienced T cells and even different antigen-experienced T cell subsets display distinct Ca 2+ fluctuation patterns in response to TCR stimulation (40,41).The use of purified naïve T cells, although themselves heterogeneous in nature (42), should allow for a more homogenous and reproducible Ca 2+ response to TCR stimulation, compared to antigen-experienced T cells.
In this proof-of-concept study, we demonstrated that performant ML algorithms can be trained on Ca 2+ fluctuations in activated monoclonal T cells to predict polyclonal T cell responses, using a limited amount of data (~10,000 cells).Substantially increasing the size of the dataset with additional time lapses or through AI-assisted methods may further improve model performance, especially when it comes to differentiating the distinct pattern of Ca 2+ fluctuation associated with nonspecific T cells mispredicted as antigen-specific from bona fide antigen-specific T cells (43)(44)(45).Furthermore, it has been previously demonstrated that Ca 2+ oscillations contain information about the affinity of TCR-pMHC interactions; lower affinity TCR engagement will typically lead to more transient Ca 2+ fluctuations and distinct early activation dynamics (14)(15)(16).Using a similar approach, we postulate that generating Ca 2+ fluctuations from monoclonal T cells following TCR stimulation with pMHC of varying affinity at lower peptide concentrations should enable training of ML models to recognize specific features of Ca 2+ fluctuation associated with low versus high affinity TCR binding to antigen.It has already been shown that an ML model, trained on the dynamics of cytokine release by T cells following stimulation over several weeks, can predict the antigen affinity of individual T cells (46).The use of intracellular Ca 2+ dynamics would fast-track and simplify this approach.Notably, in the experiments presented here, we observed only modest differences in Ca 2+ fluctuations when OT-I T cells were activated with APLs.However, it is important to note that the high peptide concentration used in this study, while optimizing the identification of low affinity antigen-specific cells, may mask differences in Ca 2+ fluctuations between different stimulation conditions.
Combining ML approaches for the identification of antigenspecific T cells with technologies for their isolation will allow for the isolation of T cells of interest for downstream single-cell TCRsequencing, identifying clinically relevant TCR sequences for use in adoptive therapy.Few methods allow for the isolation of single cells of interest following microscopy-based observations in a highthroughput and automated fashion.A few studies have demonstrated the use of microfluidics, micropipettes, and/or a microraft apparatus for the isolation of antigen-specific T cells that may also be challenging to manufacture and are relatively low-throughput technologies (19,(47)(48)(49)(50).However, we and others have recently described methods to tag and/or isolate cells with high specificity under the microscope using the targeted illumination of cells of interest (51)(52)(53)(54)(55). Particular attention will need to be paid to the compounding of errors at the various steps of the pipeline (tracking, ML prediction, and barcoding) to avoid contamination by nonspecific TCR sequences, which would increase the time required for downstream biological validation of identified TCR sequence.The weights w eff and w FDR used in the performance metric, used to select the optimal model for antigen-specific T cell prediction, can be adjusted to choose optimal models to reduce FDR or increase the efficiency of identification, depending on the downstream applications.However, the advent of fast and reliable in vitro and in silico pipelines for TCR-pMHC screening mitigates this risk (56,57).Furthermore, the possibility to identify TCR sequences with a specific affinity, fine-tuned either for acute antitumor activity (high affinity) or for longer-lasting, broader immunesurveillance, with reduced side effects (lower affinity) could facilitate improvement in the quality of care to patients requiring adoptive T cell therapy (7,9).
For T cell isolation, cellular suspension is harvested via physical dissociation from OT-I and P14 spleen and lymph nodes (male and female, 6 to 12 weeks old).Naïve CD8 + T cells are further isolated using a magnetic enrichment kit according to the manufacturer's specifications (STEMCELL Technologies, catalog no.19858).OT-I or P14 cells are stained with 2 μM CTFR (Invitrogen, catalog no.C34572) at 10 6 cells/ml for 15 min at 37°C and rested 15 min at 37°C, 5% CO 2 before being pooled.Unless otherwise stated, OT-I and P14 are mixed at a 1:1 ratio.The cell suspension is then stained with 10 μM Indo-1 for 30 min at 37°C and rested 30 min at 37°C, 5% CO 2 .The isolation and staining procedure are identical for C57B/6J and BALB/c naïve CD8 + T cells, but the T cells are kept separate at all times and are not stained with CTFR.For MLR experiments, sex-matched T cells and BMDCs are cocultured to avoid anti-Sex determining Region Y (SRY) immune responses.

Prediction of T cell antigen specificity
All the code used for the prediction of T cell specificity was coded using MATLAB (MathWorks).

In silico analysis pipeline
From raw images, segmentation and tracking of T cells are made by adapting previously published methods (62)(63)(64).Briefly, in each frame, the centroid of each T cell is localized from the sum of the Indo-1 images using a combination of edge detection and watershed segmentation methods.On the basis of the position of all cells at each time point, we calculate individual T cell trajectories using particle tracking-based methods, adapted for our particular application.Filtering of tracks based on length (at least half of the duration of the movie) ensures that all cells in a time lapse are independent from each other.The distance traveled by a cell between two frames is reported as instantaneous speed.Manual quality control on the tracking is made to remove mistracked cells.
For each cell, at each time point, the fluorescence intensity of both Indo-1 emission wavelengths is calculated by integrating pixel intensity values in a disk (60% of the cell's diameter) around the centroid.The intensity background (calculated locally for each cell) is subtracted to the fluorescence intensity before calculating the ratio of both wavelengths (405/447 nm).Assembling the ratio for each cell across all time points allows us to generate the intracellular Ca 2+ dynamics.For genotype assignment, the fluorescence of CTFR is obtained 12 times throughout the imaging period for each cell; cell type is assigned if, at least eight time points, the cell appears positive or negative for CTFR.
For manual labeling, four independent evaluators were shown the intracellular Ca 2+ dynamics of all cells and were asked to classify them as antigen-reactive or nonreactive.A majority vote between evaluators was used to determine the final label of each cell.

Training and evaluation of the ML algorithms
For all individual models, training was made on the training dataset using the 1D intracellular Ca 2+ signal as input and the genotype of each cell, their manual label, or a combination of both (antigen-specific T cells manually labeled as antigen-reactive) as ground truth.When indicated, Ca 2+ dynamics were complemented either with the derivative of the Ca 2+ dynamics, approximated by the absolute value of the difference in Ca 2+ levels between two consecutive time points, or with the instantaneous cell speed, approximated by the euclidean distance between the same cell at two consecutive time points.Evaluation of the model was made on the evaluation dataset by predicting antigen specificity for each individual time lapse (field of view).

Performance metric
For each time lapse, the model efficiency, overall efficiency, and FDR were measured as follows All throughout, the performance metric (pM) uses the average model efficiency (eff) and the average FDR across all time lapses in the evaluation dataset.The optimal model is the one that maximized the formula w eff and w FDR are variable parameters that modulate the importance attributed to the error rate or the efficiency for each particular application.For this study, we use w eff = 1 and w FDR = 5.

Data normalization
When indicated, Ca 2+ concentration of each cell was normalized to the average value of "resting" intracellular Ca 2+ concentration.Briefly, for each time lapse, two Gaussian distributions are fitted to the probability distribution function of intracellular Ca 2+ calcium concentration.The mean of each Gaussian distribution is used as the average value Ca 2+ concentration in the low ([Ca 2+ ] lo ) and high ([Ca 2+ ] hi ) state for this movie.For each cell at each time point, we divide its Ca 2+ concentration by the average [Ca 2+ ] lo value to compute the "fold change" of Ca 2+ concentration over the resting state, as a means to reduce interexperiment variability.Data augmentation procedure When indicated, data augmentation was performed by artificially generating in silico Ca 2+ fluctuations from real in vitro fluctuations.This is achieved by a combination of repeating existing data, adding noise to the existing data, shifting the start of Ca 2+ fluctuation forward or backward (in time) and increasing or decreasing the levels of the Ca 2+ in the existing data.

Hyperoptimization
For each parameter to be optimized, i.e., number of neurons, kernel size, optimizer, and mini batch size, a range of possibilities was determined according to commonly used values for that parameter in the literature.A model was trained and evaluated as previously described for each combination of these four parameters, across all the ranges.All the models were then evaluated using the weighted performance metric; the hyperoptimized model is the one maximizing the performance metric.

Statistical analysis
Unless otherwise stated, a two-sample nonparametric Mann-Whitney U test was performed using Prism (GraphPad).

Supplementary Materials
This PDF file includes: Figs.S1 to S9 Tables S1 to S3

Fig. 1 .
Fig. 1.In vitro T cell activation model for the study of intracellular Ca 2+ dynamics.(A) Schematic representation of the analysis pipeline.(B) Flow cytometry assessment of surface TcRβ down-regulation, cd69 expression, and 4-1BB expression, 3 or 15 hours after coculture with OvA peptide.error bars indicate Sd (n = 6 to 10 independent wells over four independent experiments; Mann-Whitney U test).(C) distribution of intracellular ca 2+ concentration for all time points (left) or the average ca 2+ concentration over the entire time lapse (right) according to antigen specificity assignment (n = 7173 antigen-specific and n = 7564 nonspecific T cells over eight independent experiments).a.u., arbitrary units.(D) Frequency of T cells manually labeled as antigen-reactive.horizontal lines show the median and numbers below show the average.individual fields of view are represented in gray (n = 111 fields of view over eight independent experiments).(E) Proportion of antigen-specific and nonspecific cells among those manually labeled as antigen-reactive.(F) distribution of the average intracellular ca 2+ concentration according to manual assignment and genotype (n = 5038 OT-i and n = 347 P14 cells labeled as antigen-reactive; n = 9351 cells labeled as nonreactive over eight independent experiments).(G) correlation between the frequency of cells manually labeled as antigen-reactive and the frequency of cells down-regulating surface TcRβ or expressing 4-1BB, as measured by flow cytometry, after 3 or 15 hours of incubation.For each well, data for antigen-specific and nonspecific T cells are shown, and the frequency of manual labeling for all fields of view per well is averaged.error bars indicate SeM, full line shows linear regression on antigen-specific T cells, and dotted lines show 95% confidence interval [TcRβ: n = 13 independent wells over four independent experiments, coefficient of determination (R 2 ) = 0.9534, ρ = 0.9764; 4-1BB: n = 21 independent wells over six independent experiments; R 2 = 0.442, ρ = 0.649].OT-i, antigen-specific; P14, nonspecific.[ca 2+ ] i , intracellular calcium concentration.**P < 0.01 and ***P < 0.005.

Fig. 2 .
Fig. 2. CNNs allow for efficient and accurate classification of T cell activation based on intracellular Ca2+ dynamics.(A) Model efficiency (frequency of cells manually labeled as antigen-reactive predicted as antigen-specific) and false discovery rate (FDR; frequency of nonreactive T cells predicted as antigen-specific) of all Ml algorithms tested, for both OvA (left) and gp33 (right) time lapses.Performance of the selected models detailed in table S1 (black circles) is plotted along with the performance of the thresholding approach (fig.S6, B to d).Gray dots represent other models generated during the systematic evaluation of Ml structures.The optimal optcnn man model is shown in magenta.(B) Performance of optcnn man (see Materials and Methods for the performance metrics) for varying thresholds (see fig.S6) on the probability of being antigen-specific (P antigen-spe ).a.u., arbitrary units.(C) Receiver operating characteristic curve of optcnn man for both OvA (full line) and gp33 (dotted line) time lapses.Area under the curve (AUc) represents the overall performance of optcnn man .(D) detailed performance of optcnn man using the prediction probability threshold of 0.47.horizontal lines in the violin plot show the median and numbers below show the average of the distribution.individual fields of view are represented in gray (n = 73 OvA and n = 15 gp33 fields of view over nine OvA and two gp33 independent experiments; Mann-Whitney U test).(E) distribution of the overall efficiency (frequency of antigen-specific T cells predicted as antigen-specific) and FDR across the evaluation dataset for optcnn man and the manual labeling process (n = 73 OvA and n = 15 gp33 fields of view over nine OvA and two gp33 independent experiments).(F and G) distribution of intracellular ca 2+ concentration (F) and prediction probability P antigen-spe (G) of the nonspecific cells mispredicted as antigen-specific (n = 174 OvA and n = 32 gp33 cells over nine OvA and two gp33 independent experiments).not significant (ns), P > 0.05.

Fig. 3 .
Fig. 3. Biological validation of antigen specificity predictions based on intracellular Ca 2+ dynamics.critical parameters of the coculture (e.g., number of antigenpresenting BMdcs and total number of BMdcs) are modulated, and optcnn man is used to predict the frequency of antigen-specific T cells.For each independent culture well, the prediction percentage is averaged over three fields of view; wells are then harvested and analyzed by flow cytometry for expression of selected activation markers.(A) correlation between Ml prediction and the frequency of cells down-regulating surface TcRβ expression or expressing cd69 or 4-1BB, as measured by flow cytometry, after 3 or 15 hours of incubation.error bars indicate SeM.linear regression of the antigen-specific conditions is shown (full line) with 95% confidence error (dotted lines) (TcRβ: n = 12 independent wells over three independent experiments, R 2 = 0.959, ρ = 0.989; cd69: n = 12 independent wells over three independent experiments, R 2 = 0.866, ρ = 0.930; 4-1BB and OvA peptide: n = 21 independent wells over four independent experiments, R 2 = 0.493, ρ = 0.702; 4-1BB and gp33 peptide: n = 8 independent wells over four independent experiments, R 2 = 0.746, ρ = 0.863).(B) Percentage of antigen-specific and nonspecific T cells predicted as antigen-specific as a function of the ratio of antigenpresenting to non-presenting BMdcs.Full lines and dotted lines represent a sigmoidal curve fitted to the data.The nonspecific group pools data from both 1 × 10 5 and 2 × 10 5 BMdc conditions.error bars indicate Sd. (C) Model efficiency (compared to manual labeling) and FDR of optcnnman when predicting T cell specificity to lower affinity antigens.Affinity (K d ) of the altered peptide ligands (APls) for the OT-i TcR is indicated (37) (n = 11 independent fields of view over three independent experiments).

Fig. 4 .
Fig. 4. Deep learning models predict polyclonal T cell responses.(A) Schematic representation of the mixed lymphocyte reaction (MlR) setup.Purified c57B/6J naïve cd8 + T cells, stained with indo-1, were overlaid on Mhc-matched c57B/6J BMdcs (autologous condition) or Mhc-mismatched BAlB/c BMdcs (allogeneic condition).(B) Flow cytometry-based measurement of TcRβ, cd69, and 4-1BB expression 3 or 20 hours after coculture.error bars show Sd (n = 4 to 10 culture wells over three to five independent experiments; Mann-Whitney U test).(C) Frequency of cells predicted as antigen-specific (Mann-Whitney U test).(D) correlation between Ml prediction of antigen specificity and the frequency of cells expressing cd69 or 4-1BB (B), as measured by flow cytometry after 3 or 20 hours of incubation, respectively.each point is the average of the prediction of three independent fields of view and the average of two to six independent culture wells for flow cytometry data.error bars indicate SeM of prediction and flow cytometry data.linear regression is shown (full line) with 95% confidence error (dotted lines) and its correlation coefficient (R 2 ) and slope (three to five independent experiments).(E) Probability distribution of the average ca 2+ concentration over the entire time lapse for all the cells according to the prediction by optcnn man and culture condition, comparing prediction for MlR cultures and monoclonal T cell culture with OvA n4 and OvA Q4 antigens (replotted from fig.S8) (MlR: n = 3219 nonspecific and n = 541 antigen-specific cells over five independent experiments; OvA: n = 7266 nonspecific and n = 3333 antigen-specific cells over three independent experiments; Q4: n = 1161 nonspecific and n = 972 antigen-specific cells over three independent experiments).a.u., arbitrary units.ns, P > 0.05; **P < 0.01; and ***P < 0.005.
C57BL/6J (RRID:IMSR_JAX:000664) and BALB/c (RRID:IMSR_ JAX:000651) mice were purchased from the Jackson Laboratory (Bar Harbor, ME, USA).C57BL/6J-Tg(OT-I)-Rag1<tm1Mom> (OT-I, RRID:IMSR_JAX:003831) mice were obtained through the National Institute of Allergy and Infectious Diseases Exchange Program, National Institutes of Health (Bethesda, MD, USA) (58, 59).P14 TCR Tg mice were provided by M. Richer (McGill University, Montreal, Canada) and crossed onto a TCRα knockout (KO) (the Jackson Laboratory, stock no.002116) background (RRID:MMRRC_ 037394-JAX) (60, 61).All mice were bred and maintained in specific pathogen-free animal facilities at the Maisonneuve-Rosemont Hospital Research Centre and the Comparative Medicine and Animal Resource Center at McGill University.Both male and female mice 6 to 12 weeks of age were used.All animal protocols have been approved by the Animal Care Committee at the Maisonneuve-Rosemont Hospital Research Centre and McGill University.Experiments were performed in accordance with the Canadian Council on Animal Care guidelines.