Automated time-lapse data segmentation reveals in vivo cell state dynamics

Embryonic development proceeds as a series of orderly cell state transitions built upon noisy molecular processes. We defined gene expression and cell motion states using single-cell RNA sequencing data and in vivo time-lapse cell tracking data of the zebrafish tailbud. We performed a parallel identification of these states using dimensional reduction methods and a change point detection algorithm. Both types of cell states were quantitatively mapped onto embryos, and we used the cell motion states to study the dynamics of biological state transitions over time. The time average pattern of cell motion states is reproducible among embryos. However, individual embryos exhibit transient deviations from the time average forming left-right asymmetries in collective cell motion. Thus, the reproducible pattern of cell states and bilateral symmetry arise from temporal averaging. In addition, collective cell behavior can be a source of asymmetry rather than a buffer against noisy individual cell behavior.


INTRODUCTION
Aristotle first noted the astonishing reproducibility of embryogenesis in "Historia Animalium," where he observed, "Generation from the egg occurs in an identical manner in all birds." For example, in bilaterians, the left and right sides generally adopt an identical form. At the cellular level, development entails a reproducible series of cell state transitions representing changes in gene expression state, physical state, and cell fate. These processes can be noisy; for example, cell migration can be either ordered or disordered, and such disorder is part of normal orderly development (1). We now appreciate that gene networks control cell state transitions, but these networks are composed of stochastic molecular processes (2). Despite the remarkable progress in the field of developmental biology in recent decades, it has been difficult to study the reproducibility and robustness of cell state transitions in vivo because of the challenge of systematically defining cell states in space and time. Here, we analyze the spatiotemporal pattern of cell state transitions in the zebrafish tailbud.
The vertebrate anterior-posterior body axis develops continuously from head to tail. Progenitors of the trunk mesoderm and spinal cord are largely specified during gastrulation, whereas the tailbud contains a bipotential neuromesodermal progenitor (NMP) cell population that contributes to the posterior body axis (3,4). In zebrafish, fate mapping experiments indicate that the presomitic mesoderm (PSM) formed during gastrulation gives rise to the first 10 to 13 somites (5-7). During early somitogenesis, the mesodermal cells continue to join the PSM via medial convergence, and because of their long transit time through the mesodermal progenitor zone (PZ) and PSM, NMP-derived PSM cells primarily contribute to the tail somites. Neighboring NMP cells disperse to contribute to multiple somites along the tail (8,9).
Cells in the tailbud undergo a series of transitions in gene expression and migratory behavior during their differentiation (Fig. 1A, left) (9)(10)(11)(12). The dorsal-medial (DM) tailbud contains the sox2/brachyury expressing NMPs that contribute to both the spinal cord (Fig. 1A, yellow) and the PSM (3). In the zebrafish, cells in the DM migrate toward the posterior in a rapid orderly fashion (Fig.  1A, cyan) (8,9,13). At the tip of the tailbud, mesodermally fated DM cells down-regulate sox2, up-regulate mesodermal genes such as tbx16, and undergo an epithelial-mesenchymal transition (EMT) to migrate ventrally into the PZ (Fig. 1A, magenta). Cell movements in the PZ are more disorderly than the DM. Ultimately, cells leave the PZ, reduce their speed, and assimilate into the left and right PSM (Fig. 1A, green) (9,(14)(15)(16). Cells in the PSM down-regulate tbx16 and begin to express tbx6. Cell velocity in the anterior PSM declines further as the tissue decreases its volume and solidifies (9,(17)(18)(19). The transition from orderly to disorderly motion from the DM to PZ is necessary for proper body elongation (1,9,20). Excessively disordered motion in the DM [via experimental inhibition of bone morphogenetic protein (BMP) or fibroblast growth factor (FGF) signaling] impairs the flow of cells through the tailbud leading to a short body axis. Excessively ordered motion in the PZ (induced by moderate Wnt inhibition) produces prolonged anisotropic fluxes, unequal allotment of cells to the left or right PSM, and a bent body axis. Thus, understanding robustness and reproducibility of vertebrate body elongation requires understanding the nature of these tailbud cell state transitions.
In this study, we first define the trajectory of cell states in the zebrafish tailbud during body elongation using dimensional reduction algorithms and then segment the trajectory using a change point detection algorithm. We define gene expression state using singlecell RNA sequencing (scRNA-seq). We validate this algorithmic definition of cell states by comparing wild type and embryos with reduced Wnt, FGF, and BMP signaling and verify quantitative differences in gene expression cell states by multicolor fluorescent in situ hybridization. Next, we identify cell motion states by analyzing cell tracking data using a similar approach. We find that the cell tracking data can be segmented into reproducible cell motion states. We then perform an analysis of the pattern of cell motion states, as these datasets allow direct quantification of cell state dynamics over time. The pattern of cell motion state transitions in a 2-to 3-hour time average are the same in each wild-type embryo, indicating the reproducibility of the cell state dynamics. In addition, the cell state pattern for a single time point is typically the same as the pattern of a 2-to 3-hour time average, revealing some dynamic stability of this pattern. However, individual embryos exhibit transient deviations from the average pattern of cell states. Analysis of these transient deviations reveals them to be due to irregular collective cell migration, which produces bilaterally asymmetric convergence to the midline in the PSM. Thus, both the reproducible pattern of cell motion states and bilateral symmetry arise from temporal averaging. More generally, these results indicate that collective cell behavior does not necessarily buffer noisy individual cell behavior and that collective cell behavior can be a source of inappropriate asymmetry. Prospectively, our approach using a dimensional reduction method and a change point detection algorithm may prove useful in the analysis of other complex time series datasets. Tailbuds were dissected and pooled; scRNA-seq profiles were generated, and a one-dimensional pseudotime was created and segmented into gene expression cell states. (B) UMAP projection of scRNA-seq data colored by cell type. Arrow marks the path of pseudotime in (C). (C) Expression of selected markers over pseudotime. Vertical lines are transition points between cell states as defined by a Bayesian algorithm that minimizes within state statistical error. Note that the segment colors along the pseudotime axis correspond to the colors along the developmental trajectory (arrow) in (B) but with the PZ and PSM being further subdivided into two similarly colored segments in (C). (D) UMAP projection of scRNA-seq data colored by experimental treatment. (E) Quantification of the differences in the proportion of cells that are in a given cell state in each replicate of each experimental condition. See also figs. S1 and S2.

Gene expression states
We performed scRNA-seq on dissected tails from 10 to 12 somite stage zebrafish embryos (Fig. 1A). We used wild-type embryos and embryos subject to treatments known to alter tailbud cell migration, specifically inhibition of FGF, BMP, or Wnt signaling (1,9,13,20). For each treatment, we prepared four biological replicates, each consisting of 10 to 12 tailbuds and resulting in 30,000 to 35,000 singlecell profiles. In a uniform manifold approximation and projection (UMAP) dimension reduction plot of wild type, the neuronal and paraxial mesoderm form one large cluster with more differentiated cells at each end and common progenitors (cyan) in the middle [ Fig.  1B (arrow) and fig. S1]. Wild-type and experimental samples consist of the same cell transcription profiles highlighting the robustness of gene expression states (Fig. 1D). This result is also consistent with previous scRNA-seq analysis of zebrafish embryos, indicating that perturbation of cell signaling does not create novel cell transcription profiles (21,22).
To enable direct quantitative comparisons between experimental conditions, we pooled the data from all wild-type and experimental replicates and created one unified pseudotime to define a single standard for classifying cells. Specifically, the cells in the main cluster were aligned along a neuronal-mesodermal axis from sox3 expressing neuronal cells to mespaa expressing anterior PSM cells (Fig. 1B, arrow) (22). This approach avoids the requirement to define the NMP population a priori. Instead, NMPs will be located in the middle of the pseudotime sequence, and differentiation will proceed toward both ends, i.e., neuronal to the left and mesodermal to the right (Fig. 1C). Marker genes for neuronal and mesodermal development map with respect to pseudotime in the correct developmental sequence, indicating that the procedure was successful.
To define gene expression states, we extracted the wild-type data and then used a change point detection algorithm to divide pseudotime into a series of distinct states (23). The change point algorithm identified five transition points ( fig. S2). These transition points (vertical lines in Fig. 1C) divide the pseudotime sequence into six states that generally agree with those predicted previously from marker gene expression (24). The states include a neural state, NMPs, and a succession of mesodermal states. The transition points were then mapped to the full pseudotime sequence, and we calculated the relative abundance of each state in wild-type, Wnt-inhibited embryos, FGF-inhibited embryos, and BMP-inhibited embryos (Fig. 1D).
To determine whether this analysis of scRNA-seq data accurately quantifies changes in cell state, we mapped the transcriptional states back onto the embryo and measured their abundance using multicolor fluorescent in situ hybridization for marker genes for the first five states ( Fig. 2A). Sox2 single-positive cells localize in the neural tube (state 1). Sox2-and brachyury-positive NMPs (state 2) occupy the DM. Nascent mesodermal progenitors (state 3) expressing brachyury and tbx16 are located immediately ventral to the DM in the medial PZ. Mesodermal progenitors in the PZ (state 4) are tbx16 single-positive cells located in the ventral and lateral tailbud. The PSM (state 5) is anterior to the transition from tbx16 to tbx6 expression.
To validate the scRNA-seq analysis, we chose to test the predictions of changes in the abundance of neuronal and PZ states. First, the scRNA-seq predicts that Wnt-inhibited embryos would have more neuronal cells (Fig. 1E). This is consistent with reports that elimination of Wnt signaling leads NMPs to exclusively adopt a neuronal fate (3). In our partial inhibition of Wnt signaling, 6 of 16 embryos have an abnormal cap of neuronal tissue covering the embryos' posterior, confirming the scRNA-seq results ( fig. S3).
A second prediction of the scRNA-seq analysis is that the PZ is smaller in BMP-and Wnt-inhibited embryos but not in embryos subject to FGF inhibition. To test this prediction, we performed fluorescent in situ hybridization for a PZ marker, tbx16, and a PSM marker, tbx6 (Fig. 2B). In wild-type, BMP-inhibited, and FGF-inhibited embryos, the tbx16 and tbx6 signal was measured along the anterior-posterior axis of the embryo for both the left and right sides (Fig. 2C). The PZ/PSM transition was set to the value derived from the scRNA-seq analysis (20% of the maximum value of tbx6), and then, the PZ length was normalized to the total tail length (85% of maximum value of tbx6). Consistent with the scRNA-seq analysis, BMP-but not FGF-inhibited embryos exhibited a decrease in PZ length (Fig. 2D). Because of the bent body axis exhibited by most Wnt-inhibited embryos, the area of the PZ and PSM were quantified. As predicted, Wnt-inhibited embryos have a smaller PZ (Fig. 2E). Thus, this approach to analyzing scRNA-seq data accurately identifies cell states that can be quantitatively mapped back onto the embryo.

Cell motion states
We hypothesized that similar algorithms to those used to classify gene expression states could be applied to cell motion data to define the cell motion states (Fig. 3A). For this purpose, we used cell tracking data from confocal time-lapse imaging of cells in the DM through PSM collected over 1 to 3 hours in wild-type embryos, 7 to 10 somite embryos, and embryos subject to signaling perturbations (9,13,20). As with the gene expression analysis, a cell motion trajectory for each cell track was used to order the tracks in pseudotime, and the state transitions were defined using the change point detection algorithm and the cell motion statistics such as speed and displacement in a specific duration. The cell states were color coded and spatially mapped back onto the embryo using the original cell track position.
Because this is a novel approach, we separately optimized pseudotime assembly and segmentation starting with the wild-type embryos. Unlike gene expression, cell motion cannot be practically characterized instantaneously, and the choice of track interval involves trade-offs. Longer tracks individually contain more information, but the embryo contains fewer of them, and they may average out within-track transitions in cell motion state. We chose to use a sliding window of eight time points (21 min) as an empirically derived base unit. For each track, we calculated a distance matrix in which each element is equal to the cell's displacement between the i-th and j-th time points. To form a linear sequence, we used a variational autoencoder (VAE), a neural network-based machine learning method, as a dimensional reduction tool. It embedded distance matrices from four wild-type embryos into a onedimensional latent space. To construct a pseudotime sequence for each embryo, cells were assigned the rank of their latent space coordinate (z). This procedure successfully reproduced the known developmental sequence ( fig. S4). It indicates that there is a continuity in cell migratory behavior in the tailbud, like that of gene To segment the pseudotime sequences, we sought parameters with a large variance over pseudotime. We found that either cell speed and track straightness ( fig. S5) or, more simply, track displacement (Fig. 3, B and C) gives the best results. The change point detection algorithm classifies the tracks into three cell motion states ( fig. S6). We mapped these states back onto the embryo by plotting cell position at the start of the track and taking either a single time point (Fig. 3D) or a time average created by randomly sampling tracks throughout the time lapse (Fig. 3E). The time averages are reproducible between embryos. Each embryo is segmented into a high-displacement state principally located in the DM, an intermediate displacement state mostly located in the PZ, and a low cell motion state mostly in the PSM (Fig. 3, D and E, and movie S1). These results are consistent with previous manual segmentations of the tailbud into DM, PZ, and PSM (9).
We next analyzed the temporal dynamics of the cell motion states. We find that the relative abundance of PSM-type tracks increases over time, while the DM and PZ shrink (Fig. 3F). This is consistent with the gradual decrease in the size of the tailbud as progenitor cells (DM and PZ) differentiate into PSM faster than new progenitors are created (8,25). However, we observed notable fluctuations in cell state size in some embryos. For example, in wild-type embryo 4, patches of cells transition rapidly from PZ to PSM-and then to PSM to PZ-type movements ( Fig. 4A and movie S1). These fluctuations are asymmetric between the left and right PSM and appear to be tied to differences in collective cell motion as they convergence toward the midline (Fig. 4A). To quantify this effect, we created regions of interest (ROIs) for the left and right anterior PSM and measured the abundance of PZ-type tracks and track displacement along the medial-lateral axis (Fig. 4, B and C). We found large differences between experimental replicates. Two embryos (replicates 3 and 4) have antiphase oscillations in medial-lateral displacement between the left and right PSM, where one side efficiently converges toward the midline, while the other has less convergence or even a net lateral motion. These fluctuations roughly correlate with differences in the abundance of PZ-type tracks. Replicate 1 exhibits constant convergence in both the left and right PSM, although the right side is moving significantly faster (P < 0.001) and has more PZ-type tracks. In contrast, replicate 2 has no net movement toward the midline in either PSM and a constant, low abundance of PZ-type tracks. Given this variability, we generated three more wild-type replicates (numbers 5 to 7). Their cells were assigned pseudotime coordinates by mapping them to the nearest tracks in the preestablished pseudotime sequence. They recapitulate the original phenotypes. Thus, cell motion state abundance is variable over time within a single embryo even while the average pattern in each embryo is the same.
In addition to the creation of individual pseudotime sequences for each embryo, we also performed a pooled analysis by mapping all cell tracks in wild-type replicates 1 to 4 to one pseudotime sequence and then segmenting that sequence. This approach is analogous to the unified pseudotime assembly used for the scRNA-seq data. Using this approach yields the same results as separate segmentations for three of four replicates ( fig. S7). The outlier is replicate 1 where almost the entire PSM is classified as PZ-type tracks. This indicates that the converging cells of the PSM in this replicate move similarly to the high-displacement PZ cells in the other embryos. However, within a single replicate, the PZ and PSM do have different migratory statistics ( fig. S7), indicating that the relative differences between cell motion states is reproducible from embryo to embryo, whereas the absolute magnitude of the differences may vary (9). Thus, individual versus pooled analyses provide different information, e.g., average cell state patterns, differences within a single embryo over time, and differences among embryos.

The effect of signaling perturbations on cell motion state
We next sought to determine whether signaling perturbations known to affect cell migratory behaviors would alter the cell state transitions. To this end, we took cell tracks from embryos subject to FGF, Wnt, or BMP signaling inhibition and assigned them pseudotime coordinates by mapping them to the most similar wild-type tracks. We then colored the cells by pseudotime and segmented them using the change point detection algorithm ( fig. S4 and Fig.  5). The pseudotime sequence largely agrees with the developmental sequence, although FGF signaling-and BMP signaling-inhibited embryos have a noisier pattern of cell motion states than wildtype embryos (Fig. 5, A and B). This result is consistent with the observation that BMP inhibition suppresses the average differences in speed and track straightness between the DM, PZ, and PSM regions (9,20). However, the use of additional factors in the VAE algorithm likely aids in recovery of developmental sequences. Wnt signaling inhibition, a perturbation that affects the coordination of cell movement more than its cell-autonomous characteristics (1,9), largely retains the wild-type cell motion state pattern (Fig. 5C).
These results indicate that similarly to scRNA-seq, the embryos' cell motion statistics retain sufficient tissue identity to allow for data integration.
Inhibition of BMP or FGF signaling decreases coordination of cell motion in the DM, while inhibition of Wnt signaling prolongs left-right asymmetries in cell flux in the PZ leading to a bent body axis (1,9). We hypothesized that these perturbations could have similar effects in the PSM. Therefore, we repeated the measurements for PZ-type track abundance and medial-lateral displacement in the PSM in these backgrounds (Fig. 6). BMP-and FGFinhibited embryos have fluctuations in the abundance of PZ-type tracks and mean medial-lateral displacement similar to wild type, suggesting that this process is not regulated by these signals (Fig.  6, A and B). In the Wnt signaling inhibition case, three of the four embryos exhibited wild-type patterns of antiphase oscillations or sustained but low magnitude differences in convergence. They also generally had similar numbers of PZ-type tracks in the left and right PSM. The outlier, replicate 2, had major asymmetries where the left side was initially moving medially, while the right side moved in a posterior-lateral direction. Subsequently, the left PSM cells ceased motion and adopted almost exclusively PSMtype tracks, while the right PSM began moving medially. This was accompanied by complementary curvatures of the PSM-notochord interfaces. Thus, while embryos subject to moderate Wnt signaling inhibition generally have wild-type PSM behaviors, they can exhibit abnormalities that correlate with gross morphological defects.

DISCUSSION
The ability to perform automated classifications of cells into distinct states is a powerful tool both for organizing data and the discovery of underlying principles. Extensive effort has gone into developing techniques for identifying gene expression states from scRNA-seq data (26). Meanwhile, cell motion has traditionally been described more qualitatively. Our parallel analysis of gene expression and cell migration states using dimensional reduction algorithm followed by a change point detection algorithm demonstrates that these cell state transitions can be similarly systematically defined and mapped back onto the embryo.
Gene expression and cell motion states not only share some similarities but also have many differences. The gene expression transitions are spatially segregated and reproducible. They can be mapped onto the embryo using in situ hybridization (Figs. 1 and  2). The PZ-PSM transition is bilaterally symmetric and reproducible across 10 to 12 somite wild-type embryos (Fig. 2). Signaling perturbations do not create new states but do change the abundance of wild-type states. This phenomenon has been observed in other developmental contexts (27). Overall, gene expression states are very robust. The cell motion states have a somewhat different spatial pattern than the gene expression states. The transitions between DM, PZ, and PSM are roughly in the same position as the gene expression transitions but not always exactly (Fig. 7A). This is likely due to time delays inherent in mRNA processing and translation, posttranscriptional gene regulation, and physical constraints on cell motion. The pattern of cell motion transitions is also spatially noisier than for gene expression states. Some of this difference is technical. Cell motion has far fewer parameters than gene expression. However, much of the noise is biological and due to changes in collective cell behavior, not just the behavior of single cells. While our data do indicate a slowing of cell motion during PSM solidification from the anterior to the posterior, counter examples of the anterior PSM cells moving with a relatively rapid and processive PZ-type motion while the posterior PSM is more static can be readily identified. Thus, PSM solidification does not occur in a smooth anterior to posterior progression. These fluctuations between PZ and PSM type motion occur rapidly and involve small patches of tissue, suggesting that the anterior PSM remains close to the fluid-to-solid jamming transition (19).
Creation of a straight body axis requires equally sized left and right paraxial mesoderm. However, paradoxically, the two PSMs often have divergent behaviors in both cell speed and directionality. These sorts of asymmetries have also been observed in the nascent PSM during gastrulation, suggesting that this is a general feature of zebrafish convergent extension (28). These results conflict with models of body elongation (derived from chick) where coordinated convergence movements in the PSMs compress the notochord, forcing it to elongate into the tailbud, thereby displacing PZ cells (29). Furthermore, the notochord can elongate faster than the paraxial mesoderm, such as when loss of tbx16 prevents PZ cells from entering the PSM (30). In these mutants, the elongating notochord buckles. Thus, in zebrafish, notochord elongation does not appear to be driven by convergence of the PSM.
One method for achieving robust development from noisy processes is temporal averaging. If a parameter is integrated over a time span significantly longer than the period of fluctuations, then individual replicates will converge toward the mean (Fig. 7B). For gene expression states, cells are able to maintain their state in the face of transcriptional bursting by using the longer stability of proteins (31). Two drivers of PSM elongation are addition of cells to the PSM from the PZ and convergence of PSM cells. In wild-type embryos, the PZ has left-right cell movement oscillations on the order of tens of minutes (1). Wnt inhibition causes permanent asymmetries for an entire 3-hour time lapse and a bent body axis indicating that the reversals are necessary to evenly disperse mesodermal progenitors to the left and right side. We find similar fluctuations of convergence patterns in the PSM with periods on the order of tens of minutes to up to an hour. These fluctuations are not simply noisy single-cell behavior but rather abrupt changes in the collective motion of 10 to 100 cells. Thus, while a straight spine is essential for the biomechanics of the adult, transient small imbalances are tolerable in the embryo. How cell dynamics are tuned and averaged to obtain robust left-right symmetry remains an open question.

Data and code availability
The scRNA-seq data have been archived at National Center for Biotechnology Information Gene Expression Omnibus (NCBI GEO; accession no. GSE173894).

Zebrafish methods
Tüpfel long fin zebrafish were raised according to standard protocols and experiments approved by the Institutional Animal Care and Use Committee. Experiments were performed before sex determination in zebrafish (32). FGF, BMP, and Wnt signaling perturbations were performed using protocols previously developed to modulate cell migration (9,20). Specifically, starting at the sixsomite stage, embryos were incubated in 50 mM SU5402 or 40 mM DMH1 for 2 hours to inhibit FGF or BMP signaling, respectively. Wnt signaling was inhibited by injecting notum-1 mRNA at a concentration of 150 ng/ml into embryos at the single-cell stage and then incubating them until the 10-somite stage. This treatment yields a phenotypic spectrum, and embryos with nascent body elongation defects were chosen for further experiments (20).

Tailbud dissections and scRNA-seq
Embryos were incubated until the 10-to 12-somite stage and then dissected in ice-cold Hank's balanced salt solution (HBSS). The tail was collected by cutting immediately posterior to the last formed somite. Groups of tails consisting of 10 tails for wild-type, FGF inhibition, or BMP inhibition or 12 tails for Wnt inhibition were pooled together. Cells were dissociated by incubation in papain solution (20 U/ml; Worthing Biochemical) for 15 min at 29°C with gentle agitation. Halfway through the incubation, the solution was triturated 10 times with a P200 pipette. Cells were spun down at 300g for 5 min and then resuspended in 40 μl of cold HBSS. Cell concentration and viability were checked with a hemocytometer, and the volume of the solution was adjusted if required.

Construction of 10x Genomics single-cell 3′ RNA-seq libraries (version 3) and sequencing with an Illumina HiSeq4000
GEM generation and barcoding. Single-cell suspension in RT Master Mix was loaded on the Single Cell A Chip and partition with a pool of about 750,000 barcoded gel beads to form nanoliter-scale gel beads in emulsions (GEMs). Each gel bead has primers containing (i) an Illumina R1 sequence (read 1 sequencing primer), (ii) a 16-nt 10x barcode, (iii) a 10-nt unique molecular identifier (UMI), and (iv) a poly-dT primer sequence. Upon dissolution of the gel beads in a GEM, the primers are released and mixed with cell lysate and master mix. Incubation of the GEMs then produces barcoded, full-length cDNA from polyadenylated mRNA.
Post GEM-RT cleanup, cDNA amplification, and library construction. Silane magnetic beads were used to remove leftover biochemical reagents and primers from the post-GEM reaction mixture. Full-length, barcoded cDNA was then amplified by polymerase chain reaction (PCR) to generate sufficient mass for library construction. Enzymatic fragmentation and size selection were used to optimize the cDNA amplicon size before library construction. R1 was added to the molecules during GEM incubation. P5, P7, a sample index, and R2 (read 2 primer sequence) were added during library construction via end repair, A tailing, adaptor ligation, and PCR. The final libraries contain the P5 and P7 primers used in Illumina bridge amplification.
Sequencing libraries. The single-cell 3′ protocol produces Illumina-ready sequencing libraries. A single-cell 3′ library comprises standard Illumina paired-end constructs, which begin and end with P5 and P7. The single cell 3′ 16-base pair (bp) 10x barcode and 10-bp UMI are encoded in R1, while R2 is used to sequence the cDNA fragment. Sequencing a single-cell 3′ library produces a standard Illumina BCL data output folder. The BCL data include the paired-end R1 (containing the 16-bp 10x barcode and 10-bp UMI) and R2 and the sample index in the i7 index read.

Preprocessing of scRNA-seq data
We aligned the scRNA-seq data to Grcz11 and demultiplexed using Cell Ranger (10x Genomics). After the generation of expression matrices for each sample, we used Seurat v3 for preprocessing and clustering of scRNA-seq data (27). First, we excluded cells with an ectopic number of genes or exceeding a specified percentage of mitochondrial genes based on visual inspection for the distribution of these statistics. After the filtering genes, we conducted integration following Seurat's SCTransform integration (33).
We applied principal components analysis (PCA) and embedded the 30-dimensional PCA coordinates into two-dimensional UMAP. We clustered cells by Seurat function "FindClusters" with a resolution parameter of 0.5.

Pseudotime estimation of scRNA-seq
To recover cell state dynamics encoded in the gene expression data, we ordered a subset of scRNA-seq cells that belong to the axis from neural tube to PSM so that its ordering recapitulates the developmental trajectory during body elongation. In particular, we embedded the z scores of 30-dimensional PCA coordinates of cells belonging to specified clusters (Sox3 + , Sox2 + , DM, PZ, pPSM, and aPSM) into one-dimensional UMAP coordinates. Here, we expected that the most variable axis within gene expression space during this process would be the developmental trajectory. For UMAP embedding, we used the "umap-learn" package in Python and set "n_neighbors" as 400 and "min_dist" as 0.1.

Segmentation of scRNA-seq pseudotime
We segmented the pseudotime trajectory of scRNA-seq into several segments within which each cell c has similar z scores of 30-dimentional PCA coordinate x c to dissect the dynamics along the progression of cell state transitions during zebrafish body elongation. We used a Bayesian algorithm of change detection (23) to find break points of segments b k (k = 1, …, K ), which minimize the total error from the mean of profile of the segment ∑ k E k where and τ c is the discretized rank of the estimated pseudotime. We discretized the pseudotime rank into 30 bins for computational efficiency. We determined K as five scRNA-seq data using the elbow method (34), which chose a saturation point along the group variation curve as a function of the number of clusters.

Estimation and segmentation of cell movement pseudotime
To recover cell state changes along the developmental axis for the cell motion, we also ordered cell track trajectory into one-dimensional pseudotime. It was constructed using four wild-type embryos. For this purpose, we applied a VAE (35) to distance matrices of cell trajectories between eight subsequent time points. We defined the input for cell c at time point t as the distance matrix D c,t ∈ R 8×8 between the cell c's position at time point t, …, t + 7. Here, we assume that these distance matrices are generated from one-dimensional latent cell state z ∈ R Pðz c;t Þ ¼ Nðz c;t j 0; 1Þ PðD c;t j z c;t Þ ¼ N½D c;t j μ θ ðz c;t Þ; σ 2 θ ðz c;t Þ� where μ θ ; σ 2 θ : R ! R 8�8 is a decoder neural network with convolutional layers, the implementation details of which are shown in fig. S8.
To optimize this probabilistic model efficiently and approximate the posterior distribution P(z c,t | D c, t ), we defined variational posterior distribution, q ϕ ðz c;t j D c;t Þ ¼ N½z c;t j μ ϕ ðD c;t Þ; σ 2 ϕ ðD c;t Þ�, where μ ϕ , σ 2 ϕ : R 8�8 ! R are encoder neural network with convolutional layers, the implementation details of which are shown in fig. S8. Here, we derived evidence lower bound Lðθ; ϕÞ ¼ E q ϕ ðz c;t jD c;t Þ ½logP θ ðD c;t j z c;t Þ� þ D KL ½q ϕ ðz c;t j D c;t Þ j P θ ðz c;t Þ� � logP θ ðD c;t jz ϕ;c;t Þ þ D KL ½q ϕ ðz c;t j D c;t Þ j P θ ðz c;t Þ� wherez ϕ;c;t are derived from q ϕ (z c,t | D c,t ), using reparametrized sampling (35). We optimized parameters included in encoder and decoder networks using Adam implemented in PyTorch. We adopted early stopping with patience = 30, which was implemented in PyTorch lightning. We assigned z c,t as a cell movement pseudotime of cell c at time point t. We segmented the cell movement pseudotime using the same methodology for the segmentation of scRNA-seq pseudotime, except that the properties x c,t for minimizing within-group variation are the z scores of the track displacement or speed and track straightness for the eight time points. We specified the number of change points K as using the elbow method (34), which chose a saturation point along the group variation curve as a function of the number of clusters.
For three additional wild-type embryos and all embryos subject to signaling perturbations, we mapped their tracks into the preestablished pseudotime by assigning each track the pseudotime coordinate of the closest track in the pseudotime sequence. This approach produced consistent results and is more computationally efficient than continually reconstructing pseudotime each time new data are analyzed.

PSM quantification
ROIs consisting of the anterior 125 μm of the left and right PSM were constructed. Tracks were assigned to the ROI if the cell was within the region at the start of the track. Mean displacement along the medial-lateral axis and the relative abundance of PZtype tracks were calculated at each time point.

Multicolor fluorescent in situ hybridization
Probes for sox2, brachyury, tbx16, and tbx6 were purchased from Molecular Instruments. The hairpins and colors are listed in Table 1. Staining of 10 to 12 somite embryos was performed using their recommended protocol (36) with a few modifications. Specifically, batches of 15 embryos were stained simultaneously. The tbx6 probe was diluted 1:10 to avoid excessive bleed through the sox2 channel. DAPI (4′,6-diamidino-2-phenylindole) was added to the amplification mixture. After staining embryos were taken through a series of 25%/50%/75% glycerol in phosphate-buffered saline. The posterior half of the embryo was isolated and mounted dorsal side up in 75% glycerol. Embryos were imaged with a Zeiss LSM 880 Airyscan confocal microscope using a 20× objective.
Preprocessing of the microscopy images was done using ImageJ. The sox2 and tbx6 channels were subtracted from each other to eliminate bleed-through. Images were rotated to a consistent orientation, and a maximum intensity projection was created. Adaxial cells were identified in the DAPI channel and manually removed from the image. The midline separating the embryo into left and right halves was identified manually. Subsequent quantification was performed in MATLAB. The image was smoothed with a Gaussian filter, and the ROI was thresholded using Otsu's algorithm on both the tbx16 and tbx6 channels. For wild-type, BMP-inhibited, and FGF-inhibited embryos, average fluorescent intensity was measured along the x axis of the image and normalized to the maximum value. This was done separately for the left and right sides of the embryo. The PZ/PSM boundary was taken to be the first point with a value greater than 20% of the maximum tbx6 value. The anterior end of the PSM was defined as the last point greater than 85% of the maximum tbx6 value. The scaled PZ length was the PZ length divided by the distance from the end of the tail to the anterior boundary of the PSM. Wnt-inhibited embryos had some modifications to the quantification. In bent embryos, the boundary separating the left and right halves was taken to be a line through the midpoint of the tailbud brachyury signal to the notochord and then following the notochord toward the head. For wild-type and Wnt-inhibited embryos, the outer perimeter of the embryo was traced manually. The curve was smoothed with a Savitzky-Golay filter and defined as the embryo's axis. Pixels in the ROI were mapped to their closest points on the perimeter using the distance2curve function from J. D'Errico. The mean intensity along the axis was calculated using a sliding window. The same thresholds were used for the PZ and PSM, as described previously. The boundaries for these regions were taken to be a perpendicular dropped from the axis at the cutoff point. The scaled PZ area was the PZ area divided by the area of the PZ plus PSM. Statistics were calculated using Mann-Whitney U test.

Supplementary Materials
This PDF file includes: Figs. S1 to S8 Legend for movie S1 Other Supplementary Material for this manuscript includes the following: Movie S1 View/request a protocol for this paper from Bio-protocol.