Four-dimensional computational ultrasound imaging of brain hemodynamics

Four-dimensional ultrasound imaging of complex biological systems such as the brain is technically challenging because of the spatiotemporal sampling requirements. We present computational ultrasound imaging (cUSi), an imaging method that uses complex ultrasound fields that can be generated with simple hardware and a physical wave prediction model to alleviate the sampling constraints. cUSi allows for high-resolution four-dimensional imaging of brain hemodynamics in awake and anesthetized mice.


I. INTRODUCTION
The advent of ultrafast ultrasound (>5000 frames per second (fps)) imaging over the last 20 years, enabled by increased computational power and parallel receive electronics, has spurred the development of multiple novel imaging modes for biomedical ultrasound [1,2].The formation of complete images within a short (<1 ms) temporal window allows for accurate quantification of tissue, blood, and contrast agent motion.This facilitates measurement of tissue elasticity and arterial stiffness [3,4], super-resolution via localisation and tracking of individual micro-bubbles [5,6], and vastly enhanced imaging of blood flow over a wide field-of-view [7].The latter has resulted in the emergence of functional ultrasound imaging (fUS or fUSi) a neuro-imaging technique -which is able to detect small changes in cerebral blood volume (CBV) induced by neurovascular coupling [8,9].In comparison to other neuro-imaging modalities such as fMRI, fUS offers greater ease of use at significantly lower cost while delivering a higher spatial-temporal resolution, with a recent demonstration, in conjunction with contrast agents, of its capability to detect vascular activity with a 6.5 µm spatial resolution [10].
Ultrafast ultrasound imaging, however, remains, principally, a two-dimensional technique due to the stringent spatiotemporal sampling requirements of the imaging process.This imaging process necessitates the transmission of a sequence of planar or diverging waves at high frame rates (≥5kHz) while recording, in-parallel, the back-scattered signals over a surface sampled spatially and temporally at Nyquist rates [1].In the case of 3D imaging, this typically requires thousands of elements (as compared with 64-256 for 2D imaging) and a corresponding number of independent data channels with associated radio frequency (RF) digitisers.Recent works have reported 1024 channel systems for 3/4D cardiac imaging [11,12], super-resolution [13,14], and functional imaging in rats [15].However, these required the use and synchronisation of multiple data acquisition systems which is both extremely costly and technically complex, making it infeasible for most clinical applications.To reduce the number of necessary independent datachannels two approaches are conventionally considered: 1) retaining a fully-sampled array but using more complex readout schemes employing application-specific integrated circuits (ASICs) to combine and pre-beamform signals [16][17][18], 2) sparse arrays that strategically subsample an aperture with an element distribution designed to minimise side-lobes [19,20].However, both methods impose compromises on the image formation process and such arrays are technically complex to realise.Moreover, the small element sizes (on the order of the acoustic wavelength λ ∼ 100 − 300µm) imposed by Nyquist on all conventional arrays result in poor individual element sensitivity.For haemodynamic imaging, particularly in small animals that suffers from small signal amplitudes, this poses a significant challenge.
To address the aforementioned challenges, we introduce computational ultrasound imaging (cUSi).The complex hardware requirements and sensitivity challenges associated with sparse or ASIC addressed arrays are avoided with the use of a simple, fully populated, matrix probe that is under-sampled using acoustically large, and therefore highly sensitive, elements.Inherently these large elements are extremely directional (Fig. 1A) and thus incapable of high-resolution imaging.To compen- sate for this loss of resolution, we attach a smart, plastic encoding mask to the probe which scrambles the transmitted and received wave-fields of every element.By modifying the field with a random encoding mask, we more evenly sample the k-space of our imaging aperture while avoiding any symmetries that would give rise to artefacts such as grating lobes (Fig. 1B) [21].This more uniform sampling of k-space increases the lateral resolution of the imaging system at the expense of increased clutter/side-lobes.The challenge introduced, however, by imaging with complex wave-fields is that it prohibits the use of the conventional, geometry-based, processing (e.g., delay-and-sum) that is routinely used in ultrasound for image formation [22].Instead, we reconstruct images using a model-based approach after calibrating the 3D imaging response of our system using a one-time measurement (Fig 1C and Materials and Methods) [23].
cUSi falls into the category of computational imaging which covers techniques across a range of modalities that broadly aim to realise cheaper and/or faster imaging devices in part by shifting the burden of image formation from complex hardware onto computation [24][25][26][27][28][29].Within the ultrasound domain the exploitation of reverberant media to reduce the number of sensors required for imaging has been investigated since the 90's [30][31][32][33].However, the translation of these methods to in-vivo imaging has, traditionally, been hindered by challenges in separating back-scattered signals from the transducer cross-talk as well as the high-sensitivity of these media to small perturbations (e.g., temperature shifts).Recent work has demonstrated the application of reverberant media for 2D in-vivo Doppler imaging, however, as with earlier works [32], this utilised separate elements for transmit and receive and required the use of contrast agents due to poor SNR [34].In photoacoustics, where back-scattered cross-talk is not present, reverberant media have been successfully applied to ultrafast imaging of haemodynamics and functional activity via changes in optical absorption [35][36][37].This current work builds on proof-of-principle work on the use of a spatial encoding mask to mitigate sampling constraints [23,38].Here, we show for the first time a practical implementation that translates to 3D in-vivo recordings and ultrafast imaging.

II. RESULTS
We validated the potential of cUSi for in-vivo imaging of brain haemodynamics in both awake and anesthetised mice.In both experiments imaging was performed through a cranial window, which was covered with a TPX film (CS Hyde Company, IL, USA) in the awake case only.This craniotomy was applied to remove the distorting and attenuating effects of the skull on the wave-field.An 8x8-element matrix probe (1.25 mm pitch, 1x1 cm aperture, 13.8 MHz, Imasonic France) was used for all experiments.This was under-sampled by a factor of almost 500 compared to a fully populated array sampled at Nyquist rate of 32,000 sensors.The spatial encoding mask was fabricated in-house from Rexolite 1422 (owing to its favourable acoustic properties) via CNC micromachining.A aluminium waveguide was used to provide the required imaging offset and confine the transmitted field to the cranial window.The probe was driven using a synthetic-aperture transmission scheme while applying Hadamard encoding to boost SNR and attain full-information from the element array [39,40].Each transmit-receive acquisition decreased the spatial correlations between voxels in the imaging domain improving reconstruction quality (Fig. 1C and Supplementary Fig. S1).The transmission rate was 32 kHz which, including a short dead-time for data-transfer and storage, resulted in a volume rate of 407 Hz.We acquired data continuously at this volume rate for 60 seconds generating a data ensemble from which, after removing frames subject to instability, the Doppler images were formed.
Prior to image reconstruction we applied a spatiotemporal filter to the data ensemble to obtain the signals arising from blood-flow, eliminating the large component originating from static soft tissue (Fig. 1D) [41].For the anesthetised mouse brain, volumes of size 6.8×9.2×8mm with an isotropic voxel size of 40 µm were reconstructed by correlating the measurements with a calibrated model of the spatiotemporal impulse responses of our imaging system (Materials and Methods) providing a 4D volume ensemble data-set u(x, y, z, t).To form a power-Doppler image (PDI) we computed the average power for each voxel.To evaluate the direction of flow we computed a lag-1 auto-correlation of the temporal course of each voxel [42].
In Fig. 2, we highlight the capability of cUSi for capturing brain haemodynamics in a small rodent model.In Fig. 2A, a 3D rendering of the entire PDI is shown: where both small and large vessels are resolved throughout the entire cranial window.In Fig. 2B, we feature the information on flow direction that can be extracted from the Doppler ensemble showing draining cortical vessels.Fig. 2C shows axial, coronal, and sagittal slices through the power Doppler volume indicating the richness of the small vessels that can be detected, particularly in the cortex, as well as the isotropic lateral resolution that we achieve.Finally, Fig. 2D demonstrates the dependency of the imaging performance on increasing Doppler ensemble size.In the current implementation, we require a 1-2 second acquisition time to form reliable volumes that could be used for other modalities such as fUSi.Lateral, Sagittal and Coronal flythroughs of the converged PDI are shown in Supplementary Mov.1-2.In Supplementary Fig. S2, the power and colour Doppler volumes of an awake mouse are shown.Similar performance is obtained, however, there is a loss of fine detail in the cortex and of deeper vessels due to the loss in SNR.This loss of SNR is the result of attenuation generated in a TPX film that was used to cover the imaging window.

III. DISCUSSION
Traditionally, the clinical paradigm for ultrasound imaging has focused on robust hardware and simple, fast, processing that provides real-time feedback.However, there are clear economical and technical challenges with scaling this approach to 3/4D imaging that limit its use to specific targets (e.g., cardiac imaging).Moreover, there are emerging clinical applications, such as low-cost wearable devices [43] and trans-cranial imaging [44] for which this traditional paradigm is less appropriate.In this work, we have demonstrated for the first time the possibility for high-resolution 4D imaging of haemodynamics in the mouse brain using cUSi.This represents a proof-of-concept for such model-based approaches in biomedical ultrasound.In the future by designing custom hardware incorporating specific knowledge of the image reconstruction and clinical target there is the potential to realise computational imaging devices that address these contemporary clinical challenges.
The limitation in this work was the high sidelobes that originated from the large factor by which the array was under-sampled.This compressed the dynamic range compared to conventional 2D power Doppler.We were able to image the vasculature throughout the mouse brain with high resolution despite this.In the future, this dynamic range could be improved by firstly scaling up the number of matrix elements to bring it in-line with existing 2D systems, by further tuning of the physical media used to modulate the transmit/receive fields for the ultrasound array, and through joint optimisation of the driving scheme, mask design, and image reconstruction, particularly, leveraging developments in inverse design [45,46] and deep-learning based image reconstruction [47].

A. Animal Preparation.
In this study we assessed cUSi in-vivo imaging experiments in both anesthetized (n=1) and awake (n=1) adult C57BL6/J mice (8-10 weeks old).At arrival animals were group housed under a 12/12h light/dark cycle, with controlled temperature and humidity, and with access to water and food ad libitum.After surgery, mice were individually housed.The national authority (Centrale Commissie Dierproeven, The Hague, The Netherlands; license: AVD1010020197846) granted ethical approval prior the experiments, which were performed according to the institutional, national, and European Union guidelines.
Surgery: For both experiments a craniotomy was performed to facilitate imaging without distorting and attenuating effects of the skull.During surgery and the non-awake imaging experiments, mice were anesthetized using a isoflurane/oxygen mixture (5% induction, 1.75-2% maintenance), while body temperature was kept constant at 37 °C, and heart and respiration rate were monitored (Small Animal Physiological Monitoring System, Harvard Apparatus, MA, USA).The same device was used to fixate and level the head while drilling (Foredom) the cranial window.For the anesthetized imaging experiments a cranial window of 1x1 cm was made.For the awake experiments a smaller cranial window (+2 mm by -4 mm from Bregma, and ±4 mm width) was performed due to the placement of a pedestal (1x0.8 cm), which ensured head fixation in the experimental set-up during imaging.In the awake case only, the cranial window was covered with a TPX film (CS Hyde Company, IL, USA) and post-operative mice received 3-5 days of antibiotics (Baytril, 25mg/ml, Bayer, Germany) to prevent inflammation of the brain.Prior to the start of the imaging experiments, both the exposed and covered brain tissue was sprinkled with saline solution, after which ultrasound transmission gel (Aquasonic 100, Parker Laboratories, NJ, USA) was applied for acoustic contact between the brain and the the 8x8-element matrix probe positioned straight above the cranial window.

B. Matrix Probe.
Experiments used a custom built 64-element matrix probe (Fig. S3) (8×8 square elements, 1.25 mm lateral width, 10×10 mm aperture, 13.8 MHz central frequency, 65% -6 dB bandwidth, Imasonic, France).The element size was approximately 132λ 2 significantly larger than both the 16λ 2 used for 1D linear arrays (which while sampled at closed to Nyquist rate in-plane have a large elevational size) and the 4λ 2 used previously for 3D functional imaging of rats [15].The front-surface of the probe had a 15×25 mm footprint and included tapped holes on each corner for fixation of coding masks to the front surface.

C. Spatial Encoding Mask.
For cUSI we aim to perturb the field for each matrix element by introducing a spatially varying phase shift analogous to an (optical) spatial light modulator or acoustic hologram [48].We achieve this using a plastic coding mask that introduces a local, thickness-dependent delay to the transmitted field.
The coding mask material, ideally, would fit a number of criteria.Namely, impedance matching to water and low acoustic attenuation to minimise insertion losses, high sound speed contrast to soft-tissue to maximise phase delays, and mechanically machineable/castable/printable with high precision.Rexolite 1422 was selected to balance these competing requirements as it possesses low acoustic attenuation and good acoustic matching to water (soft-tissue) while being sufficiently rigid to machine [49].
The mask had a thin, smoothly varying, profile inspired by optical diffusers previously applied for 3D optical imaging [24].We generated a smoothly varying surface profile with an average lateral feature size of approximately 370 µm and a total height variation of 1.2 mm.The lateral feature size was constrained by the manufacturing method and the tool size while the ideal height scaling was determined empirically.We found that for masks that were too thin we insufficiently modulated the field of each element resulting in a broad point spread function (PSF) that was unable to resolve microvasculature.For masks that were too thick the increased losses due to greater internal reflection and increased absorption resulted in an SNR that was too low for imaging.
The encoding mask design challenge is similar to that of sparse array design in ultrasound where considerable effort is often devoted to optimising the element configuration prior to fabrication [20].With cUSi, however, we can retrospectively (and cheaply) modify our transmit/receive fields, and therefore the side-lobe distribution, simply by re-designing the encoding mask.While in this work we empirically designed our encoding mask, in the future, inverse-design principles [45] could be adopted to refine this approach by incorporating the system physics and reconstruction nonlinearities into the encoding mask optimisation [46,50].
The encoding mask was generated for a 12×12 mm area including a 1 mm lateral buffer around the active surface to eliminate sharp discontinuities that would create diffraction effects.This profile was exported as a point cloud to SolidWorks (SolidWorks 2018, Dassault Systèmes, Vélizy-Villacoublay, France) where it was converted to a solid part that was interpretable to the CNC software.A 0.4 mm axial offset was added to the mask profile for mechanical stability and its aperture was expanded to match the waveguide opening.To attach the mask to the probe a 2 mm thick 15×35 mm buffer was added around the encoding mask including four clearance holes to fixate it to the probe.The mask was then fabricated using a computer numerical control (CNC) machine (P60 HSC, Fehlmann) using a 200 µm drill-bit.A photograph of the coding mask can be seen in Fig. S3.

D. Waveguide.
A tapered waveguide was added to the coding mask to further modify the transmitted wave-field.This was introduced for two reasons.First, the waveguide allows for a controlled offset between the probe surface and the imaging medium.From analysis of the system matrix the PSF was found to degrade significantly close to (<6 mm) the probe (Supplementary Fig. S4A).This occurs as the field of each matrix element needs to propagate a certain distance after the encoding mask in order to diverge such that it overlaps with that of neighbouring elements.Close to the probe each voxel is seen by as few as 4 elements which results in an extremely poorly conditioned reconstruction.To eliminate this an offset between the probe and imaging target is required with a distance determined by the divergence to the field introduced by the encoding mask and the element size.For this work, we found 12 mm to be sufficient.Second, the cranial window that can be safely introduced to the mouse's skull is limited, sagitally and laterally, to ∼8×6 mm which is smaller than the probe aperture (10×10 mm).As such a large fraction of the transmitted energy (≥ 40%) falls outside the imaging window.This is compounded by the additional divergence introduced by the encoding mask.With the addition of a waveguide this energy can be funnelled onto the desired imaging window (Supplementary Fig. S4B).
The waveguide had input and output apertures of 14×14 mm and 7×7 mm respectively and a length of 10 mm.The minimum length was set by the discussed need for an offset for the imaging target as well as the requirement to avoid a sharp taper angle that would trap waves within the waveguide and reflect them toward the probe surface.There is no physical constraint on maximum length, however, an overly long waveguide suffers greater losses from absorption and, practically, is more prone to trapped air pockets in the ultrasound gel, which modify the system response from the calibration.The waveguide was fabricated from Aluminium using CNC micro-machining.Aluminium has a significantly higher acoustic impedance than both water and ultrasound gel (18 vs 1.5 MRayls) so near fully confines the transmitted field (Supplementary Fig. S4B).

E. Transmission Scheme.
A synthetic aperture transmission scheme [39] (also referred to as full-matrix capture [51]) was adopted for imaging.Each element sequentially transmits whilst recording the back-scattered signals on all the array elements in parallel allowing for each voxel in the imaging medium to be synthetically focused on during reconstruction.For the probe in this work each volume is therefore formed with a set of 64 transmissions.The principal disadvantage of synthetic aperture is poor SNR as each transmission only uses a single element.To improve this we applied spatial coding on transmit [40].For each of the 64 transmissions a different spatial code was applied to the matrix elements with each forming a column of a 64x64 matrix invertable matrix.We chose a Hadamard matrix as it is comprised of +/-1's so is easily implemented in hardware by flipping the polarity of the driving signal.By transmitting on all 64 elements each time we gain a factor of √ N or ×8 improvement in SNR.For conventional ultrasound it is necessary to decode the signals prior to reconstruction by applying the inverse of the Hadamard matrix.However, our image reconstruction fully models the relevant wave-physics so this step is not required.Instead we incorporate the spatial codes into our model of the transmission.cUSi would be equally feasible with alternative transmission schemes, for example, sequential focusing on each point in the imaging medium via time-reversal [33].While this would result in greater SNR it would not be compatible with the volume rate necessary for ultrafast imaging of blood flow.

F. Experimental Setup
The probe, encoding mask, and waveguide were each submerged and then assembled underwater using the four tapped holes on the front surface of the probe (Fig. 1A).Prior to assembly, if small air pockets were present on the mask, they were manually were manually removed from the encoding mask surface under a microscope using a syringe.Ultrasound coupling gel was then syringed into the end of the waveguide while still submerged to allow it to retain water when vertically mounted.Prior to application this syringe containing gel was centrifuged at 6000 rpm for 5 minutes to remove bubbles [52].The probe was then suspended vertically above the cranial window and lowered onto the exposed mouse-brain using a manual translation stage.Prior to imaging we assessed whether any air pockets were present in the waveguide by transmitting an impulse from each element and recording the back-scattered signals.The back-scattered signals were analysed to confirm that no signals originated within 10 mm of the probe surface.

G. Experimental Measurements.
The probe was driven using an ultrasound research system (Vantage 64-LE, Verasonics Redmond, WA, USA) with a 5 cycle tone-burst centred on 15.625 MHz.A single transmit sequence was used for both the awake and anesthetised mice imaging experiments.The pulserepetition frequency (PRF) was 32 kHz giving an underlying volume rate of 500 Hz.Transmissions were repeated in blocks of 6400 firings corresponding to 100 volumes.Between each block a dead time of approximately 50 ms was used to transfer the raw data buffer and write to a hard disk.The scheme was repeated for 60 seconds for both experiments resulting in 24400 volumes and an effective volume rate of 407 Hz.
H. Data pre-processing.
Signals were sampled in 50% bandwidth (BW) mode to reduce data requirements.For each transmission 512 samples at 50% bandwidth [53] corresponding to an imaging depth of 24.6 mm were recorded on each element.The 50% BW data was then Fourier transformed and downsampled by a factor of 2 to generate a set of 128 frequencies N ω that were used for reconstruction.These were evenly spaced between 11.7-19.4MHz (i.e., a 50% bandwidth centred on 15.625 MHz).
Next any frames subject to jitter from breathing or other motion were removed by evaluating a lag-1 difference over sequential frames and filtering the frames falling above an empirically determined threshold.At this stage an instability resulting from hardware was identified that necessitated the removal of the first 64 frames from each block of 100 from both data-sets.For the awake and anesthetised measurements respectively this left a total number of frames N f of 8041 and 7679 frames.
A clutter filter was then used to separate the static or quasi-static signals corresponding to tissue from those originating from blood which was accomplished using a singular value decomposition (SVD) [41].First, both acquisitions were reshaped into a 2D matrix with dimensions N ω ×N t ×N e , N f ) where N t is the number of transmissions and N e is the number of elements.Next, the SVD was evaluated over this full set of frames and the data above a manually determined threshold cut-off.For both data-sets this threshold was set at 65% of the singular values (e.g., for the anesthetised dataset the first 5226 values were discarded).It should be noted that this spatiotemporal filtering is typically applied after image reconstruction, however, we found no significant difference in applying it to our data as a pre-processing step.After this pre-processing our data consists of a 4D tensor v with dimensions (N ω , N t , N e , N f ).

I. Acoustic Model
Conventional ultrasound images are reconstructed geometrically.Simple, non-diffracting, spherical or planar fields are transmitted allowing for images to be reconstructed using only knowledge of the sensor positions and the speed of sound in the propagation medium.The use of a coding-mask and waveguide prohibits this with cUSi as the wave-field evolves in a complex manner with propagation.
Instead we assume that our data v can be linearly related to a 3D image u, via a matrix-vector multiplication v = Au. ( Here we assume the tensors u and v are vectorised and the matrix A contains the pulse-echo impulse response to our transmission scheme for each voxel in the imaging medium.The matrix A has number of rows equal to N ω N t N e , and number of columns equal to the number of voxels in the reconstructed image.Reconstructing an image requires precise knowledge of this matrix A.
Additionally, as the dimensions for A in this work are 8 × 10 6 × 5 × 10 5 it occupies over 10 Tb in memory so cannot be stored.As such we construct and apply it sequentially.
To construct A, first we perform a one-time experimental calibration of the forward field of each element.The probe with the encoding mask and waveguide attached was mounted in a custom-built scanning tank with a three-axis computer-controlled positioning system formed from 3 translation stages X-LSM200B, X-LSM100B, and X-LDA075A (Zaber, Vancouver, Canada) and driven using the ultrasound research system.The impulse response for each element was measured over a 12×12 mm plane parallel to the end of the waveguide with a 40 µm spacing using a broadband 0.2 mm needle hydrophone (Precision Acoustics, Dorchester).Signals were sampled using a programmable analog-to-digital converter (M4i.4450-x8,Spectrum, Germany) with 14 bits per sample and a 250-MHz sampling rate.The forward field for each of the 64 elements was measured in a single scan.To improve SNR, signals were averaged 8 times at each position, to reduce the amount of averaging required the probe was driven using the Hadamard encoded synthetic aperture scheme employed for imaging.To measure the probe's full bandwidth it was driven using a 32 Vpp 32 ns impulse, however, to further reduce the required averaging we used temporal Golay codes following Bae et al [54].After measurement both the temporal and spatial encoding were deconvolved.The calibration took approximately 6 hours and generated a set of 64 spatial temporal measurements p Ne (x, y, z s , t), where z s is the plane over which the calibration was performed.These wavefields were then mapped onto the 128 frequencies N ω that comprised the experimental data.The same calibration was used to reconstruct both data-sets, acquired on separate days, demonstrating that the calibration is robust to repetition.Maximum amplitude projections through each of the elements forward fields are included as Supplementary Movie 3.
The propagation is assumed to be linear, allowing us to predict the forward field at a location (x, y, z s ) for a frequency n ω for any arbitrary spatial apodisation vector H Ne and temporal delay vector T Ne applied on transmis-sion as a simple weighted summation In our case we apply no delays to any of the elements and simply use a 64×64 apodisation matrix for our transmissions H Nt Ne .So the forward field for any of the transmissions n t can be calculated simply as Additionally, reciprocity means that we can assume each element in the matrix probe behaves identically on transmit as on receive.Therefore, the pulse-echo impulse response for an element n e for a transmission n t for a position in the calibration plane can be calculated via temporal convolution reducing to a multiplication in the frequency domain.This allows us to evaluate u = A H v for any voxel in the calibration plane z s for a frame n f via (4) and, similarly, v = Au can be evaluated as v(n ω , n t , n e , n f ) = {p ne (x, y, z s , n ω ) To reconstruct other depths we use the angular spectrum method (ASM) to project the individual forward fields.For an acoustic field over a 2D plane p(x, y, z s ) at a frequency f this calculates the field over a parallel plane at a depth Here F xy and F −1 kxky denote the 2D discrete Fourier and inverse Fourier transforms respectively and H(k x , k y , d) is a propagator function given by where k is the wavenumber.Following Zeng and Mc-Gough [55] we apply an angular cutoff to the propagator to eliminate the under-sampled and evanescent spatial frequencies, where D is the lateral dimension of the propagated field.
Practically for reconstruction it is necessary to discretise our imaging domain.For the images in this work a 40×40×40 µm spacing was used matching the calibration measurement (to avoid resampling) and corresponding to a spacing of λ 2.4 .The image domain for the images in Fig. 2 was 6.8×9.2×8mm while for the images in Fig. S2 it was 7×6.2×7.2 mm.Finally, calculation of u = A H v and v = Au proceeds by first calculating the field for each transmission over the current depth using Eq. 3. Next, A H v or Au are evaluated for the current depth using Eq. 4 or 5. Finally, each of the individual elements forward fields are stepped to the next depth using Eq.6 and the process is repeated for this depth.
J. Image Reconstruction.
Our image reconstruction was formalised in a set of linear equations v = Au allowing a variety of different solvers to be used for reconstruction.We compared 3 different methods.
The first method was the least squares estimate, which finds an image minimising the square error between the modelled Au and the measured data v as follows: This was implemented using the LSMR algorithm [56], similar to the LSQR algorithm [57] which is regularly used for large-scale sparse systems, however, safer to use in the case of early termination that was necessary here.Coronal maximum amplitude projections through the complete PDI as well as a fixed 800 µm slice of the volume for increasing LSMR iteration are shown in Fig. S5.The underlying vasculature can be clearly resolved in each image, however, for early iterations there are large variations in the vessel intensity and background caused by spatial variation in the sensitivity of the underlying model [58].These are compensated for by later iterations and beyond iteration 8 there is minimal change in the reconstructed volume aside from a gradual loss of contrast as the solver fits the noise in the data and errors in the underlying model.This has been reported previously [23], however, occurs more rapidly for the in-vivo data reconstructed here.We attribute this more severe noise amplification to the lower SNR present in ultrafast imaging compared to the sparse, rigid, objects imaged in previous work.
To try to alleviate the loss of contrast we also tested a sparsity promoting reconstruction method choosing the iterative two-step shrinkage/thresholding algorithm (TwIST [59]) which minimises the following cost-function Here λ is a dimensionless scalar weighting the regularisation, and ∥u ′ ∥ 1 is the l1-norm.The regularisation param-eter was chosen empirically as λ = 0.2max(A H v). After convergence of the initial algorithm we applied a debiasing step to the non-zero elements of u using a conjugate gradient method following Figueiredo et al [60].However, TwIST with debiasing was found to be effectively equivalent with the early iterations of LSMR (Fig. S5B) which we attribute to the lack of spatial sparsity in the individual frames used to form each PDI.Both algorithms share two drawbacks.First, the calibration of the model A was performed using a 200 µm needle hydrophone due to the poor SNR of smaller sensors.This is larger than the acoustic wavelength of the matrix probe and, as a result, the higher-spatial frequencies of element forward fields are under-estimated [61].This results in a model that is less divergent than the actual experimental field and biases against vessels in the periphery of the volume.Second, given the large-size of the model matrix A iterative approaches that require it to be repeatedly constructed and applied are timeconsuming to evaluate.These drawbacks motivated the use of a third approach which we applied for the results presented in both Fig. 2 and Fig. S2.Taking inspiration from observations on one-bit time-reversal where the amplitude information was superfluous [31] we modified A to contain only the phase-information (i.e., A new = e arg(A) for each voxel and reconstructed images using a matched filter, i.e., u = A H new v.A comparison of this method with TwIST and LSMR is shown in Fig. S5B.Taking only the phase of the model re-weights the sensitivity to unity for each voxel.This re-weighting is able to partly compensate for the directivity errors introduced by the measurement method at the expense of diminished contrast.In addition this approach is significantly faster requiring only a single evaluation of A H v.
Once the full set of frames N f were reconstructed for both data-sets, we formed the PDI's from a simple summation of the power in each individual voxel as P DI(x, y, z) = The colour Doppler images (CDI) shown in Fig. 2B and S2B were calculated as arg( However, we found the CDI were noisy when we applied this to volumes reconstructed from separate groups of 64 transmissions.Instead we reconstructed a separate sub-frame for each set of 8 transmissions.A running average was then evaluated over these sub-frames to create a stack of 8N f images each formed from 64 orthogonal transmissions with a lag of 8 transmissions between images.The CDI processing was then applied to this new, larger, image stack.
Here we have shown that with a simple, quick, reconstruction method we are still able to form high-resolution 3D images of the vasculature.In the future there is significant scope to apply more advanced reconstruction methods that exploit statistical independence of the signals in neighbouring voxels [62,63].Additionally, the directivity and other errors associated with the experimental calibration of A used here could be eliminated with the use of blind calibration methods [64] or by deconvolution of the directivity response [61].

K. Image Resolution.
We approximated our image resolution with two distinct approaches.First, by calculating the decay rate of the lateral and axial correlations of our system matrix A at three different depths.Second, by approximating the length scale on which we can distinguish separate cortical vessels over a fixed depth.These are illustrated in Fig. S6.
The correlations were evaluated at depths corresponding to approximately 1.5 mm, 4.7 mm and 7.9 mm inside the mouse brain in the experimental data.In each case a single voxel was selected and the magnitude of the correlation coefficient of this voxel with each voxel of the system matrix A over a 2×2×1 mm region with a 40 µm spacing was calculated.The results demonstrate that both the lateral and axial resolution decrease with imaging depth, however, this occurs more slowly for the axial resolution which is additionally higher than the lateral resolution.This higher axial resolution is commonly the case for ultrasound imaging.Higher side-lobes are seen in the axial direction, however, these are largely confined to the lateral position of the voxel-of-interest (i.e., the sidelobes are not spherically symmetric).By zooming in on a fixed axial slice of the reconstructed PDI we can see that this correlation-based approximation of the lateral resolution corresponds well with the scale of structures reconstructed from the experimental data.With vessels in the cortex being resolvable, as defined by the Rayleigh criterion [24], with a spacing of less than 200 µm.frames comprising the anesthetised data-set, however, the data was filtered differently to the images presented in Fig. 2 and Fig. S2.Only the 5501-6500 spatial singular vectors were used to increase the computational efficiency of the iterative methods.The slices were formed from maximum amplitude projections through an 800 µm coronal cross section.With increasing iteration number the spatial variation in the amplitude diminishes at the expense of diminished contrast.(B) Axial, coronal, and sagittal maximum amplitude projections of PDIs formed with the three different reconstruction methods evaluated in this work.The LSMR projection was taken from iteration 16, the TwIST algorithm was first run to convergence followed by the use of the conjugate gradient method to debias the final volume.

FIG. 1 .
FIG. 1. (A)Rendering of the cUSi imaging hardware.A matrix probe populated with acoustically large, sensitive, elements is modified by attaching a smart, plastic, encoding mask that scrambles the transmit and receive wave-fields.The mask is coupled to a waveguide that confines the transmitted fields and provides the necessary imaging offset.(B) Scrambling the transmitted wave-field provides a broader sampling of k-space and allows us to trade lateral resolution and side-lobe intensity.(C) The probe is driven using a Hadamard-encoded synthetic aperture scheme, signals from different transmissions are separately reconstructed then coherently summed to form each 3D volume.Images are reconstructed using a model-based approach.The system response is calibrated with a one-time measurement and then correlated with each set of measurements to recover an image.(D) To generate Doppler images, we continuously transmit to acquire data that is used to reconstruct separate volumes at a rate of ∼400 Hz.The data is spatiotemporally filtered and the power associated with blood flow in each voxel over time is evaluated to form the PDI.

ḞIG. 2 .
Results of Haemodynamic imaging in the anesthetised mouse brain.(A) 3D rendering of the reconstructed PDI of the anesthetised mouse brain.Image was formed by compounding a filtered data-set comprised of 8041 volumes.(B) Axial, coronal, and sagittal maximum amplitude projections through the reconstructed Power (left) and Colour (right) Doppler volumes.(C) Sub-projections through the PDI rendered in (A).Each slice was formed from a maximum amplitude projection through a set of planes with a thickness of 480 µm (corresponding to 12 planes in the reconstructed volume).The approximate position of each projection in the brain is denoted on the image.For the coronal slices the positions are indicated relative to Bregma.(D) 3D rendering of vessels (bottom) and flow-direction (top) in a fixed region of the cortex formed via summation of an increasing number of frames.The time indicated assumes an underlying volume rate of 400 Hz (lower than that achieved in the experimental measurements) but neglects any frames removed to eliminate motion artefacts.
FIG.S1.Lateral correlations for a voxel at a fixed (x,y,z) position for increasing transmission number.Compounding additional orthogonal transmissions decreases these correlations and improves the image reconstruction.
FIG. S6. (A) Analysis of the axial and lateral correlations of the matrix A with a fixed position (x, y) at a 3 different depths.The correlations were evaluated on a grid with a 40µm step size over a 2×2×1 mm volume centred on the voxel of interest.The top-row shows maximum amplitude projections through the spatial correlations over this volume at each depth.The bottom row plots lateral and axial cross-sections through the point of interest for each depth.The values in each sub-figure of the bottom row denote the -6dB width.(B) Assessment of vessel resolution in the cortex approximately 0.6 mm inside the mouse brain.Separations of less than 200 µm can be identified supporting the assessment of the spatial resolution from the system matrix.