Learning the solution operator of parametric partial differential equations with physics-informed DeepONets

Enabling the rapid emulation of parametric differential equations with physics-informed deep operator networks.


Introduction
As machine learning (ML) methodologies take center stage across diverse disciplines in science and engineering, there is an increased interest in adopting data-driven methods to analyze, emulate, and optimize complex physical systems.The dynamic behavior of such systems is often described by conservation and constitutive laws expressed as systems of partial differential equations (PDEs) [1].A classical task then involves the use of analytical or computational tools to solve such equations across a range of scenarios, e.g.different domain geometries, input parameters, initial and boundary conditions (IBCs).Mathematically speaking, solving these so-called parametric PDE problems involves learning the solution operator that maps variable input entities to the corresponding latent solutions of the underlying PDE system.Tackling this task using traditional tools (e.g.finite element methods [2]) bears a formidable cost, as independent simulations need to be performed for every different domain geometry, input parameter, or IBCs.This challenge has motivated a growing literature on reduced-order methods [3,4,5,6,7,8,9] that leverage existing data-sets to build fast emulators, often at the price of reduced accuracy, stability, and generalization performance [10,11].More recently, ML tools are actively developed to infer solutions of PDEs [12,13,14,15,16,17,18], however most existing tools can only accommodate a fixed given set of input parameters or IBCs.Nevertheless, these approaches have found wide applicability across diverse applications including fluid mechanics [19,20], heat transfer [21], bio-engineering [22,23], materials [24,25,26], and finance [27,28], showcasing the remarkable effectiveness of ML techniques in learning black-box functions, even in high-dimensional contexts [29].A natural question then arises: can ML methods be effective in building fast emulators for solving parametric PDEs?Solving parametric PDEs requires learning operators (i.e.maps between infinite-dimensional function spaces) instead of functions (i.e.maps between finite-dimensional vector spaces), thus defining a new and relatively under-explored realm for ML-based approaches.Neural operator methods [30,31,32] represent the solution map of parametric PDEs as an integral Hilbert-Schmidt operator, whose kernel is parametrized and learned from paired observations, either using local message passing on a graph-based discretization of the physical domain [30,31], or using global Fourier approximations in the frequency domain [32].By construction, neural operators methods are resolution independent (i.e. the model can be queried at any arbitrary input location), but they require large training data-sets, while their involved implementation often leads to slow and computationally expensive training loops.More recently, Lu et al. [33] has presented a novel operator learning architecture coined as DeepOnet that is motivated by the universal approximation theorem for operators [34,35].DeepOnets still require large annotated data-sets consisting of paired input-output observations, but they provide a simple and intuitive model architecture that is fast to train, while allowing for a continuous representation of the target output functions that is independent of resolution.Beyond deep learning approaches, operator-valued kernel methods [36,37] have also been demonstrated as a powerful tool for learning nonlinear operators, and they can naturally be generalized to neural networks acting on function spaces [38], but their applicability is generally limited due to their computational cost.Here we should again stress that the aforementioned techniques enable inference in abstract infinite-dimensional Banach spaces [39]; a paradigm shift from current machine learning practice that mainly focuses on learning functions instead of operators.In fact, recent theoretical findings also suggest that the sample complexity of DeepOnets can circumvent the curse of dimensionality in certain scenarios [40].
While the aforementioned methodologies have demonstrated early promise across a range of applications [41,42,43], their application to solving parametric PDEs faces two fundamental challenges.First, they require a large corpus of paired input-output observations.In many realistic scenarios, the acquisition of such data involves the repeated evaluation of expensive experiments or costly high-fidelity simulators, thus generating sufficient large training data-sets may be prohibitively expensive.In fact, ideally, one would wish to be able to train such models without any observed data at all (i.e.given only knowledge of the PDE form and its corresponding ICBs).The second challenge relates to the fact that, by construction, the methods outlined above can only return a crude approximation to the target solution operator in the sense that the predicted output functions are not guaranteed to satisfy the underlying PDE.Recent efforts [44,16,45,46,47] attempt to address some of these challenges by designing appropriate architectures and loss functions for learning discretized operators (i.e.maps between high-dimensional Euclidean spaces).Although these approaches can relax the requirement for paired input-output training data, they are limited by the resolution of their underlying mesh discretization, and, consequently, need modifications to their architecture for different resolutions/discretizations in order to achieve consistent convergence (if at all possible, as demonstrated in [30]).
In this work, we aim to address the aforementioned challenges by exploring a simple, yet remarkably effective extension of the DeepONet framework [33].Drawing motivation from physics-informed neural networks [14], we recognize that the outputs of a DeepONet model are differentiable with respect to their input coordinates, therefore allowing us to employ automatic differentiation [48,49] to formulate an appropriate regularization mechanism for biasing the target output functions to satisfy the underlying PDE constraints.This yields a simple procedure for training physics-informed DeepONet models even in the absence of any training data for the latent output functions, except for the appropriate IBCs of a given PDE system.By constraining the outputs of a DeepONet to approximately satisfy an underlying governing law, we observe significant improvements in predictive accuracy (up to 1-2 orders of magnitude reduction in predictive errors), enhanced generalization performance even for out-of-distribution prediction and extrapolation tasks, as well as enhanced data-efficiency (up to 100% reduction in the number of examples required to train a DeepONet model).As such, we demonstrate how physics-informed DeepONet models can be used to solve parametric PDEs without any paired input-output observations; a setting for which existing approaches for operator learning in Banach spaces fall short.Moreover, a trained physics-informed DeepONet model can generate PDE solutions up to three orders of magnitude faster compared to traditional PDE solvers.Taken together, the computational infrastructure developed in this work can have broad technical impact in reducing computational costs and accelerating scientific modeling of complex non-linear, non-equilibrium processes across diverse applications including engineering design and control, Earth system science, and computational biology.
The remaining of this paper is structured as follows.In section 2, we provide an overview of the DeepONet framework put forth by Lu et al. [33].Section 3 provides a detailed discussion of our main technical contribution, namely the formulation of a physics-informed regularization mechanism for constraining the outputs of a DeepONet model to approximately satisfy an underlying PDE system.Our discussion is accompanied by an illustrative example that highlights the main advantages of the proposed approach in comparison to conventional DeepONets.Further, in section 4 we present a series of comprehensive numerical studies to assess the performance of the proposed physics-informed DeepONet framework.Finally, section 5 concludes with a discussion of our main findings, potential pitfalls, and shortcomings, as well as future research directions emanating from this study.

Learning operators with DeepONets
In this section, we present a brief overview of the DeepONet model architecture [33] with a special focus on learning solution operators of parametric PDEs.Here, the terminology "parametric PDEs" means that some parameters of a given PDE system are allowed to vary over a certain range.These input parameters may include, but are not limited to, the shape of the physical domain, the initial or boundary conditions, constant or variable coefficients (e.g.diffusion or reaction rates), source terms, etc.To describe such problems in their full generality, let (U, V, S) be a triplet of Banach spaces and N : U × S → V be a linear or nonlinear differential operator.We consider a parametric PDEs taking the form where u ∈ U denotes the parameters (i.e.input functions), and s ∈ S is the corresponding unknown solutions of the PDE system.Specifically, we assume that, for any u ∈ U, there exists an unique solution s = s(u) ∈ U to 2.1 (subject to appropriate initial and boundary conditions).Then, we can define the solution operator G : A → U as Following the original formulation of Lu et.al. [33], we represent the solution map G by an unstacked DeepONet G θ , where θ denotes all trainable parameters of the DeepOnet network.As illustrated in Figure 1, the unstacked DeepONet is composed of two separate neural networks referred to as the "branch net" and "trunk net", respectively.The branch net takes u as input and returns a features embedding [b 1 , b 2 , . . ., b q ] T ∈ R q as output, where u = [u(x 1 ), u(x 2 ), . . ., u(x m )] represents a function u ∈ U evaluated at a collection of fixed locations {x i } m i=1 .The trunk net takes the continuous coordinates y as inputs, and outputs a features embedding [t 1 , t 2 , . . ., t q ] T ∈ R q .To obtain the final output of the DeepONet, the outputs of the branch and trunk networks are merged together via a dot product.More specifically, a DeepONet G θ prediction of a function u evaluated at y can be expressed by where θ denotes the collection of all trainable weight and bias parameters in the branch and trunk networks.These parameters can be optimized by minimizing the following mean square error loss where {u (i) } N i=1 denotes N separate input functions sampled from U. For each u (i) , {y (i) } P j=1 are P locations in the domain of G(u (i) ), and G(u (i) )(y (i) j ) is the corresponding output data evaluated at y (i) j .Contrary to the fixed sensor locations of {x i } m i=1 , we remark that the locations of {y (i) } P j=1 may vary for different i, thus allowing us to construct a continuous representation of the output functions s ∈ S.  [33] consists of two sub-networks, the branch net for extracting latent representations of input functions, and the trunk net for extracting latent representations of input coordinates at which the output functions are evaluated.A continuous and differentiable representation of the output functions is then obtained by merging the latent representations extracted by each sub-network via a dot product.Automatic differentiation can then be employed to formulate appropriate regularization mechanisms for biasing the DeepOnet outputs to satisfy a given system of PDEs.
It is important to highlight that each input function ] repeats itself for P times.In other words, suppose that u ∈ U, G(u) ∈ V are scalar-valued functions, {u (i) } N i=1 are N sample functions, and, for each sample u (i) , G(u (i) ) is evaluated at P different locations {y (i) } P j=1 ⊂ R d , then the tensor dimensions constituting a DeepOnet training data-set u, y, G(u)(y) are (N × P, m), (N × P, d), (N × P, 1) respectively.
Although DeepONets [33] and their variants (e.g.DeepM&Mnets [41]) have demonstrated great potential in approximating operators and solving multi-physics and multi-scale problems, it is worth pointing out that the learned operator may not be consistent with the underlying physical laws that generated the observed data (e.g., due to the finite capacity of neural networks or lack of sufficient training data).To illustrate this, let us consider a pedagogical example involving a simple initial value problem with an initial condition s(0) = 0. Here, our goal is to learn the anti-derivative operator To generate a training data-set, we randomly sample 10,000 different functions u from a zero-mean Gaussian process prior with an exponential quadratic kernel using a length scale of l = 0.2 [50].We also obtain the corresponding 10,000 ODE solutions s by integrating the ODE 2.6 using an explicit Runge-Kutta method (RK45) [51].For each observed pair of (u, s), we choose m = 100 sensors {x i } m i=1 uniformly distributed on the time interval [0, 1] and randomly select P = 1 observations of s(•) in [0, 1].To generate the test data-set, we repeat the same procedure with m = 100 and P = 100.The final test data-set contains 1, 000 different samples of random input functions u.  [33] equipped with different activation functions after 40,000 iterations of gradient descent using the Adam optimizer.
We represent the operator G using the unstacked DeepONet G θ where both the branch net and the trunk net are two-layer fully-connected neural networks with 100 neurons per hidden layer.Each network is equipped with ReLU activation functions.The network parameters can be trained by minimizing the following loss where ] represent the input function, and s (i) (y (i) ) denotes the associated solution of equation 2.6 evaluated at y (i) .
We train the DeepONet model by minimizing the above loss function via gradient descent using the Adam optimizer [52] for 40, 000 iterations.Note that the final output of DeepONet is a function of input coordinates x.Thus, we can compute the residual ds(x) dx of the inferred solution s(x) using automatic differentiation [49], and compare the computed residual with u(x) at the sensors {x i } m i=1 .Figure 2 shows the comparison of the predicted s(x) and ds(x) dx against the ground truth for one representative random sample from the test data-set.We can observe a good agreement between the predicted and the exact solution s(x) when using ReLU activation functions.However, the predicted residual ds(x) dx seems to approximate u(x) with step functions, which leads to a large approximation error.One may postulate that this is due to the non-smoothness of ReLU activations.However, as shown in the same Figure 2, similar poor predictions of both u(x) and s(x) are obtained by repeating the same process using a DeepONet equipped with tanh activations, under exactly the same hyper-parameter settings.Thus, despite the guarantee of universal approximation theorem for operators [34], it is possible that DeepONet models may not appropriately learn the correct solution operator in the sense that the predicted output functions are not compatible with the ground truth operator that generated the training data.

Physics-informed DeepONets
Physics-informed neural networks (PINNs) [14] can seamlessly integrate the data measurements and physical governing laws by penalizing residuals of partial differential equations in the loss function of a neural network using automatic differentiation [49].Motivated by PINNs and our findings in the previous section, we propose a novel model class referred to as "physics-informed" DeepONets that enables the DeepONet output functions to be consistent with physical constraints via minimizing the residual of the underlying governing laws in the same manner as PINNs.Specifically, we consider minimizing the following composite loss function where L operator (θ) is defined exactly the same as in equation 2.4 and  where {y denotes a set of collocation points that are randomly sampled from the domain of G(u (i) ), and used to approximately enforce a set of given physical constraints, typically described by systems of PDEs.
To demonstrate the effectiveness of the proposed framework, let us revisit the numerical example shown in section 2 and use the proposed physics-informed DeepONet architecture to learn the anti-derivate operator.Specifically, we represent the operator G by a DeepONet where both branch net and the trunk are 5-layer fully-connected neural network with 50 units per hidden layer.We also equip both networks with tanh activation functions.For this particular example, L operator is exactly the same as in equation 2.8, while the physics loss can be formulated as In this example we consider Q = m and y (i) j = x j for j = 1, 2, . . ., Q.
Figure 3 presents the predicted s(x) and ds(x) dx for the same random sample (see Figure 2) by minimizing the loss function 3.1 for 40,000 iterations of gradient descent using the Adam optimizer.Evidently, both predictions achieve an excellent agreement with the corresponding reference solutions.This can be further verified by the mean of relative L 2 error of the model predictions reported table 1, from which we may conclude that the physics-informed DeepONet not only attains comparable accuracy to the original DeepONet, but also satisfies the underlying ODE constraint.Another crucial finding is that physics-informed DeepONets are data-efficient and therefore effective in small data regime.To illustrate this, we train both a DeepONet and a physics-informed DeepONet for different number of training data points (i.e, different number of samples u) and report the mean of the relative L 2 error of s(x) over 1,000 realizations from the test data-set in Figure 4. We observe that conventional DeepONets require more than 10x training data to achieve the same accuracy as their physics-informed counterpart.
The remarkable success of DeepONets is based on the assumption that there exists enough data to train the model offline.However, high-fidelity numerical simulations are often computationally expensive and the volume of useful experimental data is generally limited or even intractable for many practical scenarios.Strikingly, as will be demonstrated below, the proposed physics-informed DeepONet is capable of learning the solution operator of parametric PDEs even without any paired input-output data, except for appropriate initial or boundary conditions -a setting for which conventional DeepOnets [33] and other competing approaches for operator learning in Banach spaces [41,42,43] fall short.

Numerical results
To demonstrate the effectiveness of physics-informed DeepONets, we provide a series of comprehensive numerical studies for solving various types of parametric PDEs.In most examples, we model random input functions u(x) using mean-zero Gaussian random fields (GRF) [50] as The parameter l will be used to control the complexity of the sampled input functions, and in general larger l > 0 leads to smoother u.
Throughout all benchmarks we will employ hyperbolic tangent activation functions (Tanh) and initialize the DeepOnet networks using the Glorot normal scheme [53], unless otherwise stated.All networks are trained via mini-batch stochastic gradient descent using the Adam optimizer [52] with default settings.Particularly, we set the batch size to be 10,000 and use exponential learning rate decay with a decay-rate of 0.9 every 1, 000 training iterations.The detailed hyper-parameters and computational cost for all examples are listed in Appendix B and C. The code and data accompanying this manuscript is publicly available https://github.com/PredictiveIntelligenceLab/Physics-informed-DeepONets.

The anti-derivative operator
To illustrate the capability of physics-informed DeepONets in solving parametric differential equations, let us again consider a pedagogical example involving the anti-derivative operator 2.6 as discussed in section 2, 3, i.e.
Assuming that u(x) is exactly known, we aim to learn the solution operator from u(x) to the solution s(x) without any paired input-output data.To this end, we represent the operator by a DeepONet G θ where both branch net and trunk net are 5-layer fully-connected neural networks with 50 neurons per hidden layer and equipped with tanh activations.The corresponding loss function is expressed as Here, and we sample N = 10, 000 input functions u(x) from a GRF with length scale l = 0.2.Moreover, we take Q = m = 100 and {x j } Q j=1 are equi-spaced grid points in [0, 1].From the expression of the loss function, it is worth emphasizing that all "training data" comes the measurements of u(x), and the zero initial condition on s(0) (i.e.no other observations of s(x) are available).The test data-set is the same as discussed in section 2.
We train the physics-informed DeepONet by minimizing the above loss function for 40,000 iterations via gradient descent using the Adam optimizer.Results for one representative input sample from the test data-set are presented in Figure 5.Additional visualizations for different input samples are provided in Appendix Figure 17.As it can be seen, an excellent agreement can be achieved between the physics-informed DeepONet predictions and the ground truth.Furthermore, we investigate the performance of the original DeepONet [33] in solving this parametric ODE example.
To this end, we train a DeepONet by minimizing the loss function L operator (θ) under exactly the same hyper-parameter setting.Representative predicted solutions s(x) for different input samples u are shown in Figure 6.We observe that the conventional DeepONet learns a degenerate map that can fit the initial condition s(0) = 0, but returns erroneous predictions for all x > 0. These observations can be further quantified in Table 2, which reports the mean and standard deviation of the relative L 2 prediction error for the output functions s and their corresponding ODE residual u over 1,000 examples in the test data-set.Remarkably, the proposed physics-informed DeepONet is trained in the absence of any paired input-output data, but still obtains comparable accuracy to the results shown in Table 1, where the model is trained with a large amount of paired input-output observations.More impressively, below, we show that physics-informed DeepONets can accommodate extremely irregular input functions by using appropriate trunk network architectures.To illustrate this, we consider a GRF with a length scale l = 0.01 as a prior on the input function space.We take Q = m = 200 and repeat the same data generation procedure as before.In this example, the training data-set contains N = 10, 000 different u samples, while the test data-set contains 1,000 realizations.
Given that the input functions are sampled from a GRF with a relatively small length scale, the associated solutions are expected to exhibit high frequencies.Therefore, we represent the solution operator by a DeepONet with Fourier feature embeddings [54], which are able to learn high-frequency components more effectively.Generally, A random Fourier mapping γ is defined as where each entry in B ∈ R m×d is sampled from a Gaussian distribution N (0, σ 2 ) and σ > 0 is a user-specified hyper-parameter.Then, a Fourier feature network [54] can be simply constructed using a random Fourier features mapping γ as a coordinate embedding of the inputs, followed by a conventional fully-connected neural network.
In particular, we encode the input functions by a branch net that is a 5-layer fully-connected neural network with 200 neurons per hidden layer.In addition, we apply a Fourier feature embedding [54] initialized with σ = 50 to the input coordinates y before passing the embedded inputs through a trunk network with the same architecture as the branch net.Then we train the physics-informed DeepONet for 300,000 iterations of gradient descent using the Adam optimizer.As can be seen in Figure 5, the predicted solutions s(x) and their corresponding ODE residuals u(x) obtained by the physics-informed DeepONet with Fourier feature networks are in excellent agreement with the growth truth.Moreover, the results of training the same physics-informed DeepONet without Fourier feature embeddings are presented in Appendix Figure 19.One may observe that using a conventional fully-connected trunk networks cannot accurately capture the high-frequency oscillations, leading to a large prediction error.These observations can be further quantified in Table 3, which summarizes the mean and standard derivation of the relative L 2 prediction error of trained physics-informed DeepONets constructed with different network architectures.Although here we have illustrated that an appropriate network architecture plays a prominent role in the performance of DeepONets, a comprehensive investigation of DeepONet architectures is beyond the scope of the present study and will be investigated in future work.
It is also worth pointing out that the trained physics-informed DeepONets is even capable of yielding accurate predictions for out-of-distribution test data.To illustrate this, we create a test data-set by sampling input functions from a GRF with a larger length-scale of l = 0.2 (recall that the training data for this case is generated using l = 0.01).The corresponding relative L 2 prediction error averaged over 1, 000 test examples is measured as 7.12e − 03.Some visualizations of the model predictions for this out-of-distribution prediction task are shown in the Appendix, Figure 21.

Diffusion-reaction systems
Our next example involves an implicit operator described by a nonlinear diffusion-reaction PDE with a source term u(x) with the zero initial and boundary conditions, where D = 0.01 is the diffusion coefficient and k = 0.01 is the reaction rate.We aim to learn the solution operator for mapping source terms u(x) to the corresponding PDE solutions s(x) using a physics-informed DeepONet.
We approximate the operator by a physics-informed DeepONet architecture G θ , where the branch and trunk networks are two separate 5-layer fully-connected neural networks with 50 neurons per hidden layer.For a given input function u (i) , we define the corresponding PDE residual as where represents the input functions, and {x i } m i=1 is a collection of equi-spaced sensor locations in [0, 1].The parameters of the physics-informed DeepONet can be trained by minimizing the loss function (4.9) Here, for each u (i) , {(x u,j } P j=1 are uniformly sampled points from the boundary of [0, 1] × [0, 1] (excluding t = 1), while {(x r,j = x j , and {t (i) r,j )} Q j=1 are uniformly sampled in [0, 1].Consequently, L operator (θ) enforces the zero initial and boundary conditions, and L physics (θ) penalizes the parametric PDE residual at the Q collocation points.In this example, we set P = Q = 100 and randomly sample N = 10, 000 input functions u(x) from a GRF with length scale l = 0.2.To generate the test data-set, we sample N = 1, 000 input functions u(x) from the same GRF and solve the diffusion-reaction system using a second-order implicit finite difference method on a 100 × 100 equi-spaced grid [51].Hence, the test data-set will contain 1, 000 realizations evaluated on a 100 × 100 uniform grid.
We train the physics-informed DeepONet by minimizing the loss function 4.8 for 120, 000 iterations of gradient descent using the Adam optimizer with default settings.Figure 8 shows the comparison between the predicted and the exact solution for a random test input sample.More visualizations for different input samples can be found in Appendix Figure 23.We observe that the physics-informed DeepONet predictions achieve an excellent agreement with the corresponding reference solutions.Furthermore, we investigate the performance of a conventional DeepONet model [33] in the case where some training data are available.Specifically, we still use the same 10, 000 input functions sampled before and for each u, and we randomly select P = 100 solution measurements out of the associated reference numerical solutions on the 100 × 100 grid.Then, we train the conventional DeepONet model under exactly the same hyper-parameter settings.The mean and standard deviation of relative L 2 errors of trained DeepONet and physics-informed DeepONet over the test data-set are visualized in Figure 9.The average relative L 2 error of DeepONet and physics-informed DeepONet are ∼ 1.92% and ∼ 0.45%, respectively.Remarkably, in contrast to the conventional DeepONet that is trained on paired input-output measurements, the proposed physics-informed DeepONet can yield much more accurate predictions even without any paired training data (except for the initial and boundary conditions).In our experience, predictive accuracy can be generally improved by using a larger batch size during training.A study of the effect of batch size for training physics-informed DeepONets can be found in Appendix F.

Burgers' equation
To highlight the ability of the proposed framework to handle nonlinearity in the governing PDEs, we consider the 1D Burgers' equation benchmark investigated in  where t ∈ (0, 1), the viscosity is set to ν = 0.01, and the initial condition u(x) is generated from a GRF ∼ N 0, 25 2 (−∆ + 5 2 I) −4 satisfying the periodic boundary conditions.Our goal here is to use the proposed physicsinformed DeepONet model to learn the solution operator mapping initial conditions u(x) to the full spatio-temporal solution s(x, t) of the 1D Burgers' equation.
Suppose that the solution operator is approximated by a physics-informed DeepONet G θ .For a specific input function u (i) , the PDE residual is defined by where u (i) denotes the input function evaluated a collection of fixed sensors {x i } m i=1 that are uniformly spaced in [0, 1].Then, the physics-informed loss function is given by where bc,j ) 2 (4.17) Here, for every input sample u (i) , x ic,j = x j and {(0, t ic,j )} P j=1 and {(x r,j )} Q j=1 are randomly sampled in the computational domain for enforcing the initial and boundary conditions and the PDE residual, respectively.In this example, we take P = m = 100, Q = 2, 500.
To obtain a set of training and test data, we randomly sample 2,000 input functions from a GRF ∼ N 0, 25 2 (−∆ + 5 2 I) −4 , and select a subset of N = 1, 000 samples as training data.For each sample u, we solve the Burgers equation 4.10 using conventional spectral methods.Specifically, assuming periodic boundary conditions, we start from a given initial condition s(x, 0) = u(x), x ∈ [0, 1] and integrate the equation 4.10 up to the final time t = 1.Synthetic test data for this example are generated using the Chebfun package [55] with a spectral Fourier discretization and a fourth-order stiff time-stepping scheme (ETDRK4) [56] with time-step size 10 −4 .Temporal snapshots of the solution are saved every ∆t = 0.01 to give us 101 snapshots in total.Consequently, the test data-set contains 1,000 realizations evaluated at a 100 × 101 spatio-temporal grid.
We employ two separate 7-layer fully-connected neural networks to represent the branch net and the trunk net, respectively.Each network is equipped with Tanh activation functions and has 100 units per hidden layer.The physics-informed DeepONet is trained by minimizing the loss function 4.15 via gradient descent using the Adam optimizer for 200, 000 iterations. Figure 10c shows the predicted solution of a trained physics-informed DeepONet for the worst sample in the test data-set, with a resulting relative L 2 error of 1.71e − 01.Moreover, a discrepancy between the exact and the predicted initial condition u(x) can be observed in Figure 10a.This indicates that the physics-informed DeepONet cannot accurately reconstruct the initial condition, which results in a large prediction error of the full solution.To enforce the initial condition and improve the performance of physics-informed DeepONet, we consider assigning weights to L IC (θ) and use a more powerful network architecture as the backbone of the branch net and the trunk net.Specifically, we modify the loss function 4.15 as where λ is a hyper-parameter that aims to balance the interplay of different terms in the loss function.Moreover, we employ a simple modified fully-connected neural network proposed by Wang et.al [57], which has been empirically proven to outperform conventional multi-layer percerptron (MLP) networks.More details can be found in Appendix G.We train the physics-informed DeepONet using standard and modified MLP networks by minimizing the modified loss function 4.20 for different λ ∈ {1, 5, 10, 20, 50, 100} under exactly the same hyper-parameter setting.The resulting average relative L 2 prediction errors are summarized in Figure 11a.Compared against conventional MLPs, the modified MLP architecture is capable of consistently yielding much better prediction accuracy, which can be further improved by assigning appropriate weights in the loss function.Among all these hyper-parameters, the smallest test error ∼ 1.38% is obtained for the modified fully-connected neural network with λ = 20.It is worth noting that the physics-informed DeepONet achieves the comparable accuracy compared to Fourier operator methods [32], albeit the latter is trained on a large corpus of paired input-output data.Furthermore, visualizations corresponding to the worst example in the test data-set are shown in Figure 10b and Figure 10d, respectively.One can see that model predictions achieve a good agreement against the reference solutions, with a the relative L 2 error being reduced to 3.30%.
Here, we must also emphasize that a trained physics-informed DeepONet model can rapidly predict the entire spatiotemporal solution of a given Burgers equation in ∼10ms.Inference with physics-informed DeepONets is trivially parallelizable, allowing for the solution of O( 103 ) PDEs in a fraction of a second, yielding up to three orders of magnitude in speed up compared to a conventional spectral solver [55], see Appendix Figure 14.

Eikonal equation
Our last example aims to highlight the capability of the proposed physics-informed DeepONet to handle different type of input functions.To this end, let us consider a two-dimensional Eikonal equation of the form where x = (x, y) ∈ R 2 denotes 2D spatial coordinates, Ω is an open domain with a piece-wise smooth boundary ∂Ω.
A solution to the above equation is a signed distance function quantifying the distance of a point in Ω to the boundary ∂Ω, i.e Sign distance functions (SDFs) have recently sparked increased interest in the computer vision and graphics communities as a tool for shape representation learning [58].This is because SDFs can continuously represent abstract shapes or surfaces implicitly as their zero-level-set, yielding high quality shape representations, interpolation and completion from partial and noisy input data [58].
In this example, we seek to learn the solution map from a well-behaved closed curve Γ to its associated signed distance function, i.e the solution of the Eikonal equation defined in equation 4.21.To this end, we use a DeepONet G θ to represent the unknown operator.This allows us to define the PDE residual Here, 2 ), . . ., ( m )] denotes a parametrized curve evaluated at a set of fixed sensor locations {(x Then, a physics-informed DeepONet can be trained by minimizing the following loss  For example, suppose that Γ is a circle with radius r, then the signed distance function is given by To generate a set of training data, we randomly choose N = 1, 000 circles with radii sampled from a uniform distribution.Then, for each input circle Γ (i) with radius r (i) , we have {(x The branch net and the trunk networks are two separate 6-layer fully-connected neural network with 50 neurons per hidden layer.We train the physics-informed DeepONet by minimizing the loss function 4.24 for 80, 000 iterations of gradient descent using the Adam optimizer.As shown in Figure 12, an excellent agreement can be achieved between the exact and the predicted signed distance functions for a representative example in the test data-set.The relative L 2 prediction error averaged over 1,000 examples in the test data-set is 4.22e − 03.More model predictions for different input samples can be found in Appendix Figure 28.

Case II: Parametric airfoils
Next, we consider a more complex case where the parameterized curves correspond to airfoils of different shapes.To obtain a set of training and test data, we use the UIUC Airfoil Data Site [59] which contains a total of 1,552 airfoil geometries.We use the first 1,000 shapes as training data, and the rest are included in the test data-set.Without loss of generality, we normalize the airfoil shapes to have zero mean and unit variance.
In this example, the computation domain is the unit square [−3, 3] × [−3, 3] and the branch net and the trunk net are two separate 6-layer fully-connected neural networks with 100 neurons per hidden layer.Both two networks are equipped with ELU activation functions.We train the physics-informed DeepONet for 120, 000 iterations of gradient descent using the Adam optimizer.To evaluate the performance of the trained model, we visualize the zero-level set of the learned signed distance function and compare it with the exact airfoil geometry.As shown in Figure 13, the zero-level-sets achieve a good agreement with the exact airfoil geometries.From the results of these two case studies, one may conclude that the proposed framework is capable of achieving a relatively accurate approximation of the exact signed distance function.

Summary and Discussion
This paper presents physics-informed DeepONets; a new model class for nonlinear operator learning in infinitedimensional Banach spaces.We illustrate how automatic differentiation can be leveraged to bias the outputs of DeepONets towards physically-consistent predictions for systems whose evolution can be described by systems of differential equations.By doing so, we observe significant improvements in predictive accuracy (up to 1-2 orders of magnitude reduction in predictive errors), enhanced generalization performance, as well as enhanced data-efficiency (up to 100% reduction in the number of examples required to train a DeepONet model).Strikingly, the proposed framework can be employed to solve parametric PDEs in an unsupervised manner, i.e. without any paired input-output observations.A series of comprehensive numerical studies demonstrate not only significant improvements in terms of predictive accuracy, but also a remarkable reduction number of training data required to train a DeepONet model.Despite the encouraging results presented here, numerous questions remain open and require further investigation.Motivated by the successful application of Fourier feature networks [54] in section 4.1, it is natural to ask: For a given parametric PDE, what is the optimal features embedding or network architecture for physics-informed DeepONets?Recently, Wang et.al. [60] proposed a multi-scale Fourier feature network to tackle PDEs with multi-scale behavior.Such an architecture may be potentially used as the backbone of the physics-informed DeepONet to learn multi-scale operators and solve multi-scale parametric PDEs.Another question arises from the possibility of achieving improved performance by assigning weights in the physics-informed DeepOnet loss function, as discussed in section 4.3.It has been shown that these weights play an important role in enhancing the trainability of constrained neural networks [57,61,62].Therefore, it is natural to ask: What are the appropriate weights to use for training physics-informed DeepONets?How to design effective algorithms for accelerating training and ensuring accuracy and robustness in the predicted outputs?We believe that addressing these questions will not only enhances the performance of physicsinformed DeepONets, but pave a new way for modeling and simulating complex, non-linear and multi-scale physical systems across diverse applications in science and engineering.

B Hyper-parameter settings
In all examples considered in this work, the branch net and the trunk net are equipped with hyperbolic tangent activation functions (Tanh), except for the Eikonal benchmark (airfoils), where ELU activations were employed.Physics-informed DeepONet models are trained via mini-batch gradient descent with a batch-size of 10,000 using the Adam optimizer [52] with default settings and exponential learning rate decay with a decay-rate of 0.9 every 1,000 iterations.In this work, we tuned these hyper-parameters manually, without attempting to find the absolute best hyper-parameter setting.This process can be automated in the future leveraging effective techniques for meta-learning and hyper-parameter optimization [63].

Case
Input Inference: A trained physics-informed DeepONet model can rapidly predict the entire spatio-temporal solution of the Burgers equation in ∼10ms.Inference with DeepONets is trivially parallelizable, allowing for the solution of O( 103 ) PDEs in a fraction of a second, yielding up to three orders of magnitude in speed up compared to a traditional spectral solver [55] (see Figure 14).

Figure 1 :
Figure1: Making DeepOnets physics-informed: The DeepONet architecture[33] consists of two sub-networks, the branch net for extracting latent representations of input functions, and the trunk net for extracting latent representations of input coordinates at which the output functions are evaluated.A continuous and differentiable representation of the output functions is then obtained by merging the latent representations extracted by each sub-network via a dot product.Automatic differentiation can then be employed to formulate appropriate regularization mechanisms for biasing the DeepOnet outputs to satisfy a given system of PDEs.

Figure 2 :
Figure 2: Learning the anti-derivative operator: Predicted solution s(x) and residual u(x) versus the ground truth for a representative input function.The results are obtained by training a conventional DeepONet model[33] equipped with different activation functions after 40,000 iterations of gradient descent using the Adam optimizer.

Figure 3 :
Figure 3: Learning anti-derivative operator: Exact solution and residual versus the predictions of a trained physicsinformed DeepONet for the same input function as in Figure 2.

Figure 4 :
Figure4: Learning anti-derivative operator: Mean of the relative L 2 prediction error of the original DeepONet[33] and the physics-informed DeepONet as a function of the number of u samples.

Model Relative L 2 Figure 5 :
Figure 5: Solving a 1D parametric ODE: (a)(b) Exact solution and residual versus the predictions of a trained physics-informed DeepONet for a representative input sample.

Figure 6 :
Figure6: Solving a 1D parametric ODE: Exact solutions versus the predicted solutions of a trained DeepONet for four different input samples.We observe that the conventional DeepONet[33] learns a degenerate operator.

Figure 7 :
Figure 7: Solving a 1D parametric ODE with irregular input functions: (a)(b) Exact solutions and corresponding ODE residuals versus the predictions of a trained physics-informed DeepONet with Fourier feature embeddings (for a representative input function sampled from a GRF with length scale l = 0.01).Additional visualizations are provided in Appendix Figure 20.

Figure 8 : 1 Rel. L 2 errorFigure 9 :
Figure 8: Solving a parametric diffusion-reaction system: Exact solution versus the prediction of a trained physicsinformed DeepONet for a representative example in the test data-set.

Figure 11 :
Figure 11: Solving a parametric Burgers' equation: (a) The average relative L 2 error of training physics-informed DeepONets with standard or modified MLPs for different λ ∈ {1, 5, 10, 20, 50, 100} over 1,000 examples in the test data-set.The smallest errors for standard and modified MLPs are 2.80e − 02 and 1.38e − 02, respectively.(b) Computational cost (sec) for performing inference with a trained physics-informed DeepONet model (conventional or modified MLP architecture), as well as corresponding timing for solving a PDE with a conventional spectral solver [55].Strikingly, a trained physics informed DeepOnet model can predict the solution of O(10 3 ) time-dependent PDEs in a fraction of a second -up to three orders of magnitude faster compared to a conventional PDE solver.Reported timings are obtained on a single NVIDIA V100 GPU.

Figure 12 :
Figure 12: Solving a parametric Eikonal equation (circles): Exact solutions versus the predicted solutions of a trained physics-informed DeepONet for a representative input sample.The black dots represent the location of sensors on the circular boundary.

Figure 13 :
Figure 13: Solving a parametric Eikonal equation (airfoils): Top: Exact airfoil geometry versus the zero-level-set obtained from the predicted signed distance function for three different input examples in the test data-set.Bottom: Predicted signed distance function of a trained physics-informed DeepONet for three different airfoil geometries in the test data-set.

Figure 14 :Figure 17 :
Figure14: Solving a parametric Burgers' equation: Computational cost (sec) for performing inference with a trained physics-informed DeepONet model (conventional or modified MLP architecture), as well as corresponding timing for solving a PDE with a conventional spectral solver[55].Strikingly, a trained physics informed DeepOnet model can predict the solution of O(103 ) time-dependent PDEs in a fraction of a second -up to three orders of magnitude faster compared to a conventional PDE solver.Reported timings are obtained on a single NVIDIA V100 GPU.

Figure 18 :
Figure18: Solving a 1D parametric ODE with irregular input functions: (a)(b) Training loss convergence of a physicsinformed DeepONets using a conventional fully-connected neural network, and a Fourier feature network, respectively, for 300,000 iterations of gradient descent using the Adam optimizer.

Figure 19 :Figure 20 :
Figure 19: Solving a parametric ODE with irregular input functions: Predicted solutions s(x) and corresponding ODE residuals u(x) for a trained physics-informed DeepONet with a conventional fully-connected architecture, across four different examples in the test data-set.

Figure 23 :Figure 26 :
Figure 23: Solving a parametric diffusion-reaction system: Predicted solution of a trained physics-informed DeepONet for three different examples in the test data-set.

Figure 28 :
Figure 28: Solving a parametric Eikonal equation (circles): Predicted signed distance functions by a trained physicsinformed DeepONet for three different examples in the test data-set.

Table 1 :
Learning anti-derivative operator: Mean and standard deviation of relative L 2 prediction errors of DeepONet and physics-informed DeepONet equipped with ReLU or Tanh activations over 1,000 examples in the test data-set.

Table 4
summarizes the main symbols and notation used in this work.), u(x 2 ), . . ., u(x m )] an input of the branch net, representing the input function u

Table 4 :
Nomenclature: Summary of the main symbols and notation used in this work.

Table 5 :
Default hyper-parameter settings for each benchmark employed in this work (unless otherwise stated).

Table 6 :
Physics-informed DeepONet architectures for each benchmark employed in this work (unless otherwise stated).

Table 7 :
[33]entional DeepONet[33]architectures for each corresponding benchmark (unless otherwise stated).TableCsummarizes the computational cost (hours) of training DeepONet and physics-informed DeepONet models with different network architectures.The size of different models as well as network architectures are listed table 7 and 6, respectively.All networks are trained using a single V100 card.It can be observed that training a physics-informed DeepONet model is generally slower than training a conventional DeepONet.This is expected as physics-informed DeepONets require to compute the PDE residual via automatic differentiation, yielding a lager computational graph, and, therefore, a higher computational cost. Training:

Table 8 :
Computational cost (hours) for training DeepONet and physics-informed DeepONet models across the different becnhmarks and architectures employed in this work.Reported timings are obtained on a single NVIDIA V100 GPU.