Main Content

Solve Partial Differential Equation with L-BFGS Method and Deep Learning

This example shows how to train a physics informed neural network (PINN) to numerically compute the solution of the Burger's equation by using the limited-memory BFGS (L-BFGS) algorithm.

The Burger's equation is a partial differential equation (PDE) that arises in different areas of applied mathematics. In particular, fluid mechanics, nonlinear acoustics, gas dynamics, and traffic flows. The L-BFGS algorithm [1] is a quasi-Newton method that approximates the Broyden-Fletcher-Goldfarb-Shanno (BFGS) algorithm.

Given the computational domain[-1,1]×[0,1], this example uses a physics informed neural network (PINN) [2] and trains a multilayer perceptron neural network that takes samples (x,t) as input, where x[-1,1] is the spatial variable, and t[0,1] is the time variable, and returns u(x,t), where u is the solution of the Burger's equation:


with u(x,t=0)=-sin(πx)as the initial condition, and u(x=-1,t)=0 and u(x=1,t)=0 as the boundary conditions.

The example trains the model by enforcing that given an input (x,t), the output of the network u(x,t) fulfills the Burger's equation, the boundary conditions, and the initial condition. Training this model does not require collecting data in advance. You can generate data using the definition of the PDE and the constraints.

Generate Training Data

Training the model requires a data set of collocation points that enforce the boundary conditions, enforce the initial conditions, and fulfill the Burger's equation.

Select 25 equally spaced time points to enforce each of the boundary conditions u(x=-1,t)=0 and u(x=1,t)=0.

numBoundaryConditionPoints = [25 25];

x0BC1 = -1*ones(1,numBoundaryConditionPoints(1));
x0BC2 = ones(1,numBoundaryConditionPoints(2));

t0BC1 = linspace(0,1,numBoundaryConditionPoints(1));
t0BC2 = linspace(0,1,numBoundaryConditionPoints(2));

u0BC1 = zeros(1,numBoundaryConditionPoints(1));
u0BC2 = zeros(1,numBoundaryConditionPoints(2));

Select 50 equally spaced spatial points to enforce the initial condition u(x,t=0)=-sin(πx).

numInitialConditionPoints  = 50;

x0IC = linspace(-1,1,numInitialConditionPoints);
t0IC = zeros(1,numInitialConditionPoints);
u0IC = -sin(pi*x0IC);

Group together the data for initial and boundary conditions.

X0 = [x0IC x0BC1 x0BC2];
T0 = [t0IC t0BC1 t0BC2];
U0 = [u0IC u0BC1 u0BC2];

Select 10,000 points to enforce the output of the network to fulfill the Burger's equation.

numInternalCollocationPoints = 10000;

points = rand(numInternalCollocationPoints,2);

dataX = 2*points(:,1)-1;
dataT = points(:,2);

Define Neural Network Architecture

Define a multilayer perceptron neural network architecture with 9 fully connect operations with 20 hidden neurons each. The first fully connect operation has two input channels corresponding to the inputs x and t. The last fully connect operation has one output u(x,t).

Specify the network hyperparameters.

numLayers = 9;
numNeurons = 20;

Create the neural network.

layers = featureInputLayer(2);

for i = 1:numLayers-1
    layers = [

layers = [
layers = 
  18×1 Layer array with layers:

     1   ''   Feature Input     2 features
     2   ''   Fully Connected   20 fully connected layer
     3   ''   Tanh              Hyperbolic tangent
     4   ''   Fully Connected   20 fully connected layer
     5   ''   Tanh              Hyperbolic tangent
     6   ''   Fully Connected   20 fully connected layer
     7   ''   Tanh              Hyperbolic tangent
     8   ''   Fully Connected   20 fully connected layer
     9   ''   Tanh              Hyperbolic tangent
    10   ''   Fully Connected   20 fully connected layer
    11   ''   Tanh              Hyperbolic tangent
    12   ''   Fully Connected   20 fully connected layer
    13   ''   Tanh              Hyperbolic tangent
    14   ''   Fully Connected   20 fully connected layer
    15   ''   Tanh              Hyperbolic tangent
    16   ''   Fully Connected   20 fully connected layer
    17   ''   Tanh              Hyperbolic tangent
    18   ''   Fully Connected   1 fully connected layer

Convert the layer array to a dlnetwork object.

net = dlnetwork(layers)
net = 
  dlnetwork with properties:

         Layers: [18×1 nnet.cnn.layer.Layer]
    Connections: [17×2 table]
     Learnables: [18×3 table]
          State: [0×3 table]
     InputNames: {'input'}
    OutputNames: {'fc_9'}
    Initialized: 1

  View summary with summary.

Training a PINN can result in better accuracy when the learnable parameters have data type double. Convert the network learnables to double using the dlupdate function. Note that not all neural networks support learnables of type double, for example, networks that use GPU optimizations that rely on learnables with type single.

net = dlupdate(@double,net);

Define Model Loss Function

Create the function modelLoss, listed in the Model Loss Function section at the end of the example, that takes as input the neural network, the network inputs, and the initial and boundary conditions, and returns the gradients of the loss with respect to the learnable parameters and the corresponding loss.

Specify Training Options

Specify the training options:

  • Train for 1500 iterations

  • Stop training early when the norm of the gradients or steps are smaller than 0.00001.

  • Use the default options for the L-BFGS solver state.

maxIterations = 1500;
gradientTolerance = 1e-5;
stepTolerance = 1e-5;
solverState = lbfgsState;

Train Neural Network

Convert the training data to dlarray objects. Specify that the inputs X and T have format "BC" (batch, channel) and that the initial conditions have format "CB" (channel, batch).

X = dlarray(dataX,"BC");
T = dlarray(dataT,"BC");
X0 = dlarray(X0,"CB");
T0 = dlarray(T0,"CB");
U0 = dlarray(U0,"CB");

Accelerate the loss function using dlaccelerate.

accfun = dlaccelerate(@modelLoss);

Create a function handle containing the loss function for the L-BFGS update. In order to evaluate the dlgradient function inside the modelLoss function using automatic differentiation, use the dlfeval function.

lossFcn = @(net) dlfeval(accfun,net,X,T,X0,T0,U0);

Initialize the TrainingProgressMonitor object. At each iteration, plot the loss and monitor the norm of the gradients and steps. Because the timer starts when you create the monitor object, make sure that you create the object close to the training loop.

monitor = trainingProgressMonitor( ...
    Metrics="TrainingLoss", ...
    Info=["Iteration" "GradientsNorm" "StepNorm"], ...

Train the network using a custom training loop using the full data set at each iteration. For each iteration:

  • Update the network learnable parameters and solver state using the lbfgsupdate function.

  • Update the training progress monitor.

  • Stop training early if the norm of the gradients are less the specified tolerances or if the line search fails.

iteration = 0;
while iteration < maxIterations && ~monitor.Stop
    iteration = iteration + 1;
    [net, solverState] = lbfgsupdate(net,lossFcn,solverState);

    updateInfo(monitor, ...
        Iteration=iteration, ...
        GradientsNorm=solverState.GradientsNorm, ...


    monitor.Progress = 100 * iteration/maxIterations;

    if solverState.GradientsNorm < gradientTolerance || ...
            solverState.StepNorm < stepTolerance || ...
            solverState.LineSearchStatus == "failed"


Evaluate Model Accuracy

For values of t at 0.25, 0.5, 0.75, and 1, compare the predicted values of the deep learning model with the true solutions of the Burger's equation using the relative l2 error.

Set the target times to test the model at. For each time, calculate the solution at 1001 equally spaced points in the range [-1,1].

tTest = [0.25 0.5 0.75 1];
numPredictions = 1001;
XTest = linspace(-1,1,numPredictions);
XTest = dlarray(XTest,"CB");

Test the model.


for i=1:numel(tTest)
    t = tTest(i);
    TTest = t*ones(1,numPredictions);
    TTest = dlarray(TTest,"CB");

    % Make predictions.
    XTTest = cat(1,XTest,TTest);
    UPred = forward(net,XTTest);

    % Calculate target.
    UTest = solveBurgers(extractdata(XTest),t,0.01/pi);

    % Calculate error.
    UPred = extractdata(UPred);
    err = norm(UPred - UTest) / norm(UTest);

    % Plot prediction.
    ylim([-1.1, 1.1])

    % Plot target.
    hold on
    plot(XTest, UTest,"--",LineWidth=2)
    hold off

    title("t = " + t + ", Error = " + gather(err));

legend(["Prediction" "Target"])

Supporting Functions

Solve Burger's Equation Function

The solveBurgers function returns the true solution of Burger's equation at times t as outlined in [3].

function U = solveBurgers(X,t,nu)

% Define functions.
f = @(y) exp(-cos(pi*y)/(2*pi*nu));
g = @(y) exp(-(y.^2)/(4*nu*t));

% Initialize solutions.
U = zeros(size(X));

% Loop over x values.
for i = 1:numel(X)
    x = X(i);

    % Calculate the solutions using the integral function. The boundary
    % conditions in x = -1 and x = 1 are known, so leave 0 as they are
    % given by initialization of U.
    if abs(x) ~= 1
        fun = @(eta) sin(pi*(x-eta)) .* f(x-eta) .* g(eta);
        uxt = -integral(fun,-inf,inf);
        fun = @(eta) f(x-eta) .* g(eta);
        U(i) = uxt / integral(fun,-inf,inf);


Model Loss Function

The model is trained by enforcing that given an input (x,t) the output of the network u(x,t) fulfills the Burger's equation, the boundary conditions, and the initial condition. In particular, two quantities contribute to the loss to be minimized:


where MSEf=1Nfi=1Nf|f(xfi,tfi)|2 and MSEu=1Nui=1Nu|u(xui,tui)-ui|2.

Here, {xui,tui}i=1Nu correspond to collocation points on the boundary of the computational domain and account for both boundary and initial condition. {xfi,tfi}i=1Nf are points in the interior of the domain.

Calculating MSEf requires the derivatives ut,ux,2ux2 of the output u of the model.

The function modelLoss takes as input, the network net, the network inputs X and T, the initial and boundary conditions X0, T0, and U0, and returns the loss and the gradients of the loss with respect to the learnable parameters.

function [loss,gradients] = modelLoss(net,X,T,X0,T0,U0)

% Make predictions with the initial conditions.
XT = cat(1,X,T);
U = forward(net,XT);

% Calculate derivatives with respect to X and T.
gradientsU = dlgradient(sum(U,"all"),{X,T},EnableHigherDerivatives=true);
Ux = gradientsU{1};
Ut = gradientsU{2};

% Calculate second-order derivatives with respect to X.
Uxx = dlgradient(sum(Ux,"all"),X,EnableHigherDerivatives=true);

% Calculate mseF. Enforce Burger's equation.
f = Ut + U.*Ux - (0.01./pi).*Uxx;
zeroTarget = zeros(size(f),"like",f);
mseF = l2loss(f,zeroTarget);

% Calculate mseU. Enforce initial and boundary conditions.
XT0 = cat(1,X0,T0);
U0Pred = forward(net,XT0);
mseU = l2loss(U0Pred,U0);

% Calculated loss to be minimized by combining errors.
loss = mseF + mseU;

% Calculate gradients with respect to the learnable parameters.
gradients = dlgradient(loss,net.Learnables);



  1. Liu, Dong C., and Jorge Nocedal. "On the limited memory BFGS method for large scale optimization." Mathematical programming 45, no. 1 (1989): 503-528.

  2. Maziar Raissi, Paris Perdikaris, and George Em Karniadakis, Physics Informed Deep Learning (Part I): Data-driven Solutions of Nonlinear Partial Differential Equations

  3. C. Basdevant, M. Deville, P. Haldenwang, J. Lacroix, J. Ouazzani, R. Peyret, P. Orlandi, A. Patera, Spectral and finite difference solutions of the Burgers equation, Computers & fluids 14 (1986) 23–41.

See Also

| |

Related Topics