Checking Validity of Gradients or Jacobians
Optimization solvers often compute more quickly and reliably when you provide first
derivatives of objective and nonlinear constraint functions. However, you can easily
make a mistake in programming these derivatives. To guard against incorrect derivatives,
check the output of the programmed derivative against a finite-difference approximation
by using the checkGradients function.
Check Gradient of Objective Function
Consider the objective Rastrigin's function
The following function computes Rastrigin's function and its gradient correctly.
function [f,g] = ras(x) f = 20 + x(1)^2 + x(2)^2 - 10*cos(2*pi*x(1) + 2*pi*x(2)); if nargout > 1 g(2) = 2*x(2) + 20*pi*sin(2*pi*x(1) + 2*pi*x(2)); g(1) = 2*x(1) + 20*pi*sin(2*pi*x(1) + 2*pi*x(2)); end end
Check that the ras function correctly computes the gradient
near a random initial point. To see details of the calculation, set the
Display name-value parameter to
"on".
rng default x0 = randn(1,2); valid = checkGradients(@ras,x0,Display="on")
____________________________________________________________ Objective function derivatives: Maximum relative difference between supplied and finite-difference derivatives = 7.32446e-08. checkGradients successfully passed. ____________________________________________________________ valid = logical 1
The badras function computes the gradient incorrectly.
function [f,g] = badras(x) f = 20 + x(1)^2 + x(2)^2 - 10*cos(2*pi*x(1) + 2*pi*x(2)); if nargout > 1 g(2) = 2*x(2) + 40*pi*sin(2*pi*x(1) + 2*pi*x(2)); g(1) = 2*x(1) + 40*pi*sin(2*pi*x(1) + 2*pi*x(2)); end end
checkGradients correctly reports that the
badras function computes gradients incorrectly.
valid = checkGradients(@badras,x0,Display="on")____________________________________________________________ Objective function derivatives: Maximum relative difference between supplied and finite-difference derivatives = 0.494224. Supplied derivative element (1,1): 92.9841 Finite-difference derivative element (1,1): 47.0291 checkGradients failed. Supplied derivative and finite-difference approximation are not within 'Tolerance' (1e-06). ____________________________________________________________ valid = logical 0
Use the correct gradient to find a local minimum of
ras(x).
rng default x0 = randn(1,2); options = optimoptions("fminunc",SpecifyObjectiveGradient=true); [x,fval,exitflag,output] = fminunc(@ras,x0,options)
Local minimum found.
Optimization completed because the size of the gradient is less than
the value of the optimality tolerance.
x =
0.9975 0.9975
fval =
11.9949
exitflag =
1
output =
struct with fields:
iterations: 9
funcCount: 13
stepsize: 5.7421e-05
lssteplength: 1
firstorderopt: 1.2426e-05
algorithm: 'quasi-newton'
message: 'Local minimum found.↵↵Optimization completed because the size of the gradient is less than↵the value of the optimality tolerance.↵↵<stopping criteria details>↵↵Optimization completed: The first-order optimality measure, 2.482746e-07, is less ↵than options.OptimalityTolerance = 1.000000e-06.'The solver takes 9 iterations and only 13 function evaluations to reach a local minimum. Without the gradient, the solver takes more function evaluations.
[x,fval,exitflag,output] = fminunc(@ras,x0) % No optionsLocal minimum found.
...
output =
struct with fields:
iterations: 9
funcCount: 39
...This time, the solver takes 39 function evaluations to reach essentially the same solution as before.
Check Jacobian of Vector Objective Function
The lsqnonlin, lsqcurvefit, and
fsolve functions use vector-valued objective functions.
(The objective function that these solvers try to minimize is the sum of squares of
the vector-valued objective.) A Jacobian is the first
derivative of a vector-valued objective function. The Jacobian J
of a function F(x) with m components, where
the variable x has n components, is an
m-by-n matrix. J(i,j)
is the partial derivative of Fi(x) with
respect to xj.
For example, the vecfun code calculates the gradient for a
function of four parameters (a, b,
c, and d in the code) with a number of
variables that depends on the input data t. The
t vector corresponds to a set of times, and each time leads
to two entries in the objective function F(x,t).
function [F,J] = vecfun(x,t) t = t(:); % Reshape t to a column vector a = x(1); b = x(2); c = x(3); d = x(4); nt = length(t); F = [a*exp(-b*t) + c,... c*exp(-d*t)]; % numel(F) = 2*nt if nargout > 1 J = zeros(2*nt,4); J(1:nt,1) = exp(-b*t); % First nt components corresponding to a*exp(-b*t) + c J(1:nt,2) = (-t.*a.*exp(-b*t)); J(1:nt,3) = ones(nt,1); J(nt+1:2*nt,3) = exp(-d*t); % Second nt components corresponding to c*exp(-d*t) J(nt+1:2*nt,4) = (-t.*c.*exp(-d*t)); end end
Validate the gradient calculation in vecfun.
t = [-1/2 1/2 1]; % Three times x = [1 2 3 4]; % Arbitrary x point valid = checkGradients(@(x)vecfun(x,t),x,Display="on")
____________________________________________________________ Objective function derivatives: Maximum relative difference between supplied and finite-difference derivatives = 1.51598e-08. checkGradients successfully passed. ____________________________________________________________ valid = logical 1
Create artificial response data corresponding to the vecfun
function at a set of times. Use the Jacobian to help fit parameters of the function
to the data.
t = linspace(1,5); x0 = [1 2 3 4]; rng default x01 = rand(1,4); y = vecfun(x0,t); y = y + 0.1*randn(size(y)); % Add noise to response options = optimoptions("lsqcurvefit",SpecifyObjectiveGradient=true); [x,resnorm,~,exitflag,output] = lsqcurvefit(@vecfun,x01,t,y,[],[],options)
Local minimum possible.
lsqcurvefit stopped because the final change in the sum of squares relative to
its initial value is less than the value of the function tolerance.
x =
0.9575 1.4570 2.9894 3.7289
resnorm =
2.2076
exitflag =
3
output =
struct with fields:
firstorderopt: 1.0824e-04
iterations: 11
funcCount: 12
cgiterations: 0
algorithm: 'trust-region-reflective'
stepsize: 0.0111
message: 'Local minimum possible.↵↵lsqcurvefit stopped because the final change in the sum of squares relative to ↵its initial value is less than the value of the function tolerance.↵↵<stopping criteria details>↵↵Optimization stopped because the relative sum of squares (r) is changing↵by less than options.FunctionTolerance = 1.000000e-06.'
bestfeasible: []
constrviolation: []The returned solution vector, [0.9575,1.4570,2.9894,3.7289], is
close to the original vector [1,2,3,4]. The solver takes just 12
function evaluations. Without the Jacobian, the solver takes more
evaluations.
[x,resnorm,~,exitflag,output] = lsqcurvefit(@vecfun,x01,t,y) % No optionsLocal minimum possible.
...
output =
struct with fields:
firstorderopt: 1.0825e-04
iterations: 11
funcCount: 60
...This time, the solver takes 60 function evaluations to reach essentially the same solution as before.
Modify Finite-Difference Options and checkGradients Arguments
Sometimes checkGradients can give an incorrect result:
For a correct gradient,
checkGradientscan give a false report of an invalid check. Typically, this occurs because the function has a relatively large second derivative, so the finite-difference estimate is inaccurate. The false report can also occur because the value of theTolerancename-value argument is too small.For an incorrect gradient,
checkGradientscan give a false report of a valid check. Typically, this false report occurs because the value of theTolerancename-value argument is too large, or because the results contain a coincidentally matching derivative.
To check the gradient more carefully when you get a false report of an invalid
check, change the finite differencing options. For example,
checkGradients incorrectly reports that the
ras gradient is incorrect starting from the point
[-2,4].
valid = checkGradients(@ras,[-2,4],Display="on")____________________________________________________________ Objective function derivatives: Maximum relative difference between supplied and finite-difference derivatives = 1.88497e-06. Supplied derivative element (1,2): 6.21515 Finite-difference derivative element (1,2): 6.21516 checkGradients failed. Supplied derivative and finite-difference approximation are not within 'Tolerance' (1e-06). ____________________________________________________________ valid = logical 0
Set the FiniteDifferenceType option to
"central" and test again.
opts = optimoptions("fmincon",FiniteDifferenceType="central"); valid = checkGradients(@ras,[-2,4],opts,Display="on")
____________________________________________________________ Objective function derivatives: Maximum relative difference between supplied and finite-difference derivatives = 1.10182e-09. checkGradients successfully passed. ____________________________________________________________ valid = logical 1
Central finite differences are typically more accurate. In this case, the relative difference between the computed gradient and the central finite-difference estimate is about 1e-9. The default forward finite difference gives a relative difference of about 2e-6.
Check the validity of the gradient by loosening the tolerance from the default 1e-6 to 1e-5.
valid = checkGradients(@ras,[-2,4],Tolerance=1e-5,Display="on")____________________________________________________________ Objective function derivatives: Maximum relative difference between supplied and finite-difference derivatives = 1.88497e-06. checkGradients successfully passed. ____________________________________________________________ valid = logical 1
In this case, checkGradients correctly reports that the
gradient is valid. The looser tolerance does not cause the badras
function to pass the check.
valid = checkGradients(@badras,[-2,4],Tolerance=1e-5,Display="on")____________________________________________________________ Objective function derivatives: Maximum relative difference between supplied and finite-difference derivatives = 0.400823. Supplied derivative element (1,2): 4.4368 Finite-difference derivative element (1,2): 6.21516 checkGradients failed. Supplied derivative and finite-difference approximation are not within 'Tolerance' (1e-05). ____________________________________________________________ valid = logical 0
Check Derivatives of Nonlinear Constraints
The unitdisk2 function correctly calculates a constraint
function and its gradient to keep x inside the disk of radius
1.
function [c,ceq,gc,gceq] = unitdisk2(x) c = x(1)^2 + x(2)^2 - 1; ceq = [ ]; if nargout > 2 gc = [2*x(1);2*x(2)]; gceq = []; end end
The unitdiskb function incorrectly calculates the gradient of
the constraint function.
function [c,ceq,gc,gceq] = unitdiskb(x) c = x(1)^2 + x(2)^2 - 1; ceq = [ ]; if nargout > 2 gc = [x(1);x(2)]; % Gradient incorrect: off by a factor of 2 gceq = []; end end
Set the IsConstraint name-value argument to
true and check the gradient at the point x0 =
randn(1,2).
rng default x0 = randn(1,2); valid = checkGradients(@unitdisk2,x0,IsConstraint=true,Display="on")
____________________________________________________________ Nonlinear inequality constraint derivatives: Maximum relative difference between supplied and finite-difference derivatives = 1.53459e-08. checkGradients successfully passed. ____________________________________________________________ valid = 1×2 logical array 1 1
checkGradients correctly reports that the gradient of
unitdisk2 is valid.
valid = checkGradients(@unitdiskb,x0,IsConstraint=true,Display="on")____________________________________________________________ Nonlinear inequality constraint derivatives: Maximum relative difference between supplied and finite-difference derivatives = 1. Supplied derivative element (2,1): 1.8324 Finite-difference derivative element (2,1): 3.66479 checkGradients failed. Supplied derivative and finite-difference approximation are not within 'Tolerance' (1e-06). ____________________________________________________________ valid = 1×2 logical array 0 1
checkGradients correctly reports that the gradient of
unitdiskb is not valid.
Check Gradients in Script
You can use the checkGradients function to stop a script when
a finite-difference approximation does not match the corresponding gradient
function.
rng default x0 = randn(1,2); opts = optimoptions("fmincon",FiniteDifferenceType="central"); assert(checkGradients(@ras,x0,opts,Display="on")) assert(all(checkGradients(@unitdisk2,x0,opts,... IsConstraint=true,Display="on"))) [x,fval] = fmincon(@ras,x0,[],[],[],[],[],[],@unitdisk2)
____________________________________________________________
Objective function derivatives:
Maximum relative difference between supplied
and finite-difference derivatives = 7.52301e-10.
checkGradients successfully passed.
____________________________________________________________
____________________________________________________________
Nonlinear inequality constraint derivatives:
Maximum relative difference between supplied
and finite-difference derivatives = 1.44672e-11.
checkGradients successfully passed.
____________________________________________________________
Local minimum found that satisfies the constraints.
Optimization completed because the objective function is non-decreasing in
feasible directions, to within the value of the optimality tolerance,
and constraints are satisfied to within the value of the constraint tolerance.
x =
0.4987 0.4987
fval =
10.4987Run the same script with the incorrect unitdiskb constraint
function. Note that the script stops early.
rng default x0 = randn(1,2); opts = optimoptions("fmincon",FiniteDifferenceType="central"); assert(checkGradients(@ras,x0,opts,Display="on")) assert(all(checkGradients(@unitdiskb,x0,opts,... IsConstraint=true,Display="on"))) [x,fval] = fmincon(@ras,x0,[],[],[],[],[],[],@unitdiskb)
____________________________________________________________ Objective function derivatives: Maximum relative difference between supplied and finite-difference derivatives = 7.52301e-10. checkGradients successfully passed. ____________________________________________________________ ____________________________________________________________ Nonlinear inequality constraint derivatives: Maximum relative difference between supplied and finite-difference derivatives = 1. Supplied derivative element (2,1): 1.8324 Finite-difference derivative element (2,1): 3.66479 checkGradients failed. Supplied derivative and finite-difference approximation are not within 'Tolerance' (1e-06). ____________________________________________________________ Error using assert Assertion failed.