Optimization by adding a penalization term

12 Ansichten (letzte 30 Tage)
Heborvi
Heborvi am 3 Okt. 2016
Kommentiert: Matt J am 4 Okt. 2016

Hello, I write the following function that minimize an objective function by adding a penalization term:

function [theta_opt, fval]=optimization_P0_RMDF_Penalise_without_laguerreValue(X,D,a,eta,sigma)
[n,p]=size(X);
    function f = objectif_function(theta)
        fun=0;
        for i=1:n
            for j=i+1:n
                S=0;
                for k=1:p
                    S=S+(X(i,k)+theta(i,k)-X(j,k)-theta(j,k))^2;
                end
                lambda(i,j)=S;
                fun=fun+D(i,j)^2+2*sigma^2*a^2*p+a^2*lambda(i,j)-2*sqrt(pi)*a*sigma*D(i,j)*laguerreL(1/2,p/2,-(lambda(i,j)/(4*sigma^2)));
            end
        end
%%%%%%%%Penalization term%%%%%%%%%%%%
      nb_disp=0;
      for i=1:n
          for k=1:p
           if theta(i,k)~=0
            nb_disp=nb_disp+1;
           end
          end
      end  
     f=fun+eta*nb_disp;
  end
theta0=zeros(n,p);
options = optimset('Display','iter','Algorithm','active-set');
[theta_opt,fval] = fminunc(@objectif_function,theta0,options);
end    

My problem that the optimzation is made without taking into account the penalization term. The result of optimization for any value of $\eta$ is similar to that obtained by taking eta=0 which means that the part "penalization term" is not considered during the optimization .

Can someone help me to fix thisd problem.

Thanks in advance,

Heborvi

Antworten (3)

Matt J
Matt J am 3 Okt. 2016
Your penalty term is a discrete function of theta, and hence not differentiable. This violates the assumptions of fminunc. Moreover, because the penalty function is piece-wise flat, it has zero gradient almost everywhere, which would likely explain why it's hard to get it to move, for small values of eta.
  2 Kommentare
Heborvi
Heborvi am 3 Okt. 2016
Bearbeitet: Heborvi am 3 Okt. 2016
Thank you for your answer.
So it is needed to choose a large value of eta? I have tested different values of eta and I still obtain the same result!
Matt J
Matt J am 3 Okt. 2016
Bearbeitet: Matt J am 3 Okt. 2016
Because the gradient and Hessian is zero almost everywhere, the search directions, calculated by fminunc using function derivatives, are the same almost everywhere as when eta=0.

Melden Sie sich an, um zu kommentieren.


John D'Errico
John D'Errico am 3 Okt. 2016
Bearbeitet: John D'Errico am 3 Okt. 2016
A major part of your problem is that you have created a discontinuous objective by adding discrete integer amounts to the objective as a penalty. Then you want to use fminunc to optimize the problem.
You clearly do not appreciate that fminunc REQUIRES a differentiable objective. That you call it a penalty term, as opposed to the objective is irrelevant. fminunc sees a discontinuous objective function. That will cause it to do unpredictable things.
In fact, this is abad thing to do to virtually ANY optimizer, so I am not sure who suggested the idea to you, or where you got it from.
My guess is that your penalty, if it is your goal that this term be close to zero, should be a simple function of the distance away from the goal. Quadratic penalties might make sense to you, maybe even exponential in some form. I can't say. Note that a LINEAR penalty function would again be wrong, since then your objective is again non-differentiable.
  2 Kommentare
Heborvi
Heborvi am 3 Okt. 2016
Bearbeitet: Heborvi am 3 Okt. 2016
Thank you for you answer,
In fact my penalty term is related to the parsimonious choice of vectors theta, therefore I use the l0-norm to do this. So, in my case I cant use fminunc to do my optimization?
John D'Errico
John D'Errico am 3 Okt. 2016
Fminunc is absolutely out of the question. And since a smoothly increasing penalty is not an option, you cannot really use any optimizer that assumes differentiability.
I think you need to be looking for a mixed integer programming tool, that can handle nonlinear objectives.

Melden Sie sich an, um zu kommentieren.


Matt J
Matt J am 4 Okt. 2016
Bearbeitet: Matt J am 4 Okt. 2016
In fact my penalty term is related to the parsimonious choice of vectors theta, therefore I use the l0-norm to do this.
Often people compromise by using the l1-norm instead. This can be formulated differentiably by minimizing
min. fun(theta)+eta*sum(r)
s.t. r(i) >= theta(i) ,
r(i) >= -theta(i)
where we have introduced additional unknown variables, r(i). This will require fmincon, as opposed to fminunc.
  6 Kommentare
Heborvi
Heborvi am 4 Okt. 2016
And the sparsity of theta is taken into account by using this formulation?
Matt J
Matt J am 4 Okt. 2016
Insofar as norm(theta,1) is an approximation of norm(theta,0), yes.

Melden Sie sich an, um zu kommentieren.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by