How can I get splines subject to constraints?

Hello. I am trying to use splines in order to get the probability density function from a histogram, something like what can be seen in the picture:
I have to include some constraints, such as:
(Where is the mean of the observed data)
The code that I tryed (with the help of the gpt guy) is completely useless, nonetheless I attach it here:
[counts, edges] = histcounts(data, bins, 'Normalization', 'pdf');
Unrecognized function or variable 'data'.
bin_centers = (edges(1:end-1) + edges(2:end)) / 2;
x_min = min(data)
x_max = max(data)
% mitjana de les dades
x_mean = mean(data)
% Constraints
syms f(x) [1 num_bins] real
assume(f >= 0) % Constraint 2
eq1 = int(f, x_min, x_max) == 1
eq3 = int(x*f, x_min, x_max) == x_mean
eqs = [eq1; eq3];
vars = f;
sol = vpasolve(eqs, vars)
f_opt = double(subs(sol, bin_centers));
One of the problems is the following:
Error using symfun/assume
Assumptions on symbolic functions not supported. Make assumptions on symbolic variables and expressions instead.
Also, the spline function is nowhere to be seen. Can anyone give me a clue?
Thank you very much!

9 Kommentare

Torsten
Torsten am 25 Feb. 2025
Bearbeitet: Torsten am 25 Feb. 2025
Is your probability density function discrete (i.e. do you only get values 25, 75, 125, 175, ..., 675 from your random variable) ? Or are values "in-between" also possible ?
Matt J
Matt J am 25 Feb. 2025
Why not just use ksdensity?
Torsten
Torsten am 25 Feb. 2025
Bearbeitet: Torsten am 25 Feb. 2025
If the distribution is discrete, approximating with a continuous pdf doesn't make sense in my opinion. That's why I asked about the possible outcomes of the random experiment.
Carles
Carles am 26 Feb. 2025
@Torsten My probability density function (pdf) is continuous. I sorted the data in a histogram, with bins with "amplitude" equal to 50. The values 25, 75, 125, ..., 675 are just the mean values of the bin.
The idea is to get a pdf that I can use for doing Monte Carlo simulations later.
Carles
Carles am 26 Feb. 2025
@Matt J The ksdensity function works fine, but when I plot the cumulative distribution function, I get the following:
ksdensity(data, 'Function','cdf','NumPoints',200)
As you can see, the probability that , and I want to have a pdf that fulfill the following constrain:
Therefore, I have the same problem as before: I need to set the constrains.
Torsten
Torsten am 26 Feb. 2025
The idea is to get a pdf that I can use for doing Monte Carlo simulations later.
To perform a Monte-Carlo simulation, you need an approximate cdf, not pdf.
Carles
Carles am 26 Feb. 2025
@Torsten Yes, you are right, but I tought that it was more appropiate to find first the pdf and then the cdf. How can I get the cdf with splines?
Torsten
Torsten am 26 Feb. 2025
Bearbeitet: Torsten am 26 Feb. 2025
I gave a link to appropriate code below. Spline approximations are not useful in this case.
Carles
Carles am 26 Feb. 2025
@Torsten Sorry, I didn't see your answer before. It is not exactly what I was looking for, but I guess I'll make it work. Thank you very much!

Melden Sie sich an, um zu kommentieren.

 Akzeptierte Antwort

Torsten
Torsten am 26 Feb. 2025
Bearbeitet: Torsten am 26 Feb. 2025
This code looks helpful for your purpose:
% Generate 1000 normal random numbers
mu = 0; sigma = 1; nr = 1000;
givenDist = mu + sigma * randn(nr,1);
generatedDist = emprand(givenDist,nr,1);
% %
% % % Plot histogram to check given and generated distribution
[n,xout] = hist(givenDist);
hist(givenDist);
hold on
hist(generatedDist,xout)
% %
% Plot cdf to check given and generated distribution
figure
x = sort(givenDist(:)); % Given distribution
p = 1:length(x);
p = p./length(x);
plot(x,p,'color','r');
hold on
%
xr = sort(generatedDist(:)); % Generated distribution
pr = 1:length(xr);
pr = pr./length(xr);
%
plot(xr,pr,'color','b');
xlabel('x')
ylabel('cdf')
legend('Given Dist.','Generated Dist.')
title('1000 random numbers generated from given normal distribution of data');
function xr = emprand(dist,varargin)
%EMPRAND Generates random numbers from empirical distribution of data.
% This is useful when you do not know the distribution type (i.e. normal or
% uniform), but you have the data and you want to generate random
% numbers form the data. The idea is to first construct cumulative distribution
% function (cdf) from the given data. Then generate uniform random number and
% interpolate from cdf.
%
% USAGE:
% xr = EMPRAND(dist) - one random number
% xr = EMPRAND(dist,m) - m-by-m random numbers
% xr = EMPRAND(dist,m,n) - m-by-n random numbers
%
% INPUT:
% dist - vector of distribution i.e. data values
% m - generates m-by-m matrix of random numbers
% n - generates m-by-n matrix of random numbers
%
% OUTPUT:
% xr - generated random numbers
%
% EXAMPLES:
% % Generate 1000 normal random numbers
% mu = 0; sigma = 1; nr = 1000;
% givenDist = mu + sigma * randn(nr,1);
% generatedDist = emprand(givenDist,nr,1);
% %
% % % Plot histogram to check given and generated distribution
% [n,xout] = hist(givenDist);
% hist(givenDist);
% hold on
% hist(generatedDist,xout)
% %
% Plot cdf to check given and generated distribution
% figure
% x = sort(givenDist(:)); % Given distribution
% p = 1:length(x);
% p = p./length(x);
% plot(x,p,'color','r');
% hold on
%
% xr = sort(generatedDist(:)); % Generated distribution
% pr = 1:length(xr);
% pr = pr./length(xr);
%
% plot(xr,pr,'color','b');
% xlabel('x')
% ylabel('cdf')
% legend('Given Dist.','Generated Dist.')
% title('1000 random numbers generated from given normal distribution of data');
%
% HISTORY:
% version 1.0.0, Release 05-Jul-2005: Initial release
% version 1.1.0, Release 16-Oct-2007: Some bug fixes and improvement of help text
% 1. Can handle NaN values in dist
% 2. Extraplolate for out of range
% 3. Calling function EMPCDF is included within this function
%
% See also:
% Author: Durga Lal Shrestha
% UNESCO-IHE Institute for Water Education, Delft, The Netherlands
% eMail: durgals@hotmail.com
% Website: http://www.hi.ihe.nl/durgalal/index.htm
% Copyright 2004-2007 Durga Lal Shrestha.
% $First created: 05-Jul-2005
% $Revision: 1.1.0 $ $Date: 16-Oct-2007 21:47:47 $
% ***********************************************************************
%% INPUT ARGUMENTS CHECK
error(nargchk(1,3,nargin));
if ~isvector(dist)
error('Invalid data size: input data must be vector')
end
if nargin == 2
m = varargin{1};
n = m;
elseif nargin == 3
m = varargin{1};
n = varargin{2};
else
m = 1;
n = 1;
end
%% COMPUTATION
x = dist(:);
% Remove missing observations indicated by NaN's.
t = ~isnan(x);
x = x(t);
% Compute empirical cumulative distribution function (cdf)
xlen = length(x);
x = sort(x);
p = 1:xlen;
p = p./xlen;
% Generate uniform random number between 0 and 1
ur = rand(m,n);
% Interpolate ur from empirical cdf and extraplolate for out of range
% values.
xr = interp1(p,x,ur,[],'extrap');
end

Weitere Antworten (1)

You are trying to get vpasolve() to invent functions that satisfy certain purposes, trying various forms of f until it satisfies constraints on integrals.
However, solve() and vpasolve() never invent functions. The only routine that invents functions is dsolve()
syms f(x) [1 num_bins] real
It is not permitted to set assumptions on abstract functions.
assume(f >= 0) % Constraint 2
It is not permitted to set assumptions on abstract functions.
For that matter, if you had had something like
syms x
f(x) = x^3 - x^2 + 1
f(x) = 
then it would be permitted to set assumptions on the function turned into an expression,
assume(f(x) > 0)
assumptions
ans = 
but it would not be permitted to set assumptions on the symbolic function form
assume(f > 0)
Error using symfun/assume (line 12)
Assumptions on symbolic functions not supported. Make assumptions on symbolic variables and expressions instead.

1 Kommentar

syms f(x) [1 num_bins] real
That attempts to define f(x) as a single function that returns an array of length num_bins and which is real-valued.
eq1 = int(f, x_min, x_max) == 1
eq3 = int(x*f, x_min, x_max) == x_mean
The integral of a function returning a vector of length num_bins is a vector of length num_bins, so eq1 would be a vector of num_bins equations; likewise eq3 would be a vector of num_bins equations.
eqs = [eq1; eq3];
You [;] between two vectors 1 x num_bins, so eqns would be 2 x num_bins of equations.
vars = f;
If that were valid, it would be a function that returned a vector 1 x num_bins
sol = vpasolve(eqs, vars)
and there you would be trying to vpasolve() 2 x num_bins equations in a vector of 1 x num_bins pseudo-variables.
When vpasolve() works, you either need to specify no variables (in which case it solves over all of the symvar() variables), or else you need to specify the explicit list of all variables. (Specifying an explicit list of all variables is useful for the case where you provide initial values, as the initial value list must either be numvars x 1 or else numvars x 2 and you would want to be sure that each initial value is associated with the correct variable.)

Melden Sie sich an, um zu kommentieren.

Gefragt:

am 25 Feb. 2025

Kommentiert:

am 26 Feb. 2025

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by