"diff" function doesn't work properly with small numbers

129 Ansichten (letzte 30 Tage)
Sylwester
Sylwester am 22 Dez. 2025 um 14:10
Kommentiert: Fangjun Jiang vor 2 Minuten
For some reason when difference between n and n+1 is too small diff function assumes the solution is 0.
There are +-290 data points on the plot, The precision is 10^(-10), As far as i know Matlab works on 16 or 32 digits so it shouldn't be a problem.
Technically on the plot there should be on no constants, Just increase and decrease of value.
Pomiary=cisnienie300920151701average300
Czas = Pomiary{:, 4};
Temperatura = Pomiary{:, 5};
CzasDMY= Czas / 86400 + datenum(1970, 1, 1);
y = Temperatura;
x = CzasDMY;
ydiff=diff(y,1);
wieksze = (ydiff > 0);
mniejsze = (ydiff < 0);
gora = y;
dol = y;
gora(~wieksze) = NaN;
dol(~mniejsze) = NaN;
plot(x,y,'b',x, gora, 'r', x, dol, 'g');
grid on;
xlim tight;
xlim("auto");
ylim("auto");
legend("Constant", "Increasing", "Decreasing");
legend("Position", [0.15754,0.1468,0.20438,0.12165]);
  8 Kommentare
dpb
dpb vor etwa eine Stunde
Bearbeitet: dpb vor 21 Minuten
whos -file x
Name Size Bytes Class Attributes x 288x1 2304 double
whos -file y
Name Size Bytes Class Attributes y 288x1 2304 double
d=dir('cisni*.mat');
whos('-file',d.name)
Name Size Bytes Class Attributes cisnienie300920151701average300 - 16275 table
load x
load y
X=[x y];
fprintf('%.12f %.12f\n',X(1:10,:).')
736236.625000000000 1022.575847900000 736236.628472222248 1022.556940100000 736236.631944444496 1022.566749900000 736236.635416666628 1022.576562000000 736236.638888888876 1022.592748100000 736236.642361111124 1022.587627700000 736236.645833333372 1022.544010600000 736236.649305555504 1022.502575900000 736236.652777777752 1022.478764300000 736236.656250000000 1022.475430000000
dy=diff(y);
iy=find(dy==0);
nnz(iy)
ans = 5
This shows there are 5 separate repeated instances in the y vector.
iy
iy = 5×1
38 91 144 198 251
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
shows that there aren't repeated values more than two in a rwo in this data set at least so the averaging technique in the earlier Answer would work to produce something that would have no zero differences if that is the ultimate goal.
Why it is significant and not just accepting the result as is is, so far, unclear? But, as noted, the problem is not in diff() or machine precision, but that the data have been rounded such that there really are identical values.
fprintf('%.14f\n',y(iy(1)+[-1:2]))
1023.03861350000000 1023.01522350000005 1023.01522350000005 1022.96080629999994
plot(x(iy(1)+[-1:2]),y(iy(1)+[-1:2]),'*-')
Reproduces exactly the problem illustrated before -- the data are identical to machine precision because the values have been rounded to seven (7) decimal digits and when read into memory from the input file containing those values, they were interpreted and stored identically in memory. Ergo, the diff() between those subsequent positions is, as it returns identically zero.
As my Answer over the same subset of the data shows, your only choices if you find this result unacceptable is to provide the data with full precision as input on the hope that there will be a difference in later digits in the original before the rounding or as illustrated there, interpolate over the range beyond the duplicated values to produce a different result for the second/repeated value such that a subsequent diff() would be nonzero. The caveats noted there are still in play, of course.
The basic answer is that your data are, indeed, not changing at every point in either a positive or negative direction but are unchanging over at least two consecutive positions and diff() is just doing its job.
Fangjun Jiang
Fangjun Jiang vor 13 Minuten
@dpb, @Sylwester, There is no problem regarding diff(). There is no probelm regarding data accuracy or precision. It is a visual mis-conception.
First, as @dpb pointed out, in the whole set of 288 data points, there is only 5 places where the data value is un-changed thus regarded as "Constant" trend.
@Sylwester had this thought. Plot all the data in BLUE color, plot all the "Increasing" trend data in RED color, plot all the "Decreasing" trend data in GREEN color. Since the RED and GREEN color are going to over-write the BLUE color, the resulting plot should show almost no "BLUE" section, since there is only 5 out of 288 data points that are "Constant" trend.
But the resulting plot shows a lot of BLUE. So @Sylwester thought there was a problem in diff().
But there is no problem regarding diff() function. It is just a visual mis-conception. Or it is due to how the plot() function connects the data points with the line style when there are "NAN" data points.
I only changed to this line.
plot(x,y,'.',x, gora, 'r+', x, dol, 'g*');
and the resulting plot gives the correct visual impression (that there is almost no BLUE "Constant" data).

Melden Sie sich an, um zu kommentieren.

Antworten (3)

Fangjun Jiang
Fangjun Jiang am 22 Dez. 2025 um 15:45
The data value and results make sense. There is no problem using diff() to process your data based on your example data.
%%
format long
y=[36 1023.08766260000
37 1023.03861350000
38 1023.01522350000
39 1023.01522350000
40 1022.96080630000]
y = 5×2
1.0e+03 * 0.036000000000000 1.023087662600000 0.037000000000000 1.023038613500000 0.038000000000000 1.023015223500000 0.039000000000000 1.023015223500000 0.040000000000000 1.022960806300000
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
ydiff=diff(y,1)
ydiff = 4×2
1.000000000000000 -0.049049100000047 1.000000000000000 -0.023389999999949 1.000000000000000 0 1.000000000000000 -0.054417200000103
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
wieksze = (ydiff > 0)
wieksze = 4×2 logical array
1 0 1 0 1 0 1 0
mniejsze = (ydiff < 0)
mniejsze = 4×2 logical array
0 1 0 1 0 0 0 1
By default, MATLAB uses 64 bits floating-point data to represent a numeric value.
At around value 1023, its relative accuracy is 1e-13, sufficient to represent your data precision 10e-10.
The problem you observed comes from your raw data. Note that y(3,2) and y(4,2) are exactly the same by visual observation.
eps(1023)
ans =
1.136868377216160e-13
Check the document for eps(). You will understand the issue better.
doc eps
  3 Kommentare
Fangjun Jiang
Fangjun Jiang vor etwa 2 Stunden
The length of diff() output is 1 smaller than its input length. Your code didn't seem to consider this.
diff(1:3)
ans = 1×2
1 1
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
Fangjun Jiang
Fangjun Jiang vor 2 Minuten
The length difference of 1 between the input and output of the diff() function is not an issue either in this case.
There is no issue regarding diff() function or data accuray/precision. The OP has a visual mis-conception due to the way that the plot(x,y,'b') function connects data points with color and line style when there are "NAN" data points in the "y" data set.

Melden Sie sich an, um zu kommentieren.


dpb
dpb vor etwa 19 Stunden
Bearbeitet: dpb vor etwa 3 Stunden
X=[
36 1023.08766260000
37 1023.03861350000
38 1023.01522350000
39 1023.01522350000
40 1022.96080630000];
dx=diff(X)
dx = 4×2
1.0000 -0.0490 1.0000 -0.0234 1.0000 0 1.0000 -0.0544
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
As hypothesized above, some of the temperature/pressure values are identical owing to the apparent rounding to seven (7) decimal digits.
You would have to have at least one more decimal place in the above between the 3rd and 4th data values in order for the difference to not be identically zero.
If you're transferring data from one place to another, to avoid this don't use text files but save the whole internal precision by using .mat files or binary formatted transfer if from some external source. Besides being able to retain full precision (note that precision does not necessarily imply accuracy), it's much more efficient in speed and memory/disk space.
As for your comment above about the values that "They are meant to be the same, The issue is that for some reason function for marking if value increased/decreased has holes in it and skips points unless difference is high enough", that makes no sense at all -- the two values are identically the same so how can there be any sense of the value changed that "increased/decreased" implies?
If you're trying to measure an overall change; then diff is entirely the wrong function as it is on a pointwise basis and so will indeed notice when there are any points for which the difference is actually zero.
Looking at your small subsample of data
plot(X(:,1),X(:,2),'*-')
indeed, there is an overall negative trend, but it isn't uniformly decreasing at every point, just overall. If you want indications of trends excluding such points, you'd have to do something like find the inflection points and then (say) the two points on either side and then use the adjusted temperature to compute the change.
Note that you would also have to locate any locations of more than two successive points being the same and then do something over those ranges. Also, in doing something like this you'll run into the issue that @Fangjun Jiang raised about the differenced vector being shorter than the original so the points are offset by one in the addressing.
For the simple example here
ix=find(dx(:,2)==0); % locate the zero point `
fprintf('%d %15.10f\n',X(ix+[0:1],:).') % display where are relatively
38 1023.0152235000 39 1023.0152235000
X(:,3)=X(:,2); % augment the X array
X(ix+1,3)=mean(X(ix+[0 2],3)); % replace the unchange with linear interp1
hold on
plot(X(:,1),X(:,3),'rx-')
legend('Original','Interpolated','location','northeast')
diff(X)
ans = 4×3
1.0000 -0.0490 -0.0490 1.0000 -0.0234 -0.0234 1.0000 0 -0.0272 1.0000 -0.0544 -0.0272
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
Now you don't have any zeros in the 3rd column diff().

Paul
Paul vor 21 Minuten
The data in gora and dol are on the plot as can be seen below when using markers. However, if the y-data pattern is
increasing->decreasing->increasing ...
then the gora and dol will have data->nan->data ...
and so the data points in gora and dol won't be connected on the plot (and won't be visible at all if not using markers)
load x
load y
ydiff=diff(y,1);
wieksze = (ydiff > 0);
mniejsze = (ydiff < 0);
gora = y;
dol = y;
gora(~wieksze) = NaN;
dol(~mniejsze) = NaN;
figure
plot(x,y,'b',x, gora, 'r-o', x, dol, 'g-x');
xlim([7.3623688,7.3623691]*1e5)
xl = xlim;
counts = (1:numel(x)).';
index = x>xl(1) & x < xl(2);
format long
[counts(index),x(index),y(index),gora(index),dol(index),wieksze(index),mniejsze(index)]
ans = 9×7
1.0e+05 * 0.000750000000000 7.362368819444445 0.010231555306000 0.010231555306000 NaN 0.000010000000000 0 0.000760000000000 7.362368854166666 0.010232377609000 NaN 0.010232377609000 0 0.000010000000000 0.000770000000000 7.362368888888889 0.010232186155000 0.010232186155000 NaN 0.000010000000000 0 0.000780000000000 7.362368923611111 0.010232346412000 NaN 0.010232346412000 0 0.000010000000000 0.000790000000000 7.362368958333334 0.010232249740000 0.010232249740000 NaN 0.000010000000000 0 0.000800000000000 7.362368993055555 0.010232331298000 NaN 0.010232331298000 0 0.000010000000000 0.000810000000000 7.362369027777778 0.010232141998000 0.010232141998000 NaN 0.000010000000000 0 0.000820000000000 7.362369062500000 0.010232185551000 NaN 0.010232185551000 0 0.000010000000000 0.000830000000000 7.362369097222222 0.010231700000000 0.010231700000000 NaN 0.000010000000000 0
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>

Produkte


Version

R2023b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by