when I add two floating point numbers the result is not correct above 262144

Question

Mark Ekblad am 30 Apr. 2018

0
Verknüpfen

Direkter Link zu dieser Frage

https://de.mathworks.com/matlabcentral/answers/398272-when-i-add-two-floating-point-numbers-the-result-is-not-correct-above-262144

Kommentiert: John D'Errico am 30 Apr. 2018

When I execute the following I always get back 262144, it is like it is treating it as an integer

a=single(262144.000);
for i=1:1234,
a= a+single(0.01);
end;
display(a-floor(a));
sprintf('%10.6f',a)
    single
       0
ans =
      '262144.000000'

or if I just do a = single(262144.000)+single(0.01) result is 262144

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Melden Sie sich an, um zu kommentieren.

Melden Sie sich an, um diese Frage zu beantworten.

Answer 1

James Tursa am 30 Apr. 2018

0
Verknüpfen

Direkter Link zu dieser Antwort

https://de.mathworks.com/matlabcentral/answers/398272-when-i-add-two-floating-point-numbers-the-result-is-not-correct-above-262144#answer_317968

Bearbeitet: James Tursa am 30 Apr. 2018

In MATLAB Online öffnen

The amount you are adding, 0.01, is less than the eps of the number you are using. E.g.,

>> a = single(262144.000)
a =
      262144
>> eps(a)
ans =
    0.0313
>> a + 0.01
ans =
      262144

So the result of the addition does not change the value of "a" because there is not enough precision in "a". I.e., the closest number to 262144.01 in IEEE single precision is in fact 262144. The next highest number is 262144.03125.

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Melden Sie sich an, um zu kommentieren.

Answer 2

the cyclist am 30 Apr. 2018

0
Verknüpfen

Direkter Link zu dieser Antwort

https://de.mathworks.com/matlabcentral/answers/398272-when-i-add-two-floating-point-numbers-the-result-is-not-correct-above-262144#answer_317967

In MATLAB Online öffnen

It is not possible to exactly represent decimal numbers with a finite number of bits (in this case, 32 bits, because you are specifying single-precision).

So, you will get that

single(262144) + single(0.1)

is most closely representable by

2.6214409e5

but that

single(262144) + single(0.01)

is most closely represented as

262144

The numbers are not "being treated as integers", but as the closest representation in this 32 bit system as possible.

You would similarly see that

262144 + 1.e-10

is represented as

2.621440000000001e+05

but

262144 + 1.e-11

is represented (and displayed) as

262144

in double precision.

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Melden Sie sich an, um zu kommentieren.

Answer 3

John D'Errico am 30 Apr. 2018

0
Verknüpfen

Direkter Link zu dieser Antwort

https://de.mathworks.com/matlabcentral/answers/398272-when-i-add-two-floating-point-numbers-the-result-is-not-correct-above-262144#answer_317969

In MATLAB Online öffnen

Time to learn what eps means. After all, YOU were the one who chose to use single precision here. So that means you should also learn what the consequences of that decision will be.

a=single(262144.000)
a =
  single
      262144
eps(a)
ans =
  single
      0.03125

eps(a) is essentially the smallest number that can be added to a, and still get a different number.

a == a+eps(a)
ans =
  logical
   0
a == a+eps(a)/2
ans =
  logical
   1

Think of eps(a) as the size of the least significant bit in the number a.

2 Kommentare
Keine anzeigenKeine ausblenden

the cyclist am 30 Apr. 2018

In MATLAB Online öffnen

As an addendum to help your understanding, note that

a = single(262144);
e = eps(a);
b = log2(e)

results in

b = -5

So, eps() returns the distance to the next-largest floating-point number (of the same precision), and in this case that distance is 2^(-5).

This illustrates the binary aspect of the representation.

John D'Errico am 30 Apr. 2018

Excellent point.

Melden Sie sich an, um zu kommentieren.

when I add two floating point numbers the result is not correct above 262144

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Akzeptierte Antwort

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Weitere Antworten (2)

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

2 Kommentare
Keine anzeigenKeine ausblenden

Siehe auch

Kategorien

Tags

Produkte

Community Treasure Hunt

when I add two floating point numbers the result is not correct above 262144

0 Kommentare -2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Akzeptierte Antwort

0 Kommentare -2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Weitere Antworten (2)

0 Kommentare -2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

2 Kommentare Keine anzeigenKeine ausblenden

Siehe auch

Kategorien

Tags

Produkte

Community Treasure Hunt

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

2 Kommentare
Keine anzeigenKeine ausblenden