why 350*0.001 - 0.350 and 351*0.001 - 0.351 are different from zero
3 views (last 30 days)
Hello, recently while programming a code for finite element analysis I found a rounding error in an operation that in principle should be very simple. I tested in different Matlab versions (2012a, 2015b, 2018a, 2022a) and always got the same error. Does anyone know why this error occurs.
this happens for 350*0.001 - 0.350 and for 351*0.001 - 0.351
It is worth clarifying that writing in this way 350/1000 - 0.350 the error does not appear.
350*0.001 - 0.350
351*0.001 - 0.351
Walter Roberson on 29 Jun 2022
Consider any finite-length positional notation with a fixed base, B. For example, base 10 and 0.193 meaning 1 * 10^-1 + 9 * 10^-2 + 3 * 10^-3 -- or to put it another way, (1*10^2 + 9*10^1 + 3*10^0)/10^3 . 193 / 10^3 .
Consider a number between 0 and 1. Hypothesize that we can express it as a rational fraction, an integer N divided by M digits of the base, N / B^M . Now let the number be 1 divided by a number that is relatively prime to B, N/B^M = 1/P, with N an integer 0 to (B^M - 1) . For example, 3 is relatively prime to 10, so N/10^M = 1/'3 would be an example.
Now, cross multiply the denominator to get N = B^M / P . But our hypothesis is that B and P are relatively prime, so we know that B^M cannot be divided exactly by P.
Therefore, for any finite length M for fixed integer base B, there exist numbers (rational numbers even!) that cannot be exactly represented in the base. In the previous example, 1/3 cannot be exactly represented in any fixed length number of decimal digits; neither can 1/7 or 1/11 or 1/13 ...
Now, let the base be 2, and the relative prime be 10. N = 2^M/10 cannot work out, for any finite number of digits. Except for 2^0 there is no power of 2 that is exactly divisible by 10.
And therefore, there is no possible finite base-2 positional representation of 1/10 (or 1/100 or 1/1000). And so as long as you are using finite binary representation, 0.001 (base 10) can never exactly equal 1/1000 . So when you multiply 0.001 represented in finite positional binary by 350, you are never going to get exactly 350/1000 .
The question then becomes whether the value that you do get for 350*0.001 is the same approximation as you get for writing 0.350 . And the answer for that happens to be NO. And if it were the same, that would be by chance, and there would be different numbers that failed to work out.
Given any particular rounding system, even given any fixed number of extra "guard" digits for multiplication, you can show that as long as you are using a finite positional integer base system, that there will be cases like this, where the rounded representations will not be equal after a multiplication.
I am emphasizing that this is not a MATLAB bug: this is an inherent problem for every finite positional integer-base number system.
You could reduce problems if you immediately switch everything to indefinite-precision rational numbers and carry out the calculations as rationals, but (A) this would require growing amounts of memory as you went through the calculations; and (B) it would not completely solve the problems anyhow. (For example, if the user wrote 0.3333333333 then were they "intending" to write the rational 1/3, or were they "intending" to write the rational 3333333333/10000000000 ?)
MATLAB chose finite binary representation because that is what your computer hardware uses.
Chunru on 29 Jun 2022
The floating-point numbers are represented by a sequence of binary bits in computer. For matlab, the default double type use 64 bits to represent number. There are infinite number of floating point numbers and there are only finte number of them can be represented by double type. Therefore, some numbers are represented approximately with some rounding errors.
For the expression like below,
350*0.001 - 0.350 and for 351*0.001 - 0.351
some of numbers are approxemate at the beginning. Arithmetic operation may further cause or rounding error. Therefore the result is not exactly 0. To compare if the results are within certain range for floating point numbers, you can use the following
abs(350*0.001 - 0.350) < eps
where eps is a very small number (doc eps for more details)