Stop Bit growth for power computation

1 view (last 30 days)
I am in need a logic to check bit growth in case of word length is 32 /16/8-bit
I am using following code to check bit growth and cap the max and min 32 bit value But this seems not working for case -2 !!
% Case -1
C = power(-97,28)
C = 4.2620e+55
if (C >= (2^32-1))
C = min(C,2^31);
if (C >= -2^31)
C = -2^31; %min(C,-2^31)
C = -2.1475e+09
% Case -2
C = power(-97,29)
C = -4.1341e+57
if (C >= (2^32-1))
C = min(C,2^31)
if (C >= -2^31)
C = -2^31; %min(C,-2^31)
C = -4.1341e+57
I need logic for both positive and nagative mantissa and exponent value . Result C must not exceed 32 bit word size
Thank you!!

Answers (2)

John D'Errico
John D'Errico on 30 Sep 2021
Edited: John D'Errico on 30 Sep 2021
Case 1: You CANNOT raise a double precision number to a power such that it exceeds flintmax (2^53 - 1) and expect the result to be correct.
ans = 9.0072e+15
And that means when you execute this:
ans = 4.2620e+55
you should expect pure garbage if you expect the result to have correct digits.
sym(-97)^28 % correct
ans = 
power(-97,28) % mostly garbage
ans = 4.2620e+55
sprintf('%55f',power(-97,28)) % note the divergence in the lower digits
ans = '42619520516862345006904392299734156132045387227709046784.000000'
Case 2: While you MAY think that -2^31 raises the number -2 to a negative power, in fact, it forms 2^31, and then negates that result. If the power is odd, then this does not matter, because the negative sign works then. But if the power is even, then it does matter.
Raising a number to a power has a higher order of precedence than does unary minus. So these two operations are not the same:
ans = -1.0737e+09
ans = 1.0737e+09
I used an even power to show they are distinct there.

Steven Lord
Steven Lord on 30 Sep 2021
If you want to saturate one way you can do this is to use integer arithmetic.
b = int32(-97)
b = int32 -97
C = power(b, 28)
C = int32 2147483647
Alternately you could use intmin and intmax as your limits. These functions can return the limits of any of the eight integer types (signed and unsigned 8, 16, 32, and 64 bit integers.)
q = 2^33
q = 8.5899e+09
q > intmax('int32') % true
ans = logical
q > intmax('int64') % false
ans = logical
But the points John D'Errico raised are also things you should consider when performing your calculations.

Sign in to comment.





Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by