Why do some calculations like the FFT produce different results when performed on a GPU?

Question

2 Stimmen

I am using the Parallel Computing Tbx and my computer's GPU to speed up calculations, but sometimes, the results are not identical. E.g. FFT produces different results. Why is that?

8 Kommentare
6 ältere Kommentare anzeigen 6 ältere Kommentare ausblenden

Walter Roberson am 21 Okt. 2022

I am having trouble finding explicit statements but single precision and double precision numbers are given in this link and the double precision is 1/32 of the single precision https://www.electroniclinic.com/nvidia-geforce-rtx-3090-ti-complete-review-with-benchmarks/

Walter Roberson am 21 Okt. 2022

The summary of the above 3 is that none of the three are suitable for double precision

Melden Sie sich an, um zu kommentieren.

Melden Sie sich an, um diese Frage zu beantworten.

Follow Question

Answer 1

Mahmoud Hammoud am 9 Feb. 2011

8 Stimmen

Higher-level algorithms like the FFT ultimately boil down to basic arithmetic operations that can yield (acceptably) different results when performed in different environments due to the very nature of floating-point arithmetic.

At the lowest level, the "environment" includes the processor (e.g., Intel vs. AMD chip) to which - all other things held constant - such differences may be attributed. Then comes the compiler which directly impacts how the computations are translated to machine code. That is, even if the calculations exhibit the same order in C code, this is not necessarily the case at the instruction level if two different compilers are used. More or less at the same level are the math libraries which are highly optimized according to processor type and are themselves compiled.

Moreover, these constitutents of the computing environment are not themselves the root cause of the (potential) discrepancies, they rather contribute to the fundamental issue getting manifested, namely the limited amount of precision available (most real numbers cannot be represented using 64 bits). This in turn gives rise to round-off "errors" in the representation which find their way to the end results when the order of operations does eventually get changed due to a different environment. As such, it should come as no surprise when two different algorithms produce slightly different results, even if the environment is the same, since in most cases, both results do not represent the "real" result.

The FFT on the GPU vs. on the CPU is in a sense an extreme case because both the algorithm AND the environment are changed: the FFT on the GPU uses NVIDIA's cuFFT library as Edric pointed out whereas the CPU/traditional desktop MATLAB implementation uses the FFTW algorithm. In addition to being different implementations of the FFT algorithm (i.e., the steps involved in computing the DFT are essentially different), one is written using CUDA which in turn builds on different lower-level (basic) math libaries, while the other uses the libraries of the host platform. The fact that different C compilers are used should also be added to the equation. Given all these interplaying factors, the differences in the results are still very small. As Edric noted, the differences for lower level trigonometric and elementary functions become hardly noticeable.

2 Kommentare
Keine anzeigen Keine ausblenden

Karl am 9 Feb. 2011

All of the responses above are very helpful, thanks.

Stephen Lange am 29 Apr. 2013

Great answer! Small amplifications.

1) Per Wikipedia, "Double precision floats deviate from the IEEE 754 standard: round-to-nearest-even is the only supported rounding mode for reciprocal, division, and square root. In single precision, denormals and signalling NaNs are not supported; only two IEEE rounding modes are supported (chop and round-to-nearest even), and those are specified on a per-instruction basis rather than in a control word; and the precision of division/square root is slightly lower than single precision."

2) Standard PC "64 bit" registers are 80 bits wide, so they do much different rounding.

3) Per some guys on the IEEE floating point standards committee (where I lurked back in 2010,) modern parallel, asynchronous computations will not necessarily be repeatable with the same executable on the same box. Modern asynchronous processing means that, per Walter, instruction execution order can't be guaranteed from one run to the next.

Melden Sie sich an, um zu kommentieren.

Answer 2

Edric Ellis am 8 Feb. 2011

7 Stimmen

Walter is quite right that any change to the order of operations changes the result. In the case of FFT on the GPU, we use NVIDIA's "CuFFT" library to give a high performance FFT implementation. The highly threaded communicating nature of FFT on the GPU inevitably leads to discrepancies.

In general, we strive to make our GPU algorithms give the numerically consistent "MATLAB answer". For many of the elementwise non-communicating algorithms (sin, cos, plus, ...), we achieve that (within an "eps" or maybe two); but as the complexity of the algorithm increases, so does the discrepancy. (For example, the parallel version of "sum" on the GPU is a vastly different implementation compared to the obvious single-threaded approach).

0 Kommentare
-2 ältere Kommentare anzeigen -2 ältere Kommentare ausblenden

Melden Sie sich an, um zu kommentieren.

Answer 3

Walter Roberson am 8 Feb. 2011

3 Stimmen

Nearly any change in the exact order of operations used to perform a calculation can result in different outcomes due to precision or round-off limitations.

If exact reproducibility of the calculation in different implementations is important, then you very likely should not be using the parallel processing toolbox -- not unless you have studied Numerical Analysis for a few years.

0 Kommentare
-2 ältere Kommentare anzeigen -2 ältere Kommentare ausblenden

Melden Sie sich an, um zu kommentieren.

Why do some calculations like the FFT produce different results when performed on a GPU?

8 Kommentare
6 ältere Kommentare anzeigen 6 ältere Kommentare ausblenden

Akzeptierte Antwort

2 Kommentare
Keine anzeigen Keine ausblenden

Weitere Antworten (2)

0 Kommentare
-2 ältere Kommentare anzeigen -2 ältere Kommentare ausblenden

0 Kommentare
-2 ältere Kommentare anzeigen -2 ältere Kommentare ausblenden

Kategorien

Produkte

Tags

Community Treasure Hunt

Why do some calculations like the FFT produce different results when performed on a GPU?

8 Kommentare 6 ältere Kommentare anzeigen 6 ältere Kommentare ausblenden

Akzeptierte Antwort

2 Kommentare Keine anzeigen Keine ausblenden

Weitere Antworten (2)

0 Kommentare -2 ältere Kommentare anzeigen -2 ältere Kommentare ausblenden

0 Kommentare -2 ältere Kommentare anzeigen -2 ältere Kommentare ausblenden

Kategorien

Produkte

Tags

Siehe auch

Community Treasure Hunt

8 Kommentare
6 ältere Kommentare anzeigen 6 ältere Kommentare ausblenden

2 Kommentare
Keine anzeigen Keine ausblenden

0 Kommentare
-2 ältere Kommentare anzeigen -2 ältere Kommentare ausblenden

0 Kommentare
-2 ältere Kommentare anzeigen -2 ältere Kommentare ausblenden