How to get the calculation amount of deep network FLOPS? Analyze Network app does not seem to count this metric?

8 Ansichten (letzte 30 Tage)
In the matlab analyzeNetwork app, the general CNN model can have the required number of parameters, the size of the feature map, but no flops?...

Akzeptierte Antwort

Walter Roberson
Walter Roberson am 24 Sep. 2021
This is quite unlikely to happen in the near future, if ever.
The translation of cuda calls into machine instructions depends on the level of optimization, and the ability of the compiler, and the cuda version. The translation of machine instructions into gflops depends on the other instructions scheduled and on the exact model — because even within one architecture, they put out models with different numbers of controllers (SMs) and very different implementations of double precision. The models with the highest double precision performance are never the models with the highest single precision, and it is not uncommon for the model from the previous architecture that had the highest double precision, to have higher double precision than most of the models with the new architecture.
  3 Kommentare
Walter Roberson
Walter Roberson am 24 Sep. 2021
If not predict, then can matlab measure gflops? That clearly depends upon what tools Nvidia provides.
What Nvidia provides is counters of a series of different classes of instructions. Nvidia also provides a performance graph based upon assigning a weight to each of the classes of instructions. The person running the tool can configure the weights.
But... the weights they use do not correspond to any actual model. And all the instructions in the same class are given the same weight, even though the different instructions may have different graduation rates. That is, some of the instructions are limited as to the number that may be executed simultaneously, at rates much lower than using the number of clock cycles per instruction would expect. The handling of square root and reciprocal square root is especially odd, due to some work needed to handle 0 and infinity according to ieee standards.
So... you cannot convert between the counters and gflops without knowing which instructions were being executed because members of the classes can have quite different performance.
The architecture for the 3000 series has some interesting changes for integer work that has to be taken into consideration when measuring gflops.
Remember though that gflops has to do with FLOATING point operations per second, but models might be programmed in integer. If a model is mostly integer, should the gflops measure be near zero, since few floating point operations were done?

Melden Sie sich an, um zu kommentieren.

Weitere Antworten (2)

Shuyue JIA
Shuyue JIA am 5 Jul. 2021
Bearbeitet: Shuyue JIA am 5 Jul. 2021
Have you found a solution?

cui,xingxing
cui,xingxing am 7 Jul. 2021
Bearbeitet: cui,xingxing am 30 Jul. 2021
gpuDevice
Error using gpuDevice (line 26)
Failed to load graphics driver. Unable to load library 'libcuda.so.1'. The error was:
libcuda.so.1: cannot open shared object file: No such file or directory
Update or reinstall your graphics driver. For more information on GPU support, see GPU Support by Release.
%% MATLAB R2021a
net50 = resnet50;
h=224;
w = 224;
layer = 'fc1000';
%% evaluate
X = gpuArray(rand(h,w,3));
features = activations(net50,X,layer);
dev = gpuDevice(1);
for i = 1:100
scalar = i;
X = gpuArray(rand(h*scalar,w*scalar,3));
% X = dlarray(X);
try % Out of memory on device.
t1 = tic;
features = activations(net50,X,layer,...
'Acceleration','none',...
'ExecutionEnvironment','gpu');
[H,W,C] = size(X);
ElapseTime(i) = toc(t1);
avaiableMem(i) = dev.AvailableMemory/(1024^2);
sizeInput(i) = H;
fprintf('input size:(%i*%i),耗时:%.2f秒,可用显存大小为:%g Mb\n',...
H,W,ElapseTime(i),avaiableMem(i));
catch
break
end
end
%% plot
figure('Color','white');
yyaxis left;
plot(sizeInput,ElapseTime,'-o','LineWidth',2);
xlabel('input image size')
ylabel('ElapseTime(s)')
yyaxis right;
plot(sizeInput,avaiableMem,'-o','LineWidth',2);
ylabel('Avaiable Memory(MB)')
grid on;
title('Indirect Evaluation of DeepNetwork computational power and number of parameters ')
FLOPs and #params correspond to ElapseTime, Avaiable Memory respectively.
The answer can be seen indirectly in this diagram.
run in MATLAB 2021a, win10

Kategorien

Mehr zu Parallel and Cloud finden Sie in Help Center und File Exchange

Produkte


Version

R2019a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by