Potential bug with parfeval; cumulative slowing down after several hours of operation. Can even exceed 10x the initial compute time.
4 Ansichten (letzte 30 Tage)
Ältere Kommentare anzeigen
There seems to be a bug using parfeval in the parallel pressing tool box. After hours of running the computation time for the compute time of the parallel tasks start increasing.
I tried running in all serial mode and the compute time remains similar even after hours of operation.
I have monitored memory usage and doesn't increase with time.
I tried monitoring the parallel compute time and maintained a low and high compute time. Once the difference exceeds certain (20%) threshold, I manually performed the following;
delete(gcp('nocreate'));
POOL=parpool('local', NO_PAR_POOLS);
The reset of the parallel pool seems to bring back the parallel compute time back to expected.
Here is the pseudo code:
%%%%%%%%%%%%%%%%%%%%%%%%
tic_sum1=0;
tic_sum1_high=0;
tic_sum1_low=0;
for mini_batch_no=1:NO_OF_MINI_BATCHES
tic;
% Launch (N-1) parallel asynchronous jobs
job{1} = parfeval(POOL, @read_dataset_from_hdd, 1, mini_batch_no, CONST_DATA, 1);
job{2} = parfeval(POOL, @compute_cpu_task, 1, BUFF_DATA_TRAIN.batch_file_read(:,:,:,:,set_cpu_num), CONST_DATA);
job{3} = parfeval(POOL, @compute_cpu_dwt_task, 1, BUFF_DATA_TRAIN.batch_file_process_1(:,:,:,:,set_cpu_dwt_num), CONST_DATA.IM_RESIZE, CONST_DATA);
% Perform the Nth parallel job on the host
compute_gpu_task({BUFF_DATA_TRAIN.batch_file_process_2(:,:,:,:,set_gpu_num), BUFF_DATA_TRAIN.batch_file_label_read(:,:,:,:,set_gpu_num)});
% Collect result from parallel jobs
result{1} = fetchOutputs(job{1});
result{2} = fetchOutputs(job{2});
result{3} = fetchOutputs(job{3});
tic_sum1=tic_sum1+toc;
%%%Perform reset of parallel pool if hi-low diff exceed threshold percentage
if (tic_sum1>=tic_sum1_high)
tic_sum1_high=tic_sum1;
end
if (tic_sum1<=tic_sum1_low)
tic_sum1_low=tic_sum1;
elseif (tic_sum1_low==0)
tic_sum1_low=tic_sum1;
end
tic_sum1=0;
if (tic_sum1_high~=0 && tic_sum1_low~=0)
if ((100*(tic_sum1_high-tic_sum1_low)/tic_sum1_low)>CPU_ALLOWABLE_COMPUTE_TIME_LOW_VS_HIGH_DIFF_PERCENTAGE)
delete(gcp('nocreate'));
POOL=parpool('local', NO_PAR_POOLS);
tic_sum1_high=0;
tic_sum1_low=0;
end
end
end
%%%%%%%%%%%%%%%%%%%%%%%%
The iterations go over 5-6 days and after the 1st day the total time of operation exceeds 10x the initial time.
6 Kommentare
Antworten (0)
Siehe auch
Kategorien
Mehr zu Parallel for-Loops (parfor) finden Sie in Help Center und File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!