Parallel execution gets stuck / hangs (on fetching data)
8 Ansichten (letzte 30 Tage)
Ältere Kommentare anzeigen
I'm solving a differential equation using ode45, and because I have to explore the parameter space and each execution takes some time I would like to solve it in parallel for different parameters. The problem I keep running into is that the execution stops randomly. My code essentially looks like:
parfor k=1:N
[tt xx]=ode45(@(t,x) f(t,x,a(k), ...);
X(:,k)=xx;
end
I'm running it on a cluster of machines (12 x 8 cores) running Scientific Linux 6 (kernel 2.6.32-573) and Matlab R2015a. After I start it, it will run for a while and then everything will just stop: CPU load will go down across the cluster and my Matlab session (running on the head node of the cluster) will freeze (and doesn't recover within any reasonable amount of time). This sometime happens after a few minutes, but sometimes after a few hours. The MJS, which is also running on the head node, is reporting all of the worker nodes as busy. If I force quit my Matlab session no error log is generated (or at least I didn't find it). From what I can tell it doesn't appear to be a memory or a communication issue (all of the nodes are reachable and have plenty of free RAM).
If I replace parfor with parfeval I can sumbit my jobs, but then fetchNext hangs in a similar way as parfor does.
I would greatly appreciate any help because I'm out of ideas at this point. If any additional information is required please let me know.
Many thanks!
5 Kommentare
Walter Roberson
am 22 Jul. 2016
- If tspan has two elements, [t0 tf], then the solver returns the solution evaluated at each internal integration step within the interval.
- If tspan contains more than two elements [t0,t1,t2,...,tf], then the solver returns the solution evaluated at the given points.
You are specifying more than two elements, so you will get results at the locations you specify, the same number as you specify. If you had provided only two elements then you would have gotten the result at each internal integration step, and the number of integration steps and their distance apart will vary as required to meet the integration tolerances so the same function with two marginally different time spans or two marginally different initial conditions might end up producing very different number of internal points as one of the two might end up skipping a difficult-to-integrate point. Especially if the there is a singularity in the time span... you can end up getting lots and lots of points generated for that as MATLAB tries to figure out the singularity.
Antworten (0)
Siehe auch
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!