process based parpool: keep the data in the workers
2 views (last 30 days)
I have a routine that creates a large matrix in compressed format as a nested cell structure and subsequently performs a number of matrix-vector products with this matrix. I am trying to parallelize this routine but I'm running into problems. Since the routine calls several mex functions to create the matrix I cannot use a thread based pool. So I use parpool('local') and parfor, looping over the sub-blocks and letting the workers fill the corresponding cells. When this is done I close the pool and open a thread-based pool for the mutliplications, which do not need any mex functions.All this works well for moderate size problems (1 to 10 GB) but for larger problems the process increasingly slows down until the parallelization doesn't show any speed-up for problems of 150 GB. I have tested without parallelization and it isn't the filling of the matrix that is slowing things down, it looks like it's the workers sending their slices of the matrix back to the client. Is there any solution to this? Ideal would be to keep the slices in the workers, and let each of them do their part of the mat-vec multiplications with subsequent parfor calls, but I can't find a way to do this.
Thanks for any help
Edric Ellis on 22 Jul 2022
Mike has already suggested looking at parfeval. The other option, which may be appropriate for your problem is to use spmd. This is quite a different parallel programming model, and converting a program from for or parfor to spmd is decidedly non-trivial. However, the advantage of spmd is that it is explicitly designed for keeping data on workers.
Without more details of your particular application, it's hard to know exactly how things might go with spmd. Some things to consider though:
More Answers (2)
Mike Croucher on 21 Jul 2022
Parfor is great, it auto-parallelises for us and takes care of a lot of things like data transfer to/from workers and so on. At some point, however, we can find ourselves fighting against or being limited by its automatic choices.
Whenever this happens to me, I start looking at other constructs in MATLAB's parallel language. Could you recast it to use parfeval for example? https://mathworks.com/help/parallel-computing/parallel.pool.parfeval.html
The function you'd run on parfeval might then do something like
- create its part of the matrix
- do the matrix-vector products as part of a for loop (No need for parfor), the parallelisation will come from running lots of these functions simulatenously
- return only what is required
parfeval isn't suitable for all problem types but its often the first thing I reach for when I run out of steam with parfor.