- Eliminate redundant calculations
- Vectorize any operations that you can
- Avoid using things like repmat and use things like bsxfun, where possible
Speeding up code: pre-allocation, vectorization, parfor, spmd....
8 views (last 30 days)
I have written a very tricky and large bit of code. Its processing a data set of 5 million values. In outline the code goes like this.
1. Outer parfor loop (1: 500K).
2. Next loop (1: ~100)
3. Test lots of conditions
4. Inner for loop. 1: K. Assign values to a growing cell array, where K is the length of it. Each cell contains a struct, which in turn contains cell arrays (Its a really high dimensional data set!).
The problem is that it currently takes 6 seconds to carry out one run of the outer loop. I need to dramatically speed it up. (24 hours run time would be ok. 24 mins would be better :)).
I have used the profiler extensivly, and other than a warning telling me to pre-allocate, it all looks ok.
I have read lots of stuff like : http://www.ee.columbia.edu/~marios/matlab/Writing_Fast_MATLAB_Code.pdf:
Stuff I am NOT currently doing:
1. pre-allocate the cell object. The reason is that I would need to search through the object to find which values are active or not each time I accessed the object. I assume this would take more time than would be saved by pre-allocation.
2. Vectorization. This is what I normally use when possible. However, this is such a complex bit of code, with many loops inside loops I dont even know where to start. Any hints?
3. I have the parallel toolbox, though only one 64 bit machine with enough RAM to load the dataset. Should I be using this? I have not done so before.
I am using win7 on a quod core machine with 16GB.
Any sensible comments welcome. thank you.
Jonathan Sullivan on 6 Feb 2013
It's really hard to say what exactly you can do without knowing the nature of what is inside those for-loops.
Some ideas come to mind:
But post your code. It's really hard to say whats applicable without seeing it.
Jason Ross on 6 Feb 2013
Edited: Jason Ross on 6 Feb 2013
Check using the Resource Monitor to see if you are swapping to disk while you are running. Given that you say you have enough RAM to load the data set in memory, I'm wondering if you also have enough RAM to deal with everything else that's going on. If that's the case, get more RAM.
It might also be useful to look at CPU utilization rates as well to see if they are pegged out. When you say you have four cores, are those compute cores or the "hyper-threaded" ones?
As for the parallel toolbox, if you only have one machine available, and you are taking up all the RAM with your existing program, then parallelization isn't likely to generate a speedup. But if you look at your RAM utilization and CPU utilization and could fit multiple copies of your program on the machine, it might help -- but if you can't, you'll likely end up going slower.
If you are doing a lot of disk I/O, a SSD will trump a "traditional" hard drive for performance and access time.
But it's likely that the above should be considered when you know your program is as tight as it can be. Sometimes throwing money at a problem works, but not always.
Dan K on 6 Feb 2013
Edited: Dan K on 6 Feb 2013
Make sure that it is a function and not a script... Huge speed difference there. When you pre-allocate, make sure your allocating enough for the eventual usage. I've seen cases where one pre-allocates, fills it up, then adds a little more to the end.
A few more thoughts...
If you are making many calls to a simple subroutine consider putting it inline, rather than calling it over and over.
If there is a particular computation that is really the bottleneck, you could consider mex-ing it.
hope it helps.
Jan on 6 Feb 2013
Concentrate to optimize the bottlenecks only. When you spend hours to improve some code, which occupies 2% of the total processing time, an acceleration of the factor 1000 leads to a almost 2% faster program also. So use the profiler and better some tic tocs to locate the bottle necks at first.
Unfortunately the profile disables the JIT acceleration, because the JIT can change the processing order of lines, while the profile must measure the lines in the original order.
Sean de Wolski on 7 Feb 2013
I'll also throw in the use of MATLAB Coder to generate mex files from various pieces of code. Although the milage will vary, you can sometimes see a pretty good speed-up with the MEXed version.