Firstly, if you're using the local cluster type, then the batch command absolutely does need to launch the worker MATLAB process - it is not already running - you can verify this using Task Manager or similar. (Clusters of type MJS keep the workers running). The time for the batch command is simply the time needed to create the parallel.Job and parallel.Task objects needed for running the batch job, and saving those to disk.
Roughly speaking, the time taken to execute submitting and waiting for the results can be broken down like this:
- Time taken to create and submit the batch job to the scheduler
- Time taken to launch the worker process (unless you're using MJS)
- Time taken for the worker to load the job and task information
- Time for the worker to actually run the task
- Time for the worker to save the task results to disk (or database for MJS)
I suspect that the "missing" time is probably largely related to item 5 in the list above - as you've written it, the 512x512x1000 array is returned by your task function @load, and this result gets saved to disk.
How long does your save('a') command take? I suspect item 5 would take at least that long.
Note that there are several additional properties on the job object that can help you work out what's going on - see the reference page. In particular, note CreateTime, SubmitTime, StartTime, and FinishTime. The underlying task object has the same properties (except SubmitTime).