Main Content

Transfer Data with Job Methods and Properties

To transfer data to a cloud cluster, you can use the AttachedFiles or JobData properties, as you do for other clusters. For example:

  1. Place all required executable and data files in the same folder.

  2. Specify that folder in the AttachedFiles property of the job.

Submitting your job transfers the files to the cloud and makes them available to the workers running on the cloud cluster.

Data stored in job and task properties is available to the client. Therefore, your task or batch function results are accessible from the finished job fetchOutputs function or the task OutputArguments property. For batch jobs running on the cloud, access the job workspace variables with the load (Parallel Computing Toolbox) function in your client session.

In this example, you run a batch job with files on your machine and a function divideData on clusters in Cloud Center.

Load Data

Copy the data for this example to your current working folder by opening the supporting function prepareSupportingFiles and using the code inside.

openExample("parallel/RunBatchJobAndAccessFilesFromWorkersExample", ...

Your current working folder now contains 4 files: A.dat, B1.dat, B2.dat, and B3.dat.

Run Batch Job

Create and discover your Cloud Center profile on MATLAB. Specify this profile as your default cluster profile. For more details, see Create and Discover Clusters.

Create a cluster object using parcluster (Parallel Computing Toolbox).

c = parcluster; 
Place your code inside a function and submit it as a batch job by using batch (Parallel Computing Toolbox). Use the AttachedFiles name-value argument to transfer files from your local machine to the workers. For example, use a parallel pool with three workers and offload the computations in the divideData function.
filenames = "B" + string(1:3) + ".dat"; 
job = batch(c,@divideData,1,{}, ... 
    Pool=3, ... 

To block MATLAB until the job completes, use the wait (Parallel Computing Toolbox) function on the job object.


Retrieve Results and Clean Up Data

To retrieve the results of a batch job, use the fetchOutputs (Parallel Computing Toolbox) function. fetchOutputs returns a cell array containing the outputs of the function run in the batch job. You can also access the job workspace variables with the load (Parallel Computing Toolbox) function.

X = fetchOutputs(job)
X = 1×1 cell array 
    {40×207 double}

When you have retrieved all the required outputs and do not need the job object anymore, delete it to clean up its data and avoid consuming resources unnecessarily.

clear job

For more details, see Run Batch Job and Access Files from Workers (Parallel Computing Toolbox).

Related Topics