Main Content

Use Amazon S3 Buckets with MATLAB Deep Learning Container

This example shows how to train your deep learning model with training data stored in an Amazon S3™ Bucket and save the trained model to the cloud.

You can scale up an existing deep learning workflow by moving data and training to the cloud, where you can rent high performance GPUs and store large data files. One way to do this is to use S3 buckets. You can read and write directly to S3 buckets from MATLAB®. You can use this workflow to access data in an S3 bucket from a MATLAB Deep Learning Container, and to get variables in and out of the container. For example:

  • If you have data locally, you can use the workflow on this page to upload that data to an S3 bucket and access it from your MATLAB Deep Learning Container to train in the cloud, where you can rent high performance GPUs.

  • After training in the cloud in the container, you can save variables to the S3 bucket and access them from anywhere after you stop running the container.

Create Amazon S3 Bucket

To upload a model from your local installation of MATLAB to the MATLAB session running in the MATLAB Deep Learning Container on the Amazon EC2 GPU enabled instance, you can use an S3 bucket. You can use the save function to save a model (and other workspace variables) as MAT files from your local installation of MATLAB to an S3 bucket. You can then use the load function to load the model into the deep learning container. Similarly, you can save a trained model from the deep learning container to an S3 bucket and load it into your local MATLAB session.

To get started using S3 buckets with MATLAB:

  1. Download and install the AWS® Command Line Interface tool on your local machine.

  2. Create AWS access keys on your local machine and set keys as environment variables.

  3. Create an S3 bucket for your data.

For detailed step-by-step instructions, including how to create AWS access keys, export the keys, and set up S3 buckets, see Transfer Data To Amazon S3 Buckets and Access Data Using MATLAB Datastore.

Save and Load MATLAB Workspace Variables with Amazon S3

From your local installation of MATLAB, you can save an untrained neural network, untrainedNetwork, for example, directly from your workspace to your S3 bucket, mynewbucket. You must set your AWS access key ID and Secret Access Key (as well as your Session Token if you are using an AWS temporary token) as environment variables in your local MATLAB installation.

setenv('AWS_ACCESS_KEY_ID','YOUR_AWS_ACCESS_KEY_ID'); 
setenv('AWS_SECRET_ACCESS_KEY','YOUR_AWS_SECRET_ACCESS_KEY');
setenv('AWS_SESSION_TOKEN','YOUR_AWS_SESSION_TOKEN'); % optional
setenv('AWS_DEFAULT_REGION','YOUR_AWS_DEFAULT_REGION'); % optional

save('s3://mynewbucket/untrainedNetwork.mat','untrainedNetwork','-v7.3');

Load this untrained model from the S3 bucket into the MATLAB session running in the deep learning container on AWS. Again, you must set your AWS Access Key ID, Secret Access Key (and Session Token if you are using an AWS temporary token) as environment variables in your container MATLAB session.

setenv('AWS_ACCESS_KEY_ID','YOUR_AWS_ACCESS_KEY_ID'); 
setenv('AWS_SECRET_ACCESS_KEY','YOUR_AWS_SECRET_ACCESS_KEY');
setenv('AWS_SESSION_TOKEN','YOUR_AWS_SESSION_TOKEN'); % optional
setenv('AWS_DEFAULT_REGION','YOUR_AWS_DEFAULT_REGION'); % optional

load('s3://mynewbucket/untrainedNetwork.mat')

Note that saving and loading MAT files to and from remote file systems using the save and load functions are supported from MATLAB releases R2021a and later, provided the MAT files are version 7.3. Ensure you are running MATLAB release R2021a or later on both your local machine and in the deep learning container.

Save your training, testing, and validation data from your local MATLAB workspace to an S3 Bucket and load it into the MATLAB Deep Learning Container by following the same steps as above. You can then train your model, save the trained network into the S3 Bucket and load the trained network back into your local MATLAB installation.

Save and Access Training Data with Amazon S3

You can train your network using data hosted in an S3 bucket on both your local installation of MATLAB or your MATLAB session running in the deep learning container. This method is useful if you already have data in S3 or if you have very large datasets that you cannot download to your local machine or into the container.

For an example of how to upload the CIFAR-10 data set from your local machine to an S3 bucket, see Work with Deep Learning Data in AWS.

After you store your data in Amazon S3, you can use datastores to access the data from your MATLAB session either on your local machine or from the deep learning container (ensure your appropriate AWS access keys have been exported as environment variables). Simply create a datastore pointing to the URL of the S3 bucket. The following sample code shows how to use an imageDatastore to access an S3 bucket. Replace 's3://MyExampleCloudData/cifar10/train' with the URL of your S3 bucket.

setenv('AWS_ACCESS_KEY_ID','YOUR_AWS_ACCESS_KEY_ID'); 
setenv('AWS_SECRET_ACCESS_KEY','YOUR_AWS_SECRET_ACCESS_KEY');
setenv('AWS_SESSION_TOKEN','YOUR_AWS_SESSION_TOKEN'); % optional
setenv('AWS_DEFAULT_REGION','YOUR_AWS_DEFAULT_REGION'); % optional

imds = imageDatastore('s3://MyExampleCloudData/cifar10/train', ...
    'IncludeSubfolders',true, ...
    'LabelSource','foldernames');

You can now use the CIFAR-10 data set stored in Amazon S3. For an example using the CIFAR-10 data set, see Train Residual Network for Image Classification.

Note that training is always faster if you have locally hosted training data. Remote data use has overheads especially if the data has many small files like the digits classification example. For example, training time depends on network speed and the proximity of the S3 bucket to the machine running the MATLAB container. Larger data files make efficient use of bandwidth in EC2 (greater than 200 kb per file). If you have sufficient memory, copy the data locally for best training speed.

Related Topics