Main Content

saveAsTallDatastore

Class: matlab.compiler.mlspark.RDD
Namespace: matlab.compiler.mlspark

Save RDD as a MATLAB tall array to a binary file that can be read back using the datastore function

Syntax

saveAsMatlabBinaryFile(obj,path)

Description

saveAsMatlabBinaryFile(obj,path) saves obj as a MATLAB® tall array in a binary file that can be read back using the datastore function. path specifies the directory location in which to save the binary file.

Input Arguments

expand all

An input RDD, specified as a RDD object.

Directory location in which to save the binary file, specified as a character vector enclosed in ''.

Data Types: char

Examples

expand all

Save an RDD as a MATLAB tall array to a binary file that can be read back using the datastore function.

%% Connect to Spark
sparkProp = containers.Map({'spark.executor.cores'}, {'1'});
conf = matlab.compiler.mlspark.SparkConf('AppName','myApp', ...
                        'Master','local[1]','SparkProperties',sparkProp);
sc = matlab.compiler.mlspark.SparkContext(conf);

%% saveAsTallDatastore 

% May require setting HADOOP_PREFIX or HADOOP_HOME environment variables to a
% valid Hadoop installation folder even if running locally.
% For example:
% setenv('HADOOP_PREFIX','/share/hadoop/hadoop-2.5.2')

inRDD = sc.parallelize({1,2,3,4,5});
% Store RDD in a file as a tall array that can be read back into MATLAB using datastore
inRDD.saveAsTallDatastore('myDir'); 
ds = datastore(['myDir' '/part*'], 'Type', 'tall');
ds.readall()

Version History

Introduced in R2016b