Deploying Applications to CLOUDERA Spark Using the MATLAB API for Spark
This example shows you how to deploy a MATLAB® application developed using the MATLAB API for Spark™ against a CLOUDERA® Spark enabled Hadoop® cluster.
The application flightsByCarrierDemo.m computes
the number of airline carrier types from airline data. The inputs
to the application are:
master— URL to the Spark clusterinputFile— the file containing the input data
Note
The complete code for this example is in the file flightsByCarrierDemo.m,
as shown below.
Prerequisites
Install the MATLAB Runtime in the default location on the desktop. This example uses
/usr/local/MATLAB/MATLAB_Runtime/R2025aas the default location for the MATLAB Runtime.If you don’t have MATLAB Runtime, see Download and Install MATLAB Runtime for installation instructions.
Install the MATLAB Runtime on every worker node.
Copy the
airlinesmall.csvfrom foldertoolbox/matlab/demosof your MATLAB install area into Hadoop Distributed File System (HDFS™) folder/datasets/airlinemod.
Deploy Applications to CLOUDERA Spark
At the MATLAB command prompt, use the
mcccommand to generate ajarfile and a shell script for the MATLAB applicationflightsByCarrierDemo.m.>> mcc -C -W 'Spark:flightsByCarrierDemoApp' flightsByCarrierDemo.mThis action creates a
jarfile namedflightsByCarrierDemoApp.jarand a shell script namedrun_flightsByCarrierDemoApp.sh.Execute the shell script in either
yarn-clientmode oryarn-clustermode. Inyarn-clientmode, the driver runs on the desktop. Inyarn-clustermode, the driver runs in the Application Master process in the cluster. The results of the computation in both cases are saved to a text file on HDFS by calling thesaveAsTextFilemethod on the RDD.yarn-clientmodeRun the following command from a Linux® terminal:
$ ./run_flightsByCarrierDemoApp.sh \ /usr/local/MATLAB/MATLAB_Runtime/R2025a \ yarn-client \ hdfs://hadoop01glnxa64:54310/datasets/airlinemod/airlinesmall.csv
To examine the results, enter the following from a Linux terminal:
$ hadoop fs -cat flightsByCarrierResults/*
yarn-clustermodeRun the following command from a Linux terminal:
$ ./run_flightsByCarrierDemoApp.sh \ /usr/local/MATLAB/MATLAB_Runtime/R2025a \ --deploy-mode cluster --master yarn yarn-cluster \ hdfs://hadoop01glnxa64:54310/datasets/airlinemod/airlinesmall.csv
