Main Content

parallel.cluster.Hadoop

Create Hadoop cluster object

Description

hadoopCluster = parallel.cluster.Hadoop creates a parallel.cluster.Hadoop object representing the Hadoop® cluster.

You use the resulting object as input to the mapreduce and mapreducer functions, for specifying the Hadoop cluster as the parallel execution environment for tall arrays and mapreduce.

example

hadoopCluster = parallel.cluster.Hadoop(Name,Value) uses the specified names and values to set properties on the created parallel.cluster.Hadoopobject.

Examples

collapse all

This example shows how to create and use a parallel.cluster.Hadoop object to set a Hadoop cluster as the mapreduce parallel execution environment.

hadoopCluster = parallel.cluster.Hadoop('HadoopInstallFolder','/host/hadoop-install');
mr = mapreducer(hadoopCluster);

This example shows how to create and use a parallel.cluster.Hadoop object to set a Hadoop cluster as the tall array parallel execution environment.

hadoopCluster = parallel.cluster.Hadoop(...
    'HadoopInstallFolder','/host/hadoop-install', ...
    'SparkInstallFolder','/host/spark-install');
mr = mapreducer(hadoopCluster);

Input Arguments

collapse all

Name-Value Arguments

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Before R2021a, use commas to separate each name and value, and enclose Name in quotes.

Example: 'HadoopInstallFolder','/share/hadoop/a1.2.1'

Path to MATLAB for workers, specified as the comma-separated pair consisting of 'ClusterMatlabRoot' and a character vector. This points to the installation of MATLAB Parallel Server™ for the workers, whether local to each machine or on a network share.

Path to Hadoop application configuration file, specified as the comma-separated pair consisting of 'HadoopConfigurationFile' and a character vector.

Path to Hadoop installation on the local machine, specified as the comma-separated pair consisting of 'HadoopInstallFolder' and a character vector. If this property is not set, the default is the value specified by the environment variable HADOOP_PREFIX, or if that is not set, then HADOOP_HOME.

Path to Spark enabled Hadoop installation on worker machines, specified as the comma-separated pair consisting of 'SparkInstallFolder' and a character vector. If this property is not set, the default is the value specified by the environment variable SPARK_PREFIX, or if that is not set, then SPARK_HOME.

Output Arguments

collapse all

Hadoop cluster, returned as a parallel.cluster.Hadoop object.

Version History

Introduced in R2014b