Main Content

mapreducer

Define parallel execution environment for mapreduce and tall arrays

Description

example

mapreducer defines the execution environment for mapreduce or tall arrays. Use the mapreducer function to change the execution environment to use a different cluster or to switch between serial and parallel development.

The default execution environment uses either the local MATLAB® session, or a parallel pool if you have Parallel Computing Toolbox™. If you have Parallel Computing Toolbox installed, when you use the tall or mapreduce functions, MATLAB automatically starts a parallel pool of workers, unless you have changed the default preferences. By default, a parallel pool uses local workers, typically one worker for each core in your machine. If you turn off the Automatically create a parallel pool option, then you must explicitly start a pool if you want to use parallel resources. See Specify Your Parallel Preferences.

When working with tall arrays, use mapreducer to set the execution environment prior to creating the tall array. Tall arrays are bound to the current global execution environment when they are constructed. If you subsequently change the global execution environment, then the tall array is invalid, and you must recreate it.

Note

In MATLAB, you do not need to specify configuration settings using mapreducer because mapreduce algorithms and tall array calculations automatically run in the local MATLAB session only. If you also have Parallel Computing Toolbox, then you can use the additional mapreducer configuration options listed on this page for running in parallel. If you have MATLAB Compiler™, then you can use separate mapreducer configuration options for running in deployed environments.

See: mapreducer in the MATLAB documentation, or mapreducer (MATLAB Compiler) in the MATLAB Compiler documentation.

mapreducer with no input arguments creates a new mapreducer execution environment with all the defaults and sets this to be the current mapreduce or tall array execution environment. You can use gcmr to get the current mapreducer configuration.

  • If you have default preferences (Automatically create a parallel pool is enabled), and you have not opened a parallel pool, then mapreducer opens a pool using the default cluster profile, sets gcmr to a mapreducer based on this pool and returns this mapreducer.

  • If you have opened a parallel pool, then mapreducer sets gcmr to a mapreducer based on the current pool and returns this mapreducer.

  • If you have disabled Automatically create a parallel pool, and you have not opened a parallel pool, then mapreducer sets gcmr to a mapreducer based on the local MATLAB session, and mapreducer returns this mapreducer.

example

mapreducer(0) specifies that mapreduce or tall array calculations run in the MATLAB client session without using any parallel resources.

mapreducer(poolobj) specifies a parallel pool for parallel execution of mapreduce or tall arrays. poolobj is a parallel.Pool object. The default pool is the current pool that is returned or opened by gcp.

mapreducer(hadoopCluster) specifies a Hadoop® cluster for parallel execution of mapreduce or tall arrays. hadoopCluster is a parallel.cluster.Hadoop object.

mapreducer(mr) sets the global execution environment for mapreduce or tall arrays, using a previously created MapReducer object, mr, if its ObjectVisibility property is 'On'.

mr = mapreducer(___) returns a MapReducer object to specify the execution environment. You can define several MapReducer objects, which enables you to swap execution environments by passing one as an input argument to mapreduce or mapreducer.

mr = mapreducer(___,'ObjectVisibility','Off') hides the visibility of the MapReducer object, mr, using any of the previous syntaxes. Use this syntax to create new MapReducer objects without affecting the global execution environment of mapreduce.

Examples

Develop in Serial and Then Use Local Workers or Cluster

If you want to develop in serial and not use local workers or your specified cluster, enter:

mapreducer(0);
If you use mapreducer to change the execution environment after creating a tall array, then the tall array is invalid and you must recreate it. To use local workers or your specified cluster again, enter:
mapreducer(gcp);

mapreducer with Automatically Create a Parallel Pool Switched Off

If you have turned off the Automatically create a parallel pool option, then you must explicitly start a pool if you want to use parallel resources. See Specify Your Parallel Preferences for details.

The following code shows how you can use mapreducer to set the execution environment to your local MATLAB session and then specify a local parallel pool:

>> mapreducer(0)
>> parpool('Processes',1);
Starting parallel pool (parpool) using the 'Processes' profile ... connected to 1 workers.
>> gather(min(tall(rand(1000,1))))
Evaluating tall expression using the Local MATLAB Session:
- Pass 1 of 1: Completed in 0.75 sec
Evaluation completed in 0.96 sec

ans =

   5.2238e-04

Input Arguments

collapse all

Pool for parallel execution, specified as a parallel.Pool object.

Example: poolobj = gcp

Hadoop cluster for parallel execution, specified as a parallel.cluster.Hadoop object.

Example: hadoopCluster = parallel.cluster.Hadoop

Output Arguments

collapse all

Execution environment for mapreduce and tall arrays, returned as a MapReducer object.

If the ObjectVisibility property of mr is set to 'On', then mr defines the default execution environment for all mapreduce algorithms and tall array calculations. If the ObjectVisibility property is 'Off', you can pass mr as an input argument to mapreduce to explicitly specify the execution environment for that particular call.

You can define several MapReducer objects, which enables you to swap execution environments by passing one as an input argument to mapreduce or mapreducer.

Tips

One of the benefits of developing your algorithms with tall arrays is that you only need to write the code once. You can develop your code locally, then use mapreducer to scale up and take advantage of the capabilities offered by Parallel Computing Toolbox, MATLAB Parallel Server™, or MATLAB Compiler, without needing to rewrite your algorithm.

Version History

Introduced in R2014b