AWS MATLAB Parallel Server - What is the best strategy?

10 Ansichten (letzte 30 Tage)
Corrado
Corrado am 8 Dez. 2023
Beantwortet: Ayush am 2 Jan. 2024
Hi everyone,
I am working with MATLAB Simulink and I want to enable parallel computing on AWS in order to run very complex simulations simultaneously.
The strategy I am adapting is a hybrid approach where I have a MATLAB Client on an EC2 c4.2xlarge 1 TB storage instance on which to run AWS Parallel Server. Here I have adopted an m5.xlarge headnode and a set of 32-core EC2 c6a.8xlarge instances, with a maximum of 16 workers per machine, and 1 TB of storage.
From my experiments, I could see that:
1 Simulation/Interation = 1 Worker = 1 CPU Core = 1 hour and 30 minutes of time of execution
Since I need to run 1,000 simulations, I should use about 65 EC2 c6a.8xlarge instances (1000/16) running parallel computation in about 1 hour and 30 minutes.
I honestly have the fear that this is too costly and computationally intense.
Is my analysis correct? Or can I use a better strategy? I am using MATLAB Parallel Computing and AWS Parallel Server for the first time
Thank you very much

Antworten (1)

Ayush
Ayush am 2 Jan. 2024
Here are few of the best strategy to enable parallel computing on AWS in order to run very complex simulations simultaneously in MATLAB and Simulink:
  • Profiling and Optimization: Before scaling up to a large number of AWS instances, make sure your code is optimized. Use MATLAB's Profiler to identify bottlenecks and optimize the simulation code to run as efficiently as possible. You may refer this: https://www.mathworks.com/help/matlab/matlab_prog/profiling-for-improving-performance.
  • Batch Processing: Consider using MATLAB's batch processing capabilities, which can queue up work for the parallel pool. This way, you can have multiple simulations queued to run on a smaller number of workers, which might take longer but could reduce costs. You may refer this: https://www.mathworks.com/help/parallel-computing/batch-processing.html
  • Scaling Strategy: Instead of launching all the instances at once, consider a scaling strategy where you start with fewer instances and only add more if needed based on the queue length and the time your jobs are taking to start.
  • Checkpointing: If simulations are long-running, implement checkpointing. This way, if an instance fails or is interrupted, you can resume the simulation from the last checkpoint rather than starting over. You may refer this: https://www.mathworks.com/help/gads/work-with-checkpoint-files.html
  • Parallel Efficiency: Ensure that your simulations are embarrassingly parallel and that there is very little to no inter-process communication that could cause overhead and reduce efficiency.
Thanks,
Ayush

Kategorien

Mehr zu Parallel Computing Fundamentals finden Sie in Help Center und File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by