matlab seg fault on distributed computing cluster

My department has a computing cluster with matlab 2009a and the distributed computing server installed, however, no one has taken the time to get remote submission working, which is what I am trying.
My PC has matlab 2011a installed. I have tried using both the 2009a and the 2011a submission scripts, but both result in the same seg fault on the nodes of the cluster when using the distributed job validation tool. I read this seg fault in the file, for example, .../Job21/Task1.log:
Executing: /fsys2/projects/cluster/mathworks_r2009a/bin/worker
which: no shopt in (/u/ihincks/bin:/usr/local/bin:/usr/bin:/usr/X11R6/bin:/bin:/usr/games:/opt/gnome/bin:/opt/kde3/bin:/usr/lib/mit/bin:/usr/lib/mit/sbin:.)
< M A T L A B (R) >
Copyright 1984-2009 The MathWorks, Inc.
Version 7.8.0.347 (R2009a) 64-bit (glnxa64)
February 12, 2009
To get started, type one of these: helpwin, helpdesk, or demo.
For product information, visit www.mathworks.com.
About to construct the storage object using constructor "makeFileStorageObject" and location "/u/ihincks/remotematlab/ihincks/"
Saved wrapper =
matlabroot: '/usr/local/matlab'
separator: '/'
sentinel: '@'
function_handle: [1x1 struct]
Struct =
function: 'distcomp.simplejob'
type: 'classsimple'
file: ''
class: 'distcomp.filestorage'
------------------------------------------------------------------------
Segmentation violation detected at Tue Jul 19 09:39:25 2011
------------------------------------------------------------------------
Configuration:
MATLAB Version: 7.8.0.347 (R2009a)
MATLAB License: 308754
Operating System: Linux 2.6.16.46-0.12-smp #1 SMP Thu May 17 14:00:09 UTC 2007 x86_64
GNU C Library: 2.4 development
Window System: No active display
Current Visual: None
Processor ID: x86 Family 6 Model 7 Stepping 6, GenuineIntel
Virtual Machine: Java 1.6.0_04-b12 with Sun Microsystems Inc. Java HotSpot(TM) 64-Bit Server VM mixed mode
Default Encoding: UTF-8
Fault Count: 1
Register State:
rax = 00002ac5a1fd4dc0 rbx = 0000000000000000
rcx = 0000000001487390 rdx = 0000000000000021
rbp = 00000000407c43d0 rsi = 0000000000000000
rdi = 0000000002d0e2d0 rsp = 00000000407c43c0
r8 = feff29c3ff646b6f r9 = 00000000407c38f0
r10 = 00000000008bc600 r11 = 00002ac5a0e7c7b0
r12 = 0000000002d0e2d0 r13 = 0000000001487390
r14 = 00002ac5a2219120 r15 = 00002aaac41fb3b8
rip = 00002ac5a1fd530e flg = 0000000000010206
Stack Trace:
[0] libmwm_dispatcher.so:fillFunctionHandle(function_handle_tag*, mdMxarrayFunctionHandle*)(0x2ac5a13078a0, 0x02d0e2d0 ", 0x0118b6d0, 0x2aaac41f8df0) + 30 bytes
[1] libmwm_dispatcher.so:mdFunctionHandleFromStruct(0x2ac5a10b3460, 100, 4, 0x2ac5a342c980) + 101 bytes
[2] libmx.so:_HandleArrayForStream(miStreamRec_tag*, miItem_tag*, miStreamCommandType, int)(0x2ac50000000e, 776, 0, 0) + 1689 bytes
[3] libmx.so:miGetCurrentItem(14, 0, 0, 0) + 341 bytes
[4] libmat.so:matGetValueAtOffset(MATFile_tag*, char*, unsigned long)(0x407c5760 ", 0x2ac5a10b38e4, 128, 0) + 58 bytes
[5] libmat.so:matGetVariable5(MATFile_tag*, char const*)(0x407c57a0, 0x407c5788, 0x01487cf8 "/u/ihincks/remotematlab/ihincks/..", 0x02d23910) + 57 bytes
etc etc., I don't want to clutter the post too much. Let me know if you want to see any more output from anything, or hear anymore details.
Has anyone come across this? Can anyone suggest that I try something? I don't have root permissions on the cluster, and the people who maintain it are more or less useless.
Thanks!

Antworten (1)

Edric Ellis
Edric Ellis am 19 Jul. 2011

0 Stimmen

Parallel Computing Toolbox is only designed to work with the same release of MATLAB Distributed Computing Server, as per the documentation. Could you try installing R2009a on your machine and using that? I suspect it's not enough simply to use the R2009a submission scripts - it looks like the files that are being created by R2011a cannot be loaded correctly on the R2009a workers.

5 Kommentare

Ian
Ian am 19 Jul. 2011
Thanks for the response, I am downloading 2009a as I type, and it was happily easier to find on the website than I expected. It seems like a segmentation fault is a pretty extreme response, as opposed to just an error.
The documentation says "Note that the toolbox and server software must be the same version". Do you think this means that if I were to replace the /toolbox/distcomp on my PC (currently 2011a) with the 2009a version, that it would work? Or am I naive in thinking that the extent of a toolbox is just what's in those folders.
Ian
Jason Ross
Jason Ross am 19 Jul. 2011
You can't mix and match toolboxes from different releases. The code within the toolbox can be dependent on other parts of MATLAB that changed between releases and won't be compatible.
Ian
Ian am 19 Jul. 2011
Good to know.
Okay, I have r2009a (and the exact release number as the cluster) installed along side r2011a. Now matlab is seg faulting on my _client machine_, with a similar error, just after I start the validation. It does this before it sees the submit script, and in the "Find Resource" phase, so it is getting less far than r2011 was...any suggestions? Is it stable to have two different versions running along side each other?
Edric Ellis
Edric Ellis am 20 Jul. 2011
It should be fine to have multiple releases of MATLAB installed simultaneously on your machine (we at MathWorks do that all the time). Do you get the seg-fault when you attempt to validate the 'local' configuration?
Ian
Ian am 20 Jul. 2011
The 'local' validation runs properly. But Hurrah! I have the cluster validation working (except matlabpool - the nodes are not allowed to see the outside world).
The thing that fixed it is weird. I'm keeping my submit scripts in my home directory, say /home/username/matlabcode/distcomp, instead of toolbox/local. I had been setting DataLocation to /home/username/matlabcode, and it was seg faulting every time with this. If I change it to any other folder that exists (I tried all of /home/username, /home/username/somefolder, /home/username/matlabcode/distcomp, somefolder (relative path)) it does not seg fault, and validates properly.
Thanks for your help

Melden Sie sich an, um zu kommentieren.

Kategorien

Mehr zu MATLAB Parallel Server finden Sie in Hilfe-Center und File Exchange

Gefragt:

Ian
am 19 Jul. 2011

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by