MDCS Issue: Client can't connect to server. Failing "LockDown Test"

I am resetting my MDCS server and I am running into a bit of difficulty with the process. This MDCS server was working until recently so I believe something has been updated that messed with the connections to the service, though I can't say what. So far, I have performed the "mdce install" and "mdce start" commands successfully, and normally I would be able to start jobmanagers but I can't. When I use the Admincenter I see that the host is recognized, but the MDCE service is unavailable. I ran the connectivity test and I get this error:
LockDown Test ERROR com.mathworks.toolbox.distcomp.admincenter.testing.infra.util.TestFailureException: Test execution on fsd0j5a02:27352 failed. Cause: com.mathworks.toolbox.distcomp.control.ControlExceptionFactory$HostPortException: An error occurred while connecting to the mdce service on the host fsd0j5a02.
Any suggestions on what might cause this error would be appreciated. The error message itself is not particularly revealing to me. The previous test indicated that the port was open, so I don't believe this is an issue with port 27352. Version is 2009a and 32 bit installation.
Thanks,
Stephen

Antworten (1)

Jason Ross
Jason Ross am 2 Sep. 2011

0 Stimmen

  1. Are you running any firewalls? Software that might be acting like one? (Malware scanner, virus scanner -- they might also include a firewall)
  2. In Admin Center, there are a suite of connectivity tests (among other things). Do these pass or generate warnings? Look especially for DNS issues like proper name resolution.
  3. Is it possible that something else is taking the port? If you run "netstat -a" at a command prompt, you can see what is on each port.
It definitely sounds like the communication can't be established and that's what is causing the issue.

3 Kommentare

1) I've shut these off for time being.
2) It passed the first 7 tests in admincenter, then failed this one which I posted the warning from. All tests after the failed test get skipped.
3) When I run netstat -a as you suggest, I get some results that seem interesting, though I don't know how to interpret them.
The port in question, port 27352 has seven line items, all in a state labeled as "CLOSE_WAIT". I'm assuming that I need to get these to finish closing? The foreign addresses on these are all ports that originate on the same server. Anything I can do to help finish the closing of these ports? Also of potential relevance is that 27351 and 27353 are labeled as the foreign address of at least one port in "TIME_WAIT" state.
Any sage advice you might be able to provide on this information is appreciated.
Thanks,
Stephen
It sounds like something is still running and needs to be killed off.
If you run "nodestatus" (in matlabroot\toolbox\distcomp\bin), do you show anything? You might want to try tearing everything down and then rebuilding:
1) stopworker (depending on the number of workers, you may have to run this a few times for different workers, but nodestatus will tell you)
2) stopjobmanager (likely only once)
3) mdce uninstall
Check the netstat output. If it's not clean, reboot. Now stand things back up:
1) mdce install
2) mdce start
3) startjobmanager
4) startworker
You might also want to use the "-clean" flag when you are starting up again, which will clear out old information, checkpoints, etc.
Note that I left out the switches to name the workers and job manager. You'll need those (although you likely already know the names)

Melden Sie sich an, um zu kommentieren.

Kategorien

Mehr zu Startup and Shutdown finden Sie in Hilfe-Center und File Exchange

Tags

Gefragt:

am 2 Sep. 2011

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by