Skip to content
This repository has been archived by the owner on May 24, 2018. It is now read-only.

error when running multi-machine example #211

Open
weihaoxie opened this issue Jul 27, 2015 · 2 comments
Open

error when running multi-machine example #211

weihaoxie opened this issue Jul 27, 2015 · 2 comments

Comments

@weihaoxie
Copy link

I run multi-machine example and occur error .
I don't know how to deal with it .
Can any one help me?

  • ../../dmlc-core/tracker/dmlc_mpi.py -H hosts -n 1 -s 1 ../../bin/cxxnet.ps bowl.conf update_on_server=1 param_server=dist
    2015-07-27 11:04:10,909 INFO start listen on 127.0.1.1:9091
    mpirun: Error: unknown option "-env"
    Type 'mpirun --help' for usage.
    Exception in thread Thread-3:
    Traceback (most recent call last):
    File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner
    self.run()
    File "/usr/lib/python2.7/threading.py", line 763, in run
    self.__target(_self.__args, *_self.__kwargs)
    File "../../dmlc-core/tracker/dmlc_mpi.py", line 63, in run
    subprocess.check_call(cmd, shell = True, env = env)
    File "/usr/lib/python2.7/subprocess.py", line 540, in check_call
    raise CalledProcessError(retcode, cmd)
    CalledProcessError: Command 'mpirun -n 2 --hostfile hosts -env DMLC_NUM_SERVER 1 -env DMLC_NUM_WORKER 1 -env DMLC_PS_ROOT_PORT 9092 -env DMLC_PS_ROOT_URI 127.0.1.1 -env DMLC_TRACKER_URI 127.0.1.1 -env DMLC_TRACKER_PORT 9091 ../../bin/cxxnet.ps bowl.conf update_on_server=1 param_server=dist ' returned non-zero exit status 213

F0727 11:04:21.168474 12551 manager.cc:55] Timeout (10 sec) to wait all other nodes initialized. See commmets for more information
*** Check failure stack trace: ***
@ 0x64343a google::LogMessage::Fail()
@ 0x64523c google::LogMessage::SendToLog()
@ 0x643078 google::LogMessage::Flush()
@ 0x645b6e google::LogMessageFatal::~LogMessageFatal()
@ 0x5ab37e ps::Manager::Run()
@ 0x5b1d6e ps::Postoffice::Run()
@ 0x40f475 main
@ 0x7f8327388ec5 (unknown)
@ 0x4115bf (unknown)
Aborted (core dumped)
Exception in thread Thread-1:
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner
self.run()
File "/usr/lib/python2.7/threading.py", line 763, in run
self.__target(_self.__args, *_self.__kwargs)
File "/home/meitu/weihao/cxxnet/dmlc-core/tracker/tracker.py", line 345, in
self.thread = Thread(target = (lambda : subprocess.check_call(self.cmd, env=env, shell=True)), args = ())
File "/usr/lib/python2.7/subprocess.py", line 540, in check_call
raise CalledProcessError(retcode, cmd)
CalledProcessError: Command '../../bin/cxxnet.ps bowl.conf update_on_server=1 param_server=dist ' returned non-zero exit status 134

@TangXing
Copy link

me too!!!

@zyzhong
Copy link

zyzhong commented Oct 21, 2017

Because you are using openmpi. Just edit the file ../../dmlc-core/tracker/dmlc_mpi.py,

for mpich2

cmd += ' -env %s %s' % (k, v)

for openmpi

cmd += ' -x %s' % k

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants