Python API (advanced)¶
In some rare cases experts may want to create Scheduler
and Worker
objects explicitly in Python manually. This is often necessary when making
tools to automatically deploy Dask in custom settings.
However, often it is sufficient to rely on the Dask command line interface.
Scheduler¶
Start the Scheduler, provide the listening port (defaults to 8786) and Tornado
IOLoop (defaults to IOLoop.current()
)
from distributed import Scheduler
from tornado.ioloop import IOLoop
from threading import Thread
s = Scheduler()
s.start('tcp://:8786') # Listen on TCP port 8786
loop = IOLoop.current()
loop.start()
Alternatively, you may want the IOLoop and scheduler to run in a separate
thread. In that case you would replace the loop.start()
call with the
following:
t = Thread(target=loop.start, daemon=True)
t.start()
Worker¶
On other nodes start worker processes that point to the URL of the scheduler.
from distributed import Worker
from tornado.ioloop import IOLoop
from threading import Thread
w = Worker('tcp://127.0.0.1:8786')
w.start() # choose randomly assigned port
loop = IOLoop.current()
loop.start()
Alternatively, replace Worker
with Nanny
if you want your workers to be
managed in a separate process by a local nanny process. This allows workers to
restart themselves in case of failure, provides some additional monitoring, and
is useful when coordinating many workers that should live in different
processes to avoid the GIL.