Scheduler Plugins¶
-
class
distributed.diagnostics.plugin.
SchedulerPlugin
[source]¶ Interface to extend the Scheduler
The scheduler operates by triggering and responding to events like
task_finished
,update_graph
,task_erred
, etc..A plugin enables custom code to run at each of those same events. The scheduler will run the analogous methods on this class when each event is triggered. This runs user code within the scheduler thread that can perform arbitrary operations in synchrony with the scheduler itself.
Plugins are often used for diagnostics and measurement, but have full access to the scheduler and could in principle affect core scheduling.
To implement a plugin implement some of the methods of this class and add the plugin to the scheduler with
Scheduler.add_plugin(myplugin)
.Examples
>>> class Counter(SchedulerPlugin): ... def __init__(self): ... self.counter = 0 ... ... def transition(self, key, start, finish, *args, **kwargs): ... if start == 'processing' and finish == 'memory': ... self.counter += 1 ... ... def restart(self, scheduler): ... self.counter = 0
>>> c = Counter() >>> scheduler.add_plugin(c) # doctest: +SKIP
-
transition
(key, start, finish, *args, **kwargs)[source]¶ Run whenever a task changes state
Parameters: key: string
start: string
Start state of the transition. One of released, waiting, processing, memory, error.
finish: string
Final state of the transition.
*args, **kwargs: More options passed when transitioning
This may include worker ID, compute time, etc.
-
RabbitMQ Example¶
RabbitMQ is a distributed messaging queue that we can use to post updates about task transitions. By posting transitions to RabbitMQ, we allow other machines to do the processing of transitions and keep scheduler processing to a minimum. See the RabbitMQ tutorial for more information on RabbitMQ and how to consume the messages.
import json
from distributed.diagnostics.plugin import SchedulerPlugin
import pika
class RabbitMQPlugin(SchedulerPlugin):
def __init__(self):
# Update host to be your RabbitMQ host
self.connection = pika.BlockingConnection(
pika.ConnectionParameters(host='localhost'))
self.channel = self.connection.channel()
self.channel.queue_declare(queue='dask_task_status', durable=True)
def transition(self, key, start, finish, *args, **kwargs):
message = dict(
key=key,
start=start,
finish=finish,
)
self.channel.basic_publish(
exchange='',
routing_key='dask_task_status',
body=json.dumps(message),
properties=pika.BasicProperties(
delivery_mode=2, # make message persistent
))
@click.command()
def dask_setup(scheduler):
plugin = RabbitMQPlugin()
scheduler.add_plugin(plugin)
Run with: dask-scheduler --preload <filename.py>
Accessing Full Task State¶
If you would like to access the full distributed.scheduler.TaskState
stored in the scheduler you can do this by passing and storing a reference to
the scheduler as so:
from distributed.diagnostics.plugin import SchedulerPlugin
class MyPlugin(SchedulerPlugin):
def __init__(self, scheduler):
self.scheduler = scheduler
def transition(self, key, start, finish, *args, **kwargs):
# Get full TaskState
ts = self.scheduler.tasks[key]
@click.command()
def dask_setup(scheduler):
plugin = MyPlugin(scheduler)
scheduler.add_plugin(plugin)