document updated 17 years ago, on Dec 25, 2007
Sometimes I have a task that needs to run for a long time (hours or days), I want to be able to tolerate reboots/process restarts, and I may even want it to run across a distributed network.
The way to implement this frequently seems to be: an asynchronous persistent job queue.
- This concept is broadly known as message-oriented middleware.
- It uses the "command pattern", where each message contains all the information to be able to start a job.
- Messages may also specify constraints that have to be met before the job can be processed. For instance, noting that the job shouldn't be processed until 3:00pm, or that it must be able to exclusively use a specific resource (and so can't run alongside another job that's using that resource, or if the resource is just unavailable at that moment).
- In all my incarnations, the queue is stored on-disk, so that if the process dies before a job finishes running, or if the computer reboots, we'll be able to painlessly restart where we left off. Also, each job takes at most a few minutes to complete, and any further work that needs to be done is carried on in new job-messages. (thus, we'll have to redo a small amount of work, but no more than the maximum amount of time it takes to complete one job-message)
- To the extent possible, job handlers should be as functional (side-effect-free) as possible. That is: the main output of processing a job should only be the creation of more job-messages.
- if the only goal of the larger task is to generate a report, you can stick to this rule by having a job called "Have the human review it", and give it the constraint that it can't be finished until a human stops by to review things (so it just sits in the queue until then, and when the larger task is done, the queue is empty except for these messages).
- If the messages are side-effect-free, then it becomes trivial to run multiple jobs in parallel. It also becomes possible to run jobs across a distributed network.
- Most programming languages have enough features to implement the above, and get almost any task done, albeit sometimes in a hard-coded and sometimes awkward fashion. For languages that go further, and support serializable continuations, job handlers can be written that look nearly like any other code... any function can fire off a bunch of messages, run a continuation (wait for all of the messages to be processed), get a response on each of them, and compute a single value based on the large number of responses. It's that last part — collecting lots of results down into one — that's awkward to do without continuations.
- It's somewhat related to batch queues/batch processing, and to distributed high-throughput computing.
Similar to
- mainframe batch queues
- RTOS scheduler
-
Modules
Implementations
Terminology
- output queue — a file or a printer, where the final results of a job are sent
- dispatcher — the bit of code that watches the job queue, and decides when/which jobs will be run