document updated 20 years ago, on May 3, 2006

SharedMessenger

SharedMessenger is a message oriented framework for performing long-running tasks (over hours or days), in a way that allows the program to be shut down (even abnormally) and then resumed. It is suitable for tasks that can segmented into smaller tasks that don't need ongoing communication between the sub-tasks (somewhat similar to the way that parallel tasks can be split up).

Provision is included for managing use of external tools, to keep from overusing them. That is, in some cases, long-running tasks may want to query a search engine or website, or access a database a very large number of times, but the program wishes to use these shared resources in a non-disruptive way.

Definitions

Message — A block of data that is the input or output of a sub-task. It's the main type of data that's persisted when the program isn't running.

Worker — A piece of code that performs a single type of sub-task. When it runs, it consumes a single message, does some work on it, and typically outputs one or more messages (with a message-type indicating they should be sent to other workers). A worker may need data from an external resource to do its job.

Resource — An external tool that performs useful work for us... often it's something that runs slowly, and is something that we don't want to overuse, or is something we only want to use at night (typically because it's shared with other users)

in the case of a website, if we access it too often, the website owners may get pissed if we access their site a million times in a few hours (but may not be upset if we do so over a week or month)
in the case of Google, they explicitely won't let us access the API more than 10,000 times a day
in the case of a shared database, it may be better to run large queries at night... and also, it's generally good, if running during the day, to not continuously run for a long period of time, to instead take breaks between queries, to lessen our overall load

Message filter — A piece of code (specific to a message type) that decides whether to keep a message or not, and it does so quickly (eg. seconds or minutes, not hours). Typically it's used to remove duplicate messages, so that work isn't performed twice. It runs as soon as a new message is created, so that the queue doesn't have to store excess data. Unlike workers, multiple filters can be attached to a given message type (and all filters must say "keep" for the message to be kept).

Workgroup — Something that identifies several sub-tasks as being part of a larger task (eg. identifies as message as originating from a single common parent message). Workgroups are purely optional, because often messages don't care if they came from a common parent. Alternatively, a message can also be identified as being part of multiple (nested) work-groups.

Once all messages in a workgroup are processed, a workgroup handler can perform some final processing on the output data (for example, to summarize the data, or to indicate to the user that a larger task has been completed) (AKA, a destructor).

(the only two things that contain persistent data are messages and workgroups. To always allow for global data storage, there's a default (root) workgroup created that is the parent to all messages)

Workgroups can be used by filters to remove duplicate sub-tasks within a single larger task, but to allow a sub-task to be worked on again if the parent task is intentionally repeated later. (yes, workgroups are intended to be an analogue of objects)

End-user interaction

The user starts the machine up, and gives it an instruction to start working on (often a single instruction, but it could be multiple instructions).
The machine creates many sub-tasks from the starting instruction, and starts working away at all its tasks (creating sub-tasks of sub-tasks, etc).
The user may stop the machine, since the queue is always remembered. When the machine is started again, it simply starts working on the queue again, and therefore doesn't lose much work when terminated. (if need be, a cron job can start the machine up periodically, in cases where the machine may want to wait a long time between doing work)
Eventually, the machine finishes the task, and often it either prints out a final report, or at least says "okay, that task is done" (or maybe it just says "there are no tasks left to do")

Details

The main message loop:

gets a message off the top of the queue
figures out what resources might be needed, based on its message type, and a message-type ⇒ resource map
asks the resource manager for those resources whether it's okay to use the resources now (or, if not now, when)
if the message can't be processed now, we put it in a separate queue that's ordered based on the time that a message should be able to run
once a message is found that can be run now, we start up the worker for that message type
if the worker creates new messages as output, we call the filters for the message types, and if a message isn't to be discarded, we add it to the bottom of the queue

Parallelizable. Because work is broken out into separate sub-tasks, and each sub-task is performed independently of each other (with no ongoing communication between them), the framework can be run on a networked cluster, on a multiprocessor computer, or in a multithreaded environment (which may useful if we spend large amounts of time waiting for external resources to do work for us).