This disclosure is to establish prior art on the methods described here. There may be patents that cover these methods, and the reader should not take the existence of this document as an indication that the methods can be freely used. The reader should make their own enquiries.

Suppose you have set of processors, p0 thru pn-1, and each piece of work to be performed by a processor has some number k associated with it. The problem is to allocate the work roughly equally across the subset of processors that are actually functioning. Further, over a period of time, a series of related pieces of work may arrive with the same k. To the maximum possible extent you want each of the related pieces of work to be handled by the same processor. If a processor fails, you want its work to be distributed across the remaining processors, but still maintaining the property that pieces of work with a given value for k are handled by the same processor. In general we assume that the k values are randomly spread through a large number space.

The motivation for these requirements is that for a given k the processor may be caching information that improves performance. Or it may be enforcing some invariant, such as in a lock manager where each request for a given lock must go to the same processor, or it clearly won't function.

To achieve this, construct a list of integers of size n. Element i contains i if processor i is functional, and -1 otherwise.

Calculate k mod n, and use the result as an index into the list. If the value contained there is non-negative, then it is the number of the processor to use. If it is -1, remove the element from the list, decrement the value of n and repeat. Continue until a processor number is found.

This scheme is fault tolerant to a degree, in that the resulting system has a high level of availability.

It also has the property that the failure of a processor only impacts on the allocation of pieces of work that would have been allocated to the failed processor. It does not result in a complete rearrangement of the work allocations. This makes things a lot simpler when dealing with things like distributed lock managers.

The fault tolerance can be improved by an extension of the algorithm that allows a distributed master/slave arrangement, where the master number for a given k is determined as above, and a slave number is obtained by treating the master as if it were not functioning. Each processor is a master for some subset of the k values, and is a slave for another subset. For any given master, each of the other processors is a slave for a roughly equal portion of the given master's subset of the k values.

There are some boring details that I've not discussed, such as how an entity wanting work to be done determines which processors are functioning, and the stuff related to the exact sequence of steps that must be performed when a processor breaks, or is repaired. I don't believe anyone could patent them because once you start thinking about it, the steps are pretty obvious.