A New Algorithm Jonathan Shapiro (shap@cobra.cis.upenn.edu)
Wed, 31 May 1995 15:22:36 -0400

After several days of working it out, and some help from Norm Hardy and Jonathan Smith, I appear to have a working algorithm to implement Scalable Distributed Recoverable Execution. The algorithm provides distributed consistent checkpoints with bounded rollback. It does this be defining and maintaining consistency guarantees across nodes and allowing the nodes to checkpoint independently.

At this point, I allow cross-machine messages but not cross-machine object sharing; I believe the latter can be added, but it's more complex than I want to deal with right now.

I have several starts at a writeup. I'm still groping my way toward a straightforward explanation, but what I have so far appears to be holding up. I'll send it out for comments when I have it.

Jonathan Smith suggested the proposed name for the algorithm:

'SNO-CRASH: A Scalable Approach to Distributed Recoverable Execution

The leading ' is silent. The original name came from three notions:

'SNO-CRASH: as in: "There's no crash!"

                                  ^^ ^^ ^^^^^
	SNO: By reference to Dave Farber, one of the originators of
		SNOBOL.

	The obvious sci-fi reference, since a distributed net might
	conceivably be built on this class of architecture.

For the twisted few, there is also an acronym:

Save New Objects to Combat RAndom System Hazards

The 'SNO-CRASH kernel will probably be called DIMSUM, and will be an extension of EROS to distribution.

Historical note:

Puns are in keeping with the tradition of all of my prior system names:

Prior pun-oriented system names in my dubious past:

pSCAM: Pseudo-Cognitive Access Mechanism, a quasi-english

front end for relational databases.

GHOTI: Internal name for AT&T's UNIX debugger. The obvious pun on

	"fish," since it seemed the group was always fishing for
	survival, but we were also told that the italian homonym is a
	slang term for "screw."  Ghoti was originally named screw:

		If your program doesn't work, screw it.

EROS: Extremely Reliable Operating System
      A labor of love.

DIMSUM: DIstributed Meta SystUm Microkernel
	A reference to composable computing: choose one from column A
	and two from column B.