5 Uses of Determinism

Mark S. Miller markm@caplet.com
Thu, 06 May 1999 03:57:42 -0700


The Turing and Von Neumann models of computation are strictly deterministic 
(as is that other Von Neumann model, cellular automata).  In all of these 
systems, given the initial state of a computation and the specification of 
the machine it ran on, one can necessarily deterministically replay the 
computation.

Of course, this is true only because these models idealize-away many of the 
messy issues that plague real computation.  Actors are an alternative 
foundational model of computation that idealizes-in many of the issues that 
these systems idealize-away.

The first such issue, explored by many besides Actors, is the contrast of 
"transformational" vs "reactive" computation.  A compiler is 
transformational: all its inputs exist before it begins, it runs 
deterministically producing output, and terminates.  A compiler that doesn't 
terminate is buggy.  

A calculator is reactive: new inputs come in after it's started, these 
inputs effect both its state and its outputs.  Its outputs may effect what 
further inputs it gets, so these outputs need to be transmitted before 
the calculator terminates.  We do not require the calculator to terminate in 
order to be correct.  Indeed, for a calculator, the opposite is true.  Even 
though you can replay a calculator from a log of its inputs, you cannot 
model a calculator as something that computes from pre-existing inputs, 
since the inputs depend (via the external human) on previous outputs.

Actors are inherently distributed and asynchronous, and so are pervasively 
non-deterministic.  Actors successfully idealize all non-determinism as 
"arrival order non-determinism":  Messages race towards an actor from 
multiple asynchronous sources.  As Einstein would insist, these messages 
had no prior determined intrinsic full order.  Nevertheless, they are 
delivered to the recipient actor in some *partially arbitrary* fully 
serialized order. In order to replay an actor, you not only need the 
messages sent to it by the external world, you need to know how these races 
were resolved.

An E Vat is a lot like an Actor, and at Vat-to-Vat granularity, E 
computation is a lot like Actor computation.  Within a Vat we have 
approximately the singly threaded synchronous deterministic world of Von 
Neumann.  It seems this is a good enough approximation that we could, under 
constrained but useful circumstances, arrange to segregate all the 
non-deterministic inputs to a Vat (or a subgraph of objects within a Vat) 
such that the remainder of the Vat's computation is deterministically 
replayable from these non-deterministic inputs.  Why might we want to 
*eventually* do this?


1) Cheaper Commitment.

Databases often use this technique by writing a "transaction log".  Once a 
transaction request has been successfully logged, the transaction can be 
committed even though a crash may destroy its result state, given that the 
transaction can be deterministically reproduced from the logged request.  I 
first saw this technique applied to committing general purpose 
computation by Rob Strom: Do a full checkpoint occasionally, and cheaply  
log inputs until you take the next checkpoint.  

It's not clear whether this wins over KeyKOS/EROS mmu-based checkpointing, 
since that's also incremental.  But is sure beats the crap out of 
non-incremental checkpointing, which is the only reasonable alternative for 
those sitting above the jvm.


2) Cheaper Fault Tolerance.

This is really a variation of #1, where the understudy is separated in space 
rather than time.  I first saw this technique applied by Auragen (which 
since disappeared), and since rather brilliantly by Rob Strom.


3) Inward Bit Confinement. 

Once we've plugged all means of communication accounted for by causality 
in our computational-model (or system-specification), all remaining 
ability to communicate must ride on informational inputs that are modelled 
as uncaused ("spontaneous"?).  These are exactly the non-deterministic 
inputs to computation.  In the confinement scenario 
http://www.erights.org/elib/capability/confinement.html , Alice generally 
isn't in a position to constrain Mallet (or verify that he is constrained), 
but she is in a position to constrain Bob (or verify that he is constrained).

If she constrains Bob to be deterministic only except for non-deterministic 
inputs that she believes Mallet cannot influence, then she knows that Bob 
cannot receive covert information from Mallet.  Therefore, Bob cannot 
receive instructions from Mallet for making use of those powers given to him 
by Alice, but which Alice wishes Mallet not to exercise.  Even though Bob 
wishes to use these powers to do Mallet's bidding, he cannot find out what 
that bidding is.


4) Debugging.  

When a compiler demonstrates a bug, since you presumably have the program 
which elicited the bug, you can reproduce it.  Assuming your debugger 
doesn't effect computation as experienced by the compiler, you should be 
able, after the fact, to reproduce the bug under the debugger; even though 
it's later, the bug didn't happen under the debugger, and no extra cost was 
spent to instrument the compiler.  This effect, normally taken for granted, 
shows the power of deterministic replay at its best.

For non-deterministic computation, if we don't log non-deterministic inputs, 
we can't replay past computations.  Such a non-deterministic program may 
drive us crazy with bugs that occasionally manifest themselves, but that we 
can never catch under the debugger, because it happens to take other paths 
then.

If we are willing to pay the instrumentation costs to log these inputs, then 
we can deterministically replay the remainder of the computation under a 
debugger, and watch a previous bug occur under the scrutiny of the debugger. 
Unfortunately, these instrumentation costs can be large.  Fortunately, 
tricks #1 (commit) or #2 (fault tolerance) would already have us pay these 
costs, because they're still cheaper than the alternative.

For example, one can combine these techniques to build a server that is 
extremely resilient to its own bugs, and that really helps in finding these 
bugs.  (We came up with this plan originally at EC, but never came close to 
implementing it.)  The server process occasionally checkpoints, and also 
logs all non-deterministic inputs needed to replay it from one checkpoint to 
the next.  It keeps at most three checkpoints on disk simultaneously.  In 
order from oldest to youngest:

	a) the one committed to.  
	b) the one being checked.
	c) the one being written out.

as well as the two log-sequences a..b and b..c.  Unlike scenario #1, the log 
doesn't let us commit at finer granularity that a full checkpoint.  Rather, 
while it typically allows us to recover (from a crash) at this fine 
granularity, our only real commitment is to checkpoint #a.  Once #b is 
completely written out, it is read-in/revived in a checking process which 
asks all the objects which can to check their internal consistency.  If all 
these objects say everything's cool, we commit to #b and throw away #a and 
a..b.  If we're not cool, then we save away #a, a..b, and #b for debugging, 
crash the server, revive it from #a, and continue the service forward.  

This time, we hope that non-determinism will allow the live service to avoid 
hitting the bug again.  We can even use our knowledge of where the 
non-determinism lies to ensure that other correct choices are made whenever 
possible, to make path divergence more likely.  The very irreproducibility 
of non-deterministic bugs, that's normally such a liability to robustness, is 
potentially an asset for staying in operation while we figure out what's wrong.

To do this figuring out, we've got #a, which reported itself to be 
consistent, or we wouldn't have committed to it.  We've got #b, with a known 
inconsistency.  And we have the a..b log, which should allow us to 
scrutinize the computation as it proceeds from an apparently consistent 
state to an inconsistent one.  Given the E model that non-deterministic 
inputs only enter as discrete events on the event loop, and given enough 
spare cycles, one could even automatically isolate, with the 
self-consistency checking, the event at which inconsistency becomes evident.  
Only the execution of this individual event would then need to be stepped 
through manually in a debugger.


5) Contract Verification.

This is a new discovery, and is the most interesting.  It will be covered in 
later email.


----------------------------------------------------------------------------

			Hash Tables and Free Will


Also to be covered in future email:

I don't expect to do any of these immediately, but #3 (confinement) and #5 
(contract verification) do become relevant in the near future.  It turns out 
that #5 has a lot in common with #1 (checkpoint), #2 (fault tolerance), and 
#4 (debugging), but #3 (confinement) is curiously different.  To do any of 
these, we need to do a good job identifying all the hidden sources of 
non-determinism.  I thought I had, but recently hit a rude surprise: hash 
table enumeration order.  Do KeyKOS and EROS also have this covert channel?  
(I think so.)  Was this channel identified?  Are it's dangers understood?

To plug this, I came up with a scheme, refined by Ping, to make hash tables 
deterministic.  It's cheap but it's not free.  However, it does force a 
non-upwards compatible change to the determination of whether two ConstMaps 
are the same.  In all other ways, the proposed hash tables are upwards 
compatible to the current ones.  I look forward to writing this up.


	Cheers,
	--MarkM