[cap-talk] Session failures (was: Re: Persistence as a cap value)

Jed Donnelley capability at webstart.com
Fri Mar 14 12:14:37 EDT 2008


At 08:38 AM 3/14/2008, Mark Miller wrote:
>On Fri, Mar 14, 2008 at 7:19 AM, Jonathan S. Shapiro <shap at eros-os.com> wrote:
> >  On Fri, 2008-03-14 at 12:44 +0100, BELLIZZOMI VALERIO wrote:
> >  > With atomic-action systems, the protocol to reestablish consistency
> >  > should be simple. It should only be a question of rendezvous and synch
> >  > of commit-points...
> >
> >  This is surprisingly hard to get right. There is an inherent hazard of
> >  unbounded roll-back during resynchronization unless cycles in the
> >  consistency dependency graph can be eliminated.
> >
> >  Achieving this reliably for communicating systems that checkpoint
> >  independently was my originally intended dissertation topic. I abandoned
> >  it as too hard, and put my attention into merely building one of the
> >  worlds fastest microkernels and formally verifying an important security
> >  property instead.
>
>I also avoided solving this in E, using broken references as a way to
>punt this to the application's responsibility. But with increasingly
>many computers now shipping with non-volatile solid state memory --
>which has a much lower latency for commit than does a disk seek -- I
>find the Waterken approach to this issue increasingly attractive:
>
>* vats/sites/machines/whatever are only asynchronously coupled to each other.
>* Outgoing messages may only be released after the state which
>generated them has checkpointed.
>* Unacknowledged outgoing messages are buffered and retried forever.

<right here lies the Two Generals problem:
http://en.wikipedia.org/wiki/Two_Generals'_Problem

which I think helps to clarify the difficulty, indeed
the General insolubility.>

>* Acks may only be released after the state caused by their reception
>has checkpointed.
>
>The complexity which has deterred prior solutions can be seen, in
>retrospect, as being caused by the perceived need for an optimistic
>strategy in order to overcome the horrendous latency of disk seeks on
>commitment. Asynchrony + pessimism is simple: no distributed rollback
>ever. Now it may finally also be affordable.

--Jed  http://www.webstart.com/jed-signature.html 



More information about the cap-talk mailing list