[E-Lang] Deadlock versus Datalock
Mark S. Miller
Thu, 17 May 2001 23:45:33 -0700
At 04:52 PM Thursday 5/17/01, Karp, Alan wrote:
>[...] Imagine my disappointment when
>I read about datalock in EiaW.
>I don't see any real difference between
In recent private correspondence that asked a similar question, I wrote
>On http://www.erights.org/elib/concurrency/event-loop.html we list 3
>liveness issue in addition to deadlock: livelock, datalock, and gridlock. In
>this context, what we call deadlock (or deadly embrace) might be renamed
>control-lock. A cycle of mutually dependent promises is a datalock, and a
>good example is described at
>If you take a conventional thread-based program that contains a deadlock bug
>you're unaware of, and you transliterate it as literally as reasonable into a
>corresponding E program, most of your deadlock bugs will vanish, even though
>you never knew they were there. Those that don't vanish will turn into
>How could some bugs vanish in an attempted "literal" translation of a
>program? Because, in attempting such a translation, E will lead you into
>making subsystems responsive to requests during times between having sent
>off (with "<-") a previous request and receiving a response, whereas the
>conventional approach would have led you to locking the subsystem. E will
>also lead you to writing subsystems so their invariants are in order at such
>times, making it fairly painless to allow it to react to further requests
>while in such a state.
The above is the best explanation I have for a curious empirical fact: In
all the code written in the E language, which includes some significant
distributed apps with plenty of concurrency issues (most written by MarcS),
so far we have only seen one accidental datalock bug. In EC Habitats,
written in Original-E (which had essentially the same concurrency control
philosophy), I can recall plenty of deadlock bugs in the small amount of
Java and C code that dealt with threads, but I can't recall a single E-based
deadlock or datalock. Maybe I just forgot, but they were at least
I don't know whether the above explanation is adequate to account for this.
>The former locks up the involved threads, but the latter prevents any work
>from getting done. The good news is that the data lock code can continue to
>send its requests to the promises, but they never resolve, so no actual work
>gets done. The bad news is that the data lock code can continue to send its
>requests to the promises, so when I get frustrated and interrupt the
>program, I have no idea where the error is. At least when I deadlock, the
>program counter tells me what source line I'm stuck at.
Very good point. It would be good to try to imagine what diagnostic hooks
would help us to understand what went wrong when a datalock happens.
>Actually, there is an essential difference. The deadlock code will work
>properly almost every time while the datalock code will fail every time.
>Are there also race conditions with datalock?
Also a very good point.
The E vat implementation ideally has only one source of non-determinism:
messages entering from the outside world (both other vats and "devices"),
and where in the queue order these messages get queued (corresponding to
Hewitt's "arrival order non-determinism"). (In actually, there are some
powerful capabilities that allow a subsystem holding them to
non-deterministic by sensing GC, but the only subsystems to which these
should be provided are ones trusted to maintain determinism for their
unprivileged clients.) E is designed so that unprivileged E programs cannot
prevent themselves from being deterministically replayed. The price is only
a moderate amount of logging (of the above source of non-determinism). But
much work remains before this is an engineering reality.
So, because of non-determinism, computation can in general diverge.
However, for computation that doesn't diverge, the occurrence of datalock
bugs are in fact deterministic. Yes, this last statement is technically