[E-Lang] Deadlock-free [was what is good about E?]

zooko@zooko.com zooko@zooko.com
Tue, 17 Jul 2001 15:51:59 -0700


[adding Cc: Guido van Rossum <guido@digicool.com>. --Z]


 Ken Kahn wrote:
>
> I'd like to understand this deadlock issue better.

<snip>

> Languages like E seem to have some advantage over languages like Java
> regarding deadlock but I haven't seen a clear description of what it really
> is.


I wish I could answer this question Ken, but I don't understand datalock well
enough.  I *do* have some recent experience with using alternately the
multithreaded model and the event-based model in the same application, and 
I would like to share that experience in the form of a rant about why
deadlock-freeness is so important (but still doesn't make my list of the Top
Five Features of E).


 --- Why Multithreading Sucks, Using Mojo Nation As An Example

At the beginning of the Mojo Nation project, we used the standard way of doing
things: multithreading.  Each request that a Mojo Nation peer received from
another peer over the network was processed in a separate thread, and each data
structure which needed to be accessed in a thread-safe way was protected with
locks, so that if one threaded wanted to access the data structure while
another thread was in the middle of accessing it, the later thread would block
until the former thread was done using the data structure.

We soon entered into "Multi-Threaded Hell".  Data structures that did not have
locks around them would get accessed in an inconsistent way, resulting in
corrupted data and inexplicable behaviour.  It would be damned difficult,
trying to figure out how this kind of corruption could have happened, because
when looking at the source code of an object, you have to imagine that two or
more threads could be accessing the object's state at the same time.  Therefore
you can rely on very little in the way of "known facts about the state" while
trying to debug a piece of code.

In order to fix the inconsistency (or perhaps even just to try to generate some
stable "known facts about the state" so that you can read the code and have
some confidence that you know what it will do), you add locks around the data
structure.  This appears to fix the problem (although really there is no way
you can be sure because multithreading inconsistency bugs are almost always
non-reproduceable).

But then the next thing that happens is that a deadlock occurs.  One of the
threads is holding on to the lock around data A, while waiting for the lock
around data B to become available.  Another thread is holding on to the lock
around data B, while waiting for the lock around data A to become available.

Okay, so two threads are now permanently stopped, which is bad, but worse is
that they will never release the locks around A and B.  Thus *any* other thread
that comes along and waits for either A *or* B will also become permanently
stopped, adding any locks that it was holding to the "deadlock pileup".  As
MarcS wrote in _E_in_a_Walnut_: "[deadlock] chokes ... a set of 'critical
sections' which have been named 'critical' in thread programming for good
reasons".

This is "Multi-Threaded Hell".  As your application evolves, or as different
programmers encounter the sporadic and non-reproduceable corruption or deadlock
bugs, they will add or remove locks around different data structures, causing
your code base to veer back and forth along the line drawn by MarkM [1], erring
first on the side of more deadlocking, and then on the side of more corruption.
This kind of thrashing is bad for the quality of the code, bad for the forward
progress of the project, and bad for morale.


 --- How To Simply Solve the Problem

So, back to the history of Mojo Nation, we realized at some point that we just
didn't *need* multithreading!  A multithreaded application, when it manages
to steer between Scylla and Charybdis, can in theory be faster than the
equivalent event-loop-based application, because one thread can utilize the CPU
at the same time that another thread is blocked on disk access or network.
However in practice this is almost never a performance win, because (a) the
locking and unlocking costs CPU time, (b) you can use a non-blocking interface
to the network so you never have to wait on it, (c) the time you save on
Multi-Threaded Hell thrashing can be spent optimizing your real performance
hotspots and implementing faster algorithms.

So for Mojo Nation, we implemented a simple event-loop in Python, and gradually
(over the course of many many months and many releases of the beta product), we
removed all threads and replaced all multithreaded code with event-loop-based
code.


 --- Why Deadlock-Freeness Is Not On My List of the Five Features of E

Now this is why I do not include deadlock-freeness in my list of E's salient
features: because it is easy to implement an event-loop on top of a
multithreaded runtime.

Now, I agree with MarcS -- I believe that multithreading will go down in
history next to manual memory management as one of the biggest sources of bugs
ever, and as one of the biggest preventable wastes of programmer time and
customer money ever.

I am very excited about coding in E because I will never be forced to deal with
someone else's multithreading code, to argue with a teammate about the value of
multithreading, or to wrap non-blocking APIs on top of blocking APIs to modules
that I need to use.

But I already have five salient features on my list: distributed, persistent,
secure, object oriented, and dynamic, and that is enough for me.


Regards,

Zooko

[1] http://erights.org/elib/concurrency/event-loop.html

P.S.  An interesting and non-intuitive fact is that while replacing one
"coarse-grained" lock with multiple "smaller-grained" locks is *usually* moving
along the axis from deadlocking toward corruption, it *can* sometimes introduce
new deadlocks which didn't occur with the single coarse-grained lock!