transparent parallelism (was: Re: [E-Lang] Internal vs. external iteration)

zooko@zooko.com zooko@zooko.com
Sun, 01 Apr 2001 13:17:27 -0700


[To: `e-lang' list, Cc: Felix Croes]

 Dean wrote:
>
> BTW A weird alternative for iteration cleanup is for the collection to 
> E.send itself a message to cleanup from the iteration.  The loop is 
> guaranteed to have completed before the message arrives.  This would allow 
> external iteration with guaranteed eventual cleaup.
[This quote was edited by Zooko.]


Hm.  If you introduce a feature which relies on this sequential-ticks property,
you may preclude some possibilities for future parallelism.

Wait -- no, it is possible to constrain such parallelism so that it never
surprises programmers who are doing things like what Dean just suggested.

This added constraint might or might not eliminate some of the performance
gained by such parallelism, but since E values not surprising programmers so
much more than it values high performance, I think we can safely ignore this
issue while doing the current E design and implementation.



-------

Nevertheless, in case someone is interested in the subject, I'll explain my
thoughts about it.  (One reason to think about it now is that it interacts with
certain notions of deterministic computation that I know MarkM has been
thinking about, although hopefully we can safely put off having to commit to
anything.)


Someday high-powered E implementations might want to do transparent parallelism
to compute multiple E ticks at once.  This can be done without changing the
programmer's abstraction (i.e. without introducing any notion of
multithreading, and its attendant consistency problems, to the programmer).  In
fact my friend Felix Croes recently told me that he has implemented exactly
this kind of transparent parallelism for his own event-based dynamic
programming language in the DGD server[1].

Transparent parallelism for an event-based language requires the implementation
to track side effects so that it can automatically roll-back and then retry any
event which attempts to cause inconsistency.  If you are familiar with
traditional transaction systems you can think of this as getting a non-blocking
read lock on every data item you read, getting a non-blocking write lock on
every data item you update, keeping the locks until the end of the tick, and
then in the case that it can't non-blockingly acquire a lock it rolls back the
entire transaction (the entire event) and then retries that event.


Now in Dean's example of putting a "clean-up" function as a future event on
your event queue, this transparent parallelism might screw up what Dean is
trying to do.

Of course, if the "clean-up" function does not interact inconsistently with the
iterator (for example if the clean-up function is empty, or more interestingly
if the clean-up function examines some data and then exits without changing
anything in the case that the data is in a certain state), then the transparent
parallelism allows the implementation to compute the iterator and the clean-up
function in parallel.

But if the clean-up function *does* need to change something on which the
iterator depends or vice versa, then this technique guarantees that the two
functions will be "isolated", which means that either the end result will be
the same as if the iterator ran completely and then the clean-up function ran
completely *or* the end result will be the same as if the clean-up function ran
completely and then the iterator ran completely.

It does not guarantee that the iterator will go first, however, which is almost
certainly what the programmer expects.


Now when writing this letter I realized that it should be possible to constrain
the parallelism so that programmers are never surprised by it.  The rules would
be that if an event A spawns event B (in E, by sending a message), then in case
of a collision event A always wins.  Furthermore, if event A spawns event B and
then spawns event C, then in a collision between B and C, B always wins.

Hm.  Unfortunately in order to really be sure you won't surprise a programmer,
you also need these precedence rules to be transitive: if A spawns B then C,
and B spawns D and C spawns E, then in a collision between D and E, D has to
win.  (If you like graphs, imagine a tree in which events are nodes and
spawning a new event is creating a child node.  When you create a child node,
it appears to the right of any previous child nodes you created.  The
programmer's view of the event structure is the view which results from a
breadth-first left-to-right traversal -- i.e. a queue of events in which each
newly spawned event is immediately appended to the queue.  The implementation,
however, can compute any of the nodes in parallel as long as the result is
consistent with the programmer's view.)


Now with all of these added constraints, would there still be a performance win
to be had by transparent parallelism?  This strongly depends on how much added
speed you get out of parallel computation (note that the next big advance in
CPU architecture appears to be "Simultaneous Multi Threading" or "SMT", which I
understand only barely but which seems to reward parallelism better than
current CPU architectures do).  It also strongly depends on how often E
programs include separate events that need to perform mutually inconsistent
operations.

I would like to hear what Felix thinks about this.  Felix clearly believes that
there is a win in doing the side-effect tracking and the transparent
parallelism that he has already implemented, but perhaps his version is not
constrained by the transitive precedence rules, and perhaps the addition of
those constraints would dramatically reduce the available parallelism.


Finally, there are two further complications:

The first complication is that, in order to be sure that you don't surprise the
programmer, you have to track *all* side-effects, not just within the language
implementation but also things like file IO, network IO, user IO, and (very
important to E) remote invocations.  This requirement may be so onerous to an E
implementation that it outweighs the value of the whole transparent parallelism
idea!  For example, programmers will probably expect that if A spawns B then C,
and B and C both send messages to a remote object that the message from B
always arrives first.  This implies (keeping in mind the transitive precendence
requirement), that all outgoing messages be queued up until the entire event
queue up until the sending event has been computed.


The second complication is that MarkM has been toying with notions of
deterministic computation in order to prevent certain kinds of covert channels.
The requirements of that feature would surely interact heavily with transparent
parallelism.


Regards,

Zooko

[1]  DGD Home Page
     http://www.dworkin.nl/dgd/