[e-lang] 'for' loop security concerns

Richard Uhtenwoldt e-lang@mail.eros-os.org
Sun Mar 14 22:38:17 EST 2004


Although other commitments have kept me away from e-lang, I continue
to admire and feel warmly towards Mark M, Marc S, Shap, Dean T and the
others here pursuing the agoric-computing, smart-contracts vision.

On the general subject of iterating over a collection, back in Nov,
Mark M expressed interest in a paper by Oleg K, namely,

http://okmij.org/ftp/papers/LL3-collections-enumerators.txt

He asked the list for a translation of that paper's section six,
entitled "How to invert an enumerator in a language without
first-class continuations", which uses Haskell, which Mark has not
picked up yet.

I'm only now seeing Mark's request.  I read Oleg's paper to see if I
could help out, then I penned this post.  

I thought I would give a summary of Oleg's paper in my own words,
rather than just provide a translation of just section six because
enough time has gone by that Mark would have to reread much of Oleg's 
paper anyway to make sense of my translation, so he might as well
read my summary instead --and because I think I can make the main
points in a much crisper way than the paper does.

the following summary does not in fact include a translation of all
the material in section six, but I invite people to ask me to
elaborate on any points in my humble summary, and that elaboration
might well entail my translating the rest of section 6.

the summary is 175 lines long.

Disclaimers.  

I did not bother to learn about "generators such as the ones in
Icon, Ruby and Python" but I doubt this is essential to understanding
Oleg's thesis.

I do not have enough experience to evaluate all of Oleg's claims that

>such an interface is superior: in efficiency; in ease of
>programming; in more predictable resource utilization and avoidance of
>resource leaks.

* * *

Let us let Oleg start us off, with a quote from his abstract:

>A programming language system gives us typically one of the
>two interfaces to systematically access elements of a collection. One
>traversal API is based on enumerators -- e.g., for-each, map, filter
>higher-order procedures -- of which the most general is fold. The
>second approach relies on streams, a.k.a. cursors, lazy
>lists. 

E's iterate method is an example of the former.

The essential difference between the two types of interfaces, BTW, is
where the loop is.  In the former, "fold type" interface, the loop is
in the definition of the "fold like" method or construct.  in the
latter, "stream type" interface, the loop is on the other side of the
interface, namely, in the code that invokes the "stream like"
method(s).  Anyway, back to Oleg.

Oleg points out that for-each, map, filter, E's iterate, E's for loop,
streams, cursors, lazy lists, etc, are all methods for
iterating over a collection.

Wouldn't it be great, O asks, if there was one method to rule them
all?

Well, there is.  let us call it "superfold" for reasons that will
become apparent by the end of this post.  (What we will call
"superfold" O refers to in section six as "a function of the type
CFoldLeft'".)

superfold "rules them all" in the sense that whenever the programmer
wants to define a new kind of collection object, the programmer need
only define a superfold method for that object (class).  Then the
superfold method can serve as a primitive "on top of" (or "in terms
of") which any other construct for iterating over that object can be
defined --in the same way that E's for loop is defined "on top of" E's
iterate method.

To say it in O's words:

>This article presents a design of the overall optimal collection
>traversal interface, which is based on a left-fold-like combinator
>with premature termination. We argue from the practical standpoint
>that such an interface is superior...

* * *

Now, if your language has call/cc, it turns out that --modulo
providing for premature termination of the iteration-- for the
superfold method it suffices to use an ordinary "left fold".  Left and
right folds are common in functional programming and to explain a left
fold to you guy it probably suffices to give one example of the use
of a left fold.

The name (foldl) I use for the left fold and the order of its three
arguments comes straight from Haskell, the home of right and left
folds, but I have translated into Scheme notation because I figure
Scheme notation is more widely known than Haskell.

So, for example, the following use of foldl

    (foldl + 0 (list a b c)) 

is just another way of writing

     (+ (+ (+ 0 a) b) c)

A more E-like way to write last use of foldl might be

     [a,b,c].foldl(add, 0)

--where add denotes the function that takes 2 arguments and adds them
together.

foldl is not that different from E's iterate method.  The analog
of the above using iterate would be this next (untested because I 
do not yet have E installed):

    var accum := 0
    def assocFunc (key, _) { accum += key }
    [a,b,c].iterate(assocFunc)
    return accum

--the big difference being the use of a mutable piece of state (named
accum) whereas foldl in particular and O's paper in general are very
much in the pure-functions paradigm.  The arguments (and return
values) named "seed" in O's paper are needed because of the desire to
stay in the pure-functions paradigm.

* * *

If your language has _no_ call/cc, then the superfold method gets a
little more complicated: namely, it gets "non-recursive".  Namely, it
gets an additional argument, which O calls "self", which is intended
to hold the recursive version of the function itself.

This sound a little hairy but it is a standard trick in lambda
calculus and function programming that should be in most computer
scientist's bag of tricks.

I'll demonstrate by defining a superfold method for the Scheme list
--a common enough collection object.  Then I'll define the corresponding
foldl method on top of that just-defined superfold method.  Often the Y
combinator is employed to do that, but Darius pointed out that my code
would be much shorter without Y.  (Thanks to Darius for commenting on
the first draft of this post!)

    (define (superfold self op zero collection)
      (cond
        ((null? collection) zero)
        (#t (op (self op zero (cdr collection)) (car collection))) ))
    
    (define (foldl op zero collection) 
      (superfold foldl op zero collection))

And so to test the above definition, we observe that if you evaluate
    
    (foldl + 0 '(1 2 3))

--the result is 6.  Ta da.

Now it sort of looks to me like you can define a nonrecursive version
of iterate (which we will calll of course superiterate) and use _that_
instead of superfold as your "method to rule them all", provided that
"users" of superiterate can allocate pieces of state (using E's var
construct) before calling superiterate and can in the allocFunc mutate
that state.

In other words, my initial impression is that the extra arguments and
extra generality of superfold relative to superiterate is needed only
if you eschew side effects.  let me know if you want me to delve here.

Finally, allow me to voice one opinion.  O writes

>For concreteness, this section demonstrates our
>approach for one particular language without first-class
>continuations: Haskell. Applications to other languages are
>straightforward.

I am not convinced that it is "straightforward" because I can imagine
that for many languages _efficiently_ running code that uses a
function (self) as argument to a function often (eg, in the expansion
of every for loop) would prove quite tricky to arrange for.

But then I know almost nothing about how it is planned to make E
run efficiently (if it is not sufficiently efficient already, that is)

* * *

In summary, the point of the first half of O's section six is that you
can define an ordinary left fold (and consequently any enumerator) on
top of the superfold method.  I just illustrated how, in Scheme.

The second half says that you can define anthing like streams,
cursors, lazy lists, on top of the superfold method.

I would be happy to demonstrate; it looks like an interesting exercise
in lambda calculus.  The key seems to be that the argument that O
calls "self" which is "supposed to" get (the recursive version of) the
function itself can be given something entirely different .

But you guys have to decide if it is relevant to the E project.  I do
not know enough about the E project to say.

-- 
Text by me above is hereby placed in the public domain

Richard Uhtenwoldt

A computer is just a backup tape's way of making another, updated backup tape



More information about the e-lang mailing list