Beowulf + E ?

Chip Morningstar chip@communities.com
Thu, 6 May 1999 14:49:03 -0700 (PDT)


Bill Frantz and I were chatting over lunch yesterday and it occurred to me to
wonder if you have investigated/looked into/heard of the community of folks
doing parallel computation with what they call Beowulf clustering (see, for
example, www.beowulf.org).

What they're doing is basically building ultra-cheap MIMD supercomputers by
strapping together hordes of Linux boxes using off-the-shelf interconnect
technology (e.g., 100Mbs Ethernet) and a growing collection of open source glue
software.

E is a platform for secure distributed computing. Our thinking (especially the
stuff you're now working on) has tended to emphasize SECURE DISTRIBUTED
computing. But one can also think of it as a platform for secure DISTRIBUTED
COMPUTING.

On its face, I would say that there isn't a lot of immediate compatibility
betwen the Beowulf vision of the world and ours: they tend to be efficiency
weenies who try to squeeze the last ounce of performance out of their machines
by doing things like hand-crafting special hot-rodded Ethernet drivers, whereas
we are using a VM with multiple layers of language interpretation. They also
look upon having complete control over their intrasystem interconnect as a way
to avoid having to consider security issues at all (unlike, say, some of these
Internet distributed DES cracking projects), whereas for us the security issues
are *the* defining constraint in distributed system design.

However, I think this movement heralds a change that may be advantageous to us.

A one or two order of magnitude drop in the cost of supercomputers means that
the tradeof between, on the one hand, the return on investment in efforts to
coax more performance out of the system vs., on the other hand, the return on
investment in development of the actual distributed supercomputer applications
themselves shifts increasingly in favor of the latter. In other words, as it
becomes more cost effective to deal with performance by simply throwing more
(relatively cheap) hardware at the problem, application development starts to
dominate the overall cost structure.

This is the same shift we have seen in the past thirty years in the transition
from mainframes to personal computers. It is the shift that caused it to make
economic sense to move from assembly language to FORTRAN to C to things like
Perl and Java. (Even though we now treat C pretty much as if it *is* assembly
language, I can recall a time when the "can this application afford the
overhead of a compiled language" debate was very real. It wasn't that long
ago.)

Beowulf clustering is making it more cost effective for small and medium sized
engineering schools to teach distributed programming. And it has the potential
for making it more likely for there to be more projects which want to hire the
kids who took those courses. But as the great unwashed masses of second and
third string programmers (i.e., 95% of programmers) move into distributed
programming, the lower the level of technical sophistication we can expect of
them in being able to deal with the subtleties and idiosyncracies of
distributed programming. The same general concept applies to all programming,
as evidenced by the fact that there's a big market for Visual Basic, but up
until now the distributed supercomputing world has remained the turf of a
relative elite owing to the substantial hardware costs.

My experience is that doing distributed systems in the E paradigm is *way*
easier than doing it the hard old way (i.e., by hand). (And I am deeply
unconvinced by certain programmers' macho protests that *they* don't find
dealing with threads and locks and distributed consistency that hard. I heard
essentially the same protests a few years back from the assembly-language
diehards.)

So this suggest to me two things:

First, it may be worthwhile looking into what it would take to turn E into a
tool for programming a Beowulf-type system (I think it mainly means coming up
with a good object migration story to enable load balancing, but I haven't
thought about the problem deeply yet).

Second, I'm thinking about the model presented in _Crossing The Chasm_. Perhaps
we should think about more consciously positioning E as the Perl or Visual
Basic of distributed systems: its job is not to run fast, its job is to enable
unsophisticated programmers of average competence to quickly and fairly easily
whip together boring but valuable small-to-medium sized (distributed)
applications. Being the hardcore hacker early adopter types that we are, we
*like* to think of what we are doing here as edge-of-the-art stuff, but it may
be more advantageous to posture as the second wave of technology, making this
class of systems accessible to the mainstream.