Anonymous Unspoofable Cyclic Distributed Static Linking: Part 2 Mark S. Miller (markm@caplet.com)
Fri, 14 Jan 2000 12:46:05 -0800

                 Distributed Linking: ClassLoaders vs Anonymous Hashes

Before we get on to the cycle problem, a word about the obvious alternative to importing/linking using anonymous hashes. Java deserves a lot of credit, and a place of honor in language history, for the ClassLoader concept. This is the first (the only?) programming language module system, allowing both separate compilation and delayed loading, in which there is no global namespace of module names. In Java, each ClassLoader creates a separate namespace. Ultimate identity is always grounded in the anonymous object-reference (eq-ness) identity of a ClassLoader object. As explained in Vijay Saraswat's http://www.research.att.com/~vj/bug.html , two Java Classes are the same exactly iff they have the same fully-qualified-class-name and are loaded by the same ClassLoader. (Vijay's paper also explains that Java failed to follow through on the premise, thereby creating security holes. But in so arguing, he corroborates that this is a premise worth following through on.)

In a mobile code scenario, this allows a Bob's vat to create a separate ClassLoader for every source name-space from which code might be imported. This would mean at least a ClassLoader reflecting names received from Alice's vat and one reflecting Carol's vat. But it's worse than that. Alice and Carol might each have multiple ClassLoaders that have loaded pass-by-copy objects being passed to Bob. Bob would need to create a ClassLoader reflecting each of these foreign ClassLoaders. But it's worse than that! This multiplication of ClassLoaders by Bob may cause further multiplication elsewhere, in a positive feedback loop. For example, among Carol's many ClassLoaders may be one reflecting a ClassLoader from Alice. Does Bob need to reflect this in his own ClassLoader in addition to his direct reflection of the original ClassLoader on Alice? What a mess.

(Note: RMI must do something along these lines, but I'm not sure what. It may be that RMI has successfully avoided the above problem, as I've never heard anyone complain about such an explosive multiplication of ClassLoaders. If anyone knows what RMI does for this issue, please post an explanation. Thanks.)

These ClassLoaders cannot in fact multiply ad infinitem. The limit is one per module (in Java, "per class"). At that point, we would have dispensed with fully-qualified-class-names as a basis for linking, and are dealing only with anonymous ClassLoader/module identities. From there, it's a small step to getting rid of ClassLoaders and using instead anonymous module hashes.

                               (dynamic vs static) (linking vs loading)

Another property of ClassLoaders that differs from anonymous hashing is dynamic vs static linking. ClassLoaders do both dynamic linking and dynamic loading. By contrast, the anonymous-hash-based Repository explained earlier does static linking and dynamic loading.

Normally linking and loading are considered indivisibly. So what do I mean by linking vs loading? By linking I mean the determination of what concrete module a development-time module-name in a module-source import statement, or its equivalent, refers to. By loading, I mean obtaining the designated concrete module and causing its entry points to be usable. (A conventional static linker does both static linking and (causes) static loading. Conventional dynamic linking (as in Unix *.so and MSWindows *.dll files) does both dynamic linking and (enables) dynamic loading, with intolerable global namespace conflicts.) My apologies if I'm using these terms in ways that conflict their standard meaning.

In the absence of name-space confusions, dynamic linking can be valuable to shorten build times as developers go around their edit-debug cycle. Dynamic loading can be valuable both for development and production, as it lets the program start running earlier. However, I claim that dynamic linking is ill-conceived for production use. It's "flexibility" -- to allow module A to be linked with a newer version of module B than A's developer knew about -- is a bug, not a feature. Such version mismatch is a source of bugs. When combined (as do Unix & MSWindows) with a global namespace, and the possibility of importing a separately written module C that depends on different versions of B than does A, we have madness. At least ClassLoaders give us separate namespaces so we *can* avoid this madness, but they still provide the "flexibility" of allowing A to link against a different version of B than A's developer had in mind. In an object system, the way A's developer can achieve similar flexibility, but have it work, is to use normal object-oriented polymorphism. A can interact with objects that instantiate modules written after A was created, given that they implement subtypes of types A knows how to invoke. This flexibility, obtained through polymorphism, is in no conflict with static-only linking.

Also, dynamic linking doesn't go far enough for development use anyway -- it's insufficiently dynamic. What's really needed is what Smalltalk does, which might be called dynamic relinking, or (my term) "upgrade-for-prototyping". In dynamic linking, both of the conventional *.so/*.dll variety, or with ClassLoaders, the linking (determination) step is delayed until loading, but once done is irrevocable. The version of B being run is the one that was "current" as of this first use. What is normally far more valuable to the developer is to always be running the *current* version of B within the system he's developing/debugging in. Smalltalk goes even further -- upgrading previous instances of B to now be instances of the latest B. Only this lets the programmer avoid throwing away and recreating all instance state between each bug fix.

In this message, I'm not going to go further into upgrade-for-prototyping, except to say that it requires well thought out mechanisms beyond that required for dynamic linking (much of which E already has), that it's a development-time-only phenomenon (and so doesn't have to deal with name resolution issues across trust or machine boundaries), and finally that it removes the remaining rationale for dynamic one-time linking.

In Part 3: On to cycles. No really, I mean it this time.

         Cheers,
         --MarkM