Anonymous Unspoofable Cyclic Distributed Static Linking: Part 1

Mark S. Miller markm@caplet.com
Thu, 13 Jan 2000 11:06:12 -0800


Secure mobile code presents many interesting problems.  Among them is the 
issue of "distributed linking".  How does module "a" import module "b"?  How 
is this import resolved?   In what namespace?  Can an attacker preload his 
own "b" into some target vat's namespace, such that "a" imports that version 
rather than the version he desired?  Can we protect "a" against such 
attacks, but still benefit from distributed replicated caching of code that 
someone might import?

It turns out that the design of the old Electric Communities (aka 
Communities.com) "Repository System", created mostly by Monica Anderson 
(cc'ed on this message) can solve most of these problems, even though it was 
built to solve a "different" problem.  Actually, I just never noticed before 
that the problem the Repository set out to solve is the general case of the 
mobile code distributed-linking problem.

                                               Repository Recap

The Repository was built to solve (and, in the beta of EC Habitats, solved) 
the problem of distributed unspoofable caching of acyclicly-linked large 
data blobs.  Engineering observation: cryptographic hashes don't collide.  
Therefore, if you store and retrieve data by its hash, you need never worry 
about collision management.  In other words, data blobs and hashes of those 
blobs are in one-to-one correspondence.  (Of course, Shannon would have a 
fit, but, as Sussman and Abelson might say, this is the difference between 
engineering and science.)  Therefore, a hash can be used as a compact 
designator of any data blob.  The data can be looked up from any source, 
whether trusted or not, as long as the recipient of the data re-hashes it to 
ensure that it is indeed the data designated by the hash.

In Original-E, on which EC Habitats was built, capabilities & symbolic 
traffic were transmitted over the equivalent of today's Pluribus protocol.  
However, when Alice wanted to send a large bitmap to Bob, she did not 
transmit the bitmap over Pluribus.  Rather, she "published" the bitmap to 
her Repository, and sent to Bob over Pluribus a designator of the bitmap: 
the hash of the bitmap + a "location hint" pointing at her Repository as a 
plausible place to obtain the bitmap.

Bob, on receiving this bitmap designator, asked his own Repository for the 
corresponding data blob, and got back a promise for the answer.  If his 
Repository already had an entry for this hash, this promise would be 
fulfilled promptly.  Otherwise, his Repository would contact at least 
Alice's Repository to obtain an alleged answer.  On receiving data from 
Alice's Repository, Bob's Repository would re-hash to ensure it was correct, 
and if so, both cache it locally and fulfill the promise handed to Bob.  
Bob's Repository is part of Bob's TCB, but Alice's Repository is not.  
Therefore Bob's Repository had to rehash the data obtained from Alice's 
Repository, but Bob doesn't need to re-hash data obtained from his own 
Repository.

Advantages:

* Large data fetches are kept out-of-line, and so can proceed in parallel 
with further symbolic traffic.  In practice, this is a big deal.  One might 
eventually want to prioritize the use of bandwidth, so that the Pluribus 
traffic can be higher priority than the data traffic.  (MarcS: Notice 
possible applicability to Edesk.)

* Data can be cached in a replicated manner by untrusted caching services, 
since integrity doesn't depend on the source.

* Unlike DNS or a zillion other schemes, there is no namespace or assigned 
number space that requires global coordination.  Hashes have the same 
anonymous first-class designation-with-integrity nature for data that 
capabilities/object-references do for objects.

Security Issues:

* Since Bob's cache will re-fetch from Alice's cache on a cache miss, this 
gives more opportunity for computation in Bob's vat to wall-bang information 
out of Bob's vat.  Since E's security premises include giving up on 
preventing wall-banging anyway, this isn't a big deal.  However, we must be 
sure not to lose our ability to do deterministic replay (ie, E must continue 
to have loggable non-determinism).

* What crypto should be used to transfer the data between Repositories?  
Curiously, we need encryption for privacy, but the protocol doesn't need to 
protect our integrity, since the re-hashing on reception is all the 
integrity we might need.  Why do we need privacy?  With privacy, and given 
that crypto-hashes are also adequately unguessable (is this generally 
true?), we can take the stance than knowledge of the hash of a published 
blob, plus access to any Repository, is authorization to obtain the blob.  
Why "plus access to any Repository"?  Confined computation will normally not 
have access to any Repository, just as it normally won't have access to any 
Introducer http://www.erights.org/elib/capability/dist-confine.html 
http://www.erights.org/elang/concurrency/introducer.html .  This potentially 
alleviates the above wall-banging issue as well.


                                                  Acyclic Linking

Of course, besides pure data per se, the data of a blob can include such 
hash-based designators of other blobs.  The Repository required no mechanism 
to support this -- just as it was up to the client to interpret data as a 
bitmap, it was up to the client to interpret nested data designators and 
perform further fetches from the Repository.

Of course, during development the source form of each such data blob would 
usually need to specify not a particular edition of some other data blob, 
but (implicitly) the "current version".  In building blobs from 
blob-sources, such development-time inter-blob names (in some 
development-time namespace) gets turned into namespace-free hash-based 
inter-blob designators.  (Lightbulb) This can be seen as a build-time static 
linking step.  Unfortunately, this solution works only for acyclic blob 
structures, which was fine for all the data we dealt with then, but doesn't 
work at all for code.  (The EC Habitats beta didn't do mobile code.)


On to Cycles in Part 2.


         Cheers,
         --MarkM