Default database / namespace
Karp, Alan
alan_karp@hp.com
Fri, 7 Jul 2000 16:26:52 -0700
I believe that most distributed systems have given naming short shrift. The
discussion in this thread illustrates the importance of getting it right.
I've just browsed through the mail archive to catch up on the discussion,
and I hope I understand what people are saying. I'd like to explain how we
solved these problems when developing e-speak. I've seen pieces of our
approach in the discussion, but we managed to produce an integreated whole
(an integrated hole if you listen to Scott McNealy).
E-speak took a very different approach to naming. I'll describe e-speak
Alpha 1.0 as the purest implementation of these ideas. Many of them
survived to Beta 2.2, but only some of it appears in Beta 3.0. These
changes occurred as e-speak was retargeted to be more acceptable as a
platform for B2B interactions. Buy me a beer, and I'll tell the whole
story.
E-speak is an integrated system; everything depends on everything else.
This fact makes it difficult to describe the system without forward
references. I put together a tutorial that simply lies early on and
introduces the concepts piecemeal. I don't think that's necessary for this
audience. If you have a problem, just read what follows twice. I've
capitalized the words that have special meaning in e-speak.
In e-speak everything is a Resource with a Repository Entry in the e-speak
Repository. The only way to access a Resource is via the e-speak Engine,
and the only way to refer to a Resource is by Name. Each request to the
Engine is interpreted in the context of a Protection Domain. The Protection
Domain contains a number of elements, one of them being a root Name Space.
In e-speak, a Name Space consists of an ordered list of Name Frames or Name
Spaces. Each Name Frame contains a list of bindings that associate a string
with a Mapping Object. (OK. I lied. The Mapping Object is not a
Resource.) The Mapping Object contains zero or more Repository Handles and
an optional Search Request. The Search Request is used for discovering
Resources, so I won't discuss it further, but it's one way to populate a
Name Frame. In addition, I'll assume the Mapping Object only contains a
single Repository Handle, what we call a Simple Binding. The Repository,
Name Spaces, and Name Frames are kept in the address space of the Engine, so
there is no way to tamper with them.
When a Client wants to use a Resource, it constructs a message consisting of
two parts. The payload contains the application API; the envelope contains
e-speak related information. Among the information in the envelope is a
list of Name Fields. Each Name Field consists of an e-speak Name and a
Label. In general, each Label corresponds to a resource reference in the
payload. When this message is sent to the Engine, the Engine resolves the
Name to find the corresponding Repository Handle. Thus, if there is no
Mapping Object containing a particular Repository Handle, there is no way
for the Client to reference the corresponding Resource. In this sense, an
e-speak Name is a capability representing the right to name the Resource
with a Repository Handle in the corresponding Mapping Object.
Each Repository Entry designates a Handler to which the message is to be
delivered. On delivering the message, the Engine puts a name binding for
each Name Field into a Name Frame associated with the message. Each binding
consists of the Label bound to a copy of the Mapping Object. We could have
used the string used by the sender, but we wanted to be able to have things
named "the_boss_is_dumb" without getting into trouble. More seriously,
names can leak information, as in "resume.txt". We can also do things like
have the string be /u/ahk/foo and the corresponding Label be d:\u\ahk\foo.
The upshot is that by default each client has a separate Name Space. Two
Clients can have the same Name for different Resources or different Names
for the same one. Bindings for Name Spaces and Name Frames can be shared,
so that two clients can use the same set of name bindings by referring to
the same Name Frame or Name Space. Anyone can change a Name at any time by
changing the string in a Name Frame and not affect anyone else's use of the
same Resource unless that Client is using the same Name Frame to reference
the Resource. (I was going to title a paper on this subject "No More 404".)
Names are handled by pairwise translation - Client to Engine, Engine to
Client. The model works just as well when the Clients are using different
Engines, each with its own Repository.
Some people have had trouble understanding how this all works. In fact, one
of the early developers who had been on the project about a month said to me
one day, "There are no global names in e-speak. You can't build a
distributed system without global names." It took him another week or so to
convince himself that you could and that we had.
What's all this got to do with the discussion. Well, it means that the
system designer doesn't have to worry about how names are structured. Each
client can make his own decision. One can put all its bindings into a
single frame; another can use a complex hierarchy. If you want everyone to
share a set of names, put a binding for a particular Name Frame in
everyone's root Name Frame. Thus, e-speak naming covers the cases discussed
in the note, as well as addressing other problems such as name changes, name
conflicts, and hiding the semantic content of names.
_________________________
Alan Karp
Decision Technology Department
Hewlett-Packard Laboratories MS 1U-2
1501 Page Mill Road
Palo Alto, CA 94304
(650) 857-3967, fax (650) 857-6278
> -----Original Message-----
> From: Bill Frantz [mailto:frantz@communities.com]
> Sent: Thursday, July 06, 2000 6:48 PM
> To: Jonathan S. Shapiro; eros-arch@eros-os.org; Kragen Sitaker
> Subject: Re: Default database / namespace
>
>
> Let me start by describing the KeyKOS directory structure.
> As Jonathan
> notes, the directory objects associated a names with
> capabilities. It did
> not give users access to capabilities they didn't already
> have, it merely
> allowed them to use a human comprehensible string, rather
> than a number, to
> name them. It should also be noted that this directory
> structure was not
> hierarchical, and directories did not have a .. entry.
>
> Each user had a unique "home" directory. By default, one of
> the entries in
> that directory, pub/, was a key to the (shared, unmodifiable)
> directory of
> publicly available factories.
>
> At 09:50 PM 7/3/00 -0400, Jonathan S. Shapiro wrote:
> >2. A single global namespace is an unmitigated disaster and should be
> >avoided at all costs, because it is an unclosable security
> hole for reasons
> >discussed in the previous messages on this topic.
> >
> >I should acknowledge that Paul Karger does not agree with me
> about (2). He
> >argues, and I think his argument needs to be understood and carefully
> >examined, that a unified namespace is essential to providing (human)
> >auditability so that one can determine what has been done in
> a system and
> >also so that one can find all nameable references to a
> sensitive object.
> >
> >In a persistent system, I am not convinced that locating all nameable
> >references has the same utility as in a non-persistent
> system, as there are
> >likely to be references from persistent processes that are
> very long lived.
> >This said, the issue of auditability is an important one.
> Paul's argument, I
> >think, comes in the context of multilevel secure systems,
> but the audit
> >issue is a general issue.
>
> For general auditing, all you need is names, and the kernel's
> representation of capabilities serve these purposes well.
> There is one
> other suggestion which needs access to the raw representation of the
> capabilities, which is garbage collection.
>
> Audit is somewhat less scary because it does not need access to the
> capabilities as capabilities, it only needs the
> representations in a static
> snap-shot of the system. It can use the snap-shot to follow
> the capability
> links.
>
> In order to make the audit report comprehensible to a human,
> it needs to be
> able to translate certain capability representations into strings, the
> reverse of the directory object function. Providing this
> reverse lookup
> operation over a set of directory objects is certainly no more
> objectionable than providing the representation of all the
> capabilities in
> the system. This set of directories would include the
> subjects, and the
> objects being audited.
>
> >In rejecting a global namespace, I want to be clear that I
> do not reject
> >shared namespaces in general. If you and I wish to share a
> common directory,
> >one of us can build the directory and ship a capability to
> it to the other
> >*provided* that we are authorized to communicate. This is a
> fine design
> >pattern. However, there is absolutely no reason why your
> name for that
> >directory and mine need to agree (though this raises a
> secondary design
> >issue of "environment variables").
>
> It also brings up issues of human communication. In the
> KeyKOS world, we
> developed the habit of having everyone use the same name for shared
> directories.
>
>