Static Types vs Dynamic Capabilities (was: a few thoughts)
Mark S. Miller
markm@caplet.com
Sun, 16 Jul 2000 21:14:11 -0700
Note: Since this is a capability-language-oriented discussion, rather than
an OS-oriented discussion, I'm cc'ing the e-lang list. For reasons that
will be clear, I'm also cc'ing Jonathan Rees.
At 10:57 AM 7/16/00 , Jonathan S. Shapiro wrote:
>In any case, my objection was not to downcasting inherently. It was to the
>fact that downcasting is implicitly a feature of all interfaces. I have no
>objection to a keyword on the interface that says "downcast from interface
>to class is permitted". My objection is that there are places where for
>security reasons such a downcast should *not* be permitted because the type
>thinning represented by the interface must be strongly enforced. Therefore,
>there needs to be some form of interface-like abstraction in which the
>downcast is disallowed. In Java, this requires the introduction of proxy
>objects. Proxy objects in turn break representation identity.
As Jonathan & I have discussed on the phone, Jonathan's objection comes from
a perspective that may well yield a coherent language with capability-like
security and that might look like Java, but such a language would not have a
Java-like semantics, and it would also not have a semantics like other
capability languages (Joule http://www.agorics.com/joule.html , E
http://www.erights.org , and W7 -- the Scheme of Rees's thesis
http://mumble.net/jar/pubs/secureos/ ). The crucial difference between
Jonathan's language perspective and these other systems is whether static or
dynamic properties are held to be primary. On this issue, Jonathan has the
static-oriented perspective, and the above systems embody the dynamic
perspective. Curiously, EROS & KeyKOS faithfully implement the purely
dynamic perspective (though I believe Jonathan disagrees with this claim,
and possibly with my construal of our perspective difference).
Purely Dynamic Capability Languages
Joule, E, & W7 have no static type system, so only dynamic properties of
these languages can produce their security. A capability corresponds to an
object/closure reference. An object/closure combines state and behavior.
* We can think of the "state" as being the object's instance variables, or
the set of variable locations used freely by the lambda expression that
evaluated to this closure (the variable locations "closed over"). This
state corresponds to the domain identity encoded into an EROS/KeyKOS start
key.
* We can think of the "behavior" as the vtable-pointer (thinking object
implementation) or code compiled from the lambda expression. This
corresponds to the KeyKOS "data byte" or the EROS ?? (sorry, I forgot what
it's called again).
In any of these implementation domains, when multiple capabilities designate
the same state but associate this state with different behavior, we refer to
the set of capabilities jointly as a "composite", and each of them
individually as "forseen facets" of this composite. The important thing to
keep in mind is that the language notion of "object" does *not* correspond
to a domain, but to a forseen facet of a domain.
(See http://www.erights.org/elib/capability/ode/ode-objects.html and
http://www.erights.org/elib/capability/ode/ode-capabilities.html for a more
complete explanation of this mapping between language concepts and
capability concepts. The "forseen" terminology was introduced at
http://www.eros-os.org/~majordomo/e-lang/1514.html and elaborated some at
http://www.eros-os.org/~majordomo/e-lang/1526.html .)
Thinning, Take 1
The "thinning" Jonathan refers to is a forseen less powerful facet
obtained by request to a more powerful facet. The thinner facet's behavior
is to respond successfully to a subset of the messages the thicker facet
responds to -- the thicker facet's type is a subtype of the thinner facet's
type, ie, the set of objects that implement the thicker type are a
subset of the set of objects that implement the thinner type.
For those messages it does respond to successfully, it does the same thing
the thicker facet would have done -- the message has the same meaning as it
would have were it sent to the thicker facet instead. When this isn't true,
we still have facets, and one might still be a subtype of the other, but it
doesn't fit our definition of the "thinning" pattern. As we'll see, this
precise definition is important to our disagreement, because only in the
thinning pattern may an implementation use the thick vs thin difference
*only* in deciding whether to reject a message.
Java
Despite Norm's unfortunate statements at
http://www.mediacity.com/~norm/CapTheory/CapBits.html , Java is not a
capability language. It does not have the security properties required of
capability security. In particular, mutable static variables provide an
unconditional communications pathway not subject to capability discipline.
However, Norm knows all this as well as I do. (Norm, please correct this
page!) What I think he's getting at is that Java is a great start on the
design of a capability language. Any secure system, language or not and
capability or not, must provide perimeter security
http://www.erights.org/elib/capability/delegations.html -- keeping the bad
guys on the outside of some encapsulation boundary. For an object language,
the natural boundary is the object. In Java terminology -- the difference
between "public" and "private". Java's unforgeable pointers and static
verification provides this beautifully.
In addition, capability security requires a protection-crossing invocation
mechanism with the properties of the Granovetter Operator (the ability to
convey capabilities in messages, etc) and it requires scoping rules
consistent with capability rules -- such as strict lexical scoping. (As
Jonathan Rees says (paraphrasing): Names in the lambda calculus serve the
same function as clist indexes in capability OSes.) Java has all these,
even though Java's designers were not trying to create a capability
platform. It got most of the way there just by addressing the applet
perimeter security problem, combined with reasonable taste in modern lexical
object-oriented safe language design. (Indeed, Scheme and Smalltalk got
even closer just on good taste, without thinking about security at all.)
Java, Static Types, and Security
The one security-relevant feature of Java absent from Joule, E, or
W7 is a static type system. How might this static type system benefit
Java's security (or the security of a capability-secure variant of Java)?
Arguably, it allows the verifier to do more of the
perimeter-security-enforcing calculations statically, enabling the language
implementation to do fewer of them at dynamically, and so achieves protection
with greater efficiency. How else might Java's security benefit from static
types? In a variety of minor ways, all of which but one have dynamic
equivalents that are just as good (and, I would claim, clearer to
understand). The one exception? Java's package scope enables the "Sibling
Communications" style of Rights Amplification (
http://www.mediacity.com/~norm/CapTheory/Synergy.html and
http://www.gwydiondylan.org/drm/drm_26.htm ) to be used with no additional
runtime cost. By contrast, Joule, E, and W7 do rights amplification with
sealer/unsealer pairs, which (at least naively) entails a bit of runtime
computation. (Either primitive is easily built out of the other. The only
expressiveness issue in choosing between these two styles is that
sealer/unsealer pairs provide finer-grained control of authority.)
How should we regard a static type system? The dynamic-first perspective
(as I hereby define it ;) ) holds that the "real" semantics of the program
is what happens at runtime. In any program accepted by the static type
checker, were one to remove the variable type declarations, it would still
mean the same thing. The variable type declarations are merely a notation
for stating assertions that the programmer believes must be provably true
for all possible executions of all possible instantiations of the program.
Since these assertions are supposed to be provably true for all possible
instantiations, their proof can and should happen prior to any actual
instantiation. The programmer typically also depends on the development
system to prevent instantiation of any program for which a proof of these
assertions could not be generated.
This is a good first approximation to the meaning of type declarations in
Java, but with some exceptions (feel free to skip):
* Support for Sibling Communication via package scope. Were this done
purely on the runtime package membership of invoker and invokee, as opposed
to the static type of the variable with which invoker designates an invokee,
a Confused Deputy problem could result
http://www.mediacity.com/~norm/CapTheory/ConfusedDeputyM.html . (Java's
reflective invocation mechanism does exactly do only the runtime check, and
so suffers from Confused Deputy.) The sealer/unsealer alternative rights
amplification style does not have this danger.
* Overload resolution. Java methods are overloaded on the static type of
the invoking argument expressions, rather than the dynamic type of the
argument values. Because Java has subtyping, these can give different
answers. However, were we to replace the message name at the call site with
the signature resulting from the static overload resolution, static types
would have no further effect on the runtime semantics of invocation.
* What else? Java's a complex language, so I'm sure there are a few more
things like this I haven't thought of. But I'll speculate that the others
are no more fundamental than the above two issues. At least for a
hypothetical capability-secure Java-like language.
Thinning, Take 2: The Seduction
The only difference between a thick facet and a derived thin facet is that
the thin facet allows only a subset of the operations allowed by the thick
facet. And for this subset, the operations have the same meaning. In
describing sensible thinning patterns, especially of forseen thinnings, the
thin interface should be an explicit named type, and the thick interface an
explicit named subtype. In mapping all this to Java, it is perfectly
sensible to map each thin supertype as a separate Interface type, and have
the thick type 'extend' or 'implement' all the types for which it forsees a
thinning. All good so far.
The seduction comes in our interpretation of
Thick p1 = ...;
Thin p2 = p1;
Is p2 a thinning of p1? Certainly Jonathan & I agree that, in current Java
it is not. p2 provides all the authority of p1 because the programmer may
at any later time do:
Thick p3 = (Thick)p2;
Our difference is that Jonathan views this property as a bug, whereas I view
it as a feature. p2 seems seductively like a thinning of p1 because, in the
absence of the use of the cast operation, the program is indeed prevented
from recovering from p2 the authority to invoke the operations described by
Think but not Thin. However, from my perspective, p1, p2, p3 all have the
same dynamic meaning -- they are all pointers to the same object -- and so
they are, and should be, security equivalent. "p1 == p2" should succeed.
The difference in static types of these variables is merely an artifact of
trying to reason statically about all possible instantiations/executions of
this code. p2 represents less static information than does p1. A cast is
effectively an if:
if (p2 instanceof Thick) {
Thick p3 = p2;
} else {
throw ...
}
Before the 'if', we only know (or we can only prove to the type checker)
that p2 is a kind of Thin. The 'if' effectively allows us to recover static
information from dynamic. Statically, we don't know which branch of the
'if' will be taken, but we know that we'll be in the then-branch iff p2 is a
kind of Thick. As Norm points out, the situation isn't very different from
if (d != 0.0) {
... n / d ...
} else {
...
}
In the code before the test, let's say that d may be any floating point
value. Within the then-branch of the if, d may be any non-zero value. Note
that these are static statements. Dynamically, in any execution d may only
have one value. The 'if' reasons dynamically about the actual value of d.
In describing the meaning of the if, we reason statically about the possible
values of d. We could describe our greater knowledge of d's values within
the then-branch as a subrange type if we wish. This subrange type would be
a subtype of the type of d outside the branch.
Of course, I'm not advocating that anyone define a type system to do this.
And I'm certainly not advocating that one design a language that prohibits
division in the absence of a static proof that the denominator is not zero!
But that's part of the point. The programmer will always have more static
knowledge of properties of their program than are captured in any type
system. Sure, you could add parameterized types to Java. But what about an
array in which we store alternating keys and values -- a common
implementation technique?
No type system I'm aware of lets us say that the even element contain
key-typed objects and the odd contain value-typed objects. Nor should
anyone design such a type system. But in the absence of type systems that
can carry out any static proof the programmer can, the type system will
always lose knowledge the programmer did not lose. The cast in Java is
1) a way for the programmer to re-introduce static knowledge that he
couldn't get the type checker to carry for him, and
2) an 'if' as above to abort the execution path, via a thrown exception, if
the programmer was wrong.
This is all useful merely as a conversation between the programmer and the
type checker helping each other to reason statically about the program. The
knowledge loss from p1 to p2 is very different than the authority loss that
a thinning operation needs to provide. The thinning operation must effect
what's possible at runtime, rather than merely reason about it.
Stone Casting (from Communities.com)
In Original-E http://www.communities.com/products/tools/e/e_white_paper.html
, an Java variant attempting to fix Java's security problems, we also felt,
like Jonathan, that thinning was a useful authority reduction pattern, and
that Java interface types & interface type inheritance was a useful way to
express the thinnings you want to be possible. However, we made thinning
into an operation distinct from static type loss -- "stone casting" as in
"cast in stone". With the proxies of Java 1.3
http://java.sun.com/j2se/1.3/docs/guide/reflection/proxy.html , we can now
express this in Java itself without building a new language. Contrast the
above
Thin p2 = p1;
with
Thin p2 = (Thin)StoneCast.new(p1, Thin.class);
This latter returns a new object that implements the Thin interface but not
Thick, and forward any This message on to the object it wraps -- the object
designated by p1. An attempt to recover the authority absent from p2:
Thick p3 = (Thick)p2;
will fail, as p2 is not a kind of Thick. Jonathan knows about the Stone
Casting alternative, writing:
>Therefore,
>there needs to be some form of interface-like abstraction in which the
>downcast is disallowed. In Java, this requires the introduction of proxy
>objects. Proxy objects in turn break representation identity.
It indeed has an effect on object identity. Whereas previously "p1 == p2"
would and should succeed, since they dynamically designate the same object,
in the stone-casting case "p1 == p2" would fail, since they designate
different objects. I believe this is the only correct answer. Whether two
objects are forseen facets on the same state, or whether they share state
through other means is their business, not the business of their clients.
Cast in Steele
Why is this the only stance in the spirit of Java's semantics? To anyone
who still feels confused about the relationship of static and dynamic types
in Java, I highly recommend Chapter 4 of Guy Steele's "The Java Language
Specification". It's first few pages make clear that the type of a variable
is simply a static constraint on the possible values the variable may have,
and that a variable of type T may hold a (reference to) an object of a
subtype of T. The type of the variable has no effect on the object it
designates.
I know Jonathan knows all this. He's just proposing a Java variant that
doesn't work according to this description. I simply want to say that this
wouldn't be a minor semantic tweak -- it would be an earthquake -- even if
only three lines of code need to be changed in the implementation and three
lines of text in the tutorial. But the proof of the pudding and all that.
Should there come to be a serious secure language proposal along
the lines Jonathan suggests, I'm sure we'd all learn a great deal. In the
meantime, I will continue to approach the relationship between capabilities
and types in the traditional way -- according to the dynamic perspective.
Cheers,
--MarkM