Static Types vs Dynamic Capabilities (was: a few thoughts)

Mark S. Miller markm@caplet.com
Sun, 16 Jul 2000 21:14:11 -0700


Note: Since this is a capability-language-oriented discussion, rather than 
an OS-oriented discussion, I'm cc'ing the e-lang list.  For reasons that 
will be clear, I'm also cc'ing Jonathan Rees.

At 10:57 AM 7/16/00 , Jonathan S. Shapiro wrote:
>In any case, my objection was not to downcasting inherently. It was to the
>fact that downcasting is implicitly a feature of all interfaces. I have no
>objection to a keyword on the interface that says "downcast from interface
>to class is permitted". My objection is that there are places where for
>security reasons such a downcast should *not* be permitted because the type
>thinning represented by the interface must be strongly enforced. Therefore,
>there needs to be some form of interface-like abstraction in which the
>downcast is disallowed. In Java, this requires the introduction of proxy
>objects. Proxy objects in turn break representation identity.

As Jonathan & I have discussed on the phone, Jonathan's objection comes from 
a perspective that may well yield a coherent language with capability-like 
security and that might look like Java, but such a language would not have a 
Java-like semantics, and it would also not have a semantics like other 
capability languages (Joule http://www.agorics.com/joule.html , E 
http://www.erights.org , and W7 -- the Scheme of Rees's thesis 
http://mumble.net/jar/pubs/secureos/ ).  The crucial difference between 
Jonathan's language perspective and these other systems is whether static or 
dynamic properties are held to be primary.  On this issue, Jonathan has the 
static-oriented perspective, and the above systems embody the dynamic 
perspective.  Curiously, EROS & KeyKOS faithfully implement the purely 
dynamic perspective (though I believe Jonathan disagrees with this claim, 
and possibly with my construal of our perspective difference).


                         Purely Dynamic Capability Languages


Joule, E, & W7 have no static type system, so only dynamic properties of 
these languages can produce their security.  A capability corresponds to an 
object/closure reference.  An object/closure combines state and behavior.

* We can think of the "state" as being the object's instance variables, or 
   the set of variable locations used freely by the lambda expression that 
   evaluated to this closure (the variable locations "closed over").  This 
   state corresponds to the domain identity encoded into an EROS/KeyKOS start 
   key.

* We can think of the "behavior" as the vtable-pointer (thinking object 
   implementation) or code compiled from the lambda expression.  This 
   corresponds to the KeyKOS "data byte" or the EROS ?? (sorry, I forgot what 
   it's called again).

In any of these implementation domains, when multiple capabilities designate 
the same state but associate this state with different behavior, we refer to 
the set of capabilities jointly as a "composite", and each of them 
individually as "forseen facets" of this composite.  The important thing to 
keep in mind is that the language notion of "object" does *not* correspond 
to a domain, but to a forseen facet of a domain.  

(See http://www.erights.org/elib/capability/ode/ode-objects.html and 
http://www.erights.org/elib/capability/ode/ode-capabilities.html for a more 
complete explanation of this mapping between language concepts and 
capability concepts.  The "forseen" terminology was introduced at 
http://www.eros-os.org/~majordomo/e-lang/1514.html and elaborated some at 
http://www.eros-os.org/~majordomo/e-lang/1526.html .)


                                       Thinning, Take 1


The "thinning" Jonathan refers to is a forseen less powerful facet 
obtained by request to a more powerful facet.  The thinner facet's behavior 
is to respond successfully to a subset of the messages the thicker facet 
responds to -- the thicker facet's type is a subtype of the thinner facet's 
type, ie, the set of objects that implement the thicker type are a 
subset of the set of objects that implement the thinner type.

For those messages it does respond to successfully, it does the same thing 
the thicker facet would have done -- the message has the same meaning as it 
would have were it sent to the thicker facet instead.  When this isn't true, 
we still have facets, and one might still be a subtype of the other, but it 
doesn't fit our definition of the "thinning" pattern.  As we'll see, this 
precise definition is important to our disagreement, because only in the 
thinning pattern may an implementation use the thick vs thin difference 
*only* in deciding whether to reject a message.


                                              Java


Despite Norm's unfortunate statements at 
http://www.mediacity.com/~norm/CapTheory/CapBits.html , Java is not a 
capability language.  It does not have the security properties required of 
capability security.  In particular, mutable static variables provide an 
unconditional communications pathway not subject to capability discipline.  

However, Norm knows all this as well as I do.  (Norm, please correct this 
page!)  What I think he's getting at is that Java is a great start on the 
design of a capability language.  Any secure system, language or not and 
capability or not, must provide perimeter security 
http://www.erights.org/elib/capability/delegations.html -- keeping the bad 
guys on the outside of some encapsulation boundary.  For an object language, 
the natural boundary is the object.  In Java terminology -- the difference 
between "public" and "private".  Java's unforgeable pointers and static 
verification provides this beautifully.

In addition, capability security requires a protection-crossing invocation 
mechanism with the properties of the Granovetter Operator (the ability to 
convey capabilities in messages, etc) and it requires scoping rules 
consistent with capability rules -- such as strict lexical scoping.  (As 
Jonathan Rees says (paraphrasing): Names in the lambda calculus serve the 
same function as clist indexes in capability OSes.)  Java has all these, 
even though Java's designers were not trying to create a capability 
platform.  It got most of the way there just by addressing the applet 
perimeter security problem, combined with reasonable taste in modern lexical 
object-oriented safe language design.  (Indeed, Scheme and Smalltalk got 
even closer just on good taste, without thinking about security at all.)


                              Java, Static Types, and Security


The one security-relevant feature of Java absent from Joule, E, or 
W7 is a static type system.  How might this static type system benefit 
Java's security (or the security of a capability-secure variant of Java)?  
Arguably, it allows the verifier to do more of the 
perimeter-security-enforcing calculations statically, enabling the language 
implementation to do fewer of them at dynamically, and so achieves protection 
with greater efficiency.  How else might Java's security benefit from static 
types?  In a variety of minor ways, all of which but one have dynamic 
equivalents that are just as good (and, I would claim, clearer to 
understand).  The one exception?  Java's package scope enables the "Sibling 
Communications" style of Rights Amplification (
http://www.mediacity.com/~norm/CapTheory/Synergy.html and 
http://www.gwydiondylan.org/drm/drm_26.htm ) to be used with no additional 
runtime cost.  By contrast, Joule, E, and W7 do rights amplification with 
sealer/unsealer pairs, which (at least naively) entails a bit of runtime 
computation.  (Either primitive is easily built out of the other.  The only 
expressiveness issue in choosing between these two styles is that 
sealer/unsealer pairs provide finer-grained control of authority.)

How should we regard a static type system?  The dynamic-first perspective 
(as I hereby define it ;) ) holds that the "real" semantics of the program 
is what happens at runtime.  In any program accepted by the static type 
checker, were one to remove the variable type declarations, it would still 
mean the same thing.  The variable type declarations are merely a notation 
for stating assertions that the programmer believes must be provably true 
for all possible executions of all possible instantiations of the program.  
Since these assertions are supposed to be provably true for all possible 
instantiations, their proof can and should happen prior to any actual 
instantiation.  The programmer typically also depends on the development 
system to prevent instantiation of any program for which a proof of these 
assertions could not be generated.

This is a good first approximation to the meaning of type declarations in 
Java, but with some exceptions (feel free to skip):

* Support for Sibling Communication via package scope.  Were this done 
   purely on the runtime package membership of invoker and invokee, as opposed 
   to the static type of the variable with which invoker designates an invokee, 
   a Confused Deputy problem could result 
   http://www.mediacity.com/~norm/CapTheory/ConfusedDeputyM.html .  (Java's 
   reflective invocation mechanism does exactly do only the runtime check, and 
   so suffers from Confused Deputy.)  The sealer/unsealer alternative rights 
   amplification style does not have this danger.

* Overload resolution.  Java methods are overloaded on the static type of 
   the invoking argument expressions, rather than the dynamic type of the 
   argument values.  Because Java has subtyping, these can give different 
   answers.  However, were we to replace the message name at the call site with 
   the signature resulting from the static overload resolution, static types 
   would have no further effect on the runtime semantics of invocation.

* What else?  Java's a complex language, so I'm sure there are a few more 
   things like this I haven't thought of.  But I'll speculate that the others 
   are no more fundamental than the above two issues.  At least for a 
   hypothetical capability-secure Java-like language.


                         Thinning, Take 2: The Seduction


The only difference between a thick facet and a derived thin facet is that 
the thin facet allows only a subset of the operations allowed by the thick 
facet.  And for this subset, the operations have the same meaning.  In 
describing sensible thinning patterns, especially of forseen thinnings, the 
thin interface should be an explicit named type, and the thick interface an 
explicit named subtype.  In mapping all this to Java, it is perfectly 
sensible to map each thin supertype as a separate Interface type, and have 
the thick type 'extend' or 'implement' all the types for which it forsees a 
thinning.  All good so far.

The seduction comes in our interpretation of 

     Thick p1 = ...;
     Thin p2 = p1;

Is p2 a thinning of p1?  Certainly Jonathan & I agree that, in current Java 
it is not.  p2 provides all the authority of p1 because the programmer may 
at any later time do:

     Thick p3 = (Thick)p2;

Our difference is that Jonathan views this property as a bug, whereas I view 
it as a feature.  p2 seems seductively like a thinning of p1 because, in the 
absence of the use of the cast operation, the program is indeed prevented 
from recovering from p2 the authority to invoke the operations described by 
Think but not Thin.  However, from my perspective, p1, p2, p3 all have the 
same dynamic meaning -- they are all pointers to the same object -- and so 
they are, and should be, security equivalent.  "p1 == p2" should succeed.  
The difference in static types of these variables is merely an artifact of 
trying to reason statically about all possible instantiations/executions of 
this code.  p2 represents less static information than does p1.  A cast is 
effectively an if:

     if (p2 instanceof Thick) {
         Thick p3 = p2;
     } else {
         throw ...
     }

Before the 'if', we only know (or we can only prove to the type checker) 
that p2 is a kind of Thin.  The 'if' effectively allows us to recover static 
information from dynamic.  Statically, we don't know which branch of the 
'if' will be taken, but we know that we'll be in the then-branch iff p2 is a 
kind of Thick.  As Norm points out, the situation isn't very different from

     if (d != 0.0) {
         ... n / d ...
     } else {
         ...
     }

In the code before the test, let's say that d may be any floating point 
value.  Within the then-branch of the if, d may be any non-zero value.  Note 
that these are static statements.  Dynamically, in any execution d may only 
have one value.  The 'if' reasons dynamically about the actual value of d.  
In describing the meaning of the if, we reason statically about the possible 
values of d.  We could describe our greater knowledge of d's values within 
the then-branch as a subrange type if we wish.  This subrange type would be 
a subtype of the type of d outside the branch.

Of course, I'm not advocating that anyone define a type system to do this.  
And I'm certainly not advocating that one design a language that prohibits 
division in the absence of a static proof that the denominator is not zero!  
But that's part of the point.  The programmer will always have more static 
knowledge of properties of their program than are captured in any type 
system.  Sure, you could add parameterized types to Java.  But what about an 
array in which we store alternating keys and values -- a common 
implementation technique?

No type system I'm aware of lets us say that the even element contain 
key-typed objects and the odd contain value-typed objects.  Nor should 
anyone design such a type system.  But in the absence of type systems that 
can carry out any static proof the programmer can, the type system will 
always lose knowledge the programmer did not lose.  The cast in Java is 

1) a way for the programmer to re-introduce static knowledge that he 
   couldn't get the type checker to carry for him, and 
2) an 'if' as above to abort the execution path, via a thrown exception, if 
   the programmer was wrong.

This is all useful merely as a conversation between the programmer and the 
type checker helping each other to reason statically about the program.  The 
knowledge loss from p1 to p2 is very different than the authority loss that 
a thinning operation needs to provide.  The thinning operation must effect 
what's possible at runtime, rather than merely reason about it.


                          Stone Casting (from Communities.com)


In Original-E http://www.communities.com/products/tools/e/e_white_paper.html 
, an Java variant attempting to fix Java's security problems, we also felt, 
like Jonathan, that thinning was a useful authority reduction pattern, and 
that Java interface types & interface type inheritance was a useful way to 
express the thinnings you want to be possible.  However, we made thinning 
into an operation distinct from static type loss -- "stone casting" as in 
"cast in stone".  With the proxies of Java 1.3 
http://java.sun.com/j2se/1.3/docs/guide/reflection/proxy.html , we can now 
express this in Java itself without building a new language.  Contrast the 
above

     Thin p2 = p1;

with

     Thin p2 = (Thin)StoneCast.new(p1, Thin.class);

This latter returns a new object that implements the Thin interface but not 
Thick, and forward any This message on to the object it wraps -- the object 
designated by p1.  An attempt to recover the authority absent from p2:

     Thick p3 = (Thick)p2;

will fail, as p2 is not a kind of Thick.  Jonathan knows about the Stone 
Casting alternative, writing:

>Therefore,
>there needs to be some form of interface-like abstraction in which the
>downcast is disallowed. In Java, this requires the introduction of proxy
>objects. Proxy objects in turn break representation identity.

It indeed has an effect on object identity.  Whereas previously "p1 == p2" 
would and should succeed, since they dynamically designate the same object, 
in the stone-casting case "p1 == p2" would fail, since they designate 
different objects.  I believe this is the only correct answer.  Whether two 
objects are forseen facets on the same state, or whether they share state 
through other means is their business, not the business of their clients.  


                               Cast in Steele

Why is this the only stance in the spirit of Java's semantics?  To anyone 
who still feels confused about the relationship of static and dynamic types 
in Java, I highly recommend Chapter 4 of Guy Steele's "The Java Language 
Specification".  It's first few pages make clear that the type of a variable 
is simply a static constraint on the possible values the variable may have, 
and that a variable of type T may hold a (reference to) an object of a 
subtype of T.  The type of the variable has no effect on the object it 
designates.

I know Jonathan knows all this.  He's just proposing a Java variant that 
doesn't work according to this description.  I simply want to say that this 
wouldn't be a minor semantic tweak -- it would be an earthquake -- even if 
only three lines of code need to be changed in the implementation and three 
lines of text in the tutorial.  But the proof of the pudding and all that.  
Should there come to be a serious secure language proposal along 
the lines Jonathan suggests, I'm sure we'd all learn a great deal.  In the 
meantime, I will continue to approach the relationship between capabilities 
and types in the traditional way -- according to the dynamic perspective.



         Cheers,
         --MarkM