[E-Lang] Draft Kernel-E DTD & Sketch of translation to debuggable Java

Dan Bornstein danfuzz@milk.com
Wed, 27 Sep 2000 15:31:40 -0700 (PDT)


MarkM writes:
>It seems there are two cases: the non-mobile-code-supporting case, and the 
>mobile-code-supporting case.  For the first, it seems JOSS already does the 
>right thing: the unserialization fails immediately upon encountering the an 
>instance of the unknown class.  What else could it, or anything else, do?

I'll grant that at the api level, it does the right thing in this case, but
I won't grant that the amount of underlying bytes that had to be written is
necessarily the best thing, nor will I grant that the amount of code that
has to be run in order to generate those bytes is necessarily the best
thing.

>For the mobile-code supporting case, here's my approximate plan:
>
>Have each side of a connection maintain a table of the serialvers of the 
>classes it believes the other side already knows how to unserialize.

The thing I'd avoid like the plague is trying to pretend that two classes
with different serializations should ever be considered the "same" class,
even if they have the same name. If what you're saying is "classes won't
have any name except for the serialver id," then I think you're probably
safe. RMI got this wrong.

>If the class isn't in the table and this class isn't the result of
>compiling E to Java, then fail, since only E code is mobile.

Note that one thing you'll have to be careful with now is that, across
versions of E, your E-to-Java translator has to ensure that it produces
serialver-equivalent code, or at least goes to extra length to deal with
the problems. (Did you fix a bug that used to spit out an extra instance
variable because it should've been marked transient? Did you fix a bug that
affects the names of the Java instance variables?) That is, without putting
JOSS in the picture, the requirements had to do with verifying Kernel E
trees. With JOSS, the two Es have to agree, to at least some extent, how to
turn Kernel E into Java classes.

If it were me designing the format (and, again I know it's not), I'd just
stick with a predefined set of agreed-upon classes at the Java layer. In
terms of serializing the state of E-written objects, I'd just have an
<object> tag that looked something like this:

    <object>
      <class>9393821930da43983</class>
      <iv><name>x</name><int>10</int></iv>
      <iv><name>y</name><int>22</int></iv>
    </object>

Which would directly serialize from/deserialize to a state bundle object of
some sort, which you could then turn into the real object you cared about.

>If the object's behavior is defined in E, then serialize its Kernel-E parse 
>tree into a packet to be transmitted ahead of this one, and remember its 
>serialver in the table.

Modulo the above considerations, that sounds fine (and I'd be much
happier if RMI did that rather than what it actually does today).

>>If it were me designing the wire protocol (and I know it's not), then I'd
>>do something more like design the XML DTD first, having it contain things
>>directly meaningful, and then construct the mapping from that DTD to
>>appropriate Java objects, which would have to be instances of globally
>>agreed-upon immutable-forever (at least in terms of their serialization
>>format) classes. The JOSS format would be the happenstance of how those
>>objects serialized, rather than the essential thing from which the other
>>formats would be derived. 
>
>JOSS is well defined.  I don't understand what the "happenstance" above, or 
>the "whatever a two element Vector consisting of an Integer and a String 
>happens to serialize into" below is meant to imply.  

What I meant was that your spec will be easier to deal with if you *don't*
base it on JOSS, because of the overhead in the JOSS format and the fact
that its semantics are pretty opaque. Since JOSS interoperability is
important, though, you'd still provide a mapping to it, but the primary
spec wouldn't be driven by details of JOSS. That is, you'd design the DTD,
pick Java classes that correspond, and define the "E JOSS spec" as "the
Java serialization system, plus the fact that instances of these particular
classes are the only ones that will ever be encoded."

>Isn't your proposed XML encoding equally the happenstance of how you (or
>hypothetically, I) decode to encode a vector of and integer and a string?

Yes, but what I meant to imply is that if you design a DTD for it, there
will be a better correspondence between concepts you wish to convey and
elements of the spec. Perhaps I should have contrasted it with what JOSS or
XMLized JOSS would have looked like for the same thing. I hereby rectify
that:

    import java.util.Vector;
    import java.io.FileOutputStream;
    import java.io.ObjectOutputStream;
    
    public class Foo
    {
        static public void main (String[] args)
            throws Exception
        {
            Vector v = new Vector ();
            v.addElement (new Integer (1));
            v.addElement ("foo");
            FileOutputStream fos = new FileOutputStream ("foo.joss");
            ObjectOutputStream oos = new ObjectOutputStream (fos);
            oos.writeObject (v);
            oos.close ();
            fos.close ();
        }
    }

    Raw output (247 bytes):

    00: aced 0005 7372 0010 6a61 7661 2e75 7469  |,m..sr..java.uti|
    10: 6c2e 5665 6374 6f72 d997 7d5b 803b af01  |l.VectorY.}[.;/.|
    20: 0200 0349 0011 6361 7061 6369 7479 496e  |...I..capacityIn|
    30: 6372 656d 656e 7449 000c 656c 656d 656e  |crementI..elemen|
    40: 7443 6f75 6e74 5b00 0b65 6c65 6d65 6e74  |tCount[..element|
    50: 4461 7461 7400 135b 4c6a 6176 612f 6c61  |Datat..[Ljava/la|
    60: 6e67 2f4f 626a 6563 743b 7870 0000 0000  |ng/Object;xp....|
    70: 0000 0002 7572 0013 5b4c 6a61 7661 2e6c  |....ur..[Ljava.l|
    80: 616e 672e 4f62 6a65 6374 3b90 ce58 9f10  |ang.Object;.NX..|
    90: 7329 6c02 0000 7870 0000 000a 7372 0011  |s)l...xp....sr..|
    a0: 6a61 7661 2e6c 616e 672e 496e 7465 6765  |java.lang.Intege|
    b0: 7212 e2a0 a4f7 8187 3802 0001 4900 0576  |r.b $w..8...I..v|
    c0: 616c 7565 7872 0010 6a61 7661 2e6c 616e  |aluexr..java.lan|
    d0: 672e 4e75 6d62 6572 86ac 951d 0b94 e08b  |g.Number.,....`.|
    e0: 0200 0078 7000 0000 0174 0003 666f 6f70  |...xp....t..foop|
    f0: 7070 7070 7070 70                        |ppppppp|

    Approximate interpretation as JOSS/XML:

    <object>
      <class>
        <name>java.util.Vector</name>
        <iv>
          <name>capacityIncremenent</name>
          <type>int</type>
        </iv>
        <iv>
          <name>elementCount</name>
          <type>int</type>
        </iv>
        <iv>
          <name>elementData</name>
          <type><array>java.lang.Object</array></type>
        </iv>
      </class>
      <iv><int>0</int></iv>
      <iv><int>2</int></iv>
      <iv>
        <array>
          <class>java.lang.Object</class>
          <el>
            <object>
              <class>
                  <name>java.lang.Integer</name>
                  <super-name>java.lang.Number</super-name>
              </class>
              <int>1</int>
            </object>
          </el>
          <el>
            <string>foo</string>
          </el>
        </array>
      </iv>
    </object>

I tried to preserve as much as possible of the structure in the XML (e.g.,
the first instance of a previously-unencountered class gets a description
of its serialization structure prepended to it, instance variable values
are set positionally (not by name), and strings get encoded specially
rather than just being instances of java.lang.String that contain arrays of
char inside them). I hope you agree that (politics aside) it'd be much
better if the XML form were more like this:

     <vector>
       <el><int>1</int></el>
       <el><string>foo</string></el>
     </vector>

Extra tidbit: I did another quick test where I wrote out a second
vector-of-int-and-string after the first to see what was up, and it looks
like there is indeed a lot of one-time overhead in the above. The XML for
the second vector (and children) would look something like this:

    <object>
      <class><backref>1</backref></class>
      <iv><int>0</int></iv>
      <iv><int>2</int></iv>
      <iv>
        <array>
          <class><backref>2</backref></class>
          <el>
            <object>
              <class><backref>3</backref></class>
              <int>1</int>
            </object>
          </el>
          <el>
            <string>bar</string>
          </el>
        </array>
      </iv>
    </object>

where the <backref>s refer to objects that had previously been encountered.
I think I'm dubious of the utility of this as a format to aid human
debugging much, at least not without some sort of comments (that, e.g.,
indicated what backref 7 actually refers to).

In terms of making your JOSS/XML use something more symbolic instead of
<backref>, you'd be moving away from the claim that the XML form is a
transparent translation of JOSS. And, if you're willing to go for that, I'd
say, just go the rest of the way and do something in XML that doesn't
necessarily look *anything* like the underlying JOSS but *does* have a
straightforward translation back and forth.

>In any case, I don't see how this deals with the problem you originally 
>raised: What to do when one side serializes an a pass-by-copy object which 
>is an instance of a behavior the other side has never heard of?

I wasn't saying that I was solving that problem. I was just saying
JOSS doesn't bring anything to the table to solve the problem, and so
you'd be paying overhead and not getting a benefit.

>And most of all, they are the only way within the Pure Java rules to make
>use of their native methods for reaching into an object's instance
>variables efficiently.

You control the horizontal. You control the vertical. If you're spitting
out bytecode for these things, surely you can spit out a writeState()
instance method and a constructFromState() static method that access
whatever you need to.

>Ok, I give up.  I'm new to this XML DTD design thing, so help me out here.  
>Why didn't you say:
>
>     <vector>
>       <int>1</int>
>       <string>foo</string>
>     </vector>

No particularly good reason in the trivial example given. However,
off-the-cuff, perhaps the format allows for sparse population, so you might
be able to say this:

    <vector>
      <size>100</size>
      <el><index>5</index><int>1</int></el>
      <el><index>40</index><string>foo</string></el>
    </vector>

(That is, it'd be weird to stick the <index> tag in the <int> tag, but the
index needs to be associated with the element it modifies.) My example
above could be a degenerate form. Anyway, I wasn't *actually* trying to
design your DTD, just giving a quick example.

-dan