[E-Lang] Draft Kernel-E DTD & Sketch of translation to debugg able Java

Karp, Alan alan_karp@hp.com
Wed, 27 Sep 2000 14:22:20 -0700


> -----Original Message-----
> From: Mark S. Miller [mailto:markm@caplet.com]
> Sent: Wednesday, September 27, 2000 11:50 AM
> To: Karp, Alan
> Cc: dnm@pobox.com; e-lang@eros-os.org
> Subject: RE: [E-Lang] Draft Kernel-E DTD & Sketch of translation to
> debugg able Java
> 
> 
> At 09:05 AM 9/27/00 , Karp, Alan wrote:
> >In the first incarnation, the protocol was ASCII strings on 
> sockets.  People
> >snickered because it was so simple, but isn't that what XML is?  ...
> 
> Yes.  But it's no longer simple, so the snickering stopped ;)
> 
> >Our first hope was that we could define the communication as the Java
> >serialization of an object.  It's a well-defined format that covers
> >everything you'd like to represent.  All you need is to have 
> code in your
> >language to produce the right bytes.  It's not hard.  I've 
> written a Perl
> >script to produce the format and a Python script to parse it.  
> 
> Could you please please please send me this code??

Well, the Perl code was written to help with repackaging, so it actually
parses the class files.  That's quite similar to parsing the serialization,
though.  The main difference is that I was only interested in the import
statements I needed to generate, not the instance variables.  The Python
code is not complete, since I didn't put in anything I didn't need to parse
the sample on the serialization page at the Sun site.  I only wrote it as an
exercise to learn Python.  I'll send it for your amusement.  

> 
> But please, only under Mozilla-compatible open-source terms.  
> This usually 
> means any open source license but GPL.
> 

There are no restrictions on its use, primarily because you'll have to do a
lot of work to finish it off.

> 
> >Hence, RMI
> >serialization is as "language independent" as any other 
> binary format.
> 
> Indeed!

You'd be surprised at the number of people who don't believe it.

> 
> 
> >Unfortunately, we found this format to be overly verbose.  
> For example,
> >sending a single byte resulted in a 380 byte message.  
> That's why e-speak
> >Beta 2.2 used its own serialization.  This format used a 
> single byte to
> >denote the base and e-speak types and an escape mechanism to 
> extend this
> >flag to more than one byte for user-defined types.
> 
> Yeah, it's got its problems too.  However, in order to play 
> by the pure-Java 
> rules, I feel stuck with it.  JOSS internally uses some 
> private native 
> methods for reaching into an object's instance variables.  
> I've looked 
> carefully at the hooks JOSS provides for customization, which 
> have grown as 
> of 1.3.  But even with 1.3, the only way to make use of these 
> native methods 
> for high speed serialization is to use the JOSS format.  But 
> I don't know 
> how the time gained by these native methods trades off 
> against the time lost 
> in generating and parsing a more verbose format.  Do you have 
> any data on that?

Sorry, no, but I do know that going through JNI costs a subtantial amount of
time (~ms?).  Hence, if you're doing something reasonably small, it isn't
worth it.

> 
> 
> >We got beaten up for being "proprietary", whatever that 
> means.  People said
> >we should use HTTP.  In my view, the protocol part of HTTP 
> deals only with
> >the transport part of the problem, not the payload part.  
> That criticism
> >disappeared when XML came onto the stage.  Then we were 
> beaten up for not
> >using XML.  At least this criticism has some merit.  Anyone 
> with a generic
> >XML parser can process the document.  Of course, they still 
> need application
> >specific knowledge to know what to do with the fields.
> 
> This experience corroborates my expectations of the politics 
> of rolling out 
> a protocol these days.  Thanks.
> 
> 
> >XML is good, not for the human readable part, but because it 
> solves the
> >syntax problem.  All parties need not produce the same 
> sequence of bytes as
> >long as they obey the same schema.  That's good.  It makes the whole
> >communication piece less fragile.  Misplace a byte in a 
> binary format, and
> >the entire message is unusable; do it in an XML document, 
> and at most one
> >field is garbage.  
> 
> I don't see this as an advantage.  Once layered on top of a 
> reliable byte 
> stream, protocols should be fail-stop.  To continue after 
> something you 
> didn't understand is to risk madness.  Or, as I used to say at EC:
> 
>      Death Before Confusion!
> 
> Ironically, it looks like the JOSS format has more error 
> recovery ability 
> than XML, but Sun wisely doesn't make use of that property.
> 

I wasn't talking about transmission errors, but about construction errors.
I agree that it is wrong to silently accept errors, but you'd like to do a
complete job reporting what's wrong.  That's hard if a single byte screws up
the rest of the data stream.  Hence, I would change your motto to 

	Symptoms Before Confusion!

The other place where XML wins over a binary format is when versions change.
If you're hit with a strange byte in a binary protocol, all you can do is
punt.  If you're hit with a missing field in XML, and you know the other guy
is back level, you can provide an architected default if that makes sense.

> 
> >The downside is latency.  It takes about 100 ms to parse an 
> XML document of
> >modest size.  You only save 1/2-3/4 of that time if you pass 
> the DOM tree.
> >This overhead would have been unacceptable in e-speak Beta 
> 2.2, but DR 3.0
> >is focused on B2B communication, which can tolerate such latencies.
> 
> Until I have good reason to believe otherwise, I'm going to 
> proceed assuming 
> protocol speed is important for E.
> 

"Premature optimization is the root of all evil."
						- Don Knuth

> 
> 
>          Cheers,
>          --MarkM
> 

_________________________
Alan Karp
Decision Technology Department
Hewlett-Packard Laboratories MS 1U-2
1501 Page Mill Road
Palo Alto, CA 94304
(650) 857-3967, fax (650) 857-6278