[e-lang] Indecision is the mother of convention (was: Handling Symbolic Data)

Mark S. Miller markm at caplet.com
Sun Apr 25 14:57:16 EDT 2004


I am desperate to put the Unicode issue to bed adequately for 0.9/alpha, and 
proceed to specify the more interesting aspects of Kernel-E. However, specs 
proceed in bottom for good reasons, and there were genuine correctness and 
security issues at stake, and Unicode is a whole world unto itself (which I 
hope most E programmers will never need to learn about). So pinning this 
down was much harder than I thought it would be. The new draft proposal can 
be found at http://www.erights.org/data/common-syntax/index.html . I have 
documented there both draft proposed decisions, rationale for them, and 
remaining open questions. I await your comments.


At 09:31 PM 4/18/2004  Sunday, Mark S. Miller wrote:
>At 08:58 PM 4/18/2004  Sunday, Jonathan S. Shapiro wrote:
>>PLEASE let us not re-invent string representation! This issue has been
>>firmly beaten to death in the context of the design of other languages,
>>and there are established solutions. Even if they are inferior, use
>>them.
>>[out of order]
>>There simply *isn't* a right solution here. Pick a language that you
>>respect -- perhaps Perl or Python, both of which have extensive
>>experience with this issue -- and follow their lead.
>
>I would very much like to delegate to an adequate prior design. 
>Pointers please?

 From web searching, I found nothing relevant about Perl. Perhaps I didn't 
search well enough. Pointers would still be appreciated.

I include on these draft pages the relevant links I found into the 
literatures of Python, Java, W3C, and of course Unicode itself. (See 
especially 
http://www.erights.org/data/common-syntax/baking-chars.html#only_bmp .) 
I found surprisingly little from Python that was helpful. I was very 
pleasantly surprised at how helpful various W3C documents were.

In writing this, I think I found a bug in rfc2119



At 09:09 PM 4/18/2004  Sunday, Mark S. Miller wrote:
>At 07:31 PM 4/18/2004  Sunday, David Hopwood wrote:
>>Mark S. Miller wrote:
>>>Should any surrogate code points remain after these steps, the source text MUST be statically rejected.
>>>E 0.9 implementation limit: If the above decoding would result in a character larger than an E platform's implementation limits, that source text MUST be statically rejected. Given that E 0.9's limits are 0..0xFFFF, it suffices to do a UTF-8 decode and reject the source text if this produces either any large characters or any surrogate code points.
>>
>>Agreed for the time being.
>
>It sounds like we have agreement (for the time being) on the proposed 
>operation of E 0.9, but disagreement on what these operational rules mean in 
>the larger Unicode context.

Are these pages consistent with your recommendations and our agreements? 
(AFAIK, they are, but I'd appreciate having you look it over. Thanks.)

Btw, after learning more about Unicode than I could stand, I went back and 
read our correspondence. It all made a lot more sense to me now!


Altogether, I came out of this experience with a newfound respect for how 
good a job a committee can do. Unlike IEEE floating point, the Unicode 
consortium took on a problem too large to ever fit in the head of one person 
or small group. Although many committees allow politics to distract them 
from technical issues, in this case, they took on a problem that was 
inherently political in its nature. They nevertheless found (or imposed) 
abstractions across languages, in order to produce a set of rules that one 
person can hold in their head, so that they can write programs that, for 
example, a Uyghur speaker can use to write text, even though the programmer 
has never heard of Uyghur and the user may not know any English.

http://www.unicode.org/standard/WhatIsUnicode.html

The extent of the market is also limited by the divisibility of labor.
However flawed, their efforts to find cross language abstractions that enable 
such an astonishing division of labor is nothing less than heroic.


-- 
Text by me above is hereby placed in the public domain

        Cheers,
        --MarkM



More information about the e-lang mailing list