[e-lang] Indecision is the mother of convention (was: Handling
Symbolic Data)
Mark S. Miller
markm at caplet.com
Sun Apr 25 14:57:16 EDT 2004
I am desperate to put the Unicode issue to bed adequately for 0.9/alpha, and
proceed to specify the more interesting aspects of Kernel-E. However, specs
proceed in bottom for good reasons, and there were genuine correctness and
security issues at stake, and Unicode is a whole world unto itself (which I
hope most E programmers will never need to learn about). So pinning this
down was much harder than I thought it would be. The new draft proposal can
be found at http://www.erights.org/data/common-syntax/index.html . I have
documented there both draft proposed decisions, rationale for them, and
remaining open questions. I await your comments.
At 09:31 PM 4/18/2004 Sunday, Mark S. Miller wrote:
>At 08:58 PM 4/18/2004 Sunday, Jonathan S. Shapiro wrote:
>>PLEASE let us not re-invent string representation! This issue has been
>>firmly beaten to death in the context of the design of other languages,
>>and there are established solutions. Even if they are inferior, use
>>them.
>>[out of order]
>>There simply *isn't* a right solution here. Pick a language that you
>>respect -- perhaps Perl or Python, both of which have extensive
>>experience with this issue -- and follow their lead.
>
>I would very much like to delegate to an adequate prior design.
>Pointers please?
From web searching, I found nothing relevant about Perl. Perhaps I didn't
search well enough. Pointers would still be appreciated.
I include on these draft pages the relevant links I found into the
literatures of Python, Java, W3C, and of course Unicode itself. (See
especially
http://www.erights.org/data/common-syntax/baking-chars.html#only_bmp .)
I found surprisingly little from Python that was helpful. I was very
pleasantly surprised at how helpful various W3C documents were.
In writing this, I think I found a bug in rfc2119
At 09:09 PM 4/18/2004 Sunday, Mark S. Miller wrote:
>At 07:31 PM 4/18/2004 Sunday, David Hopwood wrote:
>>Mark S. Miller wrote:
>>>Should any surrogate code points remain after these steps, the source text MUST be statically rejected.
>>>E 0.9 implementation limit: If the above decoding would result in a character larger than an E platform's implementation limits, that source text MUST be statically rejected. Given that E 0.9's limits are 0..0xFFFF, it suffices to do a UTF-8 decode and reject the source text if this produces either any large characters or any surrogate code points.
>>
>>Agreed for the time being.
>
>It sounds like we have agreement (for the time being) on the proposed
>operation of E 0.9, but disagreement on what these operational rules mean in
>the larger Unicode context.
Are these pages consistent with your recommendations and our agreements?
(AFAIK, they are, but I'd appreciate having you look it over. Thanks.)
Btw, after learning more about Unicode than I could stand, I went back and
read our correspondence. It all made a lot more sense to me now!
Altogether, I came out of this experience with a newfound respect for how
good a job a committee can do. Unlike IEEE floating point, the Unicode
consortium took on a problem too large to ever fit in the head of one person
or small group. Although many committees allow politics to distract them
from technical issues, in this case, they took on a problem that was
inherently political in its nature. They nevertheless found (or imposed)
abstractions across languages, in order to produce a set of rules that one
person can hold in their head, so that they can write programs that, for
example, a Uyghur speaker can use to write text, even though the programmer
has never heard of Uyghur and the user may not know any English.
http://www.unicode.org/standard/WhatIsUnicode.html
The extent of the market is also limited by the divisibility of labor.
However flawed, their efforts to find cross language abstractions that enable
such an astonishing division of labor is nothing less than heroic.
--
Text by me above is hereby placed in the public domain
Cheers,
--MarkM
More information about the e-lang
mailing list