[e-lang] Source code character sets and Unicode

Ka-Ping Yee e-lang at zesty.ca
Mon May 28 22:29:30 EDT 2007


On Mon, 28 May 2007, Mark S. Miller wrote:
> I have been thinking of requiring something like a
>
>      pragma.charset("unicode")
>
> before allowing non-ascii characters. This discussion is probably
> an opportune time to decide on this matter.

I'm thinking of something like:

1.  Choose a "basic character set" whose characters are all
    clearly distinguishable in (almost all) commonly available
    fonts on every supported platform, and whose characters are all
    supported by almost all editors (e.g. ASCII or Wysiwyg-ASCII).

2.  By default, forbid non-basic characters in source files.

3.  Choose a well-defined, bounded region of the source file,
    preferably guaranteed to be visible in common contexts (e.g.
    the first 80 characters or first two lines, whichever is shorter).

4.  Always allow only the basic character set within that region,
    regardless of any configurable settings or declarations.

5.  Require a declaration to appear within that region in order to
    allow non-basic characters in the rest of the source file.

Side question: why does Wysiwyg-ASCII allow '\r'?  Why not simply
define Wysiwyg-ASCII as the regular expression [\n -~]*?


-- ?!ng


More information about the e-lang mailing list