[e-lang] Source code character sets and Unicode
Ka-Ping Yee
e-lang at zesty.ca
Mon May 28 22:29:30 EDT 2007
On Mon, 28 May 2007, Mark S. Miller wrote:
> I have been thinking of requiring something like a
>
> pragma.charset("unicode")
>
> before allowing non-ascii characters. This discussion is probably
> an opportune time to decide on this matter.
I'm thinking of something like:
1. Choose a "basic character set" whose characters are all
clearly distinguishable in (almost all) commonly available
fonts on every supported platform, and whose characters are all
supported by almost all editors (e.g. ASCII or Wysiwyg-ASCII).
2. By default, forbid non-basic characters in source files.
3. Choose a well-defined, bounded region of the source file,
preferably guaranteed to be visible in common contexts (e.g.
the first 80 characters or first two lines, whichever is shorter).
4. Always allow only the basic character set within that region,
regardless of any configurable settings or declarations.
5. Require a declaration to appear within that region in order to
allow non-basic characters in the rest of the source file.
Side question: why does Wysiwyg-ASCII allow '\r'? Why not simply
define Wysiwyg-ASCII as the regular expression [\n -~]*?
-- ?!ng
More information about the e-lang
mailing list