[e-lang] Causeway JSON format doc?
david.hopwood at industrial-designers.co.uk
Fri Aug 1 14:48:11 CDT 2008
Kevin Reid wrote:
> On Aug 1, 2008, at 10:46, Tyler Close wrote:
>> On Fri, Aug 1, 2008 at 5:43 AM, Kevin Reid <kpreid at mac.com> wrote:
>>> * Is the column counting scalar values or grapheme clusters?
>> Unicode code points.
> This is a bad idea. Code points include surrogates, and as such, the
> count of code points depends on whether you are considering UTF-16 or
> some other encoding.
Although you're technically correct according to the definitions in the
standard, my experience is that whenever people -- even Unicode standards
weenies -- say that they're counting code points, they always mean
Unicode scalar values (or equivalently, "encoded characters").
The distinction between "code point" and "scalar value" is a historical
one that should really have been got rid of when all the UTFs were put
on an equal footing, around Unicode 3.x.
More information about the e-lang