[e-lang] Causeway JSON format doc?

David-Sarah Hopwood david.hopwood at industrial-designers.co.uk
Fri Aug 1 14:48:11 CDT 2008


Kevin Reid wrote:
> On Aug 1, 2008, at 10:46, Tyler Close wrote:
>> On Fri, Aug 1, 2008 at 5:43 AM, Kevin Reid <kpreid at mac.com> wrote:
>>>  * Is the column counting scalar values or grapheme clusters?
>> Unicode code points.
> 
> This is a bad idea. Code points include surrogates, and as such, the  
> count of code points depends on whether you are considering UTF-16 or  
> some other encoding.

Although you're technically correct according to the definitions in the
standard, my experience is that whenever people -- even Unicode standards
weenies -- say that they're counting code points, they always mean
Unicode scalar values (or equivalently, "encoded characters").

The distinction between "code point" and "scalar value" is a historical
one that should really have been got rid of when all the UTFs were put
on an equal footing, around Unicode 3.x.

-- 
David-Sarah Hopwood


More information about the e-lang mailing list