[e-lang] Joe-E taming error in java.lang.String.taming

David Hopwood david.hopwood at industrial-designers.co.uk
Sat Sep 22 13:38:23 EDT 2007


David Hopwood wrote:
> Tyler Close wrote:
>> So now that I've tamed away all the string encoding methods from
>> java.lang.String, I find I need another method in the
>> org.joe_e.charset.ASCII API. In particular,
>>
>>     /**
>>      * Decodes a US-ASCII string.
>>      * @return The corresponding string
>>      */
>>     static public String
>>     decode(final byte[] buffer, final int off, final int len) {
>>         try {
>>             return new String(buffer, off, len, "US-ASCII");

>From the documentation of this constructor
<http://java.sun.com/javase/6/docs/api/java/lang/String.html#String(byte[],%20int,%20int,%20java.lang.String)>:

# The behavior of this constructor when the given bytes are not valid in the
# given charset is unspecified. The CharsetDecoder class should be used when
# more control over the decoding process is required.

This behaviour needs to be specified for Joe-E. Here is a possible
implementation (the Charset.decode method uses a thread-locally cached
CharsetDecoder):

  import java.nio.charset.Charset;
  import java.nio.ByteBuffer;

  static private final Charset charset = Charset.forName("US-ASCII");

  /**
   * Decodes a US-ASCII string. Each byte not corresponding to a US-ASCII
   * character decodes to the Unicode replacement character U+FFFD.
   * @return The corresponding string
   * @throws java.lang.IndexOutOfBoundsException
   */
  static public String
  decode(final byte[] buffer, final int off, final int len) {
      return charset.decode(ByteBuffer.wrap(buffer, off, len));
  }

> A corresponding method should also be added to org.joe_e.charset.UTF8.

Same issue here, and in addition, it should be specified that the
UTF8 class does nothing special with initial byte-order marks.

-- 
David Hopwood <david.hopwood at industrial-designers.co.uk>




More information about the e-lang mailing list