aicas logo Jamaica 3.4 release 8

sun.text.normalizer
Class UTF16

java.lang.Object
  extended by sun.text.normalizer.UTF16

public final class UTF16
extends Object

Standalone utility class providing UTF16 character conversions and indexing conversions.

Code that uses strings alone rarely need modification. By design, UTF-16 does not allow overlap, so searching for strings is a safe operation. Similarly, concatenation is always safe. Substringing is safe if the start and end are both on UTF-32 boundaries. In normal code, the values for start and end are on those boundaries, since they arose from operations like searching. If not, the nearest UTF-32 boundaries can be determined using bounds().

Examples:

The following examples illustrate use of some of these methods.

 // iteration forwards: Original
 for (int i = 0; i < s.length(); ++i) {
     char ch = s.charAt(i);
     doSomethingWith(ch);
 }

 // iteration forwards: Changes for UTF-32
 int ch;
 for (int i = 0; i < s.length(); i+=UTF16.getCharCount(ch)) {
     ch = UTF16.charAt(s,i);
     doSomethingWith(ch);
 }

 // iteration backwards: Original
 for (int i = s.length() -1; i >= 0; --i) {
     char ch = s.charAt(i);
     doSomethingWith(ch);
 }

 // iteration backwards: Changes for UTF-32
 int ch;
 for (int i = s.length() -1; i > 0; i-=UTF16.getCharCount(ch)) {
     ch = UTF16.charAt(s,i);
     doSomethingWith(ch);
 }
 
Notes:


Field Summary
static int CODEPOINT_MAX_VALUE
          The highest Unicode code point value (scalar value) according to the Unicode Standard.
static int CODEPOINT_MIN_VALUE
          The lowest Unicode code point value.
static int LEAD_SURROGATE_MAX_VALUE
          Lead surrogate maximum value
static int LEAD_SURROGATE_MIN_VALUE
          Lead surrogate minimum value
static int SUPPLEMENTARY_MIN_VALUE
          The minimum value for Supplementary code points
static int SURROGATE_MIN_VALUE
          Surrogate minimum value
static int TRAIL_SURROGATE_MAX_VALUE
          Trail surrogate maximum value
static int TRAIL_SURROGATE_MIN_VALUE
          Trail surrogate minimum value
 
Constructor Summary
UTF16()
           
 
Method Summary
static StringBuffer append(StringBuffer target, int char32)
          Append a single UTF-32 value to the end of a StringBuffer.
static int charAt(char[] source, int start, int limit, int offset16)
          Extract a single UTF-32 value from a substring.
static int charAt(String source, int offset16)
          Extract a single UTF-32 value from a string.
static int getCharCount(int char32)
          Determines how many chars this char32 requires.
static char getLeadSurrogate(int char32)
          Returns the lead surrogate.
static char getTrailSurrogate(int char32)
          Returns the trail surrogate.
static boolean isLeadSurrogate(char char16)
          Determines whether the character is a lead surrogate.
static boolean isSurrogate(char char16)
          Determines whether the code value is a surrogate.
static boolean isTrailSurrogate(char char16)
          Determines whether the character is a trail surrogate.
static int moveCodePointOffset(char[] source, int start, int limit, int offset16, int shift32)
          Shifts offset16 by the argument number of codepoints within a subarray.
static String valueOf(int char32)
          Convenience method corresponding to String.valueOf(char).
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

CODEPOINT_MIN_VALUE

public static final int CODEPOINT_MIN_VALUE
The lowest Unicode code point value.

See Also:
Constant Field Values

CODEPOINT_MAX_VALUE

public static final int CODEPOINT_MAX_VALUE
The highest Unicode code point value (scalar value) according to the Unicode Standard.

See Also:
Constant Field Values

SUPPLEMENTARY_MIN_VALUE

public static final int SUPPLEMENTARY_MIN_VALUE
The minimum value for Supplementary code points

See Also:
Constant Field Values

LEAD_SURROGATE_MIN_VALUE

public static final int LEAD_SURROGATE_MIN_VALUE
Lead surrogate minimum value

See Also:
Constant Field Values

TRAIL_SURROGATE_MIN_VALUE

public static final int TRAIL_SURROGATE_MIN_VALUE
Trail surrogate minimum value

See Also:
Constant Field Values

LEAD_SURROGATE_MAX_VALUE

public static final int LEAD_SURROGATE_MAX_VALUE
Lead surrogate maximum value

See Also:
Constant Field Values

TRAIL_SURROGATE_MAX_VALUE

public static final int TRAIL_SURROGATE_MAX_VALUE
Trail surrogate maximum value

See Also:
Constant Field Values

SURROGATE_MIN_VALUE

public static final int SURROGATE_MIN_VALUE
Surrogate minimum value

See Also:
Constant Field Values
Constructor Detail

UTF16

public UTF16()
Method Detail

charAt

public static int charAt(String source,
                         int offset16)
Extract a single UTF-32 value from a string. Used when iterating forwards or backwards (with UTF16.getCharCount(), as well as random access. If a validity check is required, use UCharacter.isLegal() on the return value. If the char retrieved is part of a surrogate pair, its supplementary character will be returned. If a complete supplementary character is not found the incomplete character will be returned

Parameters:
source - array of UTF-16 chars
offset16 - UTF-16 offset to the start of the character.
Returns:
UTF-32 value for the UTF-32 value that contains the char at offset16. The boundaries of that codepoint are the same as in bounds32().
Throws:
IndexOutOfBoundsException - thrown if offset16 is out of bounds.

charAt

public static int charAt(char[] source,
                         int start,
                         int limit,
                         int offset16)
Extract a single UTF-32 value from a substring. Used when iterating forwards or backwards (with UTF16.getCharCount(), as well as random access. If a validity check is required, use UCharacter.isLegal() on the return value. If the char retrieved is part of a surrogate pair, its supplementary character will be returned. If a complete supplementary character is not found the incomplete character will be returned

Parameters:
source - array of UTF-16 chars
start - offset to substring in the source array for analyzing
limit - offset to substring in the source array for analyzing
offset16 - UTF-16 offset relative to start
Returns:
UTF-32 value for the UTF-32 value that contains the char at offset16. The boundaries of that codepoint are the same as in bounds32().
Throws:
IndexOutOfBoundsException - thrown if offset16 is not within the range of start and limit.

getCharCount

public static int getCharCount(int char32)
Determines how many chars this char32 requires. If a validity check is required, use isLegal() on char32 before calling.

Parameters:
char32 - the input codepoint.
Returns:
2 if is in supplementary space, otherwise 1.

isSurrogate

public static boolean isSurrogate(char char16)
Determines whether the code value is a surrogate.

Parameters:
char16 - the input character.
Returns:
true iff the input character is a surrogate.

isTrailSurrogate

public static boolean isTrailSurrogate(char char16)
Determines whether the character is a trail surrogate.

Parameters:
char16 - the input character.
Returns:
true iff the input character is a trail surrogate.

isLeadSurrogate

public static boolean isLeadSurrogate(char char16)
Determines whether the character is a lead surrogate.

Parameters:
char16 - the input character.
Returns:
true iff the input character is a lead surrogate

getLeadSurrogate

public static char getLeadSurrogate(int char32)
Returns the lead surrogate. If a validity check is required, use isLegal() on char32 before calling.

Parameters:
char32 - the input character.
Returns:
lead surrogate if the getCharCount(ch) is 2;
and 0 otherwise (note: 0 is not a valid lead surrogate).

getTrailSurrogate

public static char getTrailSurrogate(int char32)
Returns the trail surrogate. If a validity check is required, use isLegal() on char32 before calling.

Parameters:
char32 - the input character.
Returns:
the trail surrogate if the getCharCount(ch) is 2;
otherwise the character itself

valueOf

public static String valueOf(int char32)
Convenience method corresponding to String.valueOf(char). Returns a one or two char string containing the UTF-32 value in UTF16 format. If a validity check is required, use isLegal() on char32 before calling.

Parameters:
char32 - the input character.
Returns:
string value of char32 in UTF16 format
Throws:
IllegalArgumentException - thrown if char32 is a invalid codepoint.

append

public static StringBuffer append(StringBuffer target,
                                  int char32)
Append a single UTF-32 value to the end of a StringBuffer. If a validity check is required, use isLegal() on char32 before calling.

Parameters:
target - the buffer to append to
char32 - value to append.
Returns:
the updated StringBuffer
Throws:
IllegalArgumentException - thrown when char32 does not lie within the range of the Unicode codepoints

moveCodePointOffset

public static int moveCodePointOffset(char[] source,
                                      int start,
                                      int limit,
                                      int offset16,
                                      int shift32)
Shifts offset16 by the argument number of codepoints within a subarray.

Parameters:
source - char array
start - position of the subarray to be performed on
limit - position of the subarray to be performed on
offset16 - UTF16 position to shift relative to start
shift32 - number of codepoints to shift
Returns:
new shifted offset16 relative to start
Throws:
IndexOutOfBoundsException - if the new offset16 is out of bounds with respect to the subarray or the subarray bounds are out of range.

aicas logo Jamaica 3.4 release 8

aicas GmbH, Karlsruhe - Germany    www.aicas.com
Copyright 2001-2009 aicas GmbH. All Rights Reserved.