|
|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectsun.text.normalizer.UTF16
public final class UTF16
Standalone utility class providing UTF16 character conversions and indexing conversions.
Code that uses strings alone rarely need modification.
By design, UTF-16 does not allow overlap, so searching for strings is a safe
operation. Similarly, concatenation is always safe. Substringing is safe if
the start and end are both on UTF-32 boundaries. In normal code, the values
for start and end are on those boundaries, since they arose from operations
like searching. If not, the nearest UTF-32 boundaries can be determined
using bounds().
The following examples illustrate use of some of these methods.
// iteration forwards: Original
for (int i = 0; i < s.length(); ++i) {
char ch = s.charAt(i);
doSomethingWith(ch);
}
// iteration forwards: Changes for UTF-32
int ch;
for (int i = 0; i < s.length(); i+=UTF16.getCharCount(ch)) {
ch = UTF16.charAt(s,i);
doSomethingWith(ch);
}
// iteration backwards: Original
for (int i = s.length() -1; i >= 0; --i) {
char ch = s.charAt(i);
doSomethingWith(ch);
}
// iteration backwards: Changes for UTF-32
int ch;
for (int i = s.length() -1; i > 0; i-=UTF16.getCharCount(ch)) {
ch = UTF16.charAt(s,i);
doSomethingWith(ch);
}
Notes:
Lead and Trail in the API, which gives a better
sense of their ordering in a string. offset16 and
offset32 are used to distinguish offsets to UTF-16
boundaries vs offsets to UTF-32 boundaries. int char32 is
used to contain UTF-32 characters, as opposed to char16,
which is a UTF-16 code unit.
bounds(string, offset16) != TRAIL.
UCharacter.isLegal() can be used to check
for validity if desired.
| Field Summary | |
|---|---|
static int |
CODEPOINT_MAX_VALUE
The highest Unicode code point value (scalar value) according to the Unicode Standard. |
static int |
CODEPOINT_MIN_VALUE
The lowest Unicode code point value. |
static int |
LEAD_SURROGATE_MAX_VALUE
Lead surrogate maximum value |
static int |
LEAD_SURROGATE_MIN_VALUE
Lead surrogate minimum value |
static int |
SUPPLEMENTARY_MIN_VALUE
The minimum value for Supplementary code points |
static int |
SURROGATE_MIN_VALUE
Surrogate minimum value |
static int |
TRAIL_SURROGATE_MAX_VALUE
Trail surrogate maximum value |
static int |
TRAIL_SURROGATE_MIN_VALUE
Trail surrogate minimum value |
| Constructor Summary | |
|---|---|
UTF16()
|
|
| Method Summary | |
|---|---|
static StringBuffer |
append(StringBuffer target,
int char32)
Append a single UTF-32 value to the end of a StringBuffer. |
static int |
charAt(char[] source,
int start,
int limit,
int offset16)
Extract a single UTF-32 value from a substring. |
static int |
charAt(String source,
int offset16)
Extract a single UTF-32 value from a string. |
static int |
getCharCount(int char32)
Determines how many chars this char32 requires. |
static char |
getLeadSurrogate(int char32)
Returns the lead surrogate. |
static char |
getTrailSurrogate(int char32)
Returns the trail surrogate. |
static boolean |
isLeadSurrogate(char char16)
Determines whether the character is a lead surrogate. |
static boolean |
isSurrogate(char char16)
Determines whether the code value is a surrogate. |
static boolean |
isTrailSurrogate(char char16)
Determines whether the character is a trail surrogate. |
static int |
moveCodePointOffset(char[] source,
int start,
int limit,
int offset16,
int shift32)
Shifts offset16 by the argument number of codepoints within a subarray. |
static String |
valueOf(int char32)
Convenience method corresponding to String.valueOf(char). |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Field Detail |
|---|
public static final int CODEPOINT_MIN_VALUE
public static final int CODEPOINT_MAX_VALUE
public static final int SUPPLEMENTARY_MIN_VALUE
public static final int LEAD_SURROGATE_MIN_VALUE
public static final int TRAIL_SURROGATE_MIN_VALUE
public static final int LEAD_SURROGATE_MAX_VALUE
public static final int TRAIL_SURROGATE_MAX_VALUE
public static final int SURROGATE_MIN_VALUE
| Constructor Detail |
|---|
public UTF16()
| Method Detail |
|---|
public static int charAt(String source,
int offset16)
UTF16.getCharCount(), as well as random access. If a
validity check is required, use
UCharacter.isLegal() on the return value.
If the char retrieved is part of a surrogate pair, its supplementary
character will be returned. If a complete supplementary character is
not found the incomplete character will be returned
source - array of UTF-16 charsoffset16 - UTF-16 offset to the start of the character.
bounds32().
IndexOutOfBoundsException - thrown if offset16 is out of
bounds.
public static int charAt(char[] source,
int start,
int limit,
int offset16)
UTF16.getCharCount(), as well as random access. If a
validity check is required, use
UCharacter.isLegal()
on the return value.
If the char retrieved is part of a surrogate pair, its supplementary
character will be returned. If a complete supplementary character is
not found the incomplete character will be returned
source - array of UTF-16 charsstart - offset to substring in the source array for analyzinglimit - offset to substring in the source array for analyzingoffset16 - UTF-16 offset relative to start
bounds32().
IndexOutOfBoundsException - thrown if offset16 is not within
the range of start and limit.public static int getCharCount(int char32)
isLegal() on
char32 before calling.
char32 - the input codepoint.
public static boolean isSurrogate(char char16)
char16 - the input character.
public static boolean isTrailSurrogate(char char16)
char16 - the input character.
public static boolean isLeadSurrogate(char char16)
char16 - the input character.
public static char getLeadSurrogate(int char32)
isLegal()
on char32 before calling.
char32 - the input character.
public static char getTrailSurrogate(int char32)
isLegal()
on char32 before calling.
char32 - the input character.
public static String valueOf(int char32)
isLegal()
on char32 before calling.
char32 - the input character.
IllegalArgumentException - thrown if char32 is a invalid
codepoint.
public static StringBuffer append(StringBuffer target,
int char32)
isLegal()
on char32 before calling.
target - the buffer to append tochar32 - value to append.
IllegalArgumentException - thrown when char32 does not lie
within the range of the Unicode codepoints
public static int moveCodePointOffset(char[] source,
int start,
int limit,
int offset16,
int shift32)
source - char arraystart - position of the subarray to be performed onlimit - position of the subarray to be performed onoffset16 - UTF16 position to shift relative to startshift32 - number of codepoints to shift
IndexOutOfBoundsException - if the new offset16 is out of
bounds with respect to the subarray or the subarray bounds
are out of range.
|
|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||