aicas logoJamaica 3.2 release 62

sun.text.normalizer
Class UCharacter

java.lang.Object
  extended by sun.text.normalizer.UCharacter

public final class UCharacter
extends Object

The UCharacter class provides extensions to the java.lang.Character class. These extensions provide support for Unicode 3.2 properties and together with the UTF16 class, provide support for supplementary characters (those with code points above U+FFFF).

Code points are represented in these API using ints. While it would be more convenient in Java to have a separate primitive datatype for them, ints suffice in the meantime.

To use this class please add the jar file name icu4j.jar to the class path, since it contains data files which supply the information used by this file.
E.g. In Windows
set CLASSPATH=%CLASSPATH%;$JAR_FILE_PATH/ucharacter.jar.
Otherwise, another method would be to copy the files uprops.dat and unames.icu from the icu4j source subdirectory $ICU4J_SRC/src/com.ibm.icu.impl.data to your class directory $ICU4J_CLASS/com.ibm.icu.impl.data.

Aside from the additions for UTF-16 support, and the updated Unicode 3.1 properties, the main differences between UCharacter and Character are:

Further detail differences can be determined from the program com.ibm.icu.dev.test.lang.UCharacterCompare

This class is not subclassable

See Also:
com.ibm.icu.lang.UCharacterEnums

Nested Class Summary
static interface UCharacter.ECharacterCategory
          Deprecated. This is a draft API and might change in a future release of ICU.
static interface UCharacter.HangulSyllableType
          Hangul Syllable Type constants.
static interface UCharacter.NumericType
          Numeric Type constants.
 
Field Summary
static int MAX_VALUE
          The highest Unicode code point value (scalar value) according to the Unicode Standard.
static int MIN_VALUE
          The lowest Unicode code point value.
static double NO_NUMERIC_VALUE
          Special value that is returned by getUnicodeNumericValue(int) when no numeric value is defined for a code point.
static int SUPPLEMENTARY_MIN_VALUE
          The minimum value for Supplementary code points
 
Method Summary
static int digit(int ch, int radix)
          Retrieves the numeric value of a decimal digit code point.
static String foldCase(String str, boolean defaultmapping)
          The given string is mapped to its case folding equivalent according to UnicodeData.txt and CaseFolding.txt; if any character has no case folding equivalent, the character itself is returned.
static VersionInfo getAge(int ch)
          Get the "age" of the code point.
static int getCodePoint(char lead, char trail)
          Returns a code point corresponding to the two UTF16 characters.
static int getDirection(int ch)
          Returns the Bidirection property of a code point.
static int getIntPropertyValue(int ch, int type)
          Gets the property value for an Unicode property type of a code point.
static int getType(int ch)
          Returns a value indicating a code point's Unicode category.
static double getUnicodeNumericValue(int ch)
          Get the numeric value for a Unicode code point as defined in the Unicode Character Database.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

MIN_VALUE

public static final int MIN_VALUE
The lowest Unicode code point value.

See Also:
Constant Field Values

MAX_VALUE

public static final int MAX_VALUE
The highest Unicode code point value (scalar value) according to the Unicode Standard. This is a 21-bit value (21 bits, rounded up).
Up-to-date Unicode implementation of java.lang.Character.MIN_VALUE

See Also:
Constant Field Values

SUPPLEMENTARY_MIN_VALUE

public static final int SUPPLEMENTARY_MIN_VALUE
The minimum value for Supplementary code points

See Also:
Constant Field Values

NO_NUMERIC_VALUE

public static final double NO_NUMERIC_VALUE
Special value that is returned by getUnicodeNumericValue(int) when no numeric value is defined for a code point.

See Also:
getUnicodeNumericValue(int), Constant Field Values
Method Detail

digit

public static int digit(int ch,
                        int radix)
Retrieves the numeric value of a decimal digit code point.
This method observes the semantics of java.lang.Character.digit(). Note that this will return positive values for code points for which isDigit returns false, just like java.lang.Character.
Semantic Change: In release 1.3.1 and prior, this did not treat the European letters as having a digit value, and also treated numeric letters and other numbers as digits. This has been changed to conform to the java semantics.
A code point is a valid digit if and only if:

Parameters:
ch - the code point to query
radix - the radix
Returns:
the numeric value represented by the code point in the specified radix, or -1 if the code point is not a decimal digit or if its value is too large for the radix

getUnicodeNumericValue

public static double getUnicodeNumericValue(int ch)

Get the numeric value for a Unicode code point as defined in the Unicode Character Database.

A "double" return type is necessary because some numeric values are fractions, negative, or too large for int.

For characters without any numeric values in the Unicode Character Database, this function will return NO_NUMERIC_VALUE.

API Change: In release 2.2 and prior, this API has a return type int and returns -1 when the argument ch does not have a corresponding numeric value. This has been changed to synch with ICU4C

This corresponds to the ICU4C function u_getNumericValue.

Parameters:
ch - Code point to get the numeric value for.
Returns:
numeric value of ch, or NO_NUMERIC_VALUE if none is defined.

getType

public static int getType(int ch)
Returns a value indicating a code point's Unicode category. Up-to-date Unicode implementation of java.lang.Character.getType() except for the above mentioned code points that had their category changed.
Return results are constants from the interface UCharacterCategory
NOTE: the UCharacterCategory values are not compatible with those returned by java.lang.Character.getType. UCharacterCategory values match the ones used in ICU4C, while java.lang.Character type values, though similar, skip the value 17.

Parameters:
ch - code point whose type is to be determined
Returns:
category which is a value of UCharacterCategory

getCodePoint

public static int getCodePoint(char lead,
                               char trail)
Returns a code point corresponding to the two UTF16 characters.

Parameters:
lead - the lead char
trail - the trail char
Returns:
code point if surrogate characters are valid.
Throws:
IllegalArgumentException - thrown when argument characters do not form a valid codepoint

getDirection

public static int getDirection(int ch)
Returns the Bidirection property of a code point. For example, 0x0041 (letter A) has the LEFT_TO_RIGHT directional property.
Result returned belongs to the interface UCharacterDirection

Parameters:
ch - the code point to be determined its direction
Returns:
direction constant from UCharacterDirection.

foldCase

public static String foldCase(String str,
                              boolean defaultmapping)
The given string is mapped to its case folding equivalent according to UnicodeData.txt and CaseFolding.txt; if any character has no case folding equivalent, the character itself is returned. "Full", multiple-code point case folding mappings are returned here. For "simple" single-code point mappings use the API foldCase(int ch, boolean defaultmapping).

Parameters:
str - the String to be converted
defaultmapping - Indicates if all mappings defined in CaseFolding.txt is to be used, otherwise the mappings for dotted I and dotless i marked with 'I' in CaseFolding.txt will be skipped.
Returns:
the case folding equivalent of the character, if any; otherwise the character itself.
See Also:
#foldCase(int, boolean)

getAge

public static VersionInfo getAge(int ch)

Get the "age" of the code point.

The "age" is the Unicode version when the code point was first designated (as a non-character or for Private Use) or assigned a character.

This can be useful to avoid emitting code points to receiving processes that do not accept newer characters.

The data is from the UCD file DerivedAge.txt.

Parameters:
ch - The code point.
Returns:
the Unicode version number

getIntPropertyValue

public static int getIntPropertyValue(int ch,
                                      int type)

Gets the property value for an Unicode property type of a code point. Also returns binary and mask property values.

Unicode, especially in version 3.2, defines many more properties than the original set in UnicodeData.txt.

The properties APIs are intended to reflect Unicode properties as defined in the Unicode Character Database (UCD) and Unicode Technical Reports (UTR). For details about the properties see http://www.unicode.org/.

For names of Unicode properties see the UCD file PropertyAliases.txt.

 Sample usage:
 int ea = UCharacter.getIntPropertyValue(c, UProperty.EAST_ASIAN_WIDTH);
 int ideo = UCharacter.getIntPropertyValue(c, UProperty.IDEOGRAPHIC);
 boolean b = (ideo == 1) ? true : false;
 

Parameters:
ch - code point to test.
type - UProperty selector constant, identifies which binary property to check. Must be UProperty.BINARY_START <= type < UProperty.BINARY_LIMIT or UProperty.INT_START <= type < UProperty.INT_LIMIT or UProperty.MASK_START <= type < UProperty.MASK_LIMIT.
Returns:
numeric value that is directly the property value or, for enumerated properties, corresponds to the numeric value of the enumerated constant of the respective property value enumeration type (cast to enum type if necessary). Returns 0 or 1 (for false / true) for binary Unicode properties. Returns a bit-mask for mask properties. Returns 0 if 'type' is out of bounds or if the Unicode version does not have data for the property at all, or not for this code point.
See Also:
UProperty, #hasBinaryProperty, #getIntPropertyMinValue, #getIntPropertyMaxValue, #getUnicodeVersion

aicas logoJamaica 3.2 release 62

aicas GmbH, Karlsruhe - Germany    www.aicas.com
Copyright 2001-2008 aicas GmbH. All Rights Reserved.