aicas logoJamaica 3.2 release 62

sun.text.normalizer
Class Trie

java.lang.Object
  extended by sun.text.normalizer.Trie
Direct Known Subclasses:
CharTrie, IntTrie

public abstract class Trie
extends Object

A trie is a kind of compressed, serializable table of values associated with Unicode code points (0..0x10ffff).

This class defines the basic structure of a trie and provides methods to retrieve the offsets to the actual data.

Data will be the form of an array of basic types, char or int.

The actual data format will have to be specified by the user in the inner static interface com.ibm.icu.impl.Trie.DataManipulate.

This trie implementation is optimized for getting offset while walking forward through a UTF-16 string. Therefore, the simplest and fastest access macros are the fromLead() and fromOffsetTrail() methods. The fromBMP() method are a little more complicated; they get offsets even for lead surrogate codepoints, while the fromLead() method get special "folded" offsets for lead surrogate code units if there is relevant data associated with them. From such a folded offsets, an offset needs to be extracted to supply to the fromOffsetTrail() methods. To handle such supplementary codepoints, some offset information are kept in the data.

Methods in com.ibm.icu.impl.Trie.DataManipulate are called to retrieve that offset from the folded value for the lead surrogate unit.

For examples of use, see com.ibm.icu.impl.CharTrie or com.ibm.icu.impl.IntTrie.

Since:
release 2.1, Jan 01 2002
See Also:
com.ibm.icu.impl.CharTrie, com.ibm.icu.impl.IntTrie

Nested Class Summary
static interface Trie.DataManipulate
          Character data in com.ibm.impl.Trie have different user-specified format for different purposes.
 
Field Summary
protected static int INDEX_STAGE_1_SHIFT_
          Shift size for shifting right the input index. 1..9
protected static int INDEX_STAGE_2_SHIFT_
          Shift size for shifting left the index array values.
protected static int INDEX_STAGE_3_MASK_
          Mask for getting the lower bits from the input index.
protected static int LEAD_INDEX_OFFSET_
          Lead surrogate code points' index displacement in the index array. 0x10000-0xd800=0x2800 0x2800 >> INDEX_STAGE_1_SHIFT_
protected  int m_dataLength_
          Length of the data array
protected  Trie.DataManipulate m_dataManipulate_
          Internal TrieValue which handles the parsing of the data value.
protected  int m_dataOffset_
          Start index of the data portion of the trie.
protected  char[] m_index_
          Index or UTF16 characters
protected static int SURROGATE_MASK_
          Surrogate mask to use when shifting offset to retrieve supplementary values
 
Constructor Summary
protected Trie(char[] index, int options, Trie.DataManipulate dataManipulate)
          Trie constructor
protected Trie(InputStream inputStream, Trie.DataManipulate dataManipulate)
          Trie constructor for CharTrie use.
 
Method Summary
protected  int getBMPOffset(char ch)
          Gets the offset to data which the BMP character points to Treats a lead surrogate as a normal code point.
protected  int getCodePointOffset(int ch)
          Internal trie getter from a code point.
protected abstract  int getInitialValue()
          Gets the default initial value
protected  int getLeadOffset(char ch)
          Gets the offset to the data which this lead surrogate character points to.
protected  int getRawOffset(int offset, char ch)
          Gets the offset to the data which the index ch after variable offset points to.
protected abstract  int getSurrogateOffset(char lead, char trail)
          Gets the offset to the data which the surrogate pair points to.
protected abstract  int getValue(int index)
          Gets the value at the argument index
protected  boolean isCharTrie()
          Determines if this is a 16 bit trie
protected  boolean isIntTrie()
          Determines if this is a 32 bit trie
protected  void unserialize(InputStream inputStream)
          Parses the inputstream and creates the trie index with it.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

LEAD_INDEX_OFFSET_

protected static final int LEAD_INDEX_OFFSET_
Lead surrogate code points' index displacement in the index array. 0x10000-0xd800=0x2800 0x2800 >> INDEX_STAGE_1_SHIFT_

See Also:
Constant Field Values

INDEX_STAGE_1_SHIFT_

protected static final int INDEX_STAGE_1_SHIFT_
Shift size for shifting right the input index. 1..9

See Also:
Constant Field Values

INDEX_STAGE_2_SHIFT_

protected static final int INDEX_STAGE_2_SHIFT_
Shift size for shifting left the index array values. Increases possible data size with 16-bit index values at the cost of compactability. This requires blocks of stage 2 data to be aligned by DATA_GRANULARITY. 0..INDEX_STAGE_1_SHIFT

See Also:
Constant Field Values

INDEX_STAGE_3_MASK_

protected static final int INDEX_STAGE_3_MASK_
Mask for getting the lower bits from the input index. DATA_BLOCK_LENGTH_ - 1.

See Also:
Constant Field Values

SURROGATE_MASK_

protected static final int SURROGATE_MASK_
Surrogate mask to use when shifting offset to retrieve supplementary values

See Also:
Constant Field Values

m_index_

protected char[] m_index_
Index or UTF16 characters


m_dataManipulate_

protected Trie.DataManipulate m_dataManipulate_
Internal TrieValue which handles the parsing of the data value. This class is to be implemented by the user


m_dataOffset_

protected int m_dataOffset_
Start index of the data portion of the trie. CharTrie combines index and data into a char array, so this is used to indicate the initial offset to the data portion. Note this index always points to the initial value.


m_dataLength_

protected int m_dataLength_
Length of the data array

Constructor Detail

Trie

protected Trie(InputStream inputStream,
               Trie.DataManipulate dataManipulate)
        throws IOException
Trie constructor for CharTrie use.

Parameters:
inputStream - ICU data file input stream which contains the trie
dataManipulate - object containing the information to parse the trie data
Throws:
IOException - thrown when input stream does not have the right header.

Trie

protected Trie(char[] index,
               int options,
               Trie.DataManipulate dataManipulate)
Trie constructor

Parameters:
index - array to be used for index
options - used by the trie
dataManipulate - object containing the information to parse the trie data
Method Detail

getSurrogateOffset

protected abstract int getSurrogateOffset(char lead,
                                          char trail)
Gets the offset to the data which the surrogate pair points to.

Parameters:
lead - lead surrogate
trail - trailing surrogate
Returns:
offset to data

getValue

protected abstract int getValue(int index)
Gets the value at the argument index

Parameters:
index - value at index will be retrieved
Returns:
32 bit value

getInitialValue

protected abstract int getInitialValue()
Gets the default initial value

Returns:
32 bit value

getRawOffset

protected final int getRawOffset(int offset,
                                 char ch)
Gets the offset to the data which the index ch after variable offset points to. Note for locating a non-supplementary character data offset, calling

getRawOffset(0, ch);

will do. Otherwise if it is a supplementary character formed by surrogates lead and trail. Then we would have to call getRawOffset() with getFoldingIndexOffset(). See getSurrogateOffset().

Parameters:
offset - index offset which ch is to start from
ch - index to be used after offset
Returns:
offset to the data

getBMPOffset

protected final int getBMPOffset(char ch)
Gets the offset to data which the BMP character points to Treats a lead surrogate as a normal code point.

Parameters:
ch - BMP character
Returns:
offset to data

getLeadOffset

protected final int getLeadOffset(char ch)
Gets the offset to the data which this lead surrogate character points to. Data at the returned offset may contain folding offset information for the next trailing surrogate character.

Parameters:
ch - lead surrogate character
Returns:
offset to data

getCodePointOffset

protected final int getCodePointOffset(int ch)
Internal trie getter from a code point. Could be faster(?) but longer with if((c32)<=0xd7ff) { (result)=_TRIE_GET_RAW(trie, data, 0, c32); } Gets the offset to data which the codepoint points to

Parameters:
ch - codepoint
Returns:
offset to data

unserialize

protected void unserialize(InputStream inputStream)
                    throws IOException

Parses the inputstream and creates the trie index with it.

This is overwritten by the child classes.

Parameters:
inputStream - input stream containing the trie information
Throws:
IOException - thrown when data reading fails.

isIntTrie

protected final boolean isIntTrie()
Determines if this is a 32 bit trie

Returns:
true if options specifies this is a 32 bit trie

isCharTrie

protected final boolean isCharTrie()
Determines if this is a 16 bit trie

Returns:
true if this is a 16 bit trie

aicas logoJamaica 3.2 release 62

aicas GmbH, Karlsruhe - Germany    www.aicas.com
Copyright 2001-2008 aicas GmbH. All Rights Reserved.