Have you ever been wondering what the Unicode value of a special character might
be? Or did you come along 'magic' character codes in some code? Then you might want
to use these Java constants: The UCC, UniCode Constants, are derived from the
Unicode Database. For every
character there is a constant with its official name and corresponding char
or int
value. All characters of version 4.2.0 up to \u1FFFF
are covered except CJK Ideographs.
To install UCC, simply extract the ucc-*.zip and add ucc.jar to your
classpath. UCC is JDK 1.1 compliant and does not depend on any other
libraries. To use characters beyond \u10000
, called code-points,
you need Java 1.5 or newer.
For each Unicode block, e.g. Basic Latin (\u0000..\u007F
) or Aegean
Numbers (\u10100..\u1013F
), there is a separate interface
with the block's name defining all code-points defined in this block. First you need to
import
the blocks.
import unicode.AegeanNumbers; import unicode.BasicLatin; import unicode.NumberForms;
Then you can use the constants in your code:
println("count=" + Character.charCount(BasicLatin.DIGIT_NINE)); // -> 1 println("value=" + Character.getNumericValue(BasicLatin.DIGIT_NINE)); // -> 9 println("count=" + Character.charCount(NumberForms.ROMAN_NUMERAL_FIVE_HUNDRED)); // -> 1 println("value=" + Character.getNumericValue(NumberForms.ROMAN_NUMERAL_FIVE_HUNDRED)); // -> 500 println("count=" + Character.charCount(AegeanNumbers.NUMBER_EIGHT)); // -> 2 println("value=" + Character.getNumericValue(AegeanNumbers.NUMBER_EIGHT)); // -> 8
UCC is Open Source under the GPL license.