Find JSRs
Submit this Search

Ad Banner

JSRs: Java Specification Requests
JSR 204: Unicode Supplementary Character Support

Original Java Specification Request (JSR)

Identification | Request | Contributions | Additional Information

Section 1. Identification

Submitting Member: Sun Microsystems, Inc

Name of Contact Person: Brian Beck

E-Mail Address:

Telephone Number: +1 408 276 7017

Fax Number:

Specification Lead: Masayoshi Okutsu

E-Mail Address:

Telephone Number: +81 45 227 9127

Fax Number:

Initial Expert Group Membership:

Masayoshi Okutsu - Sun Microsystems, Mark Davis - IBM, Craig Cummings - Oracle

Supporting this JSR:

Sun Microsystems

Section 2: Request

2.1 Please describe the proposed Specification:

The proposed specification will define a mechanism to support Supplementary Characters as defined in the Unicode 3.1 specification. The new APIs will most likely be a collection of small extensions to the existing Java class library APIs that seek to preserve and extend the platform's existing character processing model and thus provide compatibility with existing programs.

2.2 What is the target Java platform? (i.e., desktop, server, personal, embedded, card, etc.)

Java 2, Standard Edition (J2SE)

2.3 What need of the Java community will be addressed by the proposed specification?

Support for supplementary characters is required in several important markets, including Japan, China and Hong Kong. These countries have new national standard character sets (e.g., JIS X 0213, GB 18030, HKSCS-2001) requiring mapping to Unicode supplementary characters.

2.4 Why isn't this need met by existing specifications?

Existing Unicode support in the J2SE assumes that Unicode code values can be stored in 16 bits as a single char value. Many APIs accept or return individual char values. Therefore, they can not handle supplementary characters.

2.5 Please give a short description of the underlying technology or technologies:

The Java programming language and APIs use the Unicode standard as the foundation of their character representation. The primitive data type char is defined as a 16-bit Unicode character, and the classes java.lang.Character, java.lang.String and various other classes directly implement text handling following the Unicode standard. Unicode is an evolving standard, and the Java platform has tracked the standard so that it now supports Unicode 3.0 in J2SE 1.4.

The transition to Unicode 3.1 and later, however, presents a special problem: Unicode has given up its first design principle, that of using fixed-width 16-bit characters, in order to allow the representation of more than 65,536 characters (U+0000-U+FFFF). As a coded character set, Unicode now uses character codes up to U+10FFFF. The original range of Unicode character codes, U+0000-U+FFFF, is now often referred to as the Basic Multilingual Plane (BMP). Unicode 3.1 is the first version to assign characters outside the BMP. These characters cannot be represented by individual char values. Characters outside the BMP are called supplementary characters and Planes 1 through 16 are called Supplementary Planes in the Unicode specification.

There are a number of different possible approaches to enable support for supplementary characters, with different trade-offs for compatibility, API ease of use and performance. This JSR will define the overall approach for supporting supplementary characters in the J2SE APIs. Based on the needs of the Java community, full backwards compatibility with earlier J2SE releases will be a strong requirement.

2.6 Is there a proposed package name for the API Specification? (i.e., javapi.something, org.something, etc.)

The JSR will affect some existing text manipulation APIs in the J2SE platform. The primary target will be significant APIs that accept or return individual "char" values, starting with the java.lang.Character class. We do not plan to add any new packages.

2.7 Does the proposed specification have any dependencies on specific operating systems, CPUs, or I/O devices that you know of?


2.8 Are there any security issues that cannot be addressed by the current security model?


2.9 Are there any internationalization or localization issues?

This JSR addresses an important internationalization need - see 2.3.

2.10 Are there any existing specifications that might be rendered obsolete, deprecated, or in need of revision as a result of this work?

Yes - see 2.6.

2.11 Please describe the anticipated schedule for the development of this specification.

The results of this JSR will be included in the J2SE Tiger release.

2.12 Please describe the anticipated working model for the Expert Group working on developing this specification.

We expect that the expert group will primarily select a generic model for handling supplementary characters in Java APIs. Once this generic model has been selected, the design of specific APIs and their implementation should be fairly straightforward.

Since we expect to have a geographically distributed expert group, discussions will be conducted primarily by email.

2.13 Please describe how the RI and TCK will de delivered, i.e. as part of a profile or platform edition, or stand-alone, or both. Include version information for the profile or platform in your answer.

This JSR will be delivered as part of J2SE 1.5 "Tiger".

2.14 Please state the rationale if previous versions are available stand-alone and you are now proposing in 2.13 to only deliver RI and TCK as part of a profile or platform edition (See sections 1.1.5 and 1.1.6 of the JCP 2 document).


2.15 Please provide a description of the business terms for the Specification, RI and TCK that will apply when this JSR is final.

This JSR will be delivered as part of J2SE 1.5 "Tiger". The proposed J2SE 1.5 licensing terms are available at J2SE 1.5 licensing terms.

Section 3: Contributions

3.1 Please list any existing documents, specifications, or implementations that describe the technology. Please include links to the documents if they are publicly available.

Gosling, Joy, Steele, Bracha: The Java Language Specification -

Java 2 Standard Edition API Specification -

The Unicode Consortium: The Unicode Standard Version 3.0. Addison-Wesley, 2000.

Unicode 3.1 -

Unicode Glossary -

3.2 Explanation of how these items might be used as a starting point for the work.

The first two items describe the programming language's relationship to the Unicode standard and specific APIs that will need to be updated. The last two items describe the Unicode Standard in the version that requires the specification updates.

Section 4: Additional Information (Optional)

4.1 This section contains any additional information that the submitting Member wishes to include in the JSR.

A software request for enhancement (RFE) has been submitted as bugid 4533872.