From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-prs-return-29880-listarch-gcc-prs=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 13382 invoked by alias); 22 Feb 2003 13:46:00 -0000
Mailing-List: contact gcc-prs-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Archive: <http://gcc.gnu.org/ml/gcc-prs/>
List-Post: <mailto:gcc-prs@gcc.gnu.org>
List-Help: <mailto:gcc-prs-help@gcc.gnu.org>
Sender: gcc-prs-owner@gcc.gnu.org
Received: (qmail 13362 invoked by uid 71); 22 Feb 2003 13:46:00 -0000
Date: Sat, 22 Feb 2003 13:46:00 -0000
Message-ID: <20030222134600.13361.qmail@sources.redhat.com>
To: nobody@gcc.gnu.org
Cc: gcc-prs@gcc.gnu.org,
From: Mark Wielaard <mark@klomp.org>
Subject: Re: libgcj/9802: Bug in surrogate handling in Unicode to UTF-8
	conversion
Reply-To: Mark Wielaard <mark@klomp.org>
X-SW-Source: 2003-02/txt/msg01117.txt.bz2
List-Id: <gcc-prs.sourceware.org>

The following reply was made to PR libgcj/9802; it has been noted by GNATS.

From: Mark Wielaard <mark@klomp.org>
To: gcc-gnats@gcc.gnu.org, jjc@jclark.com, java-prs@gcc.gnu.org,  gcc-bugs@gcc.gnu.org, nobody@gcc.gnu.org, gcc-prs@gcc.gnu.org
Cc:  
Subject: Re: libgcj/9802: Bug in surrogate handling in Unicode to UTF-8
	conversion
Date: 22 Feb 2003 14:38:56 +0100

 Thanks for the bug report.
 Your suggested fix seems obviously correct and I verified that making
 sure that avail is always decremented makes String.getBytes("UTF-8")
 work (read not throw an ArrayIndexOutOfBoundException).
 
 But while creating a test case I noticed that for your example we return
 two bytes: {0xf0, 0x90} but other implementations return four bytes
 {0xf0, 0x90, 0x8c, 0x80}. I don't know enough of Unicode and UTF-8
 encoding to know what is correct or why.
 
 If someone has a quick reference to the relevant definitions and/or a
 testsuite for these kind of things that would be higly appreciated.
 
 http://gcc.gnu.org/cgi-bin/gnatsweb.pl?cmd=view%20audit-trail&database=gcc&pr=9802