public inbox for gcc-prs@sourceware.org
help / color / mirror / Atom feed
From: jjc@jclark.com
To: gcc-gnats@gcc.gnu.org
Subject: libgcj/9802: Bug in surrogate handling in Unicode to UTF-8 conversion
Date: Sat, 22 Feb 2003 09:56:00 -0000	[thread overview]
Message-ID: <20030222095110.15975.qmail@sources.redhat.com> (raw)


>Number:         9802
>Category:       libgcj
>Synopsis:       Bug in surrogate handling in Unicode to UTF-8 conversion
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    unassigned
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sat Feb 22 09:56:01 UTC 2003
>Closed-Date:
>Last-Modified:
>Originator:     jjc@jclark.com
>Release:        gcc version 3.3 20030217 (prerelease)
>Organization:
>Environment:
Red Hat Linux 8.0
>Description:
The following program

class Bug {
    static public char surrogate1(int c) {
	return (char)(((c - 0x10000) >> 10) | 0xD800);
    }
    static public char surrogate2(int c) {
      return (char)(((c - 0x10000) & 0x3FF) | 0xDC00);
    }

    static public void main(String[] args) throws java.io.UnsupportedEncodingException {
	int ch = 0x10300;
	char[] v = new char[2];
	v[0] = surrogate1(ch);
	v[1] = surrogate2(ch);
	String str = new String(v);
	str.getBytes("UTF-8");
    }
}

when compiled and executed throws an exception

Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 2
   at gnu.gcj.convert.Output_UTF8.write(char[], int, int) (/home/jjc/gcc/lib/libgcj.so.4.0.0)
   at gnu.gcj.convert.UnicodeToBytes.write(java.lang.String, int, int, char[]) (/home/jjc/gcc/lib/libgcj.so.4.0.0)
   at java.lang.String.getBytes(java.lang.String) (/home/jjc/gcc/lib/libgcj.so.4.0.0)
   at Bug.main(java.lang.String[]) (Unknown Source)
 
>How-To-Repeat:

>Fix:
I haven't tested this, but I suspect the following should fix it:

*** gcc/libjava/gnu/gcj/convert/Output_UTF8.java~	2000-08-09 00:35:32.000000000 +0700
--- gcc/libjava/gnu/gcj/convert/Output_UTF8.java	2003-02-22 16:38:52.000000000 +0700
***************
*** 104,109 ****
--- 104,110 ----
  	      {
  		value = (hi_part - 0xD800) * 0x400 + (ch - 0xDC00) + 0x10000;
  		buf[count++] = (byte) (0xF0 | (value >> 18));
+ 		avail--
  		bytes_todo = 3;
  		hi_part = 0;
  	      }


>Release-Note:
>Audit-Trail:
>Unformatted:


             reply	other threads:[~2003-02-22  9:56 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-02-22  9:56 jjc [this message]
2003-02-22 13:46 Mark Wielaard
2003-02-22 14:56 James Clark

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20030222095110.15975.qmail@sources.redhat.com \
    --to=jjc@jclark.com \
    --cc=gcc-gnats@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).