public inbox for java-prs@sourceware.org
help / color / mirror / Atom feed
* [Bug java/14687] New: Incorrect UTF-8 byte->String conversion
@ 2004-03-23  4:44 joeclark at iastate dot edu
  2004-03-23 17:28 ` [Bug java/14687] " tromey at gcc dot gnu dot org
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: joeclark at iastate dot edu @ 2004-03-23  4:44 UTC (permalink / raw)
  To: java-prs

The following code snippet, when given a valid sequence of ASCII bytes, works as
expected on Sun's JDK. However, with gcj 3.3.1 (Cygwin/mingw special), the
resulting program returns only "????" as the result string.  The same code using
"iso-8859-1" encoding works on both platforms.

try {
  String byteStr = new String(bytes, "UTF-8");
  System.out.println("byteStr = " + byteStr);
}

This problem was reported on the gcc java mailing list here:
http://gcc.gnu.org/ml/java/2003-09/msg00116.html.  However, apparently the
problem still exists, and the thread didn't end with any plans to fix the problem.

-- 
           Summary: Incorrect UTF-8 byte->String conversion
           Product: gcc
           Version: 3.3.1
            Status: UNCONFIRMED
          Severity: normal
          Priority: P2
         Component: java
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: joeclark at iastate dot edu
                CC: gcc-bugs at gcc dot gnu dot org,java-prs at gcc dot gnu
                    dot org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=14687


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug java/14687] Incorrect UTF-8 byte->String conversion
  2004-03-23  4:44 [Bug java/14687] New: Incorrect UTF-8 byte->String conversion joeclark at iastate dot edu
@ 2004-03-23 17:28 ` tromey at gcc dot gnu dot org
  2004-06-26 20:39 ` pinskia at gcc dot gnu dot org
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: tromey at gcc dot gnu dot org @ 2004-03-23 17:28 UTC (permalink / raw)
  To: java-prs


------- Additional Comments From tromey at gcc dot gnu dot org  2004-03-23 17:28 -------
What is the sequence of bytes you feed in?

Our ASCII converter only handles bytes from 0x0 to 0x7f.
If you want to use bytes outside this range, you must
use the proper encoding.  I suspect this is why 8859-1
works for you.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=14687


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug java/14687] Incorrect UTF-8 byte->String conversion
  2004-03-23  4:44 [Bug java/14687] New: Incorrect UTF-8 byte->String conversion joeclark at iastate dot edu
  2004-03-23 17:28 ` [Bug java/14687] " tromey at gcc dot gnu dot org
@ 2004-06-26 20:39 ` pinskia at gcc dot gnu dot org
  2004-06-26 22:10 ` joeclark at iastate dot edu
  2004-06-27  0:43 ` [Bug libgcj/14687] [win32] " pinskia at gcc dot gnu dot org
  3 siblings, 0 replies; 5+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2004-06-26 20:39 UTC (permalink / raw)
  To: java-prs


------- Additional Comments From pinskia at gcc dot gnu dot org  2004-06-26 20:35 -------
Not a bug, Sun's JDK is just less strict at what it accepts for ASCII.

-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |RESOLVED
         Resolution|                            |INVALID


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=14687


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug java/14687] Incorrect UTF-8 byte->String conversion
  2004-03-23  4:44 [Bug java/14687] New: Incorrect UTF-8 byte->String conversion joeclark at iastate dot edu
  2004-03-23 17:28 ` [Bug java/14687] " tromey at gcc dot gnu dot org
  2004-06-26 20:39 ` pinskia at gcc dot gnu dot org
@ 2004-06-26 22:10 ` joeclark at iastate dot edu
  2004-06-27  0:43 ` [Bug libgcj/14687] [win32] " pinskia at gcc dot gnu dot org
  3 siblings, 0 replies; 5+ messages in thread
From: joeclark at iastate dot edu @ 2004-06-26 22:10 UTC (permalink / raw)
  To: java-prs


------- Additional Comments From joeclark at iastate dot edu  2004-06-26 21:56 -------
Okay, I've attached a java file that illustrates the bug.  For me, the UTF
character set does *NOT* work for *ANY* ASCII characters.  Perhaps I'm entering
the bytes wrong or something; if this is the case, the attached code should make
my error clear.

When I run this with Sun's JDK, I get the same value for all 127 values.  With
gcj, *every* UTF-8 conversion results in a "?" string; here's a quick snippet:

b 109 --> utf ? & iso m
b 110 --> utf ? & iso n
b 111 --> utf ? & iso o

-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |UNCONFIRMED
         Resolution|INVALID                     |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=14687


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug libgcj/14687] [win32] Incorrect UTF-8 byte->String conversion
  2004-03-23  4:44 [Bug java/14687] New: Incorrect UTF-8 byte->String conversion joeclark at iastate dot edu
                   ` (2 preceding siblings ...)
  2004-06-26 22:10 ` joeclark at iastate dot edu
@ 2004-06-27  0:43 ` pinskia at gcc dot gnu dot org
  3 siblings, 0 replies; 5+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2004-06-27  0:43 UTC (permalink / raw)
  To: java-prs


------- Additional Comments From pinskia at gcc dot gnu dot org  2004-06-26 22:10 -------
Works for me on i686-pc-linux-gnu, maybe this is Windows bug.

-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
          Component|java                        |libgcj
            Summary|Incorrect UTF-8 byte->String|[win32] Incorrect UTF-8
                   |conversion                  |byte->String conversion


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=14687


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2004-06-26 22:10 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-03-23  4:44 [Bug java/14687] New: Incorrect UTF-8 byte->String conversion joeclark at iastate dot edu
2004-03-23 17:28 ` [Bug java/14687] " tromey at gcc dot gnu dot org
2004-06-26 20:39 ` pinskia at gcc dot gnu dot org
2004-06-26 22:10 ` joeclark at iastate dot edu
2004-06-27  0:43 ` [Bug libgcj/14687] [win32] " pinskia at gcc dot gnu dot org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).