public inbox for java-prs@sourceware.org
help / color / mirror / Atom feed
* [Bug java/14687] New: Incorrect UTF-8 byte->String conversion
@ 2004-03-23 4:44 joeclark at iastate dot edu
2004-03-23 17:28 ` [Bug java/14687] " tromey at gcc dot gnu dot org
` (3 more replies)
0 siblings, 4 replies; 5+ messages in thread
From: joeclark at iastate dot edu @ 2004-03-23 4:44 UTC (permalink / raw)
To: java-prs
The following code snippet, when given a valid sequence of ASCII bytes, works as
expected on Sun's JDK. However, with gcj 3.3.1 (Cygwin/mingw special), the
resulting program returns only "????" as the result string. The same code using
"iso-8859-1" encoding works on both platforms.
try {
String byteStr = new String(bytes, "UTF-8");
System.out.println("byteStr = " + byteStr);
}
This problem was reported on the gcc java mailing list here:
http://gcc.gnu.org/ml/java/2003-09/msg00116.html. However, apparently the
problem still exists, and the thread didn't end with any plans to fix the problem.
--
Summary: Incorrect UTF-8 byte->String conversion
Product: gcc
Version: 3.3.1
Status: UNCONFIRMED
Severity: normal
Priority: P2
Component: java
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: joeclark at iastate dot edu
CC: gcc-bugs at gcc dot gnu dot org,java-prs at gcc dot gnu
dot org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=14687
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Bug java/14687] Incorrect UTF-8 byte->String conversion
2004-03-23 4:44 [Bug java/14687] New: Incorrect UTF-8 byte->String conversion joeclark at iastate dot edu
@ 2004-03-23 17:28 ` tromey at gcc dot gnu dot org
2004-06-26 20:39 ` pinskia at gcc dot gnu dot org
` (2 subsequent siblings)
3 siblings, 0 replies; 5+ messages in thread
From: tromey at gcc dot gnu dot org @ 2004-03-23 17:28 UTC (permalink / raw)
To: java-prs
------- Additional Comments From tromey at gcc dot gnu dot org 2004-03-23 17:28 -------
What is the sequence of bytes you feed in?
Our ASCII converter only handles bytes from 0x0 to 0x7f.
If you want to use bytes outside this range, you must
use the proper encoding. I suspect this is why 8859-1
works for you.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=14687
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Bug java/14687] Incorrect UTF-8 byte->String conversion
2004-03-23 4:44 [Bug java/14687] New: Incorrect UTF-8 byte->String conversion joeclark at iastate dot edu
2004-03-23 17:28 ` [Bug java/14687] " tromey at gcc dot gnu dot org
@ 2004-06-26 20:39 ` pinskia at gcc dot gnu dot org
2004-06-26 22:10 ` joeclark at iastate dot edu
2004-06-27 0:43 ` [Bug libgcj/14687] [win32] " pinskia at gcc dot gnu dot org
3 siblings, 0 replies; 5+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2004-06-26 20:39 UTC (permalink / raw)
To: java-prs
------- Additional Comments From pinskia at gcc dot gnu dot org 2004-06-26 20:35 -------
Not a bug, Sun's JDK is just less strict at what it accepts for ASCII.
--
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |RESOLVED
Resolution| |INVALID
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=14687
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Bug java/14687] Incorrect UTF-8 byte->String conversion
2004-03-23 4:44 [Bug java/14687] New: Incorrect UTF-8 byte->String conversion joeclark at iastate dot edu
2004-03-23 17:28 ` [Bug java/14687] " tromey at gcc dot gnu dot org
2004-06-26 20:39 ` pinskia at gcc dot gnu dot org
@ 2004-06-26 22:10 ` joeclark at iastate dot edu
2004-06-27 0:43 ` [Bug libgcj/14687] [win32] " pinskia at gcc dot gnu dot org
3 siblings, 0 replies; 5+ messages in thread
From: joeclark at iastate dot edu @ 2004-06-26 22:10 UTC (permalink / raw)
To: java-prs
------- Additional Comments From joeclark at iastate dot edu 2004-06-26 21:56 -------
Okay, I've attached a java file that illustrates the bug. For me, the UTF
character set does *NOT* work for *ANY* ASCII characters. Perhaps I'm entering
the bytes wrong or something; if this is the case, the attached code should make
my error clear.
When I run this with Sun's JDK, I get the same value for all 127 values. With
gcj, *every* UTF-8 conversion results in a "?" string; here's a quick snippet:
b 109 --> utf ? & iso m
b 110 --> utf ? & iso n
b 111 --> utf ? & iso o
--
What |Removed |Added
----------------------------------------------------------------------------
Status|RESOLVED |UNCONFIRMED
Resolution|INVALID |
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=14687
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Bug libgcj/14687] [win32] Incorrect UTF-8 byte->String conversion
2004-03-23 4:44 [Bug java/14687] New: Incorrect UTF-8 byte->String conversion joeclark at iastate dot edu
` (2 preceding siblings ...)
2004-06-26 22:10 ` joeclark at iastate dot edu
@ 2004-06-27 0:43 ` pinskia at gcc dot gnu dot org
3 siblings, 0 replies; 5+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2004-06-27 0:43 UTC (permalink / raw)
To: java-prs
------- Additional Comments From pinskia at gcc dot gnu dot org 2004-06-26 22:10 -------
Works for me on i686-pc-linux-gnu, maybe this is Windows bug.
--
What |Removed |Added
----------------------------------------------------------------------------
Component|java |libgcj
Summary|Incorrect UTF-8 byte->String|[win32] Incorrect UTF-8
|conversion |byte->String conversion
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=14687
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2004-06-26 22:10 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-03-23 4:44 [Bug java/14687] New: Incorrect UTF-8 byte->String conversion joeclark at iastate dot edu
2004-03-23 17:28 ` [Bug java/14687] " tromey at gcc dot gnu dot org
2004-06-26 20:39 ` pinskia at gcc dot gnu dot org
2004-06-26 22:10 ` joeclark at iastate dot edu
2004-06-27 0:43 ` [Bug libgcj/14687] [win32] " pinskia at gcc dot gnu dot org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).