public inbox for gcc-prs@sourceware.org
help / color / mirror / Atom feed
* java/1356: gcj mangles composed characters
@ 2000-12-20 12:25 doko
0 siblings, 0 replies; only message in thread
From: doko @ 2000-12-20 12:25 UTC (permalink / raw)
To: java-gnats
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 11520 bytes --]
>Number: 1356
>Category: java
>Synopsis: gcj mangles composed characters
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: tromey
>State: closed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Wed Dec 20 12:19:17 PST 2000
>Closed-Date: Thu Sep 14 11:36:15 PDT 2000
>Last-Modified: Thu Sep 14 11:40:00 PDT 2000
>Originator: Stephane Bortzmeyer <bortz@pasteur.fr>
>Release: gcj/libgcj 2.95
>Organization:
>Environment:
Debian Linux/GNU unstable ix86
>Description:
[Please see http://www.debian.org/Bugs/db/42/42895.html for the original report]
Java is supposed to be Unicode for its character strings. A program such
as this one:
public class Hello {
public static void main ( String []arguments)
{
System.out.println ("Liberté, égalité, fraternité !");
}
}
works fine (both with JDK-java and kaffe) when compiled with
JDK-javac or jikes, but prints strange stuff instead of my letters
when compiled with gcj (both with JDK-java and kaffe):
ishtar:~/tmp/Java> jikes Hello.java
ishtar:~/tmp/Java> kaffe Hello
Liberté, égalité, fraternité !
ishtar:~/tmp/Java> gcj -C Hello.java
ishtar:~/tmp/Java> kaffe Hello
Libertþ, þgalitþ, fraternitþ !
>How-To-Repeat:
>Fix:
>Release-Note:
>Audit-Trail:
Formerly PR gcj/33
From: Tom Tromey <tromey@cygnus.com>
To: Java Gnats Server <java-gnats@sourceware.cygnus.com>
Cc: Stephane Bortzmeyer <bortz@pasteur.fr>
Subject: gcj/33
Date: 02 Sep 1999 20:54:20 -0600
I looked at this problem.
gcj assumes that the input file is itself Utf-8 encoded. Is this the
case in your example? My guess is that the other compilers assume
that the input file is encoded according to your locale's charset, and
that your input file is Latin-1. To gcj, a Latin-1 file looks like a
file with encoding errors.
As a workaround you can convert your program to Utf-8 using GNU recode
(or iconv if you have it).
In the long term I agree that gcj should read the file using the
locale (possibly augmented with a new flag to indicate the encoding).
For instance, we could do this quite easily using libunicode (Java
hackers, contact me for details).
Tom
From: Stephane Bortzmeyer <bortzmeyer@pasteur.fr>
To: tromey@cygnus.com
Cc: Java Gnats Server <java-gnats@sourceware.cygnus.com>,
bortzmeyer@pasteur.fr, 42895@bugs.debian.org
Subject: Re: gcj/33
Date: Fri, 03 Sep 1999 09:58:37 +0200
On Thursday 2 September 1999, at 20 h 54, the keyboard of Tom Tromey
<tromey@cygnus.com> wrote:
> I looked at this problem.
[BTW, where can I read the PR on gcj? I find nothing on
http://sourceware.cygnus.com .]
> gcj assumes that the input file is itself Utf-8 encoded. Is this the
> case in your example?
No, they were in ISO-8859-1 (Latin 1).
> My guess is that the other compilers assume
> that the input file is encoded according to your locale's charset, and
Hmmm, this is certainly specified in the Java Language Definition, no ? If so,
this is just a matter of finding who is right.
> As a workaround you can convert your program to Utf-8 using GNU recode
The workaround works:
ishtar:~/tmp/Java> recode Latin1..UTF-8 Hello.java
ishtar:~/tmp/Java> more Hello.java
public class Hello {
public static void main ( String []arguments)
{
System.out.println ("Liberté, égalité, fraternité !");
}
}
ishtar:~/tmp/Java> gcj -C Hello.java
ishtar:~/tmp/Java> kaffe Hello
Liberté, égalité, fraternité !
> In the long term I agree that gcj should read the file using the
> locale (possibly augmented with a new flag to indicate the encoding).
Well, the most important for me is that all Java compilers do the same and follow the Java Language Definition.
Apart from that, practically speaking, I would say that noone can type UTF-8 on her keyboard, while many people can enter Latin-*.
From: Jason Molenda <crash@dollarCOMPANYNAME.com>
To: Stephane Bortzmeyer <bortzmeyer@pasteur.fr>
Cc: java-gnats@sourceware.cygnus.com
Subject: Re: gcj/33
Date: Fri, 3 Sep 1999 01:57:51 -0700
> [BTW, where can I read the PR on gcj? I find nothing on
> http://sourceware.cygnus.com .]
It looks like there isn't a link to it on the java web page. Look at
http://sourceware.cygnus.com/cgi-bin/gnatsweb.pl?database=java&user=guest&password=guest&cmd=login
and bring up PR # 33.
Jason
From: Tom Tromey <tromey@cygnus.com>
To: Stephane Bortzmeyer <bortzmeyer@pasteur.fr>
Cc: tromey@cygnus.com, Java Gnats Server <java-gnats@sourceware.cygnus.com>,
42895@bugs.debian.org
Subject: Re: gcj/33
Date: Fri, 3 Sep 1999 10:02:05 -0700
Stephane> [BTW, where can I read the PR on gcj? I find nothing on
Stephane> http://sourceware.cygnus.com .]
Go to http://sourceware.cygnus.com/java and follow the link to the
Gnats database.
Stephane> Hmmm, this is certainly specified in the Java Language
Stephane> Definition, no ? If so, this is just a matter of finding who
Stephane> is right.
I doubt this is specified, though I can't look right now (I don't know
where my copy of the JLS is).
Stephane> Apart from that, practically speaking, I would say that
Stephane> noone can type UTF-8 on her keyboard, while many people can
Stephane> enter Latin-*.
This is a file encoding issue, not an input issue.
Still, I agree -- I just doubt anybody has time to implement this
right now.
Tom
From: Alexandre Petit-Bianco <apbianco@cygnus.com>
To: java-gnats@sourceware.cygnus.com
Cc:
Subject: Re: gcj/33
Date: Fri, 3 Sep 1999 10:48:38 -0700
Stephane Bortzmeyer writes:
>> My guess is that the other compilers assume that the input file is
>> encoded according to your locale's charset, and
> Hmmm, this is certainly specified in the Java Language Definition,
> no ? If so, this is just a matter of finding who is right.
The JLS says that Java programs must be written in Unicodes, but also
defines Unicode escape sequences so that any Unicode characters can be
defined using only ASCII characters
( http://java.sun.com/docs/books/jls/html/3.doc.html#95413 ). That
somehow defines the minimum encoding one has to support (plain
"printable" ASCII)
I've never found any specs on how locale should be consulted to
interpret the input stream. We happen to try to read utf-8 for
character values greater or equal to 128.
> > As a workaround you can convert your program to Utf-8 using GNU recode
Or express `é' as the unicode escape sequence `\u00e9'.
./A
From: Stephane Bortzmeyer <bortzmeyer@pasteur.fr>
To: Tom Tromey <tromey@cygnus.com>
Cc: Java Gnats Server <java-gnats@sourceware.cygnus.com>,
bortzmeyer@pasteur.fr
Subject: Re: gcj/33
Date: Mon, 06 Sep 1999 10:38:39 +0200
On Friday 3 September 1999, at 10 h 2, the keyboard of Tom Tromey
<tromey@cygnus.com> wrote:
> Go to http://sourceware.cygnus.com/java and follow the link to the
> Gnats database.
There is none.
State-Changed-From-To: open->analyzed
State-Changed-By: tromey
State-Changed-When: Mon Mar 6 13:41:29 2000
State-Changed-Why:
I wrote a patch for this. The patch isn't perfect
(the encoding doesn't default to the current encoding
from your locale; I don't know how to find that information)
but it does seem to work. It is pending approval:
http://gcc.gnu.org/ml/gcc-patches/2000-03/msg00190.html
From: tromey@cygnus.com
To: bortz@pasteur.fr, 42895@bugs.debian.org, apbianco@cygnus.com,
doko@debian.org, java-gnats@sourceware.cygnus.com
Cc:
Subject: Re: gcj/33
Date: 6 Mar 2000 21:41:29 -0000
Synopsis: gcj mangles composed characters
State-Changed-From-To: open->analyzed
State-Changed-By: tromey
State-Changed-When: Mon Mar 6 13:41:29 2000
State-Changed-Why:
I wrote a patch for this. The patch isn't perfect
(the encoding doesn't default to the current encoding
from your locale; I don't know how to find that information)
but it does seem to work. It is pending approval:
http://gcc.gnu.org/ml/gcc-patches/2000-03/msg00190.html
http://sourceware.cygnus.com/cgi-bin/gnatsweb.pl?cmd=view&pr=33&database=java
Responsible-Changed-From-To: apbianco->tromey
Responsible-Changed-By: tromey
Responsible-Changed-When: Sat Jun 24 09:36:38 2000
Responsible-Changed-Why:
This is actually mine -- I have an unfinished patch
to fix it.
From: tromey@cygnus.com
To: bortz@pasteur.fr, 42895@bugs.debian.org, apbianco@cygnus.com,
doko@debian.org, java-gnats@sourceware.cygnus.com, tromey@cygnus.com
Cc:
Subject: Re: gcj/33
Date: 24 Jun 2000 16:36:38 -0000
Synopsis: gcj mangles composed characters
Responsible-Changed-From-To: apbianco->tromey
Responsible-Changed-By: tromey
Responsible-Changed-When: Sat Jun 24 09:36:38 2000
Responsible-Changed-Why:
This is actually mine -- I have an unfinished patch
to fix it.
http://sourceware.cygnus.com/cgi-bin/gnatsweb.pl?cmd=view&pr=33&database=java
State-Changed-From-To: analyzed->feedback
State-Changed-By: tromey
State-Changed-When: Tue Sep 12 15:13:20 2000
State-Changed-Why:
I'm finally checking in my patch to fix this problem.
My patch changes gcj to use the current locale's encoding
by default. If it can't find the locale's encoding it
assumes UTF-8. The patch also adds a `--encoding' switch
to gcj so that the default encoding can be changed from
the command line.
If you can try this, please do.
If not, tell me and I will simply close the PR.
The patch only works on systems with a working iconv.
I don't currently intend to make it work elsewhere
(though eventually I may by using libiconv).
From: tromey@cygnus.com
To: bortz@pasteur.fr, 42895@bugs.debian.org, doko@debian.org,
java-gnats@sourceware.cygnus.com, tromey@cygnus.com
Cc:
Subject: Re: gcj/33
Date: 12 Sep 2000 22:13:20 -0000
Synopsis: gcj mangles composed characters
State-Changed-From-To: analyzed->feedback
State-Changed-By: tromey
State-Changed-When: Tue Sep 12 15:13:20 2000
State-Changed-Why:
I'm finally checking in my patch to fix this problem.
My patch changes gcj to use the current locale's encoding
by default. If it can't find the locale's encoding it
assumes UTF-8. The patch also adds a `--encoding' switch
to gcj so that the default encoding can be changed from
the command line.
If you can try this, please do.
If not, tell me and I will simply close the PR.
The patch only works on systems with a working iconv.
I don't currently intend to make it work elsewhere
(though eventually I may by using libiconv).
http://sources.redhat.com/cgi-bin/gnatsweb.pl?cmd=view&pr=33&database=java
State-Changed-From-To: feedback->closed
State-Changed-By: tromey
State-Changed-When: Thu Sep 14 11:36:15 2000
State-Changed-Why:
Reporter can't verify but I know it is fixed.
From: tromey@cygnus.com
To: bortz@pasteur.fr, 42895@bugs.debian.org, doko@debian.org,
java-gnats@sourceware.cygnus.com, tromey@cygnus.com
Cc:
Subject: Re: gcj/33
Date: 14 Sep 2000 18:36:15 -0000
Synopsis: gcj mangles composed characters
State-Changed-From-To: feedback->closed
State-Changed-By: tromey
State-Changed-When: Thu Sep 14 11:36:15 2000
State-Changed-Why:
Reporter can't verify but I know it is fixed.
http://sources.redhat.com/cgi-bin/gnatsweb.pl?cmd=view&pr=33&database=java
>Unformatted:
^ permalink raw reply [flat|nested] only message in thread
only message in thread, other threads:[~2000-12-20 12:25 UTC | newest]
Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2000-12-20 12:25 java/1356: gcj mangles composed characters doko
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).