* outputting iso-8859-1 chars
@ 2002-04-24 15:56 Morten Poulsen
2002-04-24 22:48 ` Tom Tromey
0 siblings, 1 reply; 7+ messages in thread
From: Morten Poulsen @ 2002-04-24 15:56 UTC (permalink / raw)
To: java
Hi,
I want to output a char (eg. the Danish å, 229 in ISO-8859-1). It works
just fine when the class is compiled with javac or gcj
--encoding=ISO_8859-1 -C, and executed with a normal JVM. The class is
this:
class Hello {
public final static void main(String[] args) {
System.out.println("xxxåxxx");
}
}
However, if I compile it to a native binary, it outputs a question mark
where the å should have been.
mortenp@marvin:/tmp$ gcj --encoding=iso-8859-1 --main=Hello Hello.java
mortenp@marvin:/tmp$ ./a.out
xxx?xxx
The closest (I guess) hint I got when searching google was a patch at
http://gcc.gnu.org/ml/java-patches/2000-q4/msg00077.html which has the
line
+ buf[count++] = (byte) ((c > 0xff) ? '?' : c);
in "public class Output_8859_1 extends UnicodeToBytes". It outputs a
question mark if the character is out of range - but å shouldn't be out
of range.
I have compiled the class and looked at the string in the assembler
code. It looks unicode-ish?
mortenp@marvin:/tmp$ gcj --encoding=iso-8859-1 -S Hello.java
mortenp@marvin:/tmp$ fgrep xxx Hello.s
.ascii "xxx\303\245xxx"
I am using gcj 3.0.4 from Debian.
Isn't it possible to use ISO-8859-1 characters in strings, when using
gcj, or am I doing something wrong?
Thanks,
Morten
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: outputting iso-8859-1 chars
2002-04-24 15:56 outputting iso-8859-1 chars Morten Poulsen
@ 2002-04-24 22:48 ` Tom Tromey
2002-04-25 2:00 ` Morten Poulsen
0 siblings, 1 reply; 7+ messages in thread
From: Tom Tromey @ 2002-04-24 22:48 UTC (permalink / raw)
To: Morten Poulsen; +Cc: java
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 1490 bytes --]
>>>>> "Morten" == Morten Poulsen <morten@afdelingp.dk> writes:
Morten> I want to output a char (eg. the Danish å, 229 in
Morten> ISO-8859-1). It works just fine when the class is compiled
Morten> with javac or gcj
You don't say what platform you're on. I assume you're on Linux.
On Linux the Sun JVM assumes that the C locale uses ISO-8859-1, when
in fact it uses ASCII. libgcj respects this difference and outputs
just ASCII, meaning that character > 0x7f is printed as `?'.
This is sort of a pedantic difference, I guess, but I think it is the
cause of your problem.
Morten> mortenp@marvin:/tmp$ gcj --encoding=iso-8859-1 --main=Hello Hello.java
FYI, `gcj --encoding' tells gcj the encoding of your .java file. it
doesn't affect the runtime behavior of your program (well, it can,
since a given sequence of bytes in the input file can have a different
meaning).
Morten> I have compiled the class and looked at the string in the
Morten> assembler code. It looks unicode-ish?
Yes. Internally all string constants are represented as UTF-8. That
is how they are written to the assembler as well. At runtime they are
turned into UCS-2 (Java String encoding).
Morten> Isn't it possible to use ISO-8859-1 characters in strings,
Morten> when using gcj, or am I doing something wrong?
Your problem is almost certainly on the printing end of things. Try
setting your locale to something that uses ISO-8859-1. Or try using
`new OutputStreamWriter (System.out, "ISO-8859-1")'
Tom
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: outputting iso-8859-1 chars
2002-04-24 22:48 ` Tom Tromey
@ 2002-04-25 2:00 ` Morten Poulsen
2002-04-25 3:49 ` Oskar Liljeblad
0 siblings, 1 reply; 7+ messages in thread
From: Morten Poulsen @ 2002-04-25 2:00 UTC (permalink / raw)
To: tromey; +Cc: java
On Thu, 2002-04-25 at 01:02, Tom Tromey wrote:
> You don't say what platform you're on. I assume you're on Linux.
Yes, Linux 2.4.14 on a PIII and a 7400.
> On Linux the Sun JVM assumes that the C locale uses ISO-8859-1, when
> in fact it uses ASCII. libgcj respects this difference and outputs
> just ASCII, meaning that character > 0x7f is printed as `?'.
When I set my locale to da it still outouts a '?'.
> FYI, `gcj --encoding' tells gcj the encoding of your .java file. it
> doesn't affect the runtime behavior of your program (well, it can,
> since a given sequence of bytes in the input file can have a different
> meaning).
If I don't use it, I get
Hello.java:4: unrecognized character in input stream.
even for ISO-8859-1 characters in comments.
> Your problem is almost certainly on the printing end of things. Try
> setting your locale to something that uses ISO-8859-1. Or try using
> `new OutputStreamWriter (System.out, "ISO-8859-1")'
Thanks, that fixed it. I had the same problem when writing danish
characters to a socket. Inserting the above code fixed it there too.
Thanks for the help,
Morten
--
Morten Poulsen <morten@afdelingp.dk>
http://www.afdelingp.dk/
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: outputting iso-8859-1 chars
2002-04-25 2:00 ` Morten Poulsen
@ 2002-04-25 3:49 ` Oskar Liljeblad
2002-04-25 20:25 ` Morten Poulsen
0 siblings, 1 reply; 7+ messages in thread
From: Oskar Liljeblad @ 2002-04-25 3:49 UTC (permalink / raw)
To: Morten Poulsen; +Cc: java
On Thursday, April 25, 2002 at 10:08, Morten Poulsen wrote:
>
> > On Linux the Sun JVM assumes that the C locale uses ISO-8859-1, when
> > in fact it uses ASCII. libgcj respects this difference and outputs
> > just ASCII, meaning that character > 0x7f is printed as `?'.
>
> When I set my locale to da it still outouts a '?'.
$ LC_CTYPE=C ./a.out
xxx?xxx
$ LC_CTYPE=en_GB ./a.out
xxxåxxx
Setting locale here fixes it...
Oskar Liljeblad (oskar@osk.mine.nu)
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: outputting iso-8859-1 chars
2002-04-25 3:49 ` Oskar Liljeblad
@ 2002-04-25 20:25 ` Morten Poulsen
2002-04-30 0:22 ` Tom Tromey
0 siblings, 1 reply; 7+ messages in thread
From: Morten Poulsen @ 2002-04-25 20:25 UTC (permalink / raw)
To: Oskar Liljeblad; +Cc: java
On Thu, 2002-04-25 at 12:03, Oskar Liljeblad wrote:
> $ LC_CTYPE=C ./a.out
> xxx?xxx
> $ LC_CTYPE=en_GB ./a.out
> xxxåxxx
>
> Setting locale here fixes it...
Are you using a newer version than 3.0.4 ?
$ LC_CTYPE=C ./a.out
xxx?xxx
$ LC_CTYPE=en_GB ./a.out
xxx?xxx
Morten
--
Morten Poulsen <morten@afdelingp.dk>
http://www.afdelingp.dk/
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: outputting iso-8859-1 chars
2002-04-25 20:25 ` Morten Poulsen
@ 2002-04-30 0:22 ` Tom Tromey
2002-05-01 4:09 ` Morten Poulsen
0 siblings, 1 reply; 7+ messages in thread
From: Tom Tromey @ 2002-04-30 0:22 UTC (permalink / raw)
To: Morten Poulsen; +Cc: Oskar Liljeblad, java
>>>>> "Morten" == Morten Poulsen <morten@afdelingp.dk> writes:
Morten> Are you using a newer version than 3.0.4 ?
Morten> $ LC_CTYPE=C ./a.out
Morten> xxx?xxx
Morten> $ LC_CTYPE=en_GB ./a.out
Morten> xxx?xxx
I looked at the 3.0 branch sources using cvsweb. They have enough
support in them that this should work.
The default output encoding is chosen in libjava/java/lang/natSystem.cc.
Well, it is if various things are found by configure; in your case
this almost certainly happens since (1) Linux has all the features in
question, and (2) if we don't find the features we need we default to
ISO-8859-1 (which would contradict the results you see).
So I think the question is why you aren't seeing what we'd expect.
Try compiling and running this program:
#include <stdio.h>
#include <locale.h>
#include <langinfo.h>
int main ()
{
char *x;
setlocale (LC_CTYPE, "");
x = nl_langinfo (CODESET);
printf ("%s\n", x);
}
I get this:
creche. ./a
ANSI_X3.4-1968
creche. LC_CTYPE=en_GB ./a
ISO-8859-1
If this program doesn't print ISO-8859-1 (or some alias) when
LC_CTYPE=en_GB, then I think the problem is in your libc. Otherwise
maybe the problem is in libjava; you'd have to do some debugging to
figure out what is going wrong.
Tom
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: outputting iso-8859-1 chars
2002-04-30 0:22 ` Tom Tromey
@ 2002-05-01 4:09 ` Morten Poulsen
0 siblings, 0 replies; 7+ messages in thread
From: Morten Poulsen @ 2002-05-01 4:09 UTC (permalink / raw)
To: tromey; +Cc: Oskar Liljeblad, java
On Mon, 2002-04-29 at 22:21, Tom Tromey wrote:
> If this program doesn't print ISO-8859-1 (or some alias) when
> LC_CTYPE=en_GB, then I think the problem is in your libc. Otherwise
> maybe the problem is in libjava; you'd have to do some debugging to
> figure out what is going wrong.
Oskar Liljeblad gave me a hint about setting up locales correctly.
$ ./a.out
ANSI_X3.4-1968
$ LC_CTYPE=en_GB ./a.out
ISO-8859-1
$ ./hello
xxx?xxx
$ LC_CTYPE=en_GB ./hello
xxxåxxx
Thanks for the help,
Morten
--
Morten Poulsen <morten@afdelingp.dk>
http://www.afdelingp.dk/
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2002-05-01 11:09 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-04-24 15:56 outputting iso-8859-1 chars Morten Poulsen
2002-04-24 22:48 ` Tom Tromey
2002-04-25 2:00 ` Morten Poulsen
2002-04-25 3:49 ` Oskar Liljeblad
2002-04-25 20:25 ` Morten Poulsen
2002-04-30 0:22 ` Tom Tromey
2002-05-01 4:09 ` Morten Poulsen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).