public inbox for java@gcc.gnu.org
 help / color / mirror / Atom feed
* outputting iso-8859-1 chars
@ 2002-04-24 15:56 Morten Poulsen
  2002-04-24 22:48 ` Tom Tromey
  0 siblings, 1 reply; 7+ messages in thread
From: Morten Poulsen @ 2002-04-24 15:56 UTC (permalink / raw)
  To: java

Hi,

I want to output a char (eg. the Danish å, 229 in ISO-8859-1). It works
just fine when the class is compiled with javac or gcj
--encoding=ISO_8859-1 -C, and executed with a normal JVM. The class is
this:

class Hello {
    public final static void main(String[] args) {
        System.out.println("xxxåxxx");
    }
}

However, if I compile it to a native binary, it outputs a question mark
where the å should have been.

mortenp@marvin:/tmp$ gcj --encoding=iso-8859-1 --main=Hello Hello.java
mortenp@marvin:/tmp$ ./a.out 
xxx?xxx

The closest (I guess) hint I got when searching google was a patch at
http://gcc.gnu.org/ml/java-patches/2000-q4/msg00077.html which has the
line

+	buf[count++] = (byte) ((c > 0xff) ? '?' : c);

in "public class Output_8859_1 extends UnicodeToBytes". It outputs a
question mark if the character is out of range - but å shouldn't be out
of range.

I have compiled the class and looked at the string in the assembler
code. It looks unicode-ish?

mortenp@marvin:/tmp$ gcj --encoding=iso-8859-1 -S Hello.java            
mortenp@marvin:/tmp$ fgrep xxx Hello.s 
        .ascii  "xxx\303\245xxx"

I am using gcj 3.0.4 from Debian.

Isn't it possible to use ISO-8859-1 characters in strings, when using
gcj, or am I doing something wrong?

Thanks,
Morten

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: outputting iso-8859-1 chars
  2002-04-24 15:56 outputting iso-8859-1 chars Morten Poulsen
@ 2002-04-24 22:48 ` Tom Tromey
  2002-04-25  2:00   ` Morten Poulsen
  0 siblings, 1 reply; 7+ messages in thread
From: Tom Tromey @ 2002-04-24 22:48 UTC (permalink / raw)
  To: Morten Poulsen; +Cc: java

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 1490 bytes --]

>>>>> "Morten" == Morten Poulsen <morten@afdelingp.dk> writes:

Morten> I want to output a char (eg. the Danish å, 229 in
Morten> ISO-8859-1). It works just fine when the class is compiled
Morten> with javac or gcj

You don't say what platform you're on.  I assume you're on Linux.

On Linux the Sun JVM assumes that the C locale uses ISO-8859-1, when
in fact it uses ASCII.  libgcj respects this difference and outputs
just ASCII, meaning that character > 0x7f is printed as `?'.

This is sort of a pedantic difference, I guess, but I think it is the
cause of your problem.

Morten> mortenp@marvin:/tmp$ gcj --encoding=iso-8859-1 --main=Hello Hello.java

FYI, `gcj --encoding' tells gcj the encoding of your .java file.  it
doesn't affect the runtime behavior of your program (well, it can,
since a given sequence of bytes in the input file can have a different
meaning).

Morten> I have compiled the class and looked at the string in the
Morten> assembler code. It looks unicode-ish?

Yes.  Internally all string constants are represented as UTF-8.  That
is how they are written to the assembler as well.  At runtime they are
turned into UCS-2 (Java String encoding).

Morten> Isn't it possible to use ISO-8859-1 characters in strings,
Morten> when using gcj, or am I doing something wrong?

Your problem is almost certainly on the printing end of things.  Try
setting your locale to something that uses ISO-8859-1.  Or try using
`new OutputStreamWriter (System.out, "ISO-8859-1")'

Tom

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: outputting iso-8859-1 chars
  2002-04-24 22:48 ` Tom Tromey
@ 2002-04-25  2:00   ` Morten Poulsen
  2002-04-25  3:49     ` Oskar Liljeblad
  0 siblings, 1 reply; 7+ messages in thread
From: Morten Poulsen @ 2002-04-25  2:00 UTC (permalink / raw)
  To: tromey; +Cc: java

On Thu, 2002-04-25 at 01:02, Tom Tromey wrote:
> You don't say what platform you're on.  I assume you're on Linux.

Yes, Linux 2.4.14 on a PIII and a 7400.

> On Linux the Sun JVM assumes that the C locale uses ISO-8859-1, when
> in fact it uses ASCII.  libgcj respects this difference and outputs
> just ASCII, meaning that character > 0x7f is printed as `?'.

When I set my locale to da it still outouts a '?'.

> FYI, `gcj --encoding' tells gcj the encoding of your .java file.  it
> doesn't affect the runtime behavior of your program (well, it can,
> since a given sequence of bytes in the input file can have a different
> meaning).

If I don't use it, I get
Hello.java:4: unrecognized character in input stream.
even for ISO-8859-1 characters in comments.

> Your problem is almost certainly on the printing end of things.  Try
> setting your locale to something that uses ISO-8859-1.  Or try using
> `new OutputStreamWriter (System.out, "ISO-8859-1")'

Thanks, that fixed it. I had the same problem when writing danish
characters to a socket. Inserting the above code fixed it there too.

Thanks for the help,
Morten

-- 
Morten Poulsen <morten@afdelingp.dk>
http://www.afdelingp.dk/

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: outputting iso-8859-1 chars
  2002-04-25  2:00   ` Morten Poulsen
@ 2002-04-25  3:49     ` Oskar Liljeblad
  2002-04-25 20:25       ` Morten Poulsen
  0 siblings, 1 reply; 7+ messages in thread
From: Oskar Liljeblad @ 2002-04-25  3:49 UTC (permalink / raw)
  To: Morten Poulsen; +Cc: java

On Thursday, April 25, 2002 at 10:08, Morten Poulsen wrote:
> 
> > On Linux the Sun JVM assumes that the C locale uses ISO-8859-1, when
> > in fact it uses ASCII.  libgcj respects this difference and outputs
> > just ASCII, meaning that character > 0x7f is printed as `?'.
> 
> When I set my locale to da it still outouts a '?'.

$ LC_CTYPE=C ./a.out 
xxx?xxx
$ LC_CTYPE=en_GB ./a.out 
xxxåxxx

Setting locale here fixes it...

Oskar Liljeblad (oskar@osk.mine.nu)

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: outputting iso-8859-1 chars
  2002-04-25  3:49     ` Oskar Liljeblad
@ 2002-04-25 20:25       ` Morten Poulsen
  2002-04-30  0:22         ` Tom Tromey
  0 siblings, 1 reply; 7+ messages in thread
From: Morten Poulsen @ 2002-04-25 20:25 UTC (permalink / raw)
  To: Oskar Liljeblad; +Cc: java

On Thu, 2002-04-25 at 12:03, Oskar Liljeblad wrote:
> $ LC_CTYPE=C ./a.out 
> xxx?xxx
> $ LC_CTYPE=en_GB ./a.out 
> xxxåxxx
> 
> Setting locale here fixes it...

Are you using a newer version than 3.0.4 ?

$ LC_CTYPE=C ./a.out 
xxx?xxx
$ LC_CTYPE=en_GB ./a.out 
xxx?xxx

Morten

-- 
Morten Poulsen <morten@afdelingp.dk>
http://www.afdelingp.dk/

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: outputting iso-8859-1 chars
  2002-04-25 20:25       ` Morten Poulsen
@ 2002-04-30  0:22         ` Tom Tromey
  2002-05-01  4:09           ` Morten Poulsen
  0 siblings, 1 reply; 7+ messages in thread
From: Tom Tromey @ 2002-04-30  0:22 UTC (permalink / raw)
  To: Morten Poulsen; +Cc: Oskar Liljeblad, java

>>>>> "Morten" == Morten Poulsen <morten@afdelingp.dk> writes:

Morten> Are you using a newer version than 3.0.4 ?
Morten> $ LC_CTYPE=C ./a.out 
Morten> xxx?xxx
Morten> $ LC_CTYPE=en_GB ./a.out 
Morten> xxx?xxx

I looked at the 3.0 branch sources using cvsweb.  They have enough
support in them that this should work.

The default output encoding is chosen in libjava/java/lang/natSystem.cc.
Well, it is if various things are found by configure; in your case
this almost certainly happens since (1) Linux has all the features in
question, and (2) if we don't find the features we need we default to
ISO-8859-1 (which would contradict the results you see).

So I think the question is why you aren't seeing what we'd expect.

Try compiling and running this program:

#include <stdio.h>
#include <locale.h>
#include <langinfo.h>

int main ()
{
  char *x;
  setlocale (LC_CTYPE, "");
  x = nl_langinfo (CODESET);
  printf ("%s\n", x);
}


I get this:

    creche. ./a
    ANSI_X3.4-1968
    creche. LC_CTYPE=en_GB ./a
    ISO-8859-1


If this program doesn't print ISO-8859-1 (or some alias) when
LC_CTYPE=en_GB, then I think the problem is in your libc.  Otherwise
maybe the problem is in libjava; you'd have to do some debugging to
figure out what is going wrong.

Tom

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: outputting iso-8859-1 chars
  2002-04-30  0:22         ` Tom Tromey
@ 2002-05-01  4:09           ` Morten Poulsen
  0 siblings, 0 replies; 7+ messages in thread
From: Morten Poulsen @ 2002-05-01  4:09 UTC (permalink / raw)
  To: tromey; +Cc: Oskar Liljeblad, java

On Mon, 2002-04-29 at 22:21, Tom Tromey wrote:
> If this program doesn't print ISO-8859-1 (or some alias) when
> LC_CTYPE=en_GB, then I think the problem is in your libc.  Otherwise
> maybe the problem is in libjava; you'd have to do some debugging to
> figure out what is going wrong.

Oskar Liljeblad gave me a hint about setting up locales correctly.

$ ./a.out 
ANSI_X3.4-1968
$ LC_CTYPE=en_GB ./a.out 
ISO-8859-1

$ ./hello
xxx?xxx
$ LC_CTYPE=en_GB ./hello
xxxåxxx

Thanks for the help,
Morten

-- 
Morten Poulsen <morten@afdelingp.dk>
http://www.afdelingp.dk/

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2002-05-01 11:09 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-04-24 15:56 outputting iso-8859-1 chars Morten Poulsen
2002-04-24 22:48 ` Tom Tromey
2002-04-25  2:00   ` Morten Poulsen
2002-04-25  3:49     ` Oskar Liljeblad
2002-04-25 20:25       ` Morten Poulsen
2002-04-30  0:22         ` Tom Tromey
2002-05-01  4:09           ` Morten Poulsen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).