public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug java/14636] New: Problem with UTF-8 in IOConverter/iconv on cygwin
@ 2004-03-18 16:42 erwin at klomp dot org
  2004-03-18 16:44 ` [Bug java/14636] " erwin at klomp dot org
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: erwin at klomp dot org @ 2004-03-18 16:42 UTC (permalink / raw)
  To: gcc-bugs

I'm not sure whether to report this with cygwin or gcc, but my hunch is that the
problem is more generic than just cygwin.

I have a test class that I'll attach that shows the problem. When I try to
convert an UTF-8 byte-array to a java String, the byte order in the java chars
is wrong. (This is on an Intel platform w. MS Windows XP)

However the field iconv_byte_swap in gnu.gcj.convert.IOConverter is true, as the
test program shows.

An additional complication is that on most platforms, iconv isn't used to UTF-8,
but on cygwin with statically linked binaries, the Input_UTF8 converter class
isn't used because the linker throws it away, so IOConverter falls back on iconv. 

I wonder if the native method gnu::gcj::convert::Input_iconv::read in
natIconv.cc does the byte swapping correctly. It reads characters from a local
variable of type jchar*, swaps the bytes, and then writes it back through a
variable of type char*
Isn't a char 8-bits wide and a jchar 16 bits wide?

Also, this piece of code hasn't changed between release 3.3.1 and the HEAD.


There is a workaround: include a reference to the class that implements the UTF8
converter in Java, to force the linker to include it in the executable.

- Erwin



Full gcj -v information:


Configured with: /GCC/gcc-3.3.1-3/configure --with-gcc --with-gnu-ld --with-gnu-
as --prefix=/usr --exec-prefix=/usr --sysconfdir=/etc --libdir=/usr/lib --libexe
cdir=/usr/sbin --mandir=/usr/share/man --infodir=/usr/share/info --enable-langua
ges=c,ada,c++,f77,pascal,java,objc --enable-libgcj --enable-threads=posix --with
-system-zlib --enable-nls --without-included-gettext --enable-interpreter --enab
le-sjlj-exceptions --disable-version-specific-runtime-libs --enable-shared --dis
able-win32-registry --enable-java-gc=boehm --disable-hash-synchronization --verb
ose --target=i686-pc-cygwin --host=i686-pc-cygwin --build=i686-pc-cygwin
Thread model: posix
gcc version 3.3.1 (cygming special)

-- 
           Summary: Problem with UTF-8 in IOConverter/iconv on cygwin
           Product: gcc
           Version: 3.3.1
            Status: UNCONFIRMED
          Severity: normal
          Priority: P2
         Component: java
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: erwin at klomp dot org
                CC: gcc-bugs at gcc dot gnu dot org
 GCC build triplet: 3.3.1
  GCC host triplet: i686-pc-cygwin
GCC target triplet: i686-pc-cygwin


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=14636


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2004-03-18 17:23 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-03-18 16:42 [Bug java/14636] New: Problem with UTF-8 in IOConverter/iconv on cygwin erwin at klomp dot org
2004-03-18 16:44 ` [Bug java/14636] " erwin at klomp dot org
2004-03-18 16:56 ` pinskia at gcc dot gnu dot org
2004-03-18 17:09 ` erwin at klomp dot org
2004-03-18 17:23 ` [Bug libgcj/14636] " pinskia at gcc dot gnu dot org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).