From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 6239 invoked by alias); 18 Mar 2004 16:42:48 -0000 Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Archive: List-Post: List-Help: Sender: gcc-bugs-owner@gcc.gnu.org Received: (qmail 6231 invoked by uid 48); 18 Mar 2004 16:42:47 -0000 Date: Thu, 18 Mar 2004 16:42:00 -0000 From: "erwin at klomp dot org" To: gcc-bugs@gcc.gnu.org Message-ID: <20040318164228.14636.erwin@klomp.org> Reply-To: gcc-bugzilla@gcc.gnu.org Subject: [Bug java/14636] New: Problem with UTF-8 in IOConverter/iconv on cygwin X-Bugzilla-Reason: CC X-SW-Source: 2004-03/txt/msg02215.txt.bz2 List-Id: I'm not sure whether to report this with cygwin or gcc, but my hunch is that the problem is more generic than just cygwin. I have a test class that I'll attach that shows the problem. When I try to convert an UTF-8 byte-array to a java String, the byte order in the java chars is wrong. (This is on an Intel platform w. MS Windows XP) However the field iconv_byte_swap in gnu.gcj.convert.IOConverter is true, as the test program shows. An additional complication is that on most platforms, iconv isn't used to UTF-8, but on cygwin with statically linked binaries, the Input_UTF8 converter class isn't used because the linker throws it away, so IOConverter falls back on iconv. I wonder if the native method gnu::gcj::convert::Input_iconv::read in natIconv.cc does the byte swapping correctly. It reads characters from a local variable of type jchar*, swaps the bytes, and then writes it back through a variable of type char* Isn't a char 8-bits wide and a jchar 16 bits wide? Also, this piece of code hasn't changed between release 3.3.1 and the HEAD. There is a workaround: include a reference to the class that implements the UTF8 converter in Java, to force the linker to include it in the executable. - Erwin Full gcj -v information: Configured with: /GCC/gcc-3.3.1-3/configure --with-gcc --with-gnu-ld --with-gnu- as --prefix=/usr --exec-prefix=/usr --sysconfdir=/etc --libdir=/usr/lib --libexe cdir=/usr/sbin --mandir=/usr/share/man --infodir=/usr/share/info --enable-langua ges=c,ada,c++,f77,pascal,java,objc --enable-libgcj --enable-threads=posix --with -system-zlib --enable-nls --without-included-gettext --enable-interpreter --enab le-sjlj-exceptions --disable-version-specific-runtime-libs --enable-shared --dis able-win32-registry --enable-java-gc=boehm --disable-hash-synchronization --verb ose --target=i686-pc-cygwin --host=i686-pc-cygwin --build=i686-pc-cygwin Thread model: posix gcc version 3.3.1 (cygming special) -- Summary: Problem with UTF-8 in IOConverter/iconv on cygwin Product: gcc Version: 3.3.1 Status: UNCONFIRMED Severity: normal Priority: P2 Component: java AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: erwin at klomp dot org CC: gcc-bugs at gcc dot gnu dot org GCC build triplet: 3.3.1 GCC host triplet: i686-pc-cygwin GCC target triplet: i686-pc-cygwin http://gcc.gnu.org/bugzilla/show_bug.cgi?id=14636