public inbox for java-prs@sourceware.org help / color / mirror / Atom feed
* [Bug libgcj/31939] New: Command line arguments are byteswapped before being passed to the program runing in custom locale. @ 2007-05-15 16:35 serg at vostok dot net 2007-05-15 17:05 ` [Bug libgcj/31939] " serg at vostok dot net ` (3 more replies) 0 siblings, 4 replies; 5+ messages in thread From: serg at vostok dot net @ 2007-05-15 16:35 UTC (permalink / raw) To: java-prs Conversion to UCS-2 encoding in GNU libiconv returns bytes in network order. gcj wants to have them in host order. The hack in libgcj swaps bytes when necessary. However, command line arguments slip by the swapper (or go through it even number of times). This produces unrecognizable for a program command line arguments. There is a non-zero probability of this producing invalid UTF-16 strings, e.g. with unpaired surrogates. That, in turn, may result in segmentation faults during string operations, especially I/O character encoding conversion. -- Summary: Command line arguments are byteswapped before being passed to the program runing in custom locale. Product: gcc Version: 4.1.3 Status: UNCONFIRMED Severity: normal Priority: P3 Component: libgcj AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: serg at vostok dot net GCC build triplet: i386-portbld-freebsd6.2 GCC host triplet: i386-portbld-freebsd6.2 GCC target triplet: i386-portbld-freebsd6.2 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31939 ^ permalink raw reply [flat|nested] 5+ messages in thread
* [Bug libgcj/31939] Command line arguments are byteswapped before being passed to the program runing in custom locale. 2007-05-15 16:35 [Bug libgcj/31939] New: Command line arguments are byteswapped before being passed to the program runing in custom locale serg at vostok dot net @ 2007-05-15 17:05 ` serg at vostok dot net 2007-05-15 17:11 ` serg at vostok dot net ` (2 subsequent siblings) 3 siblings, 0 replies; 5+ messages in thread From: serg at vostok dot net @ 2007-05-15 17:05 UTC (permalink / raw) To: java-prs ------- Comment #1 from serg at vostok dot net 2007-05-15 18:05 ------- This bug is relevant only for iconv that have the "UCS-2" encoding with byte order different from native byte order of the platform, e.g. GNU libiconv on i386. There are actually two subjects here. 1. Fix source code to use an appropriate for the platform UCS-2* encoding with native byte order, e.g. UCS-2-INTERNAL for GNU libiconv on i386. 2. Find the bug that may leave command line arguments in wrong byte order and fix it. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31939 ^ permalink raw reply [flat|nested] 5+ messages in thread
* [Bug libgcj/31939] Command line arguments are byteswapped before being passed to the program runing in custom locale. 2007-05-15 16:35 [Bug libgcj/31939] New: Command line arguments are byteswapped before being passed to the program runing in custom locale serg at vostok dot net 2007-05-15 17:05 ` [Bug libgcj/31939] " serg at vostok dot net @ 2007-05-15 17:11 ` serg at vostok dot net 2007-05-15 17:33 ` serg at vostok dot net 2007-05-18 19:07 ` serg at vostok dot net 3 siblings, 0 replies; 5+ messages in thread From: serg at vostok dot net @ 2007-05-15 17:11 UTC (permalink / raw) To: java-prs ------- Comment #2 from serg at vostok dot net 2007-05-15 18:11 ------- Created an attachment (id=13559) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=13559&action=view) A hack to use UCS-2-INTERNAL instead of plain UCS-2 For subject 1. Works only for those who actually have UCS-2-INTERNAL. That's at least all GNU libiconv users. Shown here to point out that there are only 2 files of source code to change. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31939 ^ permalink raw reply [flat|nested] 5+ messages in thread
* [Bug libgcj/31939] Command line arguments are byteswapped before being passed to the program runing in custom locale. 2007-05-15 16:35 [Bug libgcj/31939] New: Command line arguments are byteswapped before being passed to the program runing in custom locale serg at vostok dot net 2007-05-15 17:05 ` [Bug libgcj/31939] " serg at vostok dot net 2007-05-15 17:11 ` serg at vostok dot net @ 2007-05-15 17:33 ` serg at vostok dot net 2007-05-18 19:07 ` serg at vostok dot net 3 siblings, 0 replies; 5+ messages in thread From: serg at vostok dot net @ 2007-05-15 17:33 UTC (permalink / raw) To: java-prs ------- Comment #3 from serg at vostok dot net 2007-05-15 18:33 ------- For subject 1. Can java/gcj even be used without iconv in general? Considering that java.io.File assumes all file names in OS are encoded in UTF-8 and java.String stores it's data in UCS-2 (UTF-16), the answer should be "NO". TODO: Add a check in configure for presence of a UCS-2* encoding in iconv. There we can test several well-known encoding names (UCS-2-INTERNAL, UCS-2-LE, USC-2-BE, UCS-2LE, UCS-2BE) and set HAVE_UCS2 to the name of one with native byte order, if found. If not, then test for plain UCS-2. If not found - bail out with a message about needing an iconv with a UCS-2 for java. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31939 ^ permalink raw reply [flat|nested] 5+ messages in thread
* [Bug libgcj/31939] Command line arguments are byteswapped before being passed to the program runing in custom locale. 2007-05-15 16:35 [Bug libgcj/31939] New: Command line arguments are byteswapped before being passed to the program runing in custom locale serg at vostok dot net ` (2 preceding siblings ...) 2007-05-15 17:33 ` serg at vostok dot net @ 2007-05-18 19:07 ` serg at vostok dot net 3 siblings, 0 replies; 5+ messages in thread From: serg at vostok dot net @ 2007-05-18 19:07 UTC (permalink / raw) To: java-prs ------- Comment #4 from serg at vostok dot net 2007-05-18 20:07 ------- For subject 2. The point is to find where arguments of "int main(int argc, char** argv)" are converted into java.lang.String to be passed to "static void main(String[] args)". Trail: gcc/java/jvgenmain.c: int main(int argc,char **argv) constructs a C code to call the main java method with JvRunMain(classname,argc,argv) libjava/prims.cc: void JvRunMain (jclass klass, int argc, const char **argv) simply calls _Jv_RunMain (klass, NULL, argc, argv, false) void _Jv_RunMain (jclass klass, const char *name, int argc, const char **argv, bool is_jar) simply calls _Jv_RunMain (NULL, klass, name, argc, argv, is_jar) void _Jv_RunMain (JvVMInitArgs *vm_args, jclass klass, const char *name, int argc, const char **argv, bool is_jar) calls JvConvertArgv (argc - 1, argv + 1) JArray<jstring> * JvConvertArgv (int argc, const char **argv) copies each argument into jbyteArray bytes, then calls new java::lang::String (bytes, 0, len) to make a String from it with a default encoding. libjava/java/lang/String.java: public String(byte[] data, int offset, int count) calls init (data, offset, count,System.getProperty("file.encoding", "8859_1")) libjava/java/lang/natString.cc: void java::lang::String::init (jbyteArray bytes, jint offset, jint count, jstring encoding) uses gnu::gcj::convert::BytesToUnicode::getDecoder(encoding) to make a converter and converter->read(array, outpos, avail) to convert data libjava/gnu/gcj/convert/BytesToUnicode.java: class BytesToUnicode extends IOConverter public static BytesToUnicode getDecoder (String encoding) uses eigther new Input_iconv(encoding) or new BytesToCharsetAdaptor(Charset.forName(encoding)) Looks like I will have to test both Input_iconv and BytesToCharsetAdaptor to see if any of them is buggy. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31939 ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2007-05-18 19:07 UTC | newest] Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2007-05-15 16:35 [Bug libgcj/31939] New: Command line arguments are byteswapped before being passed to the program runing in custom locale serg at vostok dot net 2007-05-15 17:05 ` [Bug libgcj/31939] " serg at vostok dot net 2007-05-15 17:11 ` serg at vostok dot net 2007-05-15 17:33 ` serg at vostok dot net 2007-05-18 19:07 ` serg at vostok dot net
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).