public inbox for java-prs@sourceware.org
help / color / mirror / Atom feed
* [Bug libgcj/31939]  New: Command line arguments are byteswapped before being passed to the program runing in custom locale.
@ 2007-05-15 16:35 serg at vostok dot net
  2007-05-15 17:05 ` [Bug libgcj/31939] " serg at vostok dot net
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: serg at vostok dot net @ 2007-05-15 16:35 UTC (permalink / raw)
  To: java-prs

Conversion to UCS-2 encoding in GNU libiconv returns bytes in network
order. gcj wants to have them in host order. The hack in libgcj swaps bytes
when necessary. However, command line arguments slip by the swapper (or
go through it even number of times).

This produces unrecognizable for a program command line arguments. There is a
non-zero probability of this producing invalid UTF-16 strings, e.g. with
unpaired surrogates. That, in turn, may result in segmentation faults during
string operations, especially I/O character encoding conversion.


-- 
           Summary: Command line arguments are byteswapped before being
                    passed to the program runing in custom locale.
           Product: gcc
           Version: 4.1.3
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: libgcj
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: serg at vostok dot net
 GCC build triplet: i386-portbld-freebsd6.2
  GCC host triplet: i386-portbld-freebsd6.2
GCC target triplet: i386-portbld-freebsd6.2


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31939


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug libgcj/31939] Command line arguments are byteswapped before being passed to the program runing in custom locale.
  2007-05-15 16:35 [Bug libgcj/31939] New: Command line arguments are byteswapped before being passed to the program runing in custom locale serg at vostok dot net
@ 2007-05-15 17:05 ` serg at vostok dot net
  2007-05-15 17:11 ` serg at vostok dot net
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: serg at vostok dot net @ 2007-05-15 17:05 UTC (permalink / raw)
  To: java-prs



------- Comment #1 from serg at vostok dot net  2007-05-15 18:05 -------
This bug is relevant only for iconv that have the "UCS-2" encoding with byte
order different from native byte order of the platform, e.g. GNU libiconv on
i386.

There are actually two subjects here.

1. Fix source code to use an appropriate for the platform UCS-2* encoding with
native byte order, e.g. UCS-2-INTERNAL for GNU libiconv on i386.
2. Find the bug that may leave command line arguments in wrong byte order and
fix it.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31939


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug libgcj/31939] Command line arguments are byteswapped before being passed to the program runing in custom locale.
  2007-05-15 16:35 [Bug libgcj/31939] New: Command line arguments are byteswapped before being passed to the program runing in custom locale serg at vostok dot net
  2007-05-15 17:05 ` [Bug libgcj/31939] " serg at vostok dot net
@ 2007-05-15 17:11 ` serg at vostok dot net
  2007-05-15 17:33 ` serg at vostok dot net
  2007-05-18 19:07 ` serg at vostok dot net
  3 siblings, 0 replies; 5+ messages in thread
From: serg at vostok dot net @ 2007-05-15 17:11 UTC (permalink / raw)
  To: java-prs



------- Comment #2 from serg at vostok dot net  2007-05-15 18:11 -------
Created an attachment (id=13559)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=13559&action=view)
A hack to use UCS-2-INTERNAL instead of plain UCS-2

For subject 1.
Works only for those who actually have UCS-2-INTERNAL. That's at least all GNU
libiconv users. Shown here to point out that there are only 2 files of source
code to change.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31939


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug libgcj/31939] Command line arguments are byteswapped before being passed to the program runing in custom locale.
  2007-05-15 16:35 [Bug libgcj/31939] New: Command line arguments are byteswapped before being passed to the program runing in custom locale serg at vostok dot net
  2007-05-15 17:05 ` [Bug libgcj/31939] " serg at vostok dot net
  2007-05-15 17:11 ` serg at vostok dot net
@ 2007-05-15 17:33 ` serg at vostok dot net
  2007-05-18 19:07 ` serg at vostok dot net
  3 siblings, 0 replies; 5+ messages in thread
From: serg at vostok dot net @ 2007-05-15 17:33 UTC (permalink / raw)
  To: java-prs



------- Comment #3 from serg at vostok dot net  2007-05-15 18:33 -------
For subject 1.

Can java/gcj even be used without iconv in general?
Considering that java.io.File assumes all file names in OS are encoded in UTF-8
and java.String stores it's data in UCS-2 (UTF-16), the answer should be "NO".

TODO: Add a check in configure for presence of a UCS-2* encoding in iconv.

There we can test several well-known encoding names (UCS-2-INTERNAL, UCS-2-LE,
USC-2-BE, UCS-2LE, UCS-2BE) and set HAVE_UCS2 to the name of one with native
byte order, if found. If not, then test for plain UCS-2. If not found - bail
out with a message about needing an iconv with a UCS-2 for java.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31939


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug libgcj/31939] Command line arguments are byteswapped before being passed to the program runing in custom locale.
  2007-05-15 16:35 [Bug libgcj/31939] New: Command line arguments are byteswapped before being passed to the program runing in custom locale serg at vostok dot net
                   ` (2 preceding siblings ...)
  2007-05-15 17:33 ` serg at vostok dot net
@ 2007-05-18 19:07 ` serg at vostok dot net
  3 siblings, 0 replies; 5+ messages in thread
From: serg at vostok dot net @ 2007-05-18 19:07 UTC (permalink / raw)
  To: java-prs



------- Comment #4 from serg at vostok dot net  2007-05-18 20:07 -------
For subject 2.

The point is to find where arguments of "int main(int argc, char** argv)" are
converted into java.lang.String to be passed to "static void main(String[]
args)".




Trail:


gcc/java/jvgenmain.c: 

int main(int argc,char **argv) 
constructs a C code to call the main java method with
JvRunMain(classname,argc,argv)


libjava/prims.cc:

void JvRunMain (jclass klass, int argc, const char **argv)
simply calls
_Jv_RunMain (klass, NULL, argc, argv, false)

void _Jv_RunMain (jclass klass, const char *name, int argc, const char **argv,
bool is_jar)                             
simply calls                                                                    
_Jv_RunMain (NULL, klass, name, argc, argv, is_jar)

void _Jv_RunMain (JvVMInitArgs *vm_args, jclass klass, const char *name, int
argc, const char **argv, bool is_jar)
calls
JvConvertArgv (argc - 1, argv + 1)

JArray<jstring> * JvConvertArgv (int argc, const char **argv)
copies each argument into jbyteArray bytes, then calls
new java::lang::String (bytes, 0, len)
to make a String from it with a default encoding.


libjava/java/lang/String.java:

public String(byte[] data, int offset, int count)
calls
init (data, offset, count,System.getProperty("file.encoding", "8859_1"))


libjava/java/lang/natString.cc:
void java::lang::String::init (jbyteArray bytes, jint offset, jint count,
jstring encoding)
uses
gnu::gcj::convert::BytesToUnicode::getDecoder(encoding)
to make a converter
and 
converter->read(array, outpos, avail)
to convert data

libjava/gnu/gcj/convert/BytesToUnicode.java:
class BytesToUnicode extends IOConverter 
public static BytesToUnicode getDecoder (String encoding)
uses eigther 
new Input_iconv(encoding)
or
new BytesToCharsetAdaptor(Charset.forName(encoding))

Looks like I will have to test both Input_iconv and BytesToCharsetAdaptor to
see if any of them is buggy.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31939


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2007-05-18 19:07 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-05-15 16:35 [Bug libgcj/31939] New: Command line arguments are byteswapped before being passed to the program runing in custom locale serg at vostok dot net
2007-05-15 17:05 ` [Bug libgcj/31939] " serg at vostok dot net
2007-05-15 17:11 ` serg at vostok dot net
2007-05-15 17:33 ` serg at vostok dot net
2007-05-18 19:07 ` serg at vostok dot net

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).