From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 18098 invoked by alias); 19 Jan 2004 00:30:24 -0000 Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Archive: List-Post: List-Help: Sender: gcc-bugs-owner@gcc.gnu.org Received: (qmail 18085 invoked by alias); 19 Jan 2004 00:30:22 -0000 Date: Mon, 19 Jan 2004 00:30:00 -0000 Message-ID: <20040119003022.18083.qmail@sources.redhat.com> From: "mec dot gnu at mindspring dot com" To: gcc-bugs@gcc.gnu.org In-Reply-To: <20040116155120.13708.mec.gnu@mindspring.com> References: <20040116155120.13708.mec.gnu@mindspring.com> Reply-To: gcc-bugzilla@gcc.gnu.org Subject: [Bug libgcj/13708] [3.4/3.5 regression] java program crashes at startup, UTF-8 environment X-Bugzilla-Reason: CC X-SW-Source: 2004-01/txt/msg02158.txt.bz2 List-Id: ------- Additional Comments From mec dot gnu at mindspring dot com 2004-01-19 00:30 ------- Subject: Re: [3.4/3.5 regression] java program crashes at startup, UTF-8 environment jmisc.current.pinskia.exe executes with no segfault on my system. I figured out the difference between jmisc.current.pinskia.exe and jmisc.current.static.exe. === Short version of the analysis: . _Jv_CreateJavaVM calls java.lang.ClassLoader. . java.lang.ClassLoader. calls java.lang.VMClassLoader.getSystemClassLoader . java.lang.VMClassLoader.getSystemClassLoader calls System.getProperty . java.lang.System.getProperty invokes java.lang.System. . java.lang.System. constructs some java.io.PrintStream objects . java.io.PrintStream.PrintStream calls gnu.gcj.convert.UnicodeToBytes.getDefaultEncoder . getDefaultEncoder loads a class whose name depends on the configuration options of libjava as well as the environment. . This class is "gnu.gcj.convert.Output_" + getProperty("file.encoding") . In jmisc.current.pinskia.exe: . getProperty("file.encoding") is "8859-1". . getDefaultEncoder tries to load "gnu.gcj.convert.Output_8859_1". . This class is already present in the static-linked executable. . Works fine. . In jmisc.current.static.exe: . getProperty("file.encoding") is "UTF-8". . getDefaultEncoder tries to load "gnu.gcj.convert.Output_UTF8". . This class is not already present in the static-linked executable. . So _Jv_FindClass uses the system class loader to go get it. . We are still in java.lang.ClassLoader., remember! . So the recursive call to ClassLoader.getSystemClassLoader returns NULL! . _Jv_FindClass segfaults on sys->loadClass because sys is NULL! Here is the difference between Andrew's executable and my executable. jmisc.current.pinskia.exe was built with a libgcj where the system property "file.encoding" is statically set to "8859-1". jmisc.current.static.exe was built with a libgcj where the system property "file.encoding" is dynamically initialized with a call to "setlocale", and my "file.encoding" is "UTF-8". This difference comes from a configuration check in natRuntime.cc for DEFAULT_FILE_ENCODING. >>From there, it turns out that gnu.gcj.convert.Output_8859_1 is statically linked into the program, but gnu.gcj.convert.Output_UTF8 is not. And the unicode code is running inside ClassLoader., so if it reaches for an output converter that is not already statically linked, then it dies. === Long version of the analysis: Start with this code in natRuntime.cc in gcc/libjava/java/lang/natRuntime.cc: #if ! defined (DEFAULT_FILE_ENCODING) && defined (HAVE_ICONV) \ && defined (HAVE_NL_LANGINFO) static char * file_encoding () { setlocale (LC_CTYPE, ""); char *e = nl_langinfo (CODESET); if (e == NULL || *e == '\0') e = "8859_1"; return e; } #define DEFAULT_FILE_ENCODING file_encoding () #endif #ifndef DEFAULT_FILE_ENCODING #define DEFAULT_FILE_ENCODING "8859_1" #endif static char *default_file_encoding = DEFAULT_FILE_ENCODING; In my executable, jmisc.current.static.exe, the "file_encoding" function is defined. There is a global ctor for "default_file_encoding". The global ctor runs at the right time and initializes "default_file_encoding" to the string "UTF-8". In your executable, jmisc.current.pinskia.exe, there is no "file_encoding" function. The string "default_file_encoding" is statically initialized to "8859_1". Note that this initialization depends on the values of HAVE_ICONV and HAVE_NL_LANGINFO when gcj was built. Next, java::lang::Runtime::insertSystemProperties does a simple: SET ("file.encoding", default_file_encoding); Later on, gnu.gcj.convert.UnicodeToBytes gets called. Breakpoint on that and do a stack trace. This is the killer stack trace! Here is the stack trace in both executables, jmisc.current.static.exe and jmisc.current.pinskia.exe. #0 gnu.gcj.convert.UnicodeToBytes.getDefaultEncoder #1 java.io.PrintStream.PrintStream #2 java.lang.System. #3 java::lang::Class::initializeClass #4 java.lang.System.getProperty #5 java.lang.VMClassLoader.getSystemClassLoader #6 java.lang.ClassLoader. #7 java::lang::Class::initializeClasss #8 _Jv_CreateJavaVM #9 _Jv_RunMain #10 JvRunMain #11 main See, the runtime is still initializing ClassLoader.. ClassLoader.systemClassLoader is going to be NULL until this initialization is finished. ClassLoader. has dragged in a bunch of other run-time initialization. getSystemClassLoader called get Property, which initializes System.. System. constructs the three standard streams (standard input, standard output, standard error). That invokes the Unicode encoder. UnicodeToBytes.getDefaultEncoder says: if (defaultEncoding == null) { String encoding = canonicalize (System.getProperty("file.encoding", "8859_1")); String className = "gnu.gcj.convert.Output_" + encoding; try { Class defaultEncodingClass = Class.forName(className); defaultEncoding = encoding; } In jmisc.current.pinskia.exe, the property "file.encoding" has the value "8859-1". className is "gnu.gcj.convert.Output_8859_1". In my executable, jmisc.current.static.exe, the property "file.encoding" has the value "UTF-8". className is "gnu.gcj.convert.Output_UTF8". Next, getDefaultEncoder calls Class.forName(className). This gets down to _Jv_FindClass. Look at _Jv_FindClass (in natClassLoader.cc): jclass _Jv_FindClass (_Jv_Utf8Const *name, java::lang::ClassLoader *loader) { jclass klass = _Jv_FindClassInCache (name, loader); if (! klass) { jstring sname = _Jv_NewStringUTF (name->data); java::lang::ClassLoader *sys = java::lang::ClassLoader::getSystemClassLoader (); if (loader) { ... } else { // Load using the bootstrap loader jvmspec 5.3.1. klass = sys->loadClass (sname, false); // Register that we're an initiating loader. if (klass) _Jv_RegisterInitiatingLoader (klass, 0); } If klass is in the cache, then _Jv_FindClass is happy and just returns. If klass is not in the cache, and loader is NULL (which it is), then this code attempts to use the system loader. But the system loader is NULL because we are still initializing it! See the stack trace above. That causes a segfault on "sys->loadClass (...)". So what determines whether a class is in the cache? _Jv_FindClassInCache has two hash tables, "loaded_classes" and "initiated_classes". If you break on _Jv_RegisterClassHookDefault, you can see this stack trace: _Jv_RegisterClassHookDefault _Jv_RegisterClasses frame_dummy completed.1 frame_dummy is in gcc/crtstuff.c. It calls _Jv_RegisterClasses on __JCR_LIST__ to register all the classes in __JCR_LIST__. This is simply the list of classes that are linked into the executable. I dumped all the classes in __JCR_LIST__ in jmisc.current.static.exe. There are 664 classes. The classes from gnu.gcj.convert are: gnu.gcj.convert.BytesToUnicode gnu.gcj.convert.Input_8859_1 gnu.gcj.convert.Input_iconv gnu.gcj.convert.IOConverter gnu.gcj.convert.UnicodeToBytes gnu.gcj.convert.Output_8859_1 gnu.gcj.convert.Output_iconv My executable does not have gnu.gcj.convert.Output_UTF8 linked in. So when getDefaultEncoder attempts to dynamically load it, it crashes, because getSystemClassLoader is still being initialized. Next, let me change the example program slightly: import gnu.gcj.convert.*; public class j2 { public static void main (String[] args) { Output_UTF8 foo = new gnu.gcj.convert.Output_UTF8 (); return; } } This program works! This code will work if any of these conditions are true: The executable is linked with a shared libgcj.so. I suspect that in a shared library, every class is in the __JCR_LIST__ for that library, whether it is used or not (how else could it work)? That is jmisc.current.shared.exe. The executable is built with a Java that was configured such that default_class_name is always "8859_1" instead of dynamically fetched from the locale. That is jmisc.current.pinskia.exe. The program contains an explicit reference to the output converter, such gnu.gcj.convert.Output_UTF8, or whatever the user will select at runtime via the locale environment variables. === How To Reproduce This Make sure that libgcj is built on a system where HAVE_ICONV and HAVE_NL_LANGINFO are true. Then build jmisc.exe with -static. Set $LANG to "en_US.UTF-8" and run the test program. It would be handy to change the test program to print the value of java.lang.System.getProperty("file.encoding"). === How To Fix This This needs some thought. ClassLoader. wants to run very early. While this is running, it is restricted to classes that are linked into the program, either statically or shared. It can't class-load any classes. This is the cycle: ClassLoader. calls VmClassLoader.getSystemClassLoader getSystemClassLoaders calls System.getProperty System.getProperty invokes System. System. constructs some new PrintStream's PrintStream.PrintStream drags in the unicode stuff unicode stuff needs the class loader We could break this cycle at any of those links. My first idea is to change VMClassLoader.getSystemClassLoader to use some low-level function rather than System.getProperty, so that it does not depend on all of System.. That's a lot of code! My second idea is to re-organize System so that System. is more light weight and safe to invoke before the class loader has been initialized. Another idea is to change UnicodeToBytes.getDefaultEncoder so that it uses if/else logic rather than calling Class.forName. getDefaultEncoder is invoked during class loader initialization, so it has to to work without using the class loader: String encoding = canonicalize (System.getProperty("file.encoding", "8859_1")); if (encoding == "8859_1" ) { return new Output_8859_1 (); } else if (encoding == "ASCII" ) { return new Output_ASCII (); } else if (encoding == "EUCJIS" ) { return new Output_EUCJIS (); } else if (encoding == "JavaSrc" ) { return new Output_JavaSrc (); } else if (encoding == "SJIS" ) { return new Output_SJIS (); } else if (encoding == "UTF8" ) { return new Output_UTF8 (); } else if (encoding == "iconv" ) { return new Output_iconv (); } else { throw new NoClassDefFoundError ( "missing default encoding " + encoding " + " (class " + className + "not found)"; } Yeah, it's ugly. From my perspective, the ugly part is that so much code is invoked before the class loader is initialized. Alternatively, you could declare that "-static" does not work with libgcj, and libgcj must be a shared library. It would be nice to have a better diagnostic than a core dump. Specifically, if native C++ code is about to use a NULL system class loader, than throw a diagnostic rather than dereferencing a NULL pointer. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=13708