public inbox for java@gcc.gnu.org
 help / color / mirror / Atom feed
* libSegFault.so and gcj
@ 2009-05-14 15:28 Ben Gardiner
  2009-05-14 16:04 ` Andrew Haley
  2009-05-14 16:10 ` David Daney
  0 siblings, 2 replies; 6+ messages in thread
From: Ben Gardiner @ 2009-05-14 15:28 UTC (permalink / raw)
  To: GCJ

Hello all,

We are running a gcj-compiled application on an embedded platform
(MPC852T). For reference our versions are gcc-4.0.1, glibc-2.3.3 and
linux-2.4.24 -- I know these versions are ancient, but please don't stop
reading here.

We sometimes encounter segfaults in our application; that is to say that
it will terminate with 'Segmentation fault' on the console and return
139. These occur rather infrequently, and we have yet to find a reliable
way to reproduce them. To make things more difficult, we do not have
room for core dumps on our filesystem.

I thought that we could get the some information about these segfaults
by using the preload library libSegFault.so; I tested it and integrated
it with our init scripts and let it loose into our releases hoping that
a backtrace or two would come back to me. None did; there was no output
produced by libSegFault.so at all.

I think that since gcj registers its own segfault handler which
translates segv signals into NullPointerExceptions, the original signals
never make it to libSegfault's handler. Gcj registers its handler,
catch_segv (from prims.cc:146 in our version of gcj), in INIT_SEGV
(powerpc-signal.h:62) called from _Jv_CreateJavaVM (prims.cc:1211). Here
is a snippet of INIT_SEGV:

#define INIT_SEGV                            \
do                                    \
  {                                    \
    struct kernel_old_sigaction kact;                    \
    kact.k_sa_handler = catch_segv;                    \
    kact.k_sa_mask = 0;                            \
    kact.k_sa_flags = 0;                        \
    if (syscall (SYS_sigaction, SIGSEGV, &kact, NULL) != 0)        \
      __asm__ __volatile__ (".long 0");                    \
  }                                    \
while (0)

and of catch_segv:

SIGNAL_HANDLER (catch_segv)
{
  java::lang::NullPointerException *nullp
    = new java::lang::NullPointerException;
  unblock_signal (SIGSEGV);
  MAKE_THROW_FRAME (nullp);
  throw nullp;
}

I don't know a whole lot about signal handlers -- please correct me if
I'm wrong: I think that since the syscall (SYS_sigaction,...) passes
NULL as the fourth argument, that gcj is disregarding the presence of
any previously registered signal handlers. I also think that since the
flags are zero that catch_segv is executed on the same stack as the
process that threw the signal instead of the alternate stack.

I reason from this that the segfaults are likely stack overflows. Could
anyone confirm this?

Could we patch INIT_SEGV somehow so that signals not caught by
catch_segv will be passed up so that libSegFault.so can catch them? Is
there another way to catch the cause of these segfaults?

Regards,

Ben Gardiner
Nanometrics Seismological Instruments
250 Herzberg Rd
Kanata ON CA
K2K 2A1
613 592 6776 x239

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: libSegFault.so and gcj
  2009-05-14 15:28 libSegFault.so and gcj Ben Gardiner
@ 2009-05-14 16:04 ` Andrew Haley
  2009-05-15  9:17   ` Andrew Haley
  2009-05-15 20:23   ` Ben Gardiner
  2009-05-14 16:10 ` David Daney
  1 sibling, 2 replies; 6+ messages in thread
From: Andrew Haley @ 2009-05-14 16:04 UTC (permalink / raw)
  To: Ben Gardiner; +Cc: GCJ

Ben Gardiner wrote:

> We are running a gcj-compiled application on an embedded platform
> (MPC852T). For reference our versions are gcc-4.0.1, glibc-2.3.3 and
> linux-2.4.24 -- I know these versions are ancient, but please don't stop
> reading here.
> 
> We sometimes encounter segfaults in our application; that is to say that
> it will terminate with 'Segmentation fault' on the console and return
> 139. These occur rather infrequently, and we have yet to find a reliable
> way to reproduce them. To make things more difficult, we do not have
> room for core dumps on our filesystem.
> 
> I thought that we could get the some information about these segfaults
> by using the preload library libSegFault.so; I tested it and integrated
> it with our init scripts and let it loose into our releases hoping that
> a backtrace or two would come back to me. None did; there was no output
> produced by libSegFault.so at all.
> 
> I think that since gcj registers its own segfault handler which
> translates segv signals into NullPointerExceptions, the original signals
> never make it to libSegfault's handler. Gcj registers its handler,
> catch_segv (from prims.cc:146 in our version of gcj), in INIT_SEGV
> (powerpc-signal.h:62) called from _Jv_CreateJavaVM (prims.cc:1211). Here
> is a snippet of INIT_SEGV:
> 
> #define INIT_SEGV                            \
> do                                    \
>  {                                    \
>    struct kernel_old_sigaction kact;                    \
>    kact.k_sa_handler = catch_segv;                    \
>    kact.k_sa_mask = 0;                            \
>    kact.k_sa_flags = 0;                        \
>    if (syscall (SYS_sigaction, SIGSEGV, &kact, NULL) != 0)        \
>      __asm__ __volatile__ (".long 0");                    \
>  }                                    \
> while (0)
> 
> and of catch_segv:
> 
> SIGNAL_HANDLER (catch_segv)
> {
>  java::lang::NullPointerException *nullp
>    = new java::lang::NullPointerException;
>  unblock_signal (SIGSEGV);
>  MAKE_THROW_FRAME (nullp);
>  throw nullp;
> }
> 
> I don't know a whole lot about signal handlers -- please correct me if
> I'm wrong: I think that since the syscall (SYS_sigaction,...) passes
> NULL as the fourth argument, that gcj is disregarding the presence of
> any previously registered signal handlers.

Correct.  gcj treats all segfaults as null pointer exceptions.

> I also think that since the
> flags are zero that catch_segv is executed on the same stack as the
> process that threw the signal instead of the alternate stack.

Also correct.

> I reason from this that the segfaults are likely stack overflows. Could
> anyone confirm this?

That's quite possible.  Do you not have a debugger?

Clearly if it really is a stack overflow then you're not going to be
able to call the null pointer handler.  There is a way around this,
though.  If you use the -fstack-check option gcc generates a probe
at the start of every method that writes a zero some 12kbytes below
the stack pointer.  This will give you enough stack space for the
catch_segv handler to run.

> Could we patch INIT_SEGV somehow so that signals not caught by
> catch_segv will be passed up so that libSegFault.so can catch them?

No.  They're all caught.

Andrew.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: libSegFault.so and gcj
  2009-05-14 15:28 libSegFault.so and gcj Ben Gardiner
  2009-05-14 16:04 ` Andrew Haley
@ 2009-05-14 16:10 ` David Daney
  1 sibling, 0 replies; 6+ messages in thread
From: David Daney @ 2009-05-14 16:10 UTC (permalink / raw)
  To: Ben Gardiner; +Cc: GCJ

Ben Gardiner wrote:
> Hello all,
> 
> We are running a gcj-compiled application on an embedded platform
> (MPC852T). For reference our versions are gcc-4.0.1, glibc-2.3.3 and
> linux-2.4.24 -- I know these versions are ancient, but please don't stop
> reading here.
> 
> We sometimes encounter segfaults in our application; that is to say that
> it will terminate with 'Segmentation fault' on the console and return
> 139. These occur rather infrequently, and we have yet to find a reliable
> way to reproduce them. To make things more difficult, we do not have
> room for core dumps on our filesystem.
> 
> I thought that we could get the some information about these segfaults
> by using the preload library libSegFault.so; I tested it and integrated
> it with our init scripts and let it loose into our releases hoping that
> a backtrace or two would come back to me. None did; there was no output
> produced by libSegFault.so at all.
> 
> I think that since gcj registers its own segfault handler which
> translates segv signals into NullPointerExceptions,
> 

That's right.

Usually if you die with a SIGSEGV, it is due to stack overflow.
Probably for one reason or another you are getting a fault during the
NullPointerException processing which causes the signal handler to be
reentered recursively.  This goes on until the stack overflows and the
kernel then kills the process.  If you could attach a debugger to the
process, that might shed some light on exactly what is happening.
Assuming that it is not normal for your application to take
NullPointerExceptions it shouldn't be too tedious.

David Daney


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: libSegFault.so and gcj
  2009-05-14 16:04 ` Andrew Haley
@ 2009-05-15  9:17   ` Andrew Haley
  2009-05-15 20:23   ` Ben Gardiner
  1 sibling, 0 replies; 6+ messages in thread
From: Andrew Haley @ 2009-05-15  9:17 UTC (permalink / raw)
  To: Ben Gardiner; +Cc: GCJ

Andrew Haley wrote:
> Ben Gardiner wrote:
> 
>> We are running a gcj-compiled application on an embedded platform
>> (MPC852T). For reference our versions are gcc-4.0.1, glibc-2.3.3 and
>> linux-2.4.24 -- I know these versions are ancient, but please don't stop
>> reading here.
>>
>> We sometimes encounter segfaults in our application; that is to say that
>> it will terminate with 'Segmentation fault' on the console and return
>> 139. These occur rather infrequently, and we have yet to find a reliable
>> way to reproduce them. To make things more difficult, we do not have
>> room for core dumps on our filesystem.
>>
>> I thought that we could get the some information about these segfaults
>> by using the preload library libSegFault.so; I tested it and integrated
>> it with our init scripts and let it loose into our releases hoping that
>> a backtrace or two would come back to me. None did; there was no output
>> produced by libSegFault.so at all.
>>
>> I think that since gcj registers its own segfault handler which
>> translates segv signals into NullPointerExceptions, the original signals
>> never make it to libSegfault's handler. Gcj registers its handler,
>> catch_segv (from prims.cc:146 in our version of gcj), in INIT_SEGV
>> (powerpc-signal.h:62) called from _Jv_CreateJavaVM (prims.cc:1211). Here
>> is a snippet of INIT_SEGV:
>>
>> #define INIT_SEGV                            \
>> do                                    \
>>  {                                    \
>>    struct kernel_old_sigaction kact;                    \
>>    kact.k_sa_handler = catch_segv;                    \
>>    kact.k_sa_mask = 0;                            \
>>    kact.k_sa_flags = 0;                        \
>>    if (syscall (SYS_sigaction, SIGSEGV, &kact, NULL) != 0)        \
>>      __asm__ __volatile__ (".long 0");                    \
>>  }                                    \
>> while (0)
>>
>> and of catch_segv:
>>
>> SIGNAL_HANDLER (catch_segv)
>> {
>>  java::lang::NullPointerException *nullp
>>    = new java::lang::NullPointerException;
>>  unblock_signal (SIGSEGV);
>>  MAKE_THROW_FRAME (nullp);
>>  throw nullp;
>> }
>>
>> I don't know a whole lot about signal handlers -- please correct me if
>> I'm wrong: I think that since the syscall (SYS_sigaction,...) passes
>> NULL as the fourth argument, that gcj is disregarding the presence of
>> any previously registered signal handlers.
> 
> Correct.  gcj treats all segfaults as null pointer exceptions.
> 
>> I also think that since the
>> flags are zero that catch_segv is executed on the same stack as the
>> process that threw the signal instead of the alternate stack.
> 
> Also correct.
> 
>> I reason from this that the segfaults are likely stack overflows. Could
>> anyone confirm this?
> 
> That's quite possible.  Do you not have a debugger?
> 
> Clearly if it really is a stack overflow then you're not going to be
> able to call the null pointer handler.  There is a way around this,
> though.  If you use the -fstack-check option gcc generates a probe
> at the start of every method that writes a zero some 12kbytes below
> the stack pointer.  This will give you enough stack space for the
> catch_segv handler to run.

Although you'll have to make very sure that the catch_segv handler is built
*without* the -fstack-check option !

One other thing that you can use to detect stack overflow: compile with
`-finstrument-functions'.  This might be the easiest way to do it.

Andrew.



Andrew.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: libSegFault.so and gcj
  2009-05-14 16:04 ` Andrew Haley
  2009-05-15  9:17   ` Andrew Haley
@ 2009-05-15 20:23   ` Ben Gardiner
  2009-05-15 20:49     ` David Daney
  1 sibling, 1 reply; 6+ messages in thread
From: Ben Gardiner @ 2009-05-15 20:23 UTC (permalink / raw)
  To: Andrew Haley; +Cc: GCJ

Andrew Haley wrote:
>> I reason from this that the segfaults are likely stack overflows. Could
>> anyone confirm this?
>>     
> That's quite possible.  Do you not have a debugger?
>
> Clearly if it really is a stack overflow then you're not going to be
> able to call the null pointer handler.  There is a way around this,
> though.  If you use the -fstack-check option gcc generates a probe
> at the start of every method that writes a zero some 12kbytes below
> the stack pointer.  This will give you enough stack space for the
> catch_segv handler to run.
David Daney wrote:
> Usually if you die with a SIGSEGV, it is due to stack overflow.
> Probably for one reason or another you are getting a fault during the
> NullPointerException processing which causes the signal handler to be
> reentered recursively.  This goes on until the stack overflows and the
> kernel then kills the process.  If you could attach a debugger to the
> process, that might shed some light on exactly what is happening.
> Assuming that it is not normal for your application to take
> NullPointerExceptions it shouldn't be too tedious.
Andrew and David, thank you for your insights and for the speed with 
which they were provided.

About the debugger; I agree it would be the easiest way to figure out 
what's going on here. Since I don't know how to reproduce the problem I 
was hoping to get some information from our devices in the field if and 
when they die of a segmentation fault.

Would it be possible -- and if so, are there any significant drawbacks 
-- to store the previous handler in INIT_SEGV and register it when 
catch_segv is entered, then re-register catch_segv on the way out? Would 
this allow the segv signal to be passed up to libSegFault.so's handler 
when it would have otherwise resulted in a recursive dead-end?

,Ben Gardiner

 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: libSegFault.so and gcj
  2009-05-15 20:23   ` Ben Gardiner
@ 2009-05-15 20:49     ` David Daney
  0 siblings, 0 replies; 6+ messages in thread
From: David Daney @ 2009-05-15 20:49 UTC (permalink / raw)
  To: Ben Gardiner; +Cc: Andrew Haley, GCJ

Ben Gardiner wrote:
> Andrew Haley wrote:
>>> I reason from this that the segfaults are likely stack overflows. Could
>>> anyone confirm this?
>>>     
>> That's quite possible.  Do you not have a debugger?
>>
>> Clearly if it really is a stack overflow then you're not going to be
>> able to call the null pointer handler.  There is a way around this,
>> though.  If you use the -fstack-check option gcc generates a probe
>> at the start of every method that writes a zero some 12kbytes below
>> the stack pointer.  This will give you enough stack space for the
>> catch_segv handler to run.
> David Daney wrote:
>> Usually if you die with a SIGSEGV, it is due to stack overflow.
>> Probably for one reason or another you are getting a fault during the
>> NullPointerException processing which causes the signal handler to be
>> reentered recursively.  This goes on until the stack overflows and the
>> kernel then kills the process.  If you could attach a debugger to the
>> process, that might shed some light on exactly what is happening.
>> Assuming that it is not normal for your application to take
>> NullPointerExceptions it shouldn't be too tedious.
> Andrew and David, thank you for your insights and for the speed with 
> which they were provided.
> 
> About the debugger; I agree it would be the easiest way to figure out 
> what's going on here. Since I don't know how to reproduce the problem I 
> was hoping to get some information from our devices in the field if and 
> when they die of a segmentation fault.
> 
> Would it be possible -- and if so, are there any significant drawbacks 
> -- to store the previous handler in INIT_SEGV and register it when 
> catch_segv is entered, then re-register catch_segv on the way out? Would 
> this allow the segv signal to be passed up to libSegFault.so's handler 
> when it would have otherwise resulted in a recursive dead-end?
> 

I think it would be very difficult to get that to work.  You would have 
to restore the handler in catch_segv, but where would you reregister 
catch_segv?  Actually I am not sure, but it is in either the java 
personality routine or the unwinder in libgcc.  The problem is:  what do 
you do for a multi-threaded application?  Thread A has to be able to 
handle SIGSEGV while thread B is unwinding its own exceptions.  Making 
sure that there were no race conditions could be difficult.

David Daney

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2009-05-15 20:49 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-05-14 15:28 libSegFault.so and gcj Ben Gardiner
2009-05-14 16:04 ` Andrew Haley
2009-05-15  9:17   ` Andrew Haley
2009-05-15 20:23   ` Ben Gardiner
2009-05-15 20:49     ` David Daney
2009-05-14 16:10 ` David Daney

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).