From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gdb-patches-return-112732-listarch-gdb-patches=sources.redhat.com@sourceware.org>
Received: (qmail 30837 invoked by alias); 22 May 2014 20:42:49 -0000
Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <gdb-patches.sourceware.org>
List-Subscribe: <mailto:gdb-patches-subscribe@sourceware.org>
List-Archive: <http://sourceware.org/ml/gdb-patches/>
List-Post: <mailto:gdb-patches@sourceware.org>
List-Help: <mailto:gdb-patches-help@sourceware.org>, <http://sourceware.org/ml/#faqs>
Sender: gdb-patches-owner@sourceware.org
Received: (qmail 30821 invoked by uid 89); 22 May 2014 20:42:48 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=1.2 required=5.0 tests=AWL,BAYES_00,KAM_STOCKTIP,RP_MATCHES_RCVD,SPF_HELO_PASS,SPF_PASS autolearn=no version=3.3.2
X-HELO: mx1.redhat.com
Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 22 May 2014 20:42:46 +0000
Received: from int-mx14.intmail.prod.int.phx2.redhat.com (int-mx14.intmail.prod.int.phx2.redhat.com [10.5.11.27])	by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id s4MKggrG024528	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK);	Thu, 22 May 2014 16:42:42 -0400
Received: from blade.nx (ovpn-116-100.ams2.redhat.com [10.36.116.100])	by int-mx14.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id s4MKgdSZ022171;	Thu, 22 May 2014 16:42:40 -0400
Received: by blade.nx (Postfix, from userid 1000)	id B0EC5262416; Thu, 22 May 2014 21:42:36 +0100 (BST)
Date: Thu, 22 May 2014 20:42:00 -0000
From: Gary Benson <gbenson@redhat.com>
To: Mark Kettenis <mark.kettenis@xs4all.nl>
Cc: tromey@redhat.com, palves@redhat.com, fw@deneb.enyo.de,        gdb-patches@sourceware.org
Subject: Re: [PATCH 0/2] Demangler crash handler
Message-ID: <20140522204236.GC28232@blade.nx>
References: <20140509100656.GA4760@blade.nx> <201405091120.s49BKO1f010622@glazunov.sibelius.xs4all.nl> <87fvkhjqvs.fsf@mid.deneb.enyo.de> <53737737.2030901@redhat.com> <87ppj8s7my.fsf@fleche.redhat.com> <20140522140904.GD15598@blade.nx> <201405221440.s4MEeEbx021165@glazunov.sibelius.xs4all.nl>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <201405221440.s4MEeEbx021165@glazunov.sibelius.xs4all.nl>
X-IsSubscribed: yes
X-SW-Source: 2014-05/txt/msg00570.txt.bz2

Mark Kettenis wrote:
> > Date: Thu, 22 May 2014 15:09:04 +0100
> > From: Gary Benson <gbenson@redhat.com>
> > 
> > Tom Tromey wrote:
> > > Pedro> Then stealing a signal handler always has multi-threading
> > > Pedro> considerations.  E.g., gdb Python code could well spawn a
> > > Pedro> thread that happens to call something that wants its own
> > > Pedro> SIGSEGV handler...  Signal handlers are per-process, not
> > > Pedro> per-thread.
> > > 
> > > That is true in theory but I think it is unlikely in practice.
> > > And, should it happen -- well, the onus is on folks writing
> > > extensions not to mess things up.  That's the nature of the
> > > beast.  And, sure, it is messy, particularly if we ever upstream
> > > "import gdb", but even so, signals are just fraught and this is
> > > not an ordinary enough usage to justify preventing gdb from
> > > doing it.
> > 
> > GDB installs handlers for INT, TERM, QUIT, HUP, FPE, WINCH, CONT,
> > TTOU, TRAP, ALRM and TSTP, and some other platform-specific ones
> > I didn't recognise.  Is there anything that means SIGSEGV should
> > be treated differently to all these other signals?
> 
> From that list SIGFPE is probably a bogosity.  I don't think the
> SIGFPE handler will do the right thing on many OSes and
> architectures supported by GDB, since it is unspecified whether the
> trapping instruction will be re-executed upon return from the signal
> handler.  I'd argue that the SIGFPE handler is just as unhelpful as
> the SIGSEGV handler you're proposing.  Luckily, we don't seem to
> have a lot of division-by-zero bugs in the code base.

Fair enough.

> > > The choice is really between SEGV catching and "somebody else
> > > down the road fixes more demangler bugs".
> > 
> > The demangler bugs will get fixed one way or another.  The choice
> > is: do we allow users to continue to use GDB while the bug they've
> > hit is fixed, or, do we make them wait?  In the expectation that
> > they will put their own work aside while they fix GDB instead?
> 
> Unless there is a way to force a core dump (like internal_error()
> offers) with the state at the point of the SIGSEGV in it, yes, we
> need to make them wait or fix it themselves.
> 
> I'd really like to avoid adding a SIGSEGV handler altogether.  But
> I'm willing to compromise if the signal handler offers to
> opportunity to create a core dump.  Now doing so in a signal-safe
> way will be a bit tricky of course.

Thank you for your offer to compromise.  I appreciate that you want to
avoid a SIGSEGV handler, but I don't know of any other way to regain
control of the process after the fault.

Getting core files...

With a disabled-by-default SIGSEGV handler it's easy: there will be a
core file from the initial crash.  With enabled-by-default you'd have
to request the user disable the handler and repeat whatever they did.
This does admit the possibility that there may one day be a bug that
only ever occurred once but was caught by the handler so no core file
could ever be created.

I could put something like this in the signal handler to get a core
file with the correct state from an enabled-by-default handler:

  #ifdef HAVE_WORKING_FORK
    static int core_dumped = 0;

    if (!core_dumped)
      {
        if (fork () == 0)
          dump_core ();

        core_dumped = 1;
      }
  #endif

I could even wrap the entire signal handler and its installation with
#ifdef HAVE_WORKING_FORK, so no attempt was made to handle the signal
if a core file could not be made.

fork is async-signal-safe.  setrlimit and abort are not.  I don't
know what happens after the fork--will the child process will be in
the signal handler or not?--but if this is a possible way forward
for you then I will do some research into this.

I don't think you could *ask* the user if they wanted a core file.
No printing code is safe.  With the code snippet above the user
would get a core file regardless, but that's no different from the
current situation.

As an aside, while calling non-signal-safe functions technically
results in undefined behaviour, I understand the common failure modes
are segfault and deadlock. In either case the signal handler will
still be on the stack, with the demangler failure just below ready to
debug, though for the deadlock case the user would have to intervene
to create the corefile, for example by attaching another GDB and
aborting the inferior manually.

I have to say that in my experience fixing these bugs the core dump
(when provided) has only been of use to discover the symbol that
caused the failure.  At that point I've switched to live debugging;
there's some pretty useful debug code in the demangler but you need
to compile with -DCP_DEMANGLE_DEBUG to use it.

Thanks,
Gary

-- 
http://gbenson.net/