From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <psmith@gnu.org>
Received: from eggs.gnu.org (eggs.gnu.org [IPv6:2001:470:142:3::10])
 by sourceware.org (Postfix) with ESMTPS id 8F5BE389C420
 for <gdb@sourceware.org>; Tue,  2 Feb 2021 03:21:29 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 8F5BE389C420
Received: from fencepost.gnu.org ([2001:470:142:3::e]:38078)
 by eggs.gnu.org with esmtp (Exim 4.90_1)
 (envelope-from <psmith@gnu.org>) id 1l6mFt-0007kx-8G
 for gdb@sourceware.org; Mon, 01 Feb 2021 22:21:29 -0500
Received: from pool-96-233-64-159.bstnma.fios.verizon.net
 ([96.233.64.159]:33780 helo=pdslaptop.home)
 by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256)
 (Exim 4.82) (envelope-from <psmith@gnu.org>) id 1l6mFs-0002X3-GJ
 for gdb@sourceware.org; Mon, 01 Feb 2021 22:21:28 -0500
Message-ID: <f9ac93d9fe0062a870c2ebe53d21b435c04cf4d1.camel@gnu.org>
Subject: Repro case!  Re: GDB 10.1: Backtrace goes into infinite loop
From: Paul Smith <psmith@gnu.org>
Reply-To: psmith@gnu.org
To: gdb@sourceware.org
Date: Mon, 01 Feb 2021 22:21:27 -0500
In-Reply-To: <503bd54a619aa2781d6b1385cbd3db20634addaa.camel@gnu.org>
References: <503bd54a619aa2781d6b1385cbd3db20634addaa.camel@gnu.org>
Organization: GNU's Not UNIX!
Content-Type: text/plain; charset="UTF-8"
User-Agent: Evolution 3.36.4-0ubuntu1 
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Spam-Status: No, score=-3.0 required=5.0 tests=BAYES_00, KAM_DMARC_STATUS,
 SPF_HELO_PASS, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on
 server2.sourceware.org
X-BeenThere: gdb@sourceware.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gdb mailing list <gdb.sourceware.org>
List-Unsubscribe: <https://sourceware.org/mailman/options/gdb>,
 <mailto:gdb-request@sourceware.org?subject=unsubscribe>
List-Archive: <https://sourceware.org/pipermail/gdb/>
List-Post: <mailto:gdb@sourceware.org>
List-Help: <mailto:gdb-request@sourceware.org?subject=help>
List-Subscribe: <https://sourceware.org/mailman/listinfo/gdb>,
 <mailto:gdb-request@sourceware.org?subject=subscribe>
X-List-Received-Date: Tue, 02 Feb 2021 03:21:31 -0000

Hi all.  I'm back!  I've still not seen this myself but someone
(finally!) provided me with a reproducible test case (see the quoted
message below for all details).  After examining it I am able to come
up with a repro case.

I've discovered that there's something wonky with the embedded Python
interpreter and checking threads while pretty-printing stack variables.
 If I don't load our custom Python macros (so that none of our pretty-
printers are defined) then GDB works fine.  If I load our macros,
Badness Happens.

FYI I'm linking Python 2.7.18 statically with GDB if that matters.

After spelunking my macros I've discovered that I can reproduce the
problem by replacing all our macros with one that does nothing but show
all threads.

Here is the stupid code I'm using:

// ----- foo.cpp
#include <stdlib.h>
#include <iostream>

class Foo
{
public:
    const char* s;
};

void foo(Foo& f)
{
    std::cout << f.s << "\n";
    abort();
}

int main(int argc, const char** argv)
{
    Foo f;
    f.s = argv[1] ? argv[1] : "hi";
    foo(f);
    return 0;
}
// -----

Then I do this:

$ g++ -g -ggdb3 -pthread -o foo foo.cpp

(-pthread is required!!)

Then I run it and get a core:

$ ./foo hiya
hiya
Aborted (core dumped)

Then if I run a simple GDB bt with no Python macros it's fine:

$ gdb -q -batch -ex 'bt' -c core.* foo
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `./foo hiya'.
Program terminated with signal SIGABRT, Aborted.
#0  0x00007f28af56a18b in raise () from /lib/x86_64-linux-gnu/libc.so.6
#0  0x00007f28af56a18b in raise () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007f28af549859 in abort () from /lib/x86_64-linux-gnu/libc.so.6
#2  0x000055e3b4dc4203 in foo (f=...) at foo.cpp:13
#3  0x000055e3b4dc4256 in main (argc=2, argv=0x7ffe1045c1c8) at foo.cpp:20

Now I add in a pretty-printer for the Foo class, with this file:

# ----- foo.py
import gdb

class SyncPrinter(object):
    def __init__(self, val): self.val = val

    def to_string(self):
        gdb.selected_inferior().threads()  # <---- NOTE!!
        return 'hello'

    def display_hint(self): return "Foo"

def mkpp():
    pp = gdb.printing.RegexpCollectionPrettyPrinter("test")
    pp.add_printer('Foo', r'^Foo$', SyncPrinter)
    return pp

gdb.printing.register_pretty_printer(gdb.current_objfile(),
                                     mkpp(), replace=True)
# ----- foo.py

Then I run GDB with this:

$ gdb -q -x foo.py -batch -ex 'bt' -c core.* foo

Now the backtrace loops quite a few times, then sometimes it finishes,
sometimes it dumps core with:

gdb/frame.c:2467: internal-error: bool get_frame_pc_if_available(frame_info*,
CORE_ADDR*): Assertion `frame->next != NULL' failed.

If I modify the pretty-printer to run gdb.selected_inferior() without
the threads() call, then it works again.

If anyone has any advice or thoughts I'm interested!!  Thx.


On Fri, 2020-11-13 at 17:16 -0500, Paul Smith via Gdb wrote:
> Hi all;
> 
> I just upgraded our users from a toolset using GCC 8.1.0, binutils
> 2.30, and GDB 8.2.1, to a new one using GCC 10.2, binutils 2.35.1, and
> GDB 10.1 (on GNU/Linux x86_64).
> 
> Now some of my users are running into a problem where they run the "bt"
> command and it shows some subset of the stack frames, then jumps back
> and starts over printing from frame 0, and does this forever until you
> use ^C to stop it.
> 
> Apparently this doesn't happen every time, and the number of frames
> that are shown are variable (but usually a smaller number like 2 to 5
> frames).  By "not every time" I mean after a breakpoint sometimes we
> get a good bt and sometimes it recurses, but if it recurses for a given
> bt it will always recurse (that is if you use ^C to stop then "bt"
> again it recurses again).
> 
> If we do the same thing with the older GDB (keeping the newer
> compiler/binutils) then we don't see this behavior.
> 
> FWIW, the code in question is C++ code and was compiled with -ggdb3 and
> no optimization.
> 
> Just wondering if anyone has seen something like this, and/or how to
> try to collect more details.
>