From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from eggs.gnu.org (eggs.gnu.org [IPv6:2001:470:142:3::10]) by sourceware.org (Postfix) with ESMTPS id 8F5BE389C420 for ; Tue, 2 Feb 2021 03:21:29 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 8F5BE389C420 Received: from fencepost.gnu.org ([2001:470:142:3::e]:38078) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1l6mFt-0007kx-8G for gdb@sourceware.org; Mon, 01 Feb 2021 22:21:29 -0500 Received: from pool-96-233-64-159.bstnma.fios.verizon.net ([96.233.64.159]:33780 helo=pdslaptop.home) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1l6mFs-0002X3-GJ for gdb@sourceware.org; Mon, 01 Feb 2021 22:21:28 -0500 Message-ID: Subject: Repro case! Re: GDB 10.1: Backtrace goes into infinite loop From: Paul Smith Reply-To: psmith@gnu.org To: gdb@sourceware.org Date: Mon, 01 Feb 2021 22:21:27 -0500 In-Reply-To: <503bd54a619aa2781d6b1385cbd3db20634addaa.camel@gnu.org> References: <503bd54a619aa2781d6b1385cbd3db20634addaa.camel@gnu.org> Organization: GNU's Not UNIX! Content-Type: text/plain; charset="UTF-8" User-Agent: Evolution 3.36.4-0ubuntu1 MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-3.0 required=5.0 tests=BAYES_00, KAM_DMARC_STATUS, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gdb@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gdb mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 02 Feb 2021 03:21:31 -0000 Hi all. I'm back! I've still not seen this myself but someone (finally!) provided me with a reproducible test case (see the quoted message below for all details). After examining it I am able to come up with a repro case. I've discovered that there's something wonky with the embedded Python interpreter and checking threads while pretty-printing stack variables. If I don't load our custom Python macros (so that none of our pretty- printers are defined) then GDB works fine. If I load our macros, Badness Happens. FYI I'm linking Python 2.7.18 statically with GDB if that matters. After spelunking my macros I've discovered that I can reproduce the problem by replacing all our macros with one that does nothing but show all threads. Here is the stupid code I'm using: // ----- foo.cpp #include #include class Foo { public: const char* s; }; void foo(Foo& f) { std::cout << f.s << "\n"; abort(); } int main(int argc, const char** argv) { Foo f; f.s = argv[1] ? argv[1] : "hi"; foo(f); return 0; } // ----- Then I do this: $ g++ -g -ggdb3 -pthread -o foo foo.cpp (-pthread is required!!) Then I run it and get a core: $ ./foo hiya hiya Aborted (core dumped) Then if I run a simple GDB bt with no Python macros it's fine: $ gdb -q -batch -ex 'bt' -c core.* foo [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". Core was generated by `./foo hiya'. Program terminated with signal SIGABRT, Aborted. #0 0x00007f28af56a18b in raise () from /lib/x86_64-linux-gnu/libc.so.6 #0 0x00007f28af56a18b in raise () from /lib/x86_64-linux-gnu/libc.so.6 #1 0x00007f28af549859 in abort () from /lib/x86_64-linux-gnu/libc.so.6 #2 0x000055e3b4dc4203 in foo (f=...) at foo.cpp:13 #3 0x000055e3b4dc4256 in main (argc=2, argv=0x7ffe1045c1c8) at foo.cpp:20 Now I add in a pretty-printer for the Foo class, with this file: # ----- foo.py import gdb class SyncPrinter(object): def __init__(self, val): self.val = val def to_string(self): gdb.selected_inferior().threads() # <---- NOTE!! return 'hello' def display_hint(self): return "Foo" def mkpp(): pp = gdb.printing.RegexpCollectionPrettyPrinter("test") pp.add_printer('Foo', r'^Foo$', SyncPrinter) return pp gdb.printing.register_pretty_printer(gdb.current_objfile(), mkpp(), replace=True) # ----- foo.py Then I run GDB with this: $ gdb -q -x foo.py -batch -ex 'bt' -c core.* foo Now the backtrace loops quite a few times, then sometimes it finishes, sometimes it dumps core with: gdb/frame.c:2467: internal-error: bool get_frame_pc_if_available(frame_info*, CORE_ADDR*): Assertion `frame->next != NULL' failed. If I modify the pretty-printer to run gdb.selected_inferior() without the threads() call, then it works again. If anyone has any advice or thoughts I'm interested!! Thx. On Fri, 2020-11-13 at 17:16 -0500, Paul Smith via Gdb wrote: > Hi all; > > I just upgraded our users from a toolset using GCC 8.1.0, binutils > 2.30, and GDB 8.2.1, to a new one using GCC 10.2, binutils 2.35.1, and > GDB 10.1 (on GNU/Linux x86_64). > > Now some of my users are running into a problem where they run the "bt" > command and it shows some subset of the stack frames, then jumps back > and starts over printing from frame 0, and does this forever until you > use ^C to stop it. > > Apparently this doesn't happen every time, and the number of frames > that are shown are variable (but usually a smaller number like 2 to 5 > frames). By "not every time" I mean after a breakpoint sometimes we > get a good bt and sometimes it recurses, but if it recurses for a given > bt it will always recurse (that is if you use ^C to stop then "bt" > again it recurses again). > > If we do the same thing with the older GDB (keeping the newer > compiler/binutils) then we don't see this behavior. > > FWIW, the code in question is C++ code and was compiled with -ggdb3 and > no optimization. > > Just wondering if anyone has seen something like this, and/or how to > try to collect more details. >