From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm1-x342.google.com (mail-wm1-x342.google.com [IPv6:2a00:1450:4864:20::342]) by sourceware.org (Postfix) with ESMTPS id 5A2613860017 for ; Thu, 18 Jun 2020 17:41:02 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 5A2613860017 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=embecosm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=andrew.burgess@embecosm.com Received: by mail-wm1-x342.google.com with SMTP id l26so5982491wme.3 for ; Thu, 18 Jun 2020 10:41:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=embecosm.com; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=8LshtYL4OU7uDi7lacQQGyE5MvwmA7WmUulHMqaeKpQ=; b=HwHMYKr4/1dxe5mM/TBx6V/Y6EUuAwQNgjlWAS09wM3xJxo/YWyne+3SArWhaynsxZ b3BoUkqBAhQxfi3A80jxRwEoNCS2GXWu9N8G/KKGyCyzOuU/PBLx7ZU2pwZRWpmuYI9Z BmsALqXjzeGsLThG7dxPkiSArIO08x6Hxb7Rc+AxruqgF/YRgB+j942PGtRnybm4/oJp 3togm6ExMswLRMN3Ha18ChfOfxz9qgL5lgYd24AbX6SQ2dgauTuEsHmLBb9vWBFH0mR/ UE0x9qJ1t5Oiq1fqfFfq1G/UV9ZnnDc4VvutVqRJFyxOa7RNvgdk4mJhyv4PSBHc82PU y9GQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=8LshtYL4OU7uDi7lacQQGyE5MvwmA7WmUulHMqaeKpQ=; b=SjwXrla0ry5UosaUz+uJDHCi6wSty+/xMMp7UyA8ZZWEIp2PPkhX4xSoukX291r/lI BHM2AXrcuh+qNFw89zKRlgihGfn7v/oS6bvWPUwJxc404uK5U60faAMKyTERrPhSEwhn TMi7YGKhC9dIbI6EEqwHLFodjhumtI6h0wwKDrtPxIUiphrVXiPYjlYHzUWW4eJ3obbB R/+DmZmMw5x8oQ1WYUodzUz5TNxHSmHcGz23OZJOil4zpYDlMBB9REhzAbRk4r2Cj5JW 5ABn5CP/723B30u9WvJ9hvhEWjzvnY6c/8HSgg6KOrIxrjJjFCFYa52LoMpwigTnpV8B 8vUA== X-Gm-Message-State: AOAM533A0bSj64d3UYi8tErafRWnl7g+yMv9/qjKQK/PupDy9vlbL9TV yb42i6Bu/Bl+JU6Pv6v9kqvQaF66E0g= X-Google-Smtp-Source: ABdhPJw04okmtDtQ9OpdnsLqAxAc28EMaZYRS4BhMLm+3nzT1djAdPsUWoWAQiWPy8g/cdVfbmPpkQ== X-Received: by 2002:a1c:1b13:: with SMTP id b19mr5067122wmb.84.1592502061274; Thu, 18 Jun 2020 10:41:01 -0700 (PDT) Received: from localhost (host86-128-12-16.range86-128.btcentralplus.com. [86.128.12.16]) by smtp.gmail.com with ESMTPSA id n204sm4747275wma.5.2020.06.18.10.41.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 18 Jun 2020 10:41:00 -0700 (PDT) Date: Thu, 18 Jun 2020 18:40:59 +0100 From: Andrew Burgess To: Luis Machado Cc: gdb-patches@sourceware.org, tromey@adacore.com Subject: Re: [PATCH] Fix inline frame unwinding breakage Message-ID: <20200618174059.GA2737@embecosm.com> References: <20200414213137.24015-1-luis.machado@linaro.org> <20200414213836.24370-1-luis.machado@linaro.org> <20200618165855.GY2737@embecosm.com> <20200618172922.GZ2737@embecosm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200618172922.GZ2737@embecosm.com> X-Operating-System: Linux/5.6.15-200.fc31.x86_64 (x86_64) X-Uptime: 18:31:51 up 10 days, 7:38, X-Editor: GNU Emacs [ http://www.gnu.org/software/emacs ] X-Spam-Status: No, score=-9.7 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_BARRACUDACENTRAL, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gdb-patches@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gdb-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 18 Jun 2020 17:41:04 -0000 * Andrew Burgess [2020-06-18 18:29:22 +0100]: > * Andrew Burgess [2020-06-18 17:58:55 +0100]: > > > * Luis Machado via Gdb-patches [2020-04-14 18:38:36 -0300]: > > > > > *** re-sending due to the poor choice of characters for the backtrace > > > annotations. GIT swallowed parts of it. > > > > > > There has been some breakage for aarch64-linux, arm-linux and s390-linux in > > > terms of inline frame unwinding. There may be other targets, but these are > > > the ones i'm aware of. > > > > > > The following testcases started to show numerous failures and trigger internal > > > errors in GDB after commit 1009d92fc621bc4d017029b90a5bfab16e17fde5, > > > "Find tailcall frames before inline frames". > > > > > > gdb.opt/inline-break.exp > > > gdb.opt/inline-cmds.exp > > > gdb.python/py-frame-inline.exp > > > gdb.reverse/insn-reverse.exp > > > > > > The internal errors were of this kind: > > > > > > binutils-gdb/gdb/frame.c:579: internal-error: frame_id get_frame_id(frame_info*): Assertion `fi->level == 0' failed. > > > > > > After a lengthy investigation to try and find the cause of these assertions, > > > it seems we're dealing with some fragile/poorly documented code to handle inline > > > frames and we are attempting to unwind from this fragile section of code. > > > > > > Before commit 1009d92fc621bc4d017029b90a5bfab16e17fde5, the tailcall sniffer > > > was invoked from dwarf2_frame_prev_register. By the time we invoke the > > > dwarf2_frame_prev_register function, we've probably already calculated the > > > frame id (via compute_frame_id). > > > > > > After said commit, the call to dwarf2_tailcall_sniffer_first was moved to > > > dwarf2_frame_cache. This is very early in a frame creation process, and > > > we're still calculating the frame ID (so compute_frame_id is in the call > > > stack). > > > > > > This would be fine for regular frames, but the above testcases all deal > > > with some inline frames. > > > > > > The particularity of inline frames is that their frame ID's depend on > > > the previous frame's ID, and the previous frame's ID relies in the inline > > > frame's registers. So it is a bit of a messy situation. > > > > > > We have comments in various parts of the code warning about some of these > > > particularities. > > > > > > In the case of dwarf2_tailcall_sniffer_first, we attempt to unwind the PC, > > > which goes through various functions until we eventually invoke > > > frame_unwind_got_register. This function will eventually attempt to create > > > a lazy value for a particular register, and this lazy value will require > > > a valid frame ID. Since the inline frame doesn't have a valid frame ID > > > yet (remember we're still calculating the previous frame's ID so we can tell > > > what the inline frame ID is) we will call compute_frame_id for the inline > > > frame (level 0). > > > > > > We'll eventually hit the assertion above, inside get_frame_id: > > > > > > -- > > > /* If we haven't computed the frame id yet, then it must be that > > > this is the current frame. Compute it now, and stash the > > > result. The IDs of other frames are computed as soon as > > > they're created, in order to detect cycles. See > > > get_prev_frame_if_no_cycle. */ > > > gdb_assert (fi->level == 0); > > > -- > > > > > > It seems to me we shouldn't have reached this assertion without having the > > > inline frame ID already calculated. In fact, it seems we even start recursing > > > a bit when we invoke get_prev_frame_always within inline_frame_this_id. But > > > a check makes us quit the recursion and proceed to compute the id. > > > > > > Here's the call stack for context: > > > > > > #0 get_prev_frame_always_1 (this_frame=0xaaaaab85a670) at ../../../repos/binutils-gdb/gdb/frame.c:2109 > > > RECURSION - #1 0x0000aaaaaae1d098 in get_prev_frame_always (this_frame=0xaaaaab85a670) at ../../../repos/binutils-gdb/gdb/frame.c:2124 > > > #2 0x0000aaaaaae95768 in inline_frame_this_id (this_frame=0xaaaaab85a670, this_cache=0xaaaaab85a688, this_id=0xaaaaab85a6d0) > > > at ../../../repos/binutils-gdb/gdb/inline-frame.c:165 > > > #3 0x0000aaaaaae1916c in compute_frame_id (fi=0xaaaaab85a670) at ../../../repos/binutils-gdb/gdb/frame.c:550 > > > #4 0x0000aaaaaae19318 in get_frame_id (fi=0xaaaaab85a670) at ../../../repos/binutils-gdb/gdb/frame.c:582 > > > #5 0x0000aaaaaae13480 in value_of_register_lazy (frame=0xaaaaab85a730, regnum=30) at ../../../repos/binutils-gdb/gdb/findvar.c:296 > > > #6 0x0000aaaaaae16c00 in frame_unwind_got_register (frame=0xaaaaab85a730, regnum=30, new_regnum=30) at ../../../repos/binutils-gdb/gdb/frame-unwind.c:268 > > > #7 0x0000aaaaaad52604 in dwarf2_frame_prev_register (this_frame=0xaaaaab85a730, this_cache=0xaaaaab85a748, regnum=30) > > > at ../../../repos/binutils-gdb/gdb/dwarf2/frame.c:1296 > > > #8 0x0000aaaaaae1ae68 in frame_unwind_register_value (next_frame=0xaaaaab85a730, regnum=30) at ../../../repos/binutils-gdb/gdb/frame.c:1229 > > > #9 0x0000aaaaaae1b304 in frame_unwind_register_unsigned (next_frame=0xaaaaab85a730, regnum=30) at ../../../repos/binutils-gdb/gdb/frame.c:1320 > > > #10 0x0000aaaaaab76574 in aarch64_dwarf2_prev_register (this_frame=0xaaaaab85a730, this_cache=0xaaaaab85a748, regnum=32) > > > at ../../../repos/binutils-gdb/gdb/aarch64-tdep.c:1114 > > > #11 0x0000aaaaaad52724 in dwarf2_frame_prev_register (this_frame=0xaaaaab85a730, this_cache=0xaaaaab85a748, regnum=32) > > > at ../../../repos/binutils-gdb/gdb/dwarf2/frame.c:1316 > > > #12 0x0000aaaaaae1ae68 in frame_unwind_register_value (next_frame=0xaaaaab85a730, regnum=32) at ../../../repos/binutils-gdb/gdb/frame.c:1229 > > > #13 0x0000aaaaaae1b304 in frame_unwind_register_unsigned (next_frame=0xaaaaab85a730, regnum=32) at ../../../repos/binutils-gdb/gdb/frame.c:1320 > > > #14 0x0000aaaaaae16a84 in default_unwind_pc (gdbarch=0xaaaaab81edc0, next_frame=0xaaaaab85a730) at ../../../repos/binutils-gdb/gdb/frame-unwind.c:223 > > > #15 0x0000aaaaaae32124 in gdbarch_unwind_pc (gdbarch=0xaaaaab81edc0, next_frame=0xaaaaab85a730) at ../../../repos/binutils-gdb/gdb/gdbarch.c:3074 > > > #16 0x0000aaaaaad4f15c in dwarf2_tailcall_sniffer_first (this_frame=0xaaaaab85a730, tailcall_cachep=0xaaaaab85a830, entry_cfa_sp_offsetp=0x0) > > > at ../../../repos/binutils-gdb/gdb/dwarf2/frame-tailcall.c:388 > > > #17 0x0000aaaaaad520c0 in dwarf2_frame_cache (this_frame=0xaaaaab85a730, this_cache=0xaaaaab85a748) at ../../../repos/binutils-gdb/gdb/dwarf2/frame.c:1190 > > > #18 0x0000aaaaaad52204 in dwarf2_frame_this_id (this_frame=0xaaaaab85a730, this_cache=0xaaaaab85a748, this_id=0xaaaaab85a790) > > > at ../../../repos/binutils-gdb/gdb/dwarf2/frame.c:1218 > > > #19 0x0000aaaaaae1916c in compute_frame_id (fi=0xaaaaab85a730) at ../../../repos/binutils-gdb/gdb/frame.c:550 > > > #20 0x0000aaaaaae1c958 in get_prev_frame_if_no_cycle (this_frame=0xaaaaab85a670) at ../../../repos/binutils-gdb/gdb/frame.c:1927 > > > #21 0x0000aaaaaae1cc44 in get_prev_frame_always_1 (this_frame=0xaaaaab85a670) at ../../../repos/binutils-gdb/gdb/frame.c:2006 > > > FIRST CALL - #22 0x0000aaaaaae1d098 in get_prev_frame_always (this_frame=0xaaaaab85a670) at ../../../repos/binutils-gdb/gdb/frame.c:2124 > > > #23 0x0000aaaaaae18f68 in skip_artificial_frames (frame=0xaaaaab85a670) at ../../../repos/binutils-gdb/gdb/frame.c:495 > > > #24 0x0000aaaaaae193e8 in get_stack_frame_id (next_frame=0xaaaaab85a670) at ../../../repos/binutils-gdb/gdb/frame.c:596 > > > #25 0x0000aaaaaae87a54 in process_event_stop_test (ecs=0xffffffffefc8) at ../../../repos/binutils-gdb/gdb/infrun.c:6857 > > > #26 0x0000aaaaaae86bdc in handle_signal_stop (ecs=0xffffffffefc8) at ../../../repos/binutils-gdb/gdb/infrun.c:6381 > > > #27 0x0000aaaaaae84fd0 in handle_inferior_event (ecs=0xffffffffefc8) at ../../../repos/binutils-gdb/gdb/infrun.c:5578 > > > #28 0x0000aaaaaae81588 in fetch_inferior_event (client_data=0x0) at ../../../repos/binutils-gdb/gdb/infrun.c:4020 > > > #29 0x0000aaaaaae5f7fc in inferior_event_handler (event_type=INF_REG_EVENT, client_data=0x0) at ../../../repos/binutils-gdb/gdb/inf-loop.c:43 > > > #30 0x0000aaaaaae8d768 in infrun_async_inferior_event_handler (data=0x0) at ../../../repos/binutils-gdb/gdb/infrun.c:9377 > > > #31 0x0000aaaaaabff970 in check_async_event_handlers () at ../../../repos/binutils-gdb/gdb/async-event.c:291 > > > #32 0x0000aaaaab27cbec in gdb_do_one_event () at ../../../repos/binutils-gdb/gdbsupport/event-loop.cc:194 > > > #33 0x0000aaaaaaef1894 in start_event_loop () at ../../../repos/binutils-gdb/gdb/main.c:356 > > > #34 0x0000aaaaaaef1a04 in captured_command_loop () at ../../../repos/binutils-gdb/gdb/main.c:416 > > > #35 0x0000aaaaaaef3338 in captured_main (data=0xfffffffff1f0) at ../../../repos/binutils-gdb/gdb/main.c:1254 > > > #36 0x0000aaaaaaef33a0 in gdb_main (args=0xfffffffff1f0) at ../../../repos/binutils-gdb/gdb/main.c:1269 > > > #37 0x0000aaaaaab6e0dc in main (argc=6, argv=0xfffffffff348) at ../../../repos/binutils-gdb/gdb/gdb.c:32 > > > > > > The following patch addresses this by using a function that unwinds the PC > > > from the next (inline) frame directly as opposed to creating a lazy value > > > that is bound to the next frame's ID (still not computed). > > > > > > I've validated this for aarch64-linux and x86_64-linux by running the > > > testsuite. > > > > > > Tromey, would you mind checking if this suits your problematic core file > > > tailcall scenario? > > > > > > gdb/ChangeLog: > > > > > > 2020-04-14 Luis Machado > > > > > > * dwarf2/frame-tailcall.c (dwarf2_tailcall_sniffer_first): Use > > > get_frame_register instead of gdbarch_unwind_pc. > > > --- > > > gdb/dwarf2/frame-tailcall.c | 4 +++- > > > 1 file changed, 3 insertions(+), 1 deletion(-) > > > > > > diff --git a/gdb/dwarf2/frame-tailcall.c b/gdb/dwarf2/frame-tailcall.c > > > index 2d219f13f9..01bb134a5c 100644 > > > --- a/gdb/dwarf2/frame-tailcall.c > > > +++ b/gdb/dwarf2/frame-tailcall.c > > > @@ -385,7 +385,9 @@ dwarf2_tailcall_sniffer_first (struct frame_info *this_frame, > > > prev_gdbarch = frame_unwind_arch (this_frame); > > > > > > /* Simulate frame_unwind_pc without setting this_frame->prev_pc.p. */ > > > - prev_pc = gdbarch_unwind_pc (prev_gdbarch, this_frame); > > > + get_frame_register (this_frame, gdbarch_pc_regnum (prev_gdbarch), > > > + (gdb_byte *) &prev_pc); > > > + prev_pc = gdbarch_addr_bits_remove (prev_gdbarch, prev_pc); > > > > > > /* call_site_find_chain can throw an exception. */ > > > chain = call_site_find_chain (prev_gdbarch, prev_pc, this_pc); > > > > I'm now no longer convinced that this patch is correct, and I'd like > > to reopen the discussion. > > > > Here's what concerns me, we used to make the following call-chain: > > > > gdbarch_unwind_pc --> frame_unwind_register_unsigned --> frame_unwind_register_value > > > > Now we do this: > > > > get_frame_register --> frame_unwind_register --> frame_register_unwind --> frame_unwind_register_value > > > > The problem is that gdbarch_unwind_pc' takes an argument 'next_frame', > > while, get_frame_register takes an argument called frame', but is > > really 'this_frame', it then passes 'frame->next' to > > 'frame_unwind_register'. > > > > What this means is that if we have a call stack like this: > > > > #3 --> #2 --> #1 --> #0 > > > > And we invoke the tail-call sniffer in frame #1, previously we figured > > out the $pc value in frame #2, while now we figure out the $pc value > > in frame #1. > > > > I'm even more convinced that this is an error based on the fix patch > > you applied later: > > > > commit 991a3e2e9944a4b3a27bd989ac03c18285bd545d > > Date: Sat Apr 25 00:32:44 2020 -0300 > > > > Fix remaining inline/tailcall unwinding breakage for x86_64 > > After this commit the code looks like this: /* Simulate frame_unwind_pc without setting this_frame->prev_pc.p. */ if (next_frame != nullptr && get_frame_type (next_frame) == INLINE_FRAME && !frame_id_computed_p (next_frame)) { /* The next frame is an inline frame and its frame id has not been computed yet. */ get_frame_register (this_frame, gdbarch_pc_regnum (prev_gdbarch), (gdb_byte *) &prev_pc); prev_pc = gdbarch_addr_bits_remove (prev_gdbarch, prev_pc); } else prev_pc = gdbarch_unwind_pc (prev_gdbarch, this_frame); Now the point of this function is to decide if the frame we are in _right_now_ was tail called into by it's "apparent" caller. That is when we unwind we have: main --> foo --> bar And in foo we ask, can we build a tailcall chain that gets us from main to foo. So in this case, we get the answer yes, and the chain returned represents: main --> test_func --> foo However, my thinking is that we know we're not in the position of 'foo' (that is reached by a tail call) if, foo is an inline frame - right? So, I wonder if it's as simple as saying: /* We know that THIS_FRAME was not reached by a tail call if THIS_FRAME is an inline frame. */ if (get_frame_type (this_frame) == INLINE_FRAME) return; That's totally untested, just a random thought.... I wonder if such a change would fix the original failures you saw? Thanks, Andrew