public inbox for gdb-prs@sourceware.org
help / color / mirror / Atom feed
* [Bug gdb/26676] New: amd64: runaway execution when nexti would stop on ret after call
@ 2020-09-29 9:36 basileclement06+gdb at gmail dot com
0 siblings, 0 replies; only message in thread
From: basileclement06+gdb at gmail dot com @ 2020-09-29 9:36 UTC (permalink / raw)
To: gdb-prs
https://sourceware.org/bugzilla/show_bug.cgi?id=26676
Bug ID: 26676
Summary: amd64: runaway execution when nexti would stop on ret
after call
Product: gdb
Version: 9.2
Status: UNCONFIRMED
Severity: normal
Priority: P2
Component: gdb
Assignee: unassigned at sourceware dot org
Reporter: basileclement06+gdb at gmail dot com
Target Milestone: ---
In a course I TA, we use x86 as a target to teach compilation, and we make
students generate assembly files which are then compiled using `gcc`. We
encourage them to use `gdb` to debug their programs, which works well.
Students have complained that in some cases, the `next` instruction in `gdb`
fails to work properly: when stepping over a call, `gdb` never stops on the
next instruction, and instead continues execution until the end of the program
(or the next breakpoint).
After some investigation, this happens when the instruction immediately after
the `call` is a `ret`. I also noted that setting an explicit breakpoint on the
`ret` instruction works as expected. Here is a minimal example:
```
# Filename: asm.s
.text
.globl main
main:
call f
ret
f:
mov $0, %rax
ret
```
This example can be compiled using `gcc -g -no-pie asm.s -o asm`; running `gdb`
and stepping over the `call f` instruction on line 4 runs the program to
completion:
$ gdb asm
GNU gdb (Ubuntu 9.1-0ubuntu1) 9.1
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
<http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from asm...
(gdb) break main
Breakpoint 1 at 0x401106: file asm.s, line 4.
(gdb) run
Starting program: /tmp/gdbug/asm
Breakpoint 1, main () at asm.s:4
4 call f
(gdb) n
[Inferior 1 (process 3373584) exited normally]
The expected behavior would be to stop execution on the `ret` instruction on
line 5, after returning from the `call` on line 4:
...
(gdb) n
5 ret
If any other instruction is inserted between the `call` and `ret` instructions,
`gdb` behaves properly; for instance, with an extra `nop` instruction between
the `call f` and `ret`, I get (as expected):
...
(gdb) n
5 nop
I tried to get the same behavior when compiling from C:
```
int __attribute__ ((noinline)) f() { return 0; }
int main() {
return f();
}
```
compiling with `gcc -O3 -fno-optimize-sibling-calls asm.c` reproduces the issue
(using `nexti`). Adding `-g` makes `gdb` properly stop on the `ret`
instruction in that case; so somehow embedding debug information makes the
problem go away… This piqued my curiosity, so I decided to dig a bit deeper.
I removed some of the debug information to try to figure out what was causing
the change in behavior, and remarked that the `ret` seems to be skipped
whenever I change the DW_AT_producer flag to anything which doesn't start with
"GNU C17 X.Y". This lead me to looking for DW_AT_producer in the gdb
codebase, which seems mostly used in `gdb/dwarf2read.c`. Notably there is this
check in `process_full_comp_unit`:
```
int gcc_4_minor = producer_is_gcc_ge_4 (cu->producer);
...
if (gcc_4_minor >= 5)
cust->epilogue_unwind_valid = 1;
```
Ah-ha! I checked that indeed the issue reproduces with a version < 4.5 in
DW_AT_producer, which means it is probably related to `epilogue_unwind_valid`
(there is also `locations_valid` which is set in that case, but I am having
issues on `ret`, so something related to the epilogue seems more likely).
`epilogue_unwind_valid` is used (through the `COMPUNIT_EPILOGUE_UNWIND_VALID`
macro) in `amd64-tdep.c` and notably in `amd64_stack_frame_destroyed_p`...
which indeeds check that either `epilogue_unwind_valid` is true or the epilogue
instruction is not `ret`.
Moving upward from there, this means that `frame_unwind_try_unwinder` fails on
the epilogue unwinder when on a `ret` instruction in the current frame (because
it assumes the frame has been destroyed by a previous `leave` or `pop %rbp`),
*unless* there is DWARF information saying we were compiled with GCC version ≥
4.5.
Now, this is starting to get way over my head, and I already spent too much
time digging into this — but it seems that there are two independent issues at
play here:
- The documentation for `epilogue_unwind_valid` (in `gdb/symtab.h`) mentions
that this indicates whether the "DWARF unwinder for this CU is valid even
for epilogues (PC at the return instruction)". `gdb` is perfectly able to
handle breaking on an instruction without invalid unwinding information, so
whether the DWARF unwinder is valid or not on an instruction shouldn't
influence whether `nexti` can stop on that instruction or not. This is the
actual underlying issue.
- `gdb` assumes that GCC version ≥ 4.5 is the *only* compiler able to generate
valid unwinding information on a `ret` instruction. This seems overly
conservative, and it might make sense to adopt a blacklist instead of a
whitelist approach (i.e. trust the DWARF debug information, unless it is
known to be invalid — e.g. for GCC versions < 4.5). Fixing this would
actually solve my problem, but I believe it would only be hiding the
underlying issue described above, which should be fixed independently.
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] only message in thread
only message in thread, other threads:[~2020-09-29 9:36 UTC | newest]
Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-29 9:36 [Bug gdb/26676] New: amd64: runaway execution when nexti would stop on ret after call basileclement06+gdb at gmail dot com
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).