[Bug gdb/26676] New: amd64: runaway execution when nexti would stop on ret after call

public inbox for gdb-prs@sourceware.org
help / color / mirror / Atom feed

* [Bug gdb/26676] New: amd64: runaway execution when nexti would stop on ret after call
@ 2020-09-29  9:36 basileclement06+gdb at gmail dot com
  0 siblings, 0 replies; only message in thread
From: basileclement06+gdb at gmail dot com @ 2020-09-29  9:36 UTC (permalink / raw)
  To: gdb-prs

https://sourceware.org/bugzilla/show_bug.cgi?id=26676

            Bug ID: 26676
           Summary: amd64: runaway execution when nexti would stop on ret
                    after call
           Product: gdb
           Version: 9.2
            Status: UNCONFIRMED
          Severity: normal
          Priority: P2
         Component: gdb
          Assignee: unassigned at sourceware dot org
          Reporter: basileclement06+gdb at gmail dot com
  Target Milestone: ---

In a course I TA, we use x86 as a target to teach compilation, and we make
students generate assembly files which are then compiled using `gcc`.  We
encourage them to use `gdb` to debug their programs, which works well.

Students have complained that in some cases, the `next` instruction in `gdb`
fails to work properly: when stepping over a call, `gdb` never stops on the
next instruction, and instead continues execution until the end of the program
(or the next breakpoint).

After some investigation, this happens when the instruction immediately after
the `call` is a `ret`.  I also noted that setting an explicit breakpoint on the
`ret` instruction works as expected.  Here is a minimal example:

```
# Filename: asm.s
        .text
        .globl  main
main:
        call f
        ret
f:
        mov     $0, %rax
        ret
```

This example can be compiled using `gcc -g -no-pie asm.s -o asm`; running `gdb`
and stepping over the `call f` instruction on line 4 runs the program to
completion:

        $ gdb asm
        GNU gdb (Ubuntu 9.1-0ubuntu1) 9.1
        Copyright (C) 2020 Free Software Foundation, Inc.
        License GPLv3+: GNU GPL version 3 or later
<http://gnu.org/licenses/gpl.html>
        This is free software: you are free to change and redistribute it.
        There is NO WARRANTY, to the extent permitted by law.
        Type "show copying" and "show warranty" for details.
        This GDB was configured as "x86_64-linux-gnu".
        Type "show configuration" for configuration details.
        For bug reporting instructions, please see:
        <http://www.gnu.org/software/gdb/bugs/>.
        Find the GDB manual and other documentation resources online at:
            <http://www.gnu.org/software/gdb/documentation/>.

        For help, type "help".
        Type "apropos word" to search for commands related to "word"...
        Reading symbols from asm...
        (gdb) break main
        Breakpoint 1 at 0x401106: file asm.s, line 4.
        (gdb) run
        Starting program: /tmp/gdbug/asm 

        Breakpoint 1, main () at asm.s:4
        4               call f
        (gdb) n
        [Inferior 1 (process 3373584) exited normally]

The expected behavior would be to stop execution on the `ret` instruction on
line 5, after returning from the `call` on line 4:

        ...
        (gdb) n
        5               ret

If any other instruction is inserted between the `call` and `ret` instructions,
`gdb` behaves properly; for instance, with an extra `nop` instruction between
the `call f` and `ret`, I get (as expected):

        ...
        (gdb) n
        5               nop

I tried to get the same behavior when compiling from C:

```
int __attribute__ ((noinline)) f() { return 0; }

int main() {
        return f();
}
```

compiling with `gcc -O3 -fno-optimize-sibling-calls asm.c` reproduces the issue
(using `nexti`).  Adding `-g` makes `gdb` properly stop on the `ret`
instruction in that case; so somehow embedding debug information makes the
problem go away…  This piqued my curiosity, so I decided to dig a bit deeper.

I removed some of the debug information to try to figure out what was causing
the change in behavior, and remarked that the `ret` seems to be skipped
whenever I change the DW_AT_producer flag to anything which doesn't start with
"GNU C17 X.Y".  This lead me to looking for DW_AT_producer in the gdb
codebase, which seems mostly used in `gdb/dwarf2read.c`.  Notably there is this
check in `process_full_comp_unit`:

```
      int gcc_4_minor = producer_is_gcc_ge_4 (cu->producer);
      ...
      if (gcc_4_minor >= 5)
        cust->epilogue_unwind_valid = 1;
```

Ah-ha!  I checked that indeed the issue reproduces with a version < 4.5 in
DW_AT_producer, which means it is probably related to `epilogue_unwind_valid`
(there is also `locations_valid` which is set in that case, but I am having
issues on `ret`, so something related to the epilogue seems more likely).

`epilogue_unwind_valid` is used (through the `COMPUNIT_EPILOGUE_UNWIND_VALID`
macro) in `amd64-tdep.c` and notably in `amd64_stack_frame_destroyed_p`...
which indeeds check that either `epilogue_unwind_valid` is true or the epilogue
instruction is not `ret`.

Moving upward from there, this means that `frame_unwind_try_unwinder` fails on
the epilogue unwinder when on a `ret` instruction in the current frame (because
it assumes the frame has been destroyed by a previous `leave` or `pop %rbp`),
*unless* there is DWARF information saying we were compiled with GCC version ≥
4.5.

Now, this is starting to get way over my head, and I already spent too much
time digging into this — but it seems that there are two independent issues at
play here:

 - The documentation for `epilogue_unwind_valid` (in `gdb/symtab.h`) mentions
   that this indicates whether the "DWARF unwinder for this CU is valid even
   for epilogues (PC at the return instruction)".  `gdb` is perfectly able to
   handle breaking on an instruction without invalid unwinding information, so
   whether the DWARF unwinder is valid or not on an instruction shouldn't
   influence whether `nexti` can stop on that instruction or not.  This is the
   actual underlying issue.

 - `gdb` assumes that GCC version ≥ 4.5 is the *only* compiler able to generate
   valid unwinding information on a `ret` instruction.  This seems overly
   conservative, and it might make sense to adopt a blacklist instead of a
   whitelist approach (i.e. trust the DWARF debug information, unless it is
   known to be invalid — e.g. for GCC versions < 4.5).  Fixing this would
   actually solve my problem, but I believe it would only be hiding the
   underlying issue described above, which should be fixed independently.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2020-09-29  9:36 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-29  9:36 [Bug gdb/26676] New: amd64: runaway execution when nexti would stop on ret after call basileclement06+gdb at gmail dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).