[Bug debug/94474] Incorrect DWARF range information for inlined function

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

From: "andrew.burgess at embecosm dot com" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug debug/94474] Incorrect DWARF range information for inlined function
Date: Tue, 28 Apr 2020 09:56:41 +0000	[thread overview]
Message-ID: <bug-94474-4-KG5xz4xQqB@http.gcc.gnu.org/bugzilla/> (raw)
In-Reply-To: <bug-94474-4@http.gcc.gnu.org/bugzilla/>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94474

--- Comment #10 from Andrew Burgess <andrew.burgess at embecosm dot com> ---
Bernd,

Not a problem, always happy to expand on things.  This might get a
little long, but hopefully it should give you an idea what I think is
wrong.

I have not updated the reproducer, I don't know why I would need to do
that, as the existing example demonstrates the issues.  My original
comment did use an older version of binutils though, which is why I
saw =<8e8>   Unknown AT value: 2138: 3=, which was unfortunate.  I'm now
using binutils 2.34.

To reduce the length of this post I'm only looking at the first
inlined subroutine, this is the worst offender, the issues with the
other two are just repeats of this one.

>From the reproducer I'm only looking at the *-v3 output files.

Here's the DWARF for the first inlined subroutine:

#+BEGIN_EXAMPLE
   <2><8dd>: Abbrev Number: 38 (DW_TAG_inlined_subroutine)
      <8de>   DW_AT_abstract_origin: <0x9ce>
      <8e2>   DW_AT_entry_pc    : 0x401165
      <8ea>   DW_AT_GNU_entry_view: 0
      <8eb>   DW_AT_ranges      : 0x30
      <8ef>   DW_AT_call_file   : 1
      <8f0>   DW_AT_call_line   : 52
      <8f1>   DW_AT_call_column : 10
      <8f2>   DW_AT_sibling     : <0x931>
#+END_EXAMPLE

It references the range information at offset 0x30, which can be seen
here, the duplication is just an objdump oddity, the ranges we care
about are the first four lines starting with =00000030=:

#+BEGIN_EXAMPLE
  Contents of the .debug_ranges section:

      Offset   Begin    End
      00000000 0000000000401160 00000000004011ca
      00000000 0000000000401040 0000000000401045
      00000000 <End of list>
      00000030 0000000000401165 0000000000401165 (start == end)
      00000030 0000000000401169 0000000000401173
      00000030 0000000000401040 0000000000401045
      00000030 <End of list>
      00000030 0000000000401165 0000000000401165 (start == end)
      00000030 0000000000401169 0000000000401173
      00000030 0000000000401040 0000000000401045
      00000030 <End of list>
      00000070 0000000000401160 00000000004011ca
      00000070 0000000000401040 0000000000401045
      00000070 0000000000401050 0000000000401065
      00000070 <End of list>
#+END_EXAMPLE

Next, here's a snippet of disassembler, this covers two of the ranges
from the above, the empty one =0x401165= to =0x401165=, and then =0x401169=
to =0x401173=.  The range =0x401040= to =0x401045= I'm ignoring for now,
it's just some out of line cold code and not important for this
discussion:

#+BEGIN_EXAMPLE
  0000000000401160 <_Z13get_alias_setP4tree>:
    401160:       48 85 ff                test   %rdi,%rdi
    401163:       74 4b                   je     4011b0
<_Z13get_alias_setP4tree+0x50>
    401165:       48 83 ec 08             sub    $0x8,%rsp
    401169:       8b 07                   mov    (%rdi),%eax
    40116b:       85 c0                   test   %eax,%eax
    40116d:       0f 85 cd fe ff ff       jne    401040
<_Z13get_alias_setP4tree.cold>
    401173:       8b 47 04                mov    0x4(%rdi),%eax
    401176:       83 f8 01                cmp    $0x1,%eax
    401179:       74 28                   je     4011a3
<_Z13get_alias_setP4tree+0x43>
    40117b:       8b 07                   mov    (%rdi),%eax
    40117d:       85 c0                   test   %eax,%eax
#+END_EXAMPLE

Finally, a snippet of the line table covering the same disassembler
region as above:

#+BEGIN_EXAMPLE
  test-v3:     file format elf64-x86-64

  Contents of the .debug_line section:

  CU: ./step-and-next-inline.cc:
  File name                            Line number    Starting address    View 
  Stmt
  step-and-next-inline.cc                       50            0x401160         
     x
  step-and-next-inline.cc                       51            0x401160       1 
     x
  step-and-next-inline.cc                       54            0x401160       2

  ./step-and-next-inline.h:[++]
  step-and-next-inline.h                        32            0x401165         
     x
  step-and-next-inline.h                        34            0x401165       1 
     x

  ./step-and-next-inline.cc:[++]
  step-and-next-inline.cc                       50            0x401165       2

  ./step-and-next-inline.h:[++]
  step-and-next-inline.h                        34            0x401169
  step-and-next-inline.h                        34            0x40116b
  step-and-next-inline.h                        36            0x401173         
     x
  step-and-next-inline.h                        37            0x401173       1 
     x
  step-and-next-inline.h                        37            0x401173       2

  ./step-and-next-inline.cc:[++]
  step-and-next-inline.cc                       52            0x401173       3
  step-and-next-inline.cc                       52            0x401176
#+END_EXAMPLE

Here is what I think GCC could have produced for the range
information instead:

#+BEGIN_EXAMPLE
      00000030 0000000000401165 0000000000401176
      00000030 0000000000401040 0000000000401045
      00000030 <End of list>
#+END_EXAMPLE

The changes are:

 (1) The empty range at =0x401165= is now folded into the following
     range, which coincidentally started at the very next instruction
     anyway.

 (2) The range previously ending at =0x401173= now ends at =0x401176=,
     extending by one instruction.

The reasoning is that:

 (1) The DW_AT_entry_pc is now within a range attributed to the
 inlined subroutine.

 (2) All lines marked as is-stmt for the subroutine, are within a
 range attributed to the subroutine.  A debugger stopping at one of
 those lines will now understand that:
     (a) It is at a particular line, and
     (b) It is within a particular subroutine.
 Notice that at =0x401173= we have lines from both the inlined
 subroutine, and from the enclosing calller function.  However, it is
 the inlined callee that is marked is-stmt for this address, I believe
 this should mean this address is included in the range for the
 inlined subroutine.

While I'm writing this I want to take a moment to address views.  I'm
still reading and learning about views, but I think they are relevant
here.

Non of the proposed view specifications I have yet seen go far enough
to address applying the views technique to ranges.  Each range could
potentially be improved by having an entry and exit view.  When a
DW_TAG_inlined_subroutine references an external ranges list, then
each sub-range in that list needs to be able to have its own entry and
exit view.  Currently, in the example above we have one
DW_AT_GNU_entry_view for 3 ranges (though one is empty and can be
ignored).  Without some really clever engineering, one entry view is
never going to be correct for multiple sub-ranges.

Further, I've seen no mention of exit views anywhere, and I think they
would also be needed.

With the addition of these two features we would be able to support (I
believe) fully merged callers and callees, with lines marked as
is-stmt from both the caller and callee appearing at the same address.

For now, without this, I think GCC needs to restrict itself when
inlining.  When an address represents a line from both the caller and
callee, then that address should only be is-stmt true for EITHER lines
from the caller, or lines from the callee.  If the address is is-stmt
true for a line from the caller, then the address should NOT be within
the callee's range(s), and if the address is is-stmt true for a line
from the callee, then the address MUST be within the callee's
range(s).

On a final note.  These are just my personal thoughts from the
perspective of a debug consumer, though I use words like "must" or
"should" above this reflects my thoughts on how I believe the debug
should appear, and is not an attempt to prescribe how GCC should
be.  I know there are limitations to what GCC can achieve, and also, I
could be totally wrong in my understanding of DWARF.  I'm always happy
to be corrected!

next prev parent reply	other threads:[~2020-04-28  9:56 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-03 16:13 [Bug debug/94474] New: " andrew.burgess at embecosm dot com
2020-04-06  9:59 ` [Bug debug/94474] " bernd.edlinger at hotmail dot de
2020-04-06 10:54 ` andrew.burgess at embecosm dot com
2020-04-06 11:04 ` bernd.edlinger at hotmail dot de
2020-04-06 11:17 ` bernd.edlinger at hotmail dot de
2020-04-06 12:55 ` andrew.burgess at embecosm dot com
2020-04-06 12:57 ` bernd.edlinger at hotmail dot de
2020-04-06 13:15 ` bernd.edlinger at hotmail dot de
2020-04-06 14:16 ` andrew.burgess at embecosm dot com
2020-04-28  2:17 ` bernd.edlinger at hotmail dot de
2020-04-28  9:56 ` andrew.burgess at embecosm dot com [this message]
2020-05-16  8:09 ` bernd.edlinger at hotmail dot de
2020-05-17  8:43 ` andrew.burgess at embecosm dot com
2020-05-17  9:05 ` bernd.edlinger at hotmail dot de
2021-12-28 13:16 ` bernd.edlinger at hotmail dot de
2021-12-28 20:57 ` bernd.edlinger at hotmail dot de

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-94474-4-KG5xz4xQqB@http.gcc.gnu.org/bugzilla/ \
    --to=gcc-bugzilla@gcc.gnu.org \
    --cc=gcc-bugs@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).