public inbox for gdb-patches@sourceware.org
 help / color / mirror / Atom feed
From: Andrew Burgess <aburgess@redhat.com>
To: Bernd Edlinger <bernd.edlinger@hotmail.de>, gdb-patches@sourceware.org
Subject: Re: [PATCHv5] gdb: fix handling of DW_AT_entry_pc of inlined subroutines
Date: Tue, 05 Nov 2024 11:25:44 +0000	[thread overview]
Message-ID: <87h68lsz0n.fsf@redhat.com> (raw)
In-Reply-To: <DU2PR08MB10263176E1B7E4BD051B7B0FEE4562@DU2PR08MB10263.eurprd08.prod.outlook.com>

Bernd Edlinger <bernd.edlinger@hotmail.de> writes:

> On 10/31/24 16:00, Andrew Burgess wrote:
>> In v5:
>> 
>>   - If DW_AT_entry_pc is outside of a block's start()..end() range,
>>     then the DW_AT_entry_pc is now ignored.
>> 
>>   - The entry-pc is stored as an offset from block::start() within the
>>     block class, this means we no longer need to worry about special
>>     address values, i.e. and entry-pc of zero is fine.  This also
>>     means that we no longer need to relocate the entry-pc in
>>     objfile_relocate1.  Credit for this idea to Bernd.
>> 
>> In v4:
>> 
>>   - I still have one failed being reported from Linaro CI,
>>     gdb.stabs/gdb11479.exp.  I've been unable to reproduce this
>>     failure.  However, on re-reading this patch I did notice that, if
>>     the m_entry_pc field was not set then we would still end up giving
>>     it a value during relocation, this would result in m_entry_pc
>>     holding an address which had been relocated twice.
>> 
>>     Fixed this by moving the block's address relocation logic into the
>>     block class, and only relocating the m_entry_pc variable if it
>>     actually has been assigned a value.
>> 
>> In v3:
>> 
>>   - Entry PC values are now clamped to the block's start/end range.
>>     This was causing failures in the self-tests when GDB was compiled
>>     with GCC using default optimisation, as GCC emits an (invalid?)
>>     entry-pc which is outside the blocks start/end range.
>> 
>> In v2:
>> 
>>   - Don't use std::optional<CORE_ADDR> for block::m_entry_pc, instead
>>     just use CORE_ADDR.  This does mean that targets where 0 is a
>>     valid code address might run into problems, but those targets
>>     likely have plenty of other problems, and I'd rather land this fix
>>     than get hung up on this edge case.
>> 
>>     In the future it might be possible to remove block::m_start and
>>     block::m_end as Klaus suggested, but this doesn't look like a
>>     simple task.  If we did drop those fields then it might be
>>     possible to space a word for some flag bits which would allow us
>>     to remove the 0 address special case.
>> 
>>   - Fixes to the DWARF assembler testcase to handle compiling with
>>     '-pie', the Linaro CI exposed these issues.  The changes are
>>     pretty minor, mostly using label names rather than addresses when
>>     building the DWARF.
>> 
>> ---
>> 
>> The entry PC for a DIE, e.g. an inline function, might not be the base
>> address of the DIE.  Currently though, in block::entry_pc(), GDB
>> always returns the base address (low-pc or the first address of the
>> first range) as the entry PC.
>> 
>> This commit extends the block class to carry the entry PC as a
>> separate member variable.  Then the DWARF reader is extended to read
>> and set the entry PC for the block.  Now in block::entry_pc(), if the
>> entry PC has been set, then this is the value returned.
>> 
>> I have not removed the old code in block::entry_pc().  Not every DIE
>> will set the entry-pc, but GDB still needs to have an entry-pc for
>> every block, so the existing logic supplies the entry-pc for any block
>> where the entry-pc was not set.
>> 
>> The DWARF-5 spec for reading the entry PC is a super-set of the spec
>> as found in DWARF-4.  For example, if there is no DW_AT_entry_pc then
>> DWARF-4 says to use DW_AT_low_pc while DWARF-5 says to use the base
>> address, which is DW_AT_low_pc or the first address in the first range
>> specified by DW_AT_ranges if there is no DW_AT_low_pc.
>> 
>> I have taken the approach of just implementing the DWARF-5 spec for
>> everyone.  There doesn't seem to be any benefit to deliberately
>> ignoring a ranges based entry PC value for DWARF-4.  If some naughty
>> compiler has emitted that, then lets use it.
>> 
>> Similarly, DWARF-4 says that DW_AT_entry_pc is an address.  DWARF-5
>> allows an address or a constant, where the constant is an offset from
>> the base address.  I allow both approaches for all DWARF versions.
>> There doesn't seem to be any downsides to this approach.
>> 
>> I ran into an issue when testing this patch where GCC would have the
>> DW_AT_entry_pc point to an empty range.  When GDB parses the ranges
>> any empty ranges are ignored.  As a consequence, the entry-pc appears
>> to be outside the address range of a block.
>> 
>> The empty range problem is certainly something that we can, and should
>> address, but that is not the focus of this patch, so for now I'm
>> ignoring that problem.  What I have done is added a check: if the
>> DW_AT_entry_pc is outside the range of a block then the entry-pc is
>> ignored, GDB will then fall-back to its default algorithm for
>> computing the entry-pc.
>> 
>> If/when in the future we address the empty range problem, these
>> DW_AT_entry_pc attributes will suddenly become valid and GDB will
>> start using them.  Until then, GDB continues to operate as it always
>> had.
>> 
>> An early version of this patch stored the entry-pc within the block
>> like this:
>> 
>>   std::optional<CORE_ADDR> m_entry_pc;
>> 
>> However, a concern was raised that this, on a 64-bit host, effectively
>> increases the size of block by 16-bytes (8-bytes for the CORE_ADDR,
>> and 8-bytes for the std::optional's bool plus padding).
>> 
>> If we remove the std::optional part and just use a CORE_ADDR then we
>> need to have a "special" address to indicate if m_entry_pc is in use
>> or not.  I don't really like using special addresses; different
>> targets can access different address ranges, even zero is a valid
>> address on some targets.
>> 
>> However, Bernd Edlinger suggested storing the entry-pc as an offset,
>> and I think that will resolve my concerns.  So, we store the entry-pc
>> as an offset from the block::start() value.  If this offset is zero
>> then the entry-pc is block::start(), or was unset and we should just
>> use the default algorithm.  If the entry-pc was set, and is not equal
>> to block::start() then the offset will be non-zero, and we can
>> recalculate the entry-pc when needed.
>> 
>> With this done, on a 64-bit host, block only needs to increase by
>> 8-bytes.
>> 
>> The inline-entry.exp test was originally contributed by Bernd here:
>> 
>>   https://inbox.sourceware.org/gdb-patches/AS1PR01MB94659E4D9B3F4A6006CC605FE4922@AS1PR01MB9465.eurprd01.prod.exchangelabs.com
>> 
>> though I have made some edits, making more use of lib/gdb.exp
>> functions, making the gdb_test output patterns a little tighter, and
>> updating the test to run with Clang.  I also moved the test to
>> gdb.opt/ as that seemed like a better home for it.
>> 
>> Co-Authored-By: Bernd Edlinger <bernd.edlinger@hotmail.de>
>> ---
>>  gdb/block.h                               |  47 ++-
>>  gdb/dwarf2/read.c                         | 102 ++++-
>>  gdb/testsuite/gdb.dwarf2/dw2-entry-pc.c   |  51 +++
>>  gdb/testsuite/gdb.dwarf2/dw2-entry-pc.exp | 487 ++++++++++++++++++++++
>>  gdb/testsuite/gdb.opt/inline-entry.c      |  41 ++
>>  gdb/testsuite/gdb.opt/inline-entry.exp    |  51 +++
>>  6 files changed, 766 insertions(+), 13 deletions(-)
>>  create mode 100644 gdb/testsuite/gdb.dwarf2/dw2-entry-pc.c
>>  create mode 100644 gdb/testsuite/gdb.dwarf2/dw2-entry-pc.exp
>>  create mode 100644 gdb/testsuite/gdb.opt/inline-entry.c
>>  create mode 100644 gdb/testsuite/gdb.opt/inline-entry.exp
>> 
>> diff --git a/gdb/block.h b/gdb/block.h
>> index c3babad52f3..e284d67f550 100644
>> --- a/gdb/block.h
>> +++ b/gdb/block.h
>> @@ -179,27 +179,37 @@ struct block : public allocate_on_obstack<block>
>>  
>>    /* Return the "entry PC" of this block.
>>  
>> -     The entry PC is the lowest (start) address for the block when all addresses
>> -     within the block are contiguous.  If non-contiguous, then use the start
>> -     address for the first range in the block.
>> -
>> -     At the moment, this almost matches what DWARF specifies as the entry
>> -     pc.  (The missing bit is support for DW_AT_entry_pc which should be
>> -     preferred over range data and the low_pc.)
>> +     If the entry PC has been set to a specific value then this is
>> +     returned.  Otherwise, the entry PC is the lowest (start) address for
>> +     the block when all addresses within the block are contiguous.  If
>> +     non-contiguous, then use the start address for the first range in the
>> +     block.
>>  
>> -     Once support for DW_AT_entry_pc is added, I expect that an entry_pc
>> -     field will be added to one of these data structures.  Once that's done,
>> -     the entry_pc field can be set from the dwarf reader (and other readers
>> -     too).  ENTRY_PC can then be redefined to be less DWARF-centric.  */
>> +     This almost matches what DWARF specifies as the entry pc, except that
>> +     the final case, using the first address of the first range, is a GDB
>> +     extension.  However, the DWARF reader sets the specific entry PC
>> +     wherever possible, so this non-standard fallback case is only used as
>> +     a last resort.  */
>>  
>>    CORE_ADDR entry_pc () const
>>    {
>> -    if (this->is_contiguous ())
>> +    if (m_entry_pc_offset != 0)
>> +      return this->start () + m_entry_pc_offset;
>> +    else if (this->is_contiguous ())
>>        return this->start ();
>>      else
>>        return this->ranges ()[0].start ();
>>    }
>
> I wonder if it can happen that this->start is different
> than ranges[0].start, because if that is possible,
> then it would not work to set_entry_pc(this->start)
> because the offset will be 0 and the else path will
> pick the ranges[0],start.
>
> Actually I thought of something like this:
>     if (this->is_contiguous ())
>        pc = this->start ();
>     else
>        pc = this->ranges ()[0].start ();
>     pc += m_entry_pc_offset;
>     return pc;
>
> so without any additional branches,
> is there a reason why that cannot work?

You're correct.  I've adopted this approach in v6 which I just posted.

It is possible, and valid, for a block's sub-ranges to not be in
ascending address order.  In this case the start() can be lower than the
start of the first range.

>
>>  
>> +  /* Set this block's entry-pc to ADDR, which must lie between start() and
>> +     end().  */
>> +
>> +  void set_entry_pc (CORE_ADDR addr)
>> +  {
>> +    gdb_assert (addr >= this->start () && addr < this->end ());
>
> note that my patch series part 3/3 has an example where
> the inline funcion has an empty instruction range, I wonder
> if this assertion will break, maybe addr <= this->end ?

I ran the tests from the full optimised debug series.  Obviously there's
plenty of failures at this point, but there are no assertions triggered.

Using 'addr <= this->end' would, I think not be correct, as the block's
end() address is not inclusive.  That is, if 'addr == this->end ()' then
the entry-pc would be the first address outside the block's range, which
is clearly not correct.

> So, when this function is called, are the ranges already set,
> or will that happen later? 

The ranges are already set by the time a block's entry-pc is set.

Thanks,
Andrew


  reply	other threads:[~2024-11-05 11:25 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-10-16 15:47 [PATCH] " Andrew Burgess
2024-10-17 20:03 ` Tom Tromey
2024-10-18 10:06   ` Andrew Burgess
2024-10-18 13:57     ` Andrew Burgess
2024-10-18 10:26   ` Gerlicher, Klaus
2024-10-18 13:55     ` Andrew Burgess
2024-10-18 13:53 ` [PATCHv2] " Andrew Burgess
2024-10-28 13:45   ` [PATCHv3] " Andrew Burgess
2024-10-29 14:49     ` [PATCHv4] " Andrew Burgess
2024-10-29 15:28       ` Bernd Edlinger
2024-10-31 10:57         ` Andrew Burgess
2024-10-31 14:01           ` Bernd Edlinger
2024-10-31 14:56             ` Andrew Burgess
2024-10-29 16:34       ` Bernd Edlinger
2024-10-31 10:59         ` Andrew Burgess
2024-10-31 15:00       ` [PATCHv5] " Andrew Burgess
2024-11-01 18:13         ` Tom Tromey
2024-11-01 20:27         ` Bernd Edlinger
2024-11-05 11:25           ` Andrew Burgess [this message]
2024-11-05 15:26             ` Bernd Edlinger
2024-11-05 16:52               ` Andrew Burgess
2024-11-05 19:57                 ` Bernd Edlinger
2024-11-05 11:21         ` [PATCHv6] " Andrew Burgess
2024-11-13 13:49           ` Andrew Burgess
2024-11-13 16:59             ` Andrew Burgess
2024-11-14  9:20               ` Tom de Vries
2024-11-14 19:33                 ` Andrew Burgess
2024-11-15  8:50                   ` Tom de Vries
2024-11-15 10:53                     ` Bernd Edlinger
2024-11-15 14:00                       ` Andrew Burgess
2024-11-15 14:30                         ` Tom de Vries
2024-11-15 16:46                           ` Andrew Burgess
2024-11-15 19:24                           ` Andrew Burgess
2024-11-17 23:52                             ` Bernd Edlinger
2024-11-19  9:29                               ` Andrew Burgess
2024-11-15 17:00                         ` Bernd Edlinger
2024-11-15 13:45                     ` Andrew Burgess
2024-10-29 15:29     ` [PATCHv3] " Sam James
2024-10-31  9:48       ` Andrew Burgess
2024-10-18 14:24 ` [PATCH] " Bernd Edlinger
2024-10-28 13:26   ` Andrew Burgess

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87h68lsz0n.fsf@redhat.com \
    --to=aburgess@redhat.com \
    --cc=bernd.edlinger@hotmail.de \
    --cc=gdb-patches@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).