public inbox for gdb-patches@sourceware.org
 help / color / mirror / Atom feed
From: Simon Marchi <simark@simark.ca>
To: Caroline Tice <cmtice@google.com>
Cc: Eric Christopher <echristo@google.com>,
	Tom Tromey <tom@tromey.com>,
	Caroline Tice via Gdb-patches <gdb-patches@sourceware.org>
Subject: Re: [PATCH V3] Fix issues with reading rnglists, especially from dwo files, for DWARF v5
Date: Sat, 4 Jul 2020 01:11:13 -0400	[thread overview]
Message-ID: <5ccfe911-6049-e8f3-4874-9991b2649512@simark.ca> (raw)
In-Reply-To: <CABtf2+Szbtug4JzWvrAp-3MFnZKwUpwVVptuDoFGus9wu111oA@mail.gmail.com>

On 2020-07-03 6:47 p.m., Caroline Tice wrote:
> "
> 
> I have made most of your requested changes, which are in the attached
> patch.  I disagree with one of your comments, and I had a question
> about another:
> 
> First the disagreement.  You said:
> 
>>> @@ -18684,7 +18764,8 @@ partial_die_info::read (const struct die_reader_specs *reader,
>>>           /* It would be nice to reuse dwarf2_get_pc_bounds here,
>>>              but that requires a full DIE, so instead we just
>>>              reimplement it.  */
>>> -         int need_ranges_base = tag != DW_TAG_compile_unit;
>>> +         int need_ranges_base = (tag != DW_TAG_compile_unit
>>> +                                 && attr.form != DW_FORM_rnglistx);
>>
>> It looks like this fix is unrelated from the rest of the patch, which deals
>> about reading rnglists from dwo files.  It's better to have on fix per patch.
>> So I think this fix would deserve its own patch, with a commit message that
>> clearly explains the issue and the fix.
>>
> This is actually a required fix for dealing with rnglists.  Without
> it, when the DW_AT_ranges uses the form
> DW_FORM_rnglistx, the wrong value is calculated and the entire rnglist
> is ignored (if you turn on complaints this will happen).  So without
> this fix, I can't test any of the other code for fixing rnglists, with
> or without .dwo files.

Ah ok, I must have understood it wrong then.  I find all of this really confusing.

I'll summarize what I think I understand, please let me know if I say anything wrong.
We are dealing with two similar ways of dealing with address ranges, one is the
pre-standard GNU version here, which was using DW_AT_GNU_ranges_base:

  https://gcc.gnu.org/wiki/DebugFission

and the version in DWARF 5.

In the pre-standard version, only one range section was present, and it was in the
main linked file.  DW_AT_ranges attributes in the dwo, the form DW_FORM_sec_offset,
are actually pointing at the section in the main file, at offset

  CU's DW_AT_GNU_ranges_base value (found in the skeleton, in the main file) + DIE's DW_AT_ranges

If there is a DW_AT_ranges attribute in the skeleton CU, in the main linked file,
then it is absolute, it is _not_ added to the DW_AT_GNU_ranges_base value.  Hence
the `needs_range_base` thingy.  The `ranges_offset` variable in this function is
the direct offset to the range list to read.

In DWARF 5, we have a ranges section in the main linked file and one in the dwo.  DW_AT_ranges
attributes refer to the ranges section of the file they are in.  The DW_AT_rnglists_base
points to the beginning of the range table for that CU.  The actual value of DW_AT_ranges
attribute (when using the DW_FORM_rnglistx form) is an index in the offset table.  However,
due to the reprocessing you added in this patch, the attribute value is now in fact the
offset with `DW_AT_rnglists_base` already added.  And _that's_ the reason why
DW_FORM_rnglistx attributes should not have the range base applied.

I think the comment above in the code is misleading and needs to be updated, this one:

	  /* DW_AT_rnglists_base does not apply to DIEs from the DWO skeleton.
	     We take advantage of the fact that DW_AT_ranges does not appear
	     in DW_TAG_compile_unit of DWO files.  */

In commit 18a8505e38fc ("Dwarf 5: Handle debug_str_offsets and indexed attributes
that have base offsets."), the it was changed like so:

-         /* DW_AT_ranges_base does not apply to DIEs from the DWO skeleton.
+         /* DW_AT_rnglists_base does not apply to DIEs from the DWO skeleton.

I think it was a wrong change, because it mixes two different things.  DW_AT_rnglists_base
is a DWARF 5 thing, and the range base not applying to the values from the skeleton is a
pre-standard thing.  I think it should say something like:

	  /* DW_AT_GNU_ranges_base does not apply to DIEs from the DWO skeleton.
	     We take advantage of the fact that DW_AT_ranges does not appear
	     in DW_TAG_compile_unit of DWO files.

	     Attributes of form DW_AT_rnglistx have already had their value changed
	     by read_rnglist_index and already include DW_AT_rnglists_base, so don't
	     add the ranges base either.  */

> 
>> I'm also not sure I understand it: when the form is DW_FORM_rnglistx, don't
>> we want to sometimes apply the base coming from DW_AT_rnglists_base?
> 
> It all depends on what form is being used. If the DW_AT_ranges is
> using DW_FORM_sec_offset (which is what
> GCC generates), then you do need to add the base value.But if you
> are using the DW_FORM_rnglisx (which is what clang generates), the
> base value has already been considered/incorporated into the attribute
> so adding it in (again) causes the value to be incorrect (again, turn
> on complaints and see what happens if you try to add the base into
> something with DW_FORM_rnglistx).  From the DWARF5 spec: "[A] rnglist
> is either represented as: An index into the .debug_rnglists section
> (DW_FORM_rnglistx). The ...operand identifies an offset location
> relative to the base of that section...[or] An offset into the
> .debug_rnglists section (DW_FORM_sec_offset).  The operand consists of
> a byte offset from the beginning of the .debug_rnglists section."
> Note the DW_FORM_rnglistx is already relative to the base.

Ok.  What was not clear to me in the first pass was that read_rnglist_index
already has modified the attribute value to include DW_AT_rnglists_base.

Now, about values of class rnglist and form DW_FORM_sec_offset in DWARF 5.
The spec says this about them:

  - An offset into the .debug_rnglists section (DW_FORM_sec_offset).
  The operand consists of a byte offset from the beginning of the
  .debug_rnglists section. It is relocatable in a relocatable object file,
  and relocated in an executable or shared object file. In the 32-bit
  DWARF format, this offset is a 4-byte unsigned value; in the 64-bit
  DWARF format, it is an 8-byte unsigned value (see Section 7.4 on
  page 196).

Since it says that the value is already relocated in an executable, does it
mean we should not add the base for these, in DWARF 5?

> Now, my question:  I'm not quite clear as to whether you really want
> me to change the code in
> cu_debug_rnglists_section or not.   I wrote the code mostly by copying
> the code for loclists as closely as possible, and making whatever
> changes to that were required to make it correct for rnglists.  Is it
> better to have the code more consistent with the loclists code?If
> you really want me to make the changes there you suggest, then I will;

I think the situation is a bit different than for `cu_debug_loc_section`:
when you have a dwo, I don't think there is the possibility of having an
attribute referring to a location list in the skeleton.  So if you have a
dwo, you'll always want the loc section from the dwo.  So there, the pattern

  if (there is a dwo)
    return the dwo section;

  return the main file section;

makes more sense.  But in the case of range lists, we have attributes
in both the skeleton and the dwo that require range lists, and the right
section to return depends on where the attribute is.  That's why, when I
see this pattern:

  if (there is a dwo)
    return the dwo section;

  return the main file section;

that pattern, I find it looks wrong, because the right section to return
does not depend (only) on whether there is dwo.  As we saw, it only works
"by chance" because this is only called once for the skeleton (when
reading DW_AT_ranges) and the `cu->dwo_unit` pointer happens not to be set
up yet.  It is also one refactor away from breaking, if for some reason we
reorder some code and the `cu->dwo_unit` pointer gets initialized earlier.

> I just wanted confirmation that you really want that change (i.e.
> adding the dwarf tag and checking vs. DW_TAG_compile_unit, & calling
> it from dwarf2_rnglists_process).

I think that checking DW_TAG_compile_unit is still a bit of a hack - ideally,
the caller could just tell us if the DIE comes from a dwo or not.  But at
least it's logical and based on some DWARF knowledge.  It would certainly
require a comment to explain it, because it wouldn't be obvious.

> @@ -1379,11 +1382,16 @@ static void read_call_site_scope (struct die_info *die, struct dwarf2_cu *cu);
>  static void read_variable (struct die_info *die, struct dwarf2_cu *cu);
>
>  static int dwarf2_ranges_read (unsigned, CORE_ADDR *, CORE_ADDR *,
> -			       struct dwarf2_cu *, dwarf2_psymtab *);
> +			       struct dwarf2_cu *, dwarf2_psymtab *,
> +			       dwarf_tag);

This declaration is unneeded and can simply be removed.  Could you push an
obvious patch that does this?.


> @@ -13917,8 +13969,20 @@ dwarf2_rnglists_process (unsigned offset, struct dwarf2_cu *cu,
>        if (range_beginning == range_end)
>  	continue;
>
> -      range_beginning += *base;
> -      range_end += *base;
> +      /* Only DW_RLE_offset_pair needs the base address added.  */
> +      if (rlet == DW_RLE_offset_pair)
> +	{
> +	  if (!base.has_value ())
> +	    {
> +	      /* We have no valid base address for the ranges
> +		 data.  */
> +	      complaint (_("Invalid .debug_rnglists data (no base address)"));

The comment fits on a single line.  But it can probably be changed to be more
specific to DW_RLE_offset_pair.  Maybe the complaint could be more specific too,
and mention DW_RLE_offset_pair.  It would be valid to have .debug_rnglists without
a base address (I think?), but here it's invalid because there is not base address
_and_ we have encountered a DW_RLE_offset_pair.

> @@ -19094,6 +19167,57 @@ read_loclist_index (struct dwarf2_cu *cu, ULONGEST loclist_index)
>      return bfd_get_64 (abfd, info_ptr) + loclist_base;
>  }
>
> +/* Given a DW_FORM_rnglistx value rnglist_index, fetch the offset from the
> +   array of offsets in the .debug_rnglists section.  */
> +static CORE_ADDR
> +read_rnglist_index (struct dwarf2_cu *cu, ULONGEST rnglist_index)
> +{
> +  struct dwarf2_per_objfile *dwarf2_per_objfile = cu->per_objfile;
> +  struct objfile *objfile = dwarf2_per_objfile->objfile;
> +  bfd *abfd = objfile->obfd;
> +  ULONGEST rnglist_header_size =
> +    (cu->header.initial_length_size == 4 ? LOCLIST_HEADER_SIZE32
> +     : LOCLIST_HEADER_SIZE64);

Should this really refer to loclist header sizes?  If they are the same
sizes as range list table headers, I suppose it's just a coincidence.

> +  ULONGEST rnglist_base =
> +      (cu->dwo_unit) ? rnglist_header_size : cu->ranges_base;

cu->dwo_unit != nullptr

> +  ULONGEST start_offset =
> +    rnglist_base + rnglist_index * cu->header.offset_size;
> +
> +  /* Get rnglists section.  */
> +  struct dwarf2_section_info *section = cu_debug_rnglists_section (cu);
> +  if (section == nullptr)
> +    error (_("Cannot find .debug_rnglists section [in module %s]"),
> +	   objfile_name(objfile));

Space before paren.

> +
> +  /* Read the rnglists section content.  */
> +  section->read (objfile);
> +  if (section->buffer == NULL)

In new code, I suggest using nullptr instead of NULL, just for consistency.

> +    error (_("DW_FORM_rnglistx used without .debug_rnglists section "
> +	     "[in module %s]"),
> +       objfile_name (objfile));
> +
> +  /* Verify the rnglist header is valid (same as loclist header).  */
> +  struct loclist_header header;
> +  read_loclist_header (&header, section);

Ok well, if we are going to take advantage that they are the same, the name
of the function and types should reflect it.  `read_loclist_header` should
become `read_loclists_rnglists_header` (using the plural to match the section
names).

> +  if (rnglist_index >= header.offset_entry_count)
> +    error (_("DW_FORM_rnglistx index pointing outside of "
> +	     ".debug_rnglists offset array [in module %s]"),
> +	    objfile_name(objfile));

Space before paren.

> +
> +  /* Validate that the offset is within the section's range.  */
> +  if (start_offset >= section->size)
> +    error (_("DW_FORM_rnglistx pointing outside of "
> +             ".debug_rnglists section [in module %s]"),
> +          objfile_name(objfile));
> +
> +  const gdb_byte *info_ptr = section->buffer + start_offset;
> +
> +  if (cu->header.offset_size == 4)
> +    return read_4_bytes (abfd, info_ptr) + rnglist_base;
> +  else
> +    return read_8_bytes (abfd, info_ptr) + rnglist_base;

An idea for a subsequent cleanup, we could have a `read_offset` function
that does

  if (cu->header.offset_size == 4)
    return read_4_bytes (abfd, info_ptr);
  else
    return read_8_bytes (abfd, info_ptr);

There are a few spots that could use it.

Simon

  reply	other threads:[~2020-07-04  5:11 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-01 17:16 [PATCH] " Caroline Tice
2020-06-01 20:33 ` Tom Tromey
2020-06-02 17:04   ` Caroline Tice
2020-06-03 14:49     ` Tom Tromey
2020-06-04 21:39       ` Caroline Tice
2020-06-09 23:32         ` Caroline Tice
2020-06-16 15:37           ` Caroline Tice
2020-06-18 20:27           ` Tom Tromey
2020-06-23 19:04             ` Caroline Tice
2020-07-01  0:09               ` Caroline Tice
2020-07-01  0:34               ` Simon Marchi
2020-07-01  0:36                 ` Simon Marchi
2020-07-01 19:57                   ` Caroline Tice
2020-07-02  5:41                     ` Simon Marchi
2020-07-03 22:47                       ` [PATCH V3] " Caroline Tice
2020-07-04  5:11                         ` Simon Marchi [this message]
2020-07-09 15:48                           ` [PATCH V4] " Caroline Tice
2020-07-11 17:54                             ` Simon Marchi
2020-07-14 15:47                               ` [PATCH V5] " Caroline Tice
2020-07-15  2:04                                 ` Simon Marchi
2020-07-15  3:15                                   ` Simon Marchi
2020-07-15 16:57                                     ` Caroline Tice
2020-07-15 17:04                                       ` H.J. Lu
2020-07-15 22:35                                         ` Caroline Tice
2020-07-16  2:34                                           ` Simon Marchi
2020-07-16  4:46                                             ` Caroline Tice
2020-07-16 15:41                                               ` Simon Marchi
2020-07-16 15:46                                                 ` Caroline Tice
2020-07-16 16:09                                                   ` Simon Marchi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5ccfe911-6049-e8f3-4874-9991b2649512@simark.ca \
    --to=simark@simark.ca \
    --cc=cmtice@google.com \
    --cc=echristo@google.com \
    --cc=gdb-patches@sourceware.org \
    --cc=tom@tromey.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).