Re: [PATCH v2] gdb: Cache line headers in DWARF per CU

public inbox for gdb-patches@sourceware.org
 help / color / mirror / Atom feed

From: Will Hawkins <hawkinsw@obs.cr>
To: Simon Marchi <simark@simark.ca>
Cc: gdb-patches@sourceware.org
Subject: Re: [PATCH v2] gdb: Cache line headers in DWARF per CU
Date: Tue, 30 Apr 2024 01:19:05 -0400	[thread overview]
Message-ID: <CADx9qWiSW52itGWv_CXZHuSBNEdvOr+XrrOXLBWo9G1pOiuqmw@mail.gmail.com> (raw)
In-Reply-To: <CADx9qWjb6oFTwXrrWqmdjtBvKUfqz9a++yYZryzoZq9ru49qFQ@mail.gmail.com>

On Mon, Apr 29, 2024 at 11:50 AM Will Hawkins <hawkinsw@obs.cr> wrote:
>
> On Mon, Apr 29, 2024 at 10:37 AM Simon Marchi <simark@simark.ca> wrote:
> >
> > On 4/29/24 9:59 AM, Will Hawkins wrote:
> > > When the line headers of a DWARF unit are read, make sure that they are
> > > available for reuse by storing them in the per CU. A view of that data
> > > structure will be given to each of the dwarf2_cus as they are
> > > instantiated for reading debugging information. As a result, we can
> > > simplify the scoped RAII object (by removing code for deallocating a
> > > line header upon the completion of reading DWARF debugging information).
> >
> > My understanding is that prior to this change, the lifetime of the
> > line_header objects that were not for partial units was approximately
> > equal to the lifetime of the dwarf2_cu object.  After this change, the
> > lifetime of those line_header objects becomes tied to the
> > dwarf2_per_cu_data object, which lives essentially as long as the
> > dwarf2_per_objfile object.
> >
> > If so, could we simplify things and store all line_header objects in
> > dwarf2_per_objfile::line_header_hash?
>
> I *sincerely* thought about this! If you think that it is a good idea,
> then I can incorporate this into a v3. Based on my understanding of
> the code, I think that this is reasonable. The only problem is that
> (as I understand it), there could be a situation where the hash does
> not have space and we could lose the reuse. (see
> https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=gdb/dwarf2/read.c;h=7eacafc25b5e5f5202769b18175ed510a6787489;hb=43ca4ec2996a6bfbb8726f1c753aeb85fe0f92a4#l7401)

I have a v3 that is almost ready that combines all the caching into
the line_header_hash. I hope that I will have it ready to send out
tomorrow! Thank you for your patience!

>
> >
> > Also, could you provide an explanation of why this is useful, when does
> > the re-use happens?  Naively, I would think that the line header is used
> > once when expanding the CU, creating the symtabs, and not used later.
> > What am I missing?
>
> You are missing nothing ... at this point. However, the goal of this
> is to make it easier to read the line header earlier in the decoding
> process. We will need to read the line header earlier as part of the
> work on supporting embedded source code in the dwarf file. There is
> some discussion about that here:
> https://sourceware.org/pipermail/gdb-patches/2024-April/208614.html
>
> If we have this "ensure" function then we can simply call it at
> find_file_and_directory and we will ensure that we do not have
> multiple decodings.
>
> Does that make sense? As I said, I am only trying to help and do not
> want to upset anything!
>
> >
> > I noted a few random comments below.
>
> Thank you!! I will absolutely make these changes in a v3.
>
> I sincerely appreciate the feedback and hope that what I am offering
> is helpful! I am really enjoying the process of contributing!
>
> Will
>
>
> >
> > > @@ -7312,29 +7302,38 @@ find_file_and_directory (struct die_info *die, struct dwarf2_cu *cu)
> > >    return *cu->per_cu->fnd;
> > >  }
> > >
> > > -/* Handle DW_AT_stmt_list for a compilation unit.
> > > -   DIE is the DW_TAG_compile_unit die for CU.
> > > -   COMP_DIR is the compilation directory.  LOWPC is passed to
> > > -   dwarf_decode_lines.  See dwarf_decode_lines comments about it.  */
> > > -
> > >  static void
> > > -handle_DW_AT_stmt_list (struct die_info *die, struct dwarf2_cu *cu,
> > > -                     const file_and_directory &fnd, unrelocated_addr lowpc,
> > > -                     bool have_code) /* ARI: editCase function */
> > > +ensure_line_header_read (struct dwarf2_cu *cu, struct die_info *die,
> >
> > I would suggest dropping the `struct ` keyword where possible.  DIE
> > could be const I think.
> >
> > > +                      const file_and_directory &fnd,
> > > +                      sect_offset line_offset)
> > >  {
> > >    dwarf2_per_objfile *per_objfile = cu->per_objfile;
> > > -  struct attribute *attr;
> > > +  dwarf2_per_cu_data *per_cu = cu->per_cu;
> > >    hashval_t line_header_local_hash;
> > >    void **slot;
> > > -  int decode_mapping;
> > > -
> > > -  gdb_assert (! cu->per_cu->is_debug_types);
> > > -
> > > -  attr = dwarf2_attr (die, DW_AT_stmt_list, cu);
> > > -  if (attr == NULL || !attr->form_is_unsigned ())
> > > -    return;
> > >
> > > -  sect_offset line_offset = (sect_offset) attr->as_unsigned ();
> > > +  /* There are two places for storing/finding line headers:
> > > +     1. Per CU: When DIE is not a partial unit, the decoded
> > > +     line header is owned by the per CU. In this case, the
> >
> > We use two spaces after periods in text.
> >
> > > +     job of this function is to simply give out a copy of
> > > +     that pointer to CU -- the per cu outlives CU so this
> > > +     lending is safe.
> > > +     2. Line Header Hash: When the DIE is a partial unit, the
> > > +     decoded line header is owned by the line header hash.
> > > +     In this case, the job of this function is to simply
> > > +     give out a copy of the appropriate entry in the hash.
> > > +     Caveat: If a decoded line header of a partial unit
> > > +     does not fit in the line header hash, it will be
> > > +     stored in/owned by the per cu.
> > > +    Of course, the first time that the line header is decoded
> > > +    it must be put in the proper place according to the rules
> > > +    above. That is the other job of this function.  */
> > > +
> > > +  if (per_cu->line_headers != NULL)
> >
> > NULL -> nullptr
> >
> > > +    {
> > > +      cu->line_header = per_cu->line_headers.get ();
> > > +      return;
> > > +    }
> > >
> > >    /* The line header hash table is only created if needed (it exists to
> > >       prevent redundant reading of the line table for partial_units).
> > > @@ -7378,9 +7377,6 @@ handle_DW_AT_stmt_list (struct die_info *die, struct dwarf2_cu *cu,
> > >    if (lh == NULL)
> > >      return;
> > >
> > > -  cu->line_header = lh.release ();
> > > -  cu->line_header_die_owner = die;
> > > -
> > >    if (per_objfile->line_header_hash == NULL)
> > >      slot = NULL;
> > >    else
> > > @@ -7394,18 +7390,43 @@ handle_DW_AT_stmt_list (struct die_info *die, struct dwarf2_cu *cu,
> > >      {
> > >        /* This newly decoded line number information unit will be owned
> > >        by line_header_hash hash table.  */
> > > -      *slot = cu->line_header;
> > > -      cu->line_header_die_owner = NULL;
> > > +      *slot = cu->line_header = lh.release ();
> > >      }
> > >    else
> > >      {
> > >        /* We cannot free any current entry in (*slot) as that struct line_header
> > > -      may be already used by multiple CUs.  Create only temporary decoded
> > > -      line_header for this CU - it may happen at most once for each line
> > > -      number information unit.  And if we're not using line_header_hash
> > > -      then this is what we want as well.  */
> > > +      may be already used by multiple CUs.  Therefore, this newly read line
> > > +      header will be owned by the per_cu.  */
> > >        gdb_assert (die->tag != DW_TAG_partial_unit);
> > > +
> > > +      per_cu->line_headers = std::move (lh);
> > > +      cu->line_header = per_cu->line_headers.get ();
> > >      }
> >
> > If the design stayed as-is, I would try to put:
> >
> >   cu->line_header = lh.get ();
> >
> > ... before the condition, since that part doesn't depend on the
> > condition.
> >
> > > @@ -222,6 +223,14 @@ struct dwarf2_per_cu_data
> > >       have one.  */
> > >    std::unique_ptr<file_and_directory> fnd;
> > >
> > > +  /* The decoded line headers for this CU.  This is cached so that
> > > +     there is no need to refetch it repeatedly. ensure_line_headers_read
> > > +     in read.c is responsible for transferring a view of this structure
> > > +     to dwarf2_cus as they are instantiated. This may be nullptr when
> > > +     the decoded line header is owned by the line header hash associated
> > > +     with the per objfile in the presence of a partial unit.  */
> > > +  std::unique_ptr<struct line_header> line_headers;
> >
> > Use line_header_up as the type.
> >
> > Simon

     prev parent reply	other threads:[~2024-04-30  5:19 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-29 13:59 Will Hawkins
2024-04-29 14:36 ` Simon Marchi
2024-04-29 15:50   ` Will Hawkins
2024-04-30  5:19     ` Will Hawkins [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CADx9qWiSW52itGWv_CXZHuSBNEdvOr+XrrOXLBWo9G1pOiuqmw@mail.gmail.com \
    --to=hawkinsw@obs.cr \
    --cc=gdb-patches@sourceware.org \
    --cc=simark@simark.ca \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).