[Bug debug/99178] New: Emit .debug

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug debug/99178] New: Emit .debug_names
@ 2021-02-20  0:20 tromey at gcc dot gnu.org
  2021-02-22  8:44 ` [Bug debug/99178] " rguenth at gcc dot gnu.org
                   ` (6 more replies)
  0 siblings, 7 replies; 8+ messages in thread
From: tromey at gcc dot gnu.org @ 2021-02-20  0:20 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99178

            Bug ID: 99178
           Summary: Emit .debug_names
           Product: gcc
           Version: unknown
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: debug
          Assignee: unassigned at gcc dot gnu.org
          Reporter: tromey at gcc dot gnu.org
  Target Milestone: ---

DWARF 5 includes a new index section, .debug_names.
GCC should emit this with -gdwarf-5

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug debug/99178] Emit .debug_names
  2021-02-20  0:20 [Bug debug/99178] New: Emit .debug_names tromey at gcc dot gnu.org
@ 2021-02-22  8:44 ` rguenth at gcc dot gnu.org
  2021-02-22  9:00 ` jakub at gcc dot gnu.org
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-02-22  8:44 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99178

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
     Ever confirmed|0                           |1
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2021-02-22
           Severity|normal                      |enhancement
            Version|unknown                     |11.0

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
Confirmed.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug debug/99178] Emit .debug_names
  2021-02-20  0:20 [Bug debug/99178] New: Emit .debug_names tromey at gcc dot gnu.org
  2021-02-22  8:44 ` [Bug debug/99178] " rguenth at gcc dot gnu.org
@ 2021-02-22  9:00 ` jakub at gcc dot gnu.org
  2021-02-22 21:40 ` mark at gcc dot gnu.org
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: jakub at gcc dot gnu.org @ 2021-02-22  9:00 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99178

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jakub at gcc dot gnu.org,
                   |                            |mark at gcc dot gnu.org,
                   |                            |vries at gcc dot gnu.org

--- Comment #2 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
I'm not sure if it is a good idea.
The indexes are useful when they cover the whole binary or library, when each
TU has its own index, then I don't see how it is helpful for the consumer to
search each TU's index compared to walking the DIEs.
I think .debug_names should be created during linking or post-linking, so ld,
dwz or gdb.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug debug/99178] Emit .debug_names
  2021-02-20  0:20 [Bug debug/99178] New: Emit .debug_names tromey at gcc dot gnu.org
  2021-02-22  8:44 ` [Bug debug/99178] " rguenth at gcc dot gnu.org
  2021-02-22  9:00 ` jakub at gcc dot gnu.org
@ 2021-02-22 21:40 ` mark at gcc dot gnu.org
  2024-01-10  1:20 ` dblaikie at gmail dot com
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: mark at gcc dot gnu.org @ 2021-02-22 21:40 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99178

--- Comment #3 from Mark Wielaard <mark at gcc dot gnu.org> ---
So if the compiler would emit the .debug_name index would that make any
link/post-processing steps easier or more efficient?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug debug/99178] Emit .debug_names
  2021-02-20  0:20 [Bug debug/99178] New: Emit .debug_names tromey at gcc dot gnu.org
                   ` (2 preceding siblings ...)
  2021-02-22 21:40 ` mark at gcc dot gnu.org
@ 2024-01-10  1:20 ` dblaikie at gmail dot com
  2024-01-10  2:47 ` tromey at gcc dot gnu.org
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: dblaikie at gmail dot com @ 2024-01-10  1:20 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99178

David Blaikie <dblaikie at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |dblaikie at gmail dot com

--- Comment #4 from David Blaikie <dblaikie at gmail dot com> ---
(In reply to Mark Wielaard from comment #3)
> So if the compiler would emit the .debug_name index would that make any
> link/post-processing steps easier or more efficient?

Right - that's the intent. You can omit the hash table part of .debug_names -
in which case it's just like a newer pubnames/pubtypes - maybe with the
opportunity to have more guaranteed contents (the lack of those guarantees I
think is why debug_gnu_pubnames/types came to be, yeah?).

At least on the lld side, we're working on adding the requisite merging - like
`-Wl,--gdb-index`, except instead of merging debug_gnu_pubnames/pubtypes ->
gdb_index, it merges debug_names -> debug_names.

This is relevant/important/necessary for Split DWARF in particular, where the
linker wouldn't have access to the DWARF to index it anyway (& you don't always
want to run the dwp tool, which would have access to all the DWARF to index it
- but it'd be nice to avoid that in iterative developer scenarios, and save it
only for archival situations) - and even if you do have all the DWARF, it's
certainly faster to merge some tables than to reparse all the DWARF from
scratch.

It'd be great to get GCC/GDB folks take on the name tables - get some practical
experience with their contents, file any bugs about missing elements, etc. It's
possible they're leaning too heavily towards lldb's style of name lookup since
they derived from an existing apple implementation there & it'd be good to
generalize them where needed.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug debug/99178] Emit .debug_names
  2021-02-20  0:20 [Bug debug/99178] New: Emit .debug_names tromey at gcc dot gnu.org
                   ` (3 preceding siblings ...)
  2024-01-10  1:20 ` dblaikie at gmail dot com
@ 2024-01-10  2:47 ` tromey at gcc dot gnu.org
  2024-01-10 18:51 ` dblaikie at gmail dot com
  2024-01-10 20:31 ` tromey at gcc dot gnu.org
  6 siblings, 0 replies; 8+ messages in thread
From: tromey at gcc dot gnu.org @ 2024-01-10  2:47 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99178

--- Comment #5 from Tom Tromey <tromey at gcc dot gnu.org> ---
(In reply to David Blaikie from comment #4)

I don't remember filing this bug.  At the time maybe I thought it
would be worthwhile to have "end to end" .debug_names generation,
that is, to try to have the index and also not slow down
compilation or link times too much.  Not sure how I feel about it now.

> It'd be great to get GCC/GDB folks take on the name tables - get some
> practical experience with their contents, file any bugs about missing
> elements, etc. It's possible they're leaning too heavily towards lldb's
> style of name lookup since they derived from an existing apple
> implementation there & it'd be good to generalize them where needed.

gdb has long done the wrong thing with .debug_names.
https://sourceware.org/bugzilla/show_bug.cgi?id=24820

I don't really know how/why that happened.  However, I wrote patches to
fix it:

https://sourceware.org/pipermail/gdb-patches/2023-December/204949.html

This version of gdb will look at the augmentation string and only
allow certain indexes to be used.  This is done to avoid known bugs --
mainly coming from the "old" (current) gdb, but also clang has some
issues (from memory, it doesn't include parent info).  Also, gdb relies
on its extensions (see below).

When writing the new scanner, I found a few bugs in DWARF related to
which names appear in the index.  I don't recall offhand what these are,
and I didn't file them due to the late unpleasantness (sorry).  They
could probably be dug up by reading the scanner and comparing to the spec.

gdb will also emit some extensions.  You can see these in the docs patch:

https://sourceware.org/pipermail/gdb-patches/2023-December/204947.html

Generally the goal of these is to avoid having to do any DIE reading
in order to reject a lookup.  Note that this means gdb relies on
template parameters being in the name -- something we discussed in
gdb bugzilla a bit.

With these patches, gdb will not generate or use the hash table.
This is explained here:

https://sourceware.org/pipermail/gdb-patches/2023-December/204963.html

The only other thing I can think of offhand is that the reliance on
.debug_str means that gdb may have to augment the string table when
DW_FORM_string is in use.  This is also caused by the "(anonymous namespace)"
special case.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug debug/99178] Emit .debug_names
  2021-02-20  0:20 [Bug debug/99178] New: Emit .debug_names tromey at gcc dot gnu.org
                   ` (4 preceding siblings ...)
  2024-01-10  2:47 ` tromey at gcc dot gnu.org
@ 2024-01-10 18:51 ` dblaikie at gmail dot com
  2024-01-10 20:31 ` tromey at gcc dot gnu.org
  6 siblings, 0 replies; 8+ messages in thread
From: dblaikie at gmail dot com @ 2024-01-10 18:51 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99178

--- Comment #6 from David Blaikie <dblaikie at gmail dot com> ---
(In reply to Tom Tromey from comment #5)
> (In reply to David Blaikie from comment #4)
> 
> I don't remember filing this bug.  At the time maybe I thought it
> would be worthwhile to have "end to end" .debug_names generation,
> that is, to try to have the index and also not slow down
> compilation or link times too much.  Not sure how I feel about it now.

Certainly what's been possible with .debug_gnu_pubnames/types + -Wl,--gdb-index
today. It'd be nice to have that same workflow, but in a more portable form.

> > It'd be great to get GCC/GDB folks take on the name tables - get some
> > practical experience with their contents, file any bugs about missing
> > elements, etc. It's possible they're leaning too heavily towards lldb's
> > style of name lookup since they derived from an existing apple
> > implementation there & it'd be good to generalize them where needed.
> 
> gdb has long done the wrong thing with .debug_names.
> https://sourceware.org/bugzilla/show_bug.cgi?id=24820

Ah, thanks for the link - I followed up there with some context/thoughts.

> I don't really know how/why that happened.  However, I wrote patches to
> fix it:
> 
> https://sourceware.org/pipermail/gdb-patches/2023-December/204949.html
> 
> This version of gdb will look at the augmentation string and only
> allow certain indexes to be used.  This is done to avoid known bugs --
> mainly coming from the "old" (current) gdb, but also clang has some
> issues (from memory, it doesn't include parent info).

Ideally that'd be detected by looking at the abbreviation table, rather than
the augmentation string - if parent info is necessary for a usage of the table,
that'd be a generic way to check for it & ensure the unusable indexes are
ignored while not ignoring usable ones.

> Also, gdb relies on its extensions (see below).

Ah, but yeah, if you need extensions, then positive augmentation string
checking seems likely necessary.

(though this starts to feel like websites checking browser versions,
unfortunately :/ )

> When writing the new scanner, I found a few bugs in DWARF related to
> which names appear in the index.  I don't recall offhand what these are,
> and I didn't file them due to the late unpleasantness (sorry).  

No worries - and totally understandable. If they happen to come to mind at any
point, I'd love to hear about them.

> They could probably be dug up by reading the scanner and comparing to the spec.
> 
> gdb will also emit some extensions.  You can see these in the docs patch:
> 
> https://sourceware.org/pipermail/gdb-patches/2023-December/204947.html

Awesome - appreciate the documentation!

> Generally the goal of these is to avoid having to do any DIE reading
> in order to reject a lookup.  Note that this means gdb relies on
> template parameters being in the name -- something we discussed in
> gdb bugzilla a bit.

Yeah, I'd love to figure out how to deal with that better, but don't have
immediate suggestions.

Any sense of how bad the performance is if names without template parameters
(strawman: this could be communicated via another flag on the entry in the
index) did require DWARF parsing to check template parameters? Is that
something that'd be an option? (especially with a flag in the entry, then it'd
only be a runtime cost to those using this naming mechanism - as much as I'd
like to move to that mechanism being normal/the default, perhaps this would be
a safe transition path)

But I guess Google's probably the only one super interested in the simplified
template names at the moment (& we're mostly investing in lldb these days), so
might be unlikely anyone would be signing up to do that work in gdb.

> With these patches, gdb will not generate or use the hash table.
> This is explained here:
> 
> https://sourceware.org/pipermail/gdb-patches/2023-December/204963.html

Oh, that's got some good details/answers some of my questions - thanks!

> I consider this hash table to be essentially useless in general, due to the name canonicalization problem -- while DWARF says that writers should use the system demangling style, (1) this style varies across systems, so it can't truly be relied on; and (2) at least GCC and one other compiler don't actually follow this part of the spec anyway.

Hmm, I missed a step here - perhaps you can help me understand. Maybe,
ultimately, I agree with you here - I've pushed back on the lldb folks relying
on character identical name lookup in the index due to the problems you've
described (there's no real canonical demangling) - but where does DWARF say
that writers should "use the system demangling style"?

> It's important to note, though, that even if the hash was somehow useful, GDB probably still would not use it -- a sorted list of names is needed for completion and performs reasonably well for other lookups, so a hash table is just overhead, IMO.

Oh, that makes loads of sense - yeah, beats me how lldb deals with tab
completion using the hash table... maybe it builds some other side table or
something. That's something I've wondered about for a while and it's good to
know how GDB deals with this, and why its index looks different.

Does that mean you want the entries in the table to be sorted? Do you emit them
that way and then, based on augmentation string, rely on them being sorted? Or
do a quick scan at startup and build a sorted list in memory regardless of the
order in .debug_names? (.debug_names entry list isn't suited to random access,
is it? The records aren't all the same size so I don't think you could binary
search through them)

> The only other thing I can think of offhand is that the reliance on
> .debug_str means that gdb may have to augment the string table when
> DW_FORM_string is in use.  This is also caused by the "(anonymous namespace)"
> special case.

And Split DWARF, I guess? The strings wouldn't otherwise be in the executables
.debug_str if not for the index - they'd only appear in the dwo/dwp
.debug_str.dwo sections.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug debug/99178] Emit .debug_names
  2021-02-20  0:20 [Bug debug/99178] New: Emit .debug_names tromey at gcc dot gnu.org
                   ` (5 preceding siblings ...)
  2024-01-10 18:51 ` dblaikie at gmail dot com
@ 2024-01-10 20:31 ` tromey at gcc dot gnu.org
  6 siblings, 0 replies; 8+ messages in thread
From: tromey at gcc dot gnu.org @ 2024-01-10 20:31 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99178

--- Comment #7 from Tom Tromey <tromey at gcc dot gnu.org> ---
(In reply to David Blaikie from comment #6)

> Ideally that'd be detected by looking at the abbreviation table, rather than
> the augmentation string - if parent info is necessary for a usage of the
> table, that'd be a generic way to check for it & ensure the unusable indexes
> are ignored while not ignoring usable ones.

Good idea.

> Ah, but yeah, if you need extensions, then positive augmentation string
> checking seems likely necessary.

It's possible we could accept indexes without the extensions and sort of
limp along or have bugs.  Not sure offhand.

> (though this starts to feel like websites checking browser versions,
> unfortunately :/ )

https://github.com/rust-lang/rust/issues/41252#issuecomment-293676579

> Any sense of how bad the performance is if names without template parameters
> (strawman: this could be communicated via another flag on the entry in the
> index) did require DWARF parsing to check template parameters?

It can be bad -- gdb actually has a related bug right now:

https://sourceware.org/bugzilla/show_bug.cgi?id=30520

(There's a few possible dups of this.)

Maybe it would be possible to do some kind of 2-phase expansion.
But we already have 2 DWARF scanners and this would be adding a 3rd...

> Hmm, I missed a step here - perhaps you can help me understand. Maybe,
> ultimately, I agree with you here - I've pushed back on the lldb folks
> relying on character identical name lookup in the index due to the problems
> you've described (there's no real canonical demangling) - but where does
> DWARF say that writers should "use the system demangling style"?

https://wiki.dwarfstd.org/Best_Practices.md

See also bug#94845.

> Oh, that makes loads of sense - yeah, beats me how lldb deals with tab
> completion using the hash table... maybe it builds some other side table or
> something. That's something I've wondered about for a while and it's good to
> know how GDB deals with this, and why its index looks different.

FWIW you can't really go by what gdb does today: its .debug_names it just
super wrong, and .gdb_index is a hash table but explicitly relies on the
name canonicalization that gdb does.  Which itself is unsafe (like what if
that changes between versions) but we didn't really think through all the
problems back then.

> Does that mean you want the entries in the table to be sorted? Do you emit
> them that way and then, based on augmentation string, rely on them being
> sorted? Or do a quick scan at startup and build a sorted list in memory
> regardless of the order in .debug_names? (.debug_names entry list isn't
> suited to random access, is it? The records aren't all the same size so I
> don't think you could binary search through them)

The new reader just reads the entries and creates the same internal data
structures that the new DWARF scanner creates.  It handles sorting,
canonicalization, etc, during the scan.  This work is done in a worker
thread to hide it from the user (although I think it's reasonably quick
anyway).

> > The only other thing I can think of offhand is that the reliance on
> > .debug_str means that gdb may have to augment the string table when
> > DW_FORM_string is in use.  This is also caused by the "(anonymous namespace)"
> > special case.
> 
> And Split DWARF, I guess? The strings wouldn't otherwise be in the
> executables .debug_str if not for the index - they'd only appear in the
> dwo/dwp .debug_str.dwo sections.

Yeah, I haven't really looked at this too much.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2024-01-10 20:31 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-20  0:20 [Bug debug/99178] New: Emit .debug_names tromey at gcc dot gnu.org
2021-02-22  8:44 ` [Bug debug/99178] " rguenth at gcc dot gnu.org
2021-02-22  9:00 ` jakub at gcc dot gnu.org
2021-02-22 21:40 ` mark at gcc dot gnu.org
2024-01-10  1:20 ` dblaikie at gmail dot com
2024-01-10  2:47 ` tromey at gcc dot gnu.org
2024-01-10 18:51 ` dblaikie at gmail dot com
2024-01-10 20:31 ` tromey at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).