* Decl/def matching with templates without template parameters in the DW_AT_name @ 2023-01-06 17:37 David Blaikie 2023-01-11 18:24 ` Simon Marchi 0 siblings, 1 reply; 11+ messages in thread From: David Blaikie @ 2023-01-06 17:37 UTC (permalink / raw) To: gdb I've been working on reducing debug info size, especially for expression-template heavy code like in Tensorflow and Eigen. To that end, I've implemented the flag -gsimple-template-names in Clang that simplifies the DW_AT_name for templates by omitting template parameters (eg: "vector" instead of "vector<int, std::allocator<int>>") when the DW_TAG_template_*_parameters contain sufficient information to reconstruct the original name. This generally seems to work in GDB - that looks intentional (perhaps because someone else implemented this feature elsewhere, or just for canonicalization reasons (the full string with template parameters might have different whitespace, parentheses, other formatting)), it seems unlikely that GDB would accidentally be able to connect two "vector" declarations up to the right "vector" definitions based on the template parameters without intentional code to support this scenario. But one place I found a problem was in pretty printers. I have situations where a type declaration isn't resolved to a definition when working within a pretty printer (resulting in the pretty printer being unable to find member variables in the object/of that type) - but if I print out the variable/member it works correctly - or if I "list" the source of the file where the definition of the type is, then the pretty printer works correctly. Does this ring a bell for anyone/sound like something sufficient to file a bug, or would I need to dig in further to create an isolated reproduction to help communicate the issue I'm seeing? ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Decl/def matching with templates without template parameters in the DW_AT_name 2023-01-06 17:37 Decl/def matching with templates without template parameters in the DW_AT_name David Blaikie @ 2023-01-11 18:24 ` Simon Marchi 2023-01-11 23:50 ` David Blaikie 0 siblings, 1 reply; 11+ messages in thread From: Simon Marchi @ 2023-01-11 18:24 UTC (permalink / raw) To: David Blaikie, gdb On 1/6/23 12:37, David Blaikie via Gdb wrote: > I've been working on reducing debug info size, especially for > expression-template heavy code like in Tensorflow and Eigen. > > To that end, I've implemented the flag -gsimple-template-names in > Clang that simplifies the DW_AT_name for templates by omitting > template parameters (eg: "vector" instead of "vector<int, > std::allocator<int>>") when the DW_TAG_template_*_parameters contain > sufficient information to reconstruct the original name. This reduces the length of strings, and more importantly it allows more sharing, is that it? It doesn't change the layout of the DIE tree? > This generally seems to work in GDB - that looks intentional (perhaps > because someone else implemented this feature elsewhere, or just for > canonicalization reasons (the full string with template parameters > might have different whitespace, parentheses, other formatting)), it > seems unlikely that GDB would accidentally be able to connect two > "vector" declarations up to the right "vector" definitions based on > the template parameters without intentional code to support this > scenario. Probably this code: https://gitlab.com/gnutools/binutils-gdb/-/blob/1b9af5b949bff0c750ededb459400c1857fec416/gdb/dwarf2/read.c#L9002 > But one place I found a problem was in pretty printers. I have > situations where a type declaration isn't resolved to a definition > when working within a pretty printer (resulting in the pretty printer > being unable to find member variables in the object/of that type) - > but if I print out the variable/member it works correctly - or if I > "list" the source of the file where the definition of the type is, > then the pretty printer works correctly. > > Does this ring a bell for anyone/sound like something sufficient to > file a bug, or would I need to dig in further to create an isolated > reproduction to help communicate the issue I'm seeing? I have no idea about what happens without digging into the code. I just remember some bug where GDB would generate the templated name with let's say "<5u>" for an integer value template parameter, so if you entered "<5>", it didn't match textually. Maybe something like that is happening. I also found: https://sourceware.org/bugzilla/show_bug.cgi?id=17762 There's the possibility that the "partial symbol" and "expanded symbol" don't have the same name. So you could see a difference in behavior depending on whether the CU has been expanded by GDB or not (I'm just speculating). Is it valid DWARF (5) for DW_AT_name of a templated struct instantiation to omit the template parameters? I don't see DWARF mandating one or the other, so I assume that both including them or not are valid. If so, I think we can consider it a GDB bug if it doesn't process the "without" version correctly. Feel free to file a bug with the relevant background information. It would be useful to include some binaries, one compiled with -gsimple-tempalte-names and one without, so it's easy to compare the two. Simon ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Decl/def matching with templates without template parameters in the DW_AT_name 2023-01-11 18:24 ` Simon Marchi @ 2023-01-11 23:50 ` David Blaikie 2023-01-12 1:46 ` Simon Marchi 2023-01-12 2:32 ` Simon Marchi 0 siblings, 2 replies; 11+ messages in thread From: David Blaikie @ 2023-01-11 23:50 UTC (permalink / raw) To: Simon Marchi; +Cc: gdb On Wed, Jan 11, 2023 at 10:24 AM Simon Marchi <simark@simark.ca> wrote: > > > > On 1/6/23 12:37, David Blaikie via Gdb wrote: > > I've been working on reducing debug info size, especially for > > expression-template heavy code like in Tensorflow and Eigen. > > > > To that end, I've implemented the flag -gsimple-template-names in > > Clang that simplifies the DW_AT_name for templates by omitting > > template parameters (eg: "vector" instead of "vector<int, > > std::allocator<int>>") when the DW_TAG_template_*_parameters contain > > sufficient information to reconstruct the original name. > > This reduces the length of strings, and more importantly it allows more > sharing, is that it? Yep. Especially for complex expression templates - the fully expanded template names can be massive (some examples I had involve 50k character names... ) - they don't have the benefits of mangled names that include redundancy elimination/deduplication at least within a single name (admittedly the mangled names for these things are massive too - and at some point I might start some discussions about alternative mangling/symbol naming schemes, for those who can diverge from the Itanium ABI) > It doesn't change the layout of the DIE tree? Correct... well, in one case it does change the DIE tree - template declarations. GCC and Clang produce template declarations without template parameter DIEs, eg: $ cat > test.cpp template<typename T> struct t1 { }; t1<int> *v; $ clang++-tot test.cpp -g -c && llvm-dwarfdump-tot test.o ... 0x0000002e: DW_TAG_structure_type DW_AT_name ("t1<int>") DW_AT_declaration (true) 0x00000030: NULL So if we dropped the `<int>` from that name, there'd be no hope to connect the declaration with the right definition. So when using -gsimple-template-names, we enable template parameter DIEs on declarations. (Sony implemented this previously, presumably because their debugger needed it for something like this - but it wasn't enabled by defaulte because GDB and LLDB didn't seem to need these extra DIEs so it wasn't worth the extra size - but maybe it's not so much to be a real problem/maybe we should switch that on by default at some point if we're going further in this direction). > > This generally seems to work in GDB - that looks intentional (perhaps > > because someone else implemented this feature elsewhere, or just for > > canonicalization reasons (the full string with template parameters > > might have different whitespace, parentheses, other formatting)), it > > seems unlikely that GDB would accidentally be able to connect two > > "vector" declarations up to the right "vector" definitions based on > > the template parameters without intentional code to support this > > scenario. > > Probably this code: > > https://gitlab.com/gnutools/binutils-gdb/-/blob/1b9af5b949bff0c750ededb459400c1857fec416/gdb/dwarf2/read.c#L9002 Something like that. Though the comment here is interesting: https://gitlab.com/gnutools/binutils-gdb/-/blob/1b9af5b949bff0c750ededb459400c1857fec416/gdb/dwarf2/read.c#L8944 /* Template parameters may be specified in the DIE's DW_AT_name, or as children with DW_TAG_template_type_param or DW_TAG_value_type_param. If the latter, add them to the name here. If the name already has template parameters, then skip this step; some versions of GCC emit both, and it is more efficient to use the pre-computed name. At least usually GCC and Clang produce both (DW_AT_name with parameters, and DIEs for the parameters) for definitions, and name-only, no DIEs for declarations. I wonder which compiler has previously produced simplified names - it'd be nice as a proof-of-concept/existence proof/prior art for the new Clang feature. It's also interesting that it says to use the name if it's already got them - that suggests no canonicalization, but I'm pretty sure GDB does do some canonicalization to allow fuzzy matching of decls/defs between Clang and GCC (& whatever other compilers) which might have minor differences between how they print template parameters. But yeah, that code that specifically says "if there's no '<' in the name, go and rebuild the name" is very explicit support for the -gsimple-template-names naming... someone must've done this before. > > But one place I found a problem was in pretty printers. I have > > situations where a type declaration isn't resolved to a definition > > when working within a pretty printer (resulting in the pretty printer > > being unable to find member variables in the object/of that type) - > > but if I print out the variable/member it works correctly - or if I > > "list" the source of the file where the definition of the type is, > > then the pretty printer works correctly. > > > > Does this ring a bell for anyone/sound like something sufficient to > > file a bug, or would I need to dig in further to create an isolated > > reproduction to help communicate the issue I'm seeing? > > I have no idea about what happens without digging into the code. I just > remember some bug where GDB would generate the templated name with let's > say "<5u>" for an integer value template parameter, so if you entered > "<5>", it didn't match textually. Maybe something like that is > happening. > > I also found: > > https://sourceware.org/bugzilla/show_bug.cgi?id=17762 Yeah, I certainly hit a few cases like that when testing -gsimple-template-names canonicalization/roundtripping to make sure it wasn't lossy. > There's the possibility that the "partial symbol" and "expanded symbol" > don't have the same name. So you could see a difference in behavior > depending on whether the CU has been expanded by GDB or not (I'm just > speculating). > > Is it valid DWARF (5) for DW_AT_name of a templated struct instantiation > to omit the template parameters? I don't see DWARF mandating one or the > other, so I assume that both including them or not are valid. Yeah, this is a case where DWARF is like "here are some tools you could use to express some language features, have at!" and doesn't say "to describe this particular language feature you must use DWARF in this particular way" > If so, I > think we can consider it a GDB bug if it doesn't process the "without" > version correctly. Feel free to file a bug with the relevant background > information. It would be useful to include some binaries, one compiled > with -gsimple-tempalte-names and one without, so it's easy to compare > the two. Yeah, for now I'm only seeing the problem with pretty printers, so to provide repro steps will be a bit more involved. (I've also got a crash that seems to happen when using the feature without -gsplit-dwarf (it's probably reproducable with -gsplit-dwarf too, but we enable -Wl,-gdb-index in that case and so GDB doesn't parse as much DWARF on startup - but if we tickled it into loading more DWARF it'd probably hit the same crash, I'd wager)) I'll see what I can do. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Decl/def matching with templates without template parameters in the DW_AT_name 2023-01-11 23:50 ` David Blaikie @ 2023-01-12 1:46 ` Simon Marchi 2023-01-14 20:28 ` Tom Tromey 2023-01-18 22:01 ` David Blaikie 2023-01-12 2:32 ` Simon Marchi 1 sibling, 2 replies; 11+ messages in thread From: Simon Marchi @ 2023-01-12 1:46 UTC (permalink / raw) To: David Blaikie; +Cc: gdb On 1/11/23 18:50, David Blaikie wrote: > On Wed, Jan 11, 2023 at 10:24 AM Simon Marchi <simark@simark.ca> wrote: >> >> >> >> On 1/6/23 12:37, David Blaikie via Gdb wrote: >>> I've been working on reducing debug info size, especially for >>> expression-template heavy code like in Tensorflow and Eigen. >>> >>> To that end, I've implemented the flag -gsimple-template-names in >>> Clang that simplifies the DW_AT_name for templates by omitting >>> template parameters (eg: "vector" instead of "vector<int, >>> std::allocator<int>>") when the DW_TAG_template_*_parameters contain >>> sufficient information to reconstruct the original name. >> >> This reduces the length of strings, and more importantly it allows more >> sharing, is that it? > > Yep. Especially for complex expression templates - the fully expanded > template names can be massive (some examples I had involve 50k > character names... ) - they don't have the benefits of mangled names > that include redundancy elimination/deduplication at least within a > single name (admittedly the mangled names for these things are massive > too - and at some point I might start some discussions about > alternative mangling/symbol naming schemes, for those who can diverge > from the Itanium ABI) > >> It doesn't change the layout of the DIE tree? > > Correct... > > well, in one case it does change the DIE tree - template declarations. > > GCC and Clang produce template declarations without template parameter DIEs, eg: > > $ cat > test.cpp > template<typename T> > struct t1 { }; > t1<int> *v; > $ clang++-tot test.cpp -g -c && llvm-dwarfdump-tot test.o > ... > 0x0000002e: DW_TAG_structure_type > DW_AT_name ("t1<int>") > DW_AT_declaration (true) > > 0x00000030: NULL > > So if we dropped the `<int>` from that name, there'd be no hope to > connect the declaration with the right definition. > > So when using -gsimple-template-names, we enable template parameter > DIEs on declarations. (Sony implemented this previously, presumably > because their debugger needed it for something like this - but it > wasn't enabled by defaulte because GDB and LLDB didn't seem to need > these extra DIEs so it wasn't worth the extra size - but maybe it's > not so much to be a real problem/maybe we should switch that on by > default at some point if we're going further in this direction). > >>> This generally seems to work in GDB - that looks intentional (perhaps >>> because someone else implemented this feature elsewhere, or just for >>> canonicalization reasons (the full string with template parameters >>> might have different whitespace, parentheses, other formatting)), it >>> seems unlikely that GDB would accidentally be able to connect two >>> "vector" declarations up to the right "vector" definitions based on >>> the template parameters without intentional code to support this >>> scenario. >> >> Probably this code: >> >> https://gitlab.com/gnutools/binutils-gdb/-/blob/1b9af5b949bff0c750ededb459400c1857fec416/gdb/dwarf2/read.c#L9002 > > Something like that. Though the comment here is interesting: > > https://gitlab.com/gnutools/binutils-gdb/-/blob/1b9af5b949bff0c750ededb459400c1857fec416/gdb/dwarf2/read.c#L8944 > > /* Template parameters may be specified in the DIE's DW_AT_name, or > as children with DW_TAG_template_type_param or > DW_TAG_value_type_param. If the latter, add them to the name > here. If the name already has template parameters, then > skip this step; some versions of GCC emit both, and > it is more efficient to use the pre-computed name. > > At least usually GCC and Clang produce both (DW_AT_name with > parameters, and DIEs for the parameters) for definitions, and > name-only, no DIEs for declarations. > > I wonder which compiler has previously produced simplified names - > it'd be nice as a proof-of-concept/existence proof/prior art for the > new Clang feature. Digging in the history leads me to: https://inbox.sourceware.org/gdb-patches/201007302017.41074.pedro@codesourcery.com/ So RVCT, the RealView compiler. I don't have access to that, unfortunately. It seems obsolete, also. > It's also interesting that it says to use the name if it's already got > them - that suggests no canonicalization, but I'm pretty sure GDB does > do some canonicalization to allow fuzzy matching of decls/defs between > Clang and GCC (& whatever other compilers) which might have minor > differences between how they print template parameters. I don't know, unfortunately. > > But yeah, that code that specifically says "if there's no '<' in the > name, go and rebuild the name" is very explicit support for the > -gsimple-template-names naming... someone must've done this before. > > >>> But one place I found a problem was in pretty printers. I have >>> situations where a type declaration isn't resolved to a definition >>> when working within a pretty printer (resulting in the pretty printer >>> being unable to find member variables in the object/of that type) - >>> but if I print out the variable/member it works correctly - or if I >>> "list" the source of the file where the definition of the type is, >>> then the pretty printer works correctly. >>> >>> Does this ring a bell for anyone/sound like something sufficient to >>> file a bug, or would I need to dig in further to create an isolated >>> reproduction to help communicate the issue I'm seeing? >> >> I have no idea about what happens without digging into the code. I just >> remember some bug where GDB would generate the templated name with let's >> say "<5u>" for an integer value template parameter, so if you entered >> "<5>", it didn't match textually. Maybe something like that is >> happening. >> >> I also found: >> >> https://sourceware.org/bugzilla/show_bug.cgi?id=17762 > > Yeah, I certainly hit a few cases like that when testing > -gsimple-template-names canonicalization/roundtripping to make sure it > wasn't lossy. In order to avoid any bug like this, for you tests, maybe you can use the -readnow switch to bypass the partial symbol steps. > >> There's the possibility that the "partial symbol" and "expanded symbol" >> don't have the same name. So you could see a difference in behavior >> depending on whether the CU has been expanded by GDB or not (I'm just >> speculating). >> >> Is it valid DWARF (5) for DW_AT_name of a templated struct instantiation >> to omit the template parameters? I don't see DWARF mandating one or the >> other, so I assume that both including them or not are valid. > > Yeah, this is a case where DWARF is like "here are some tools you > could use to express some language features, have at!" and doesn't say > "to describe this particular language feature you must use DWARF in > this particular way" Typical "DWARF is a permissive standard, not a prescriptive one" thing. > >> If so, I >> think we can consider it a GDB bug if it doesn't process the "without" >> version correctly. Feel free to file a bug with the relevant background >> information. It would be useful to include some binaries, one compiled >> with -gsimple-tempalte-names and one without, so it's easy to compare >> the two. > > Yeah, for now I'm only seeing the problem with pretty printers, so to > provide repro steps will be a bit more involved. (I've also got a > crash that seems to happen when using the feature without > -gsplit-dwarf (it's probably reproducable with -gsplit-dwarf too, but > we enable -Wl,-gdb-index in that case and so GDB doesn't parse as much > DWARF on startup - but if we tickled it into loading more DWARF it'd > probably hit the same crash, I'd wager)) > > I'll see what I can do. If you end up merging this, it will be interesting to run the full GDB testsuite against clang with that flag, it would cover a lot of things. Simon ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Decl/def matching with templates without template parameters in the DW_AT_name 2023-01-12 1:46 ` Simon Marchi @ 2023-01-14 20:28 ` Tom Tromey 2023-01-16 21:18 ` Simon Marchi 2023-01-18 22:12 ` David Blaikie 2023-01-18 22:01 ` David Blaikie 1 sibling, 2 replies; 11+ messages in thread From: Tom Tromey @ 2023-01-14 20:28 UTC (permalink / raw) To: Simon Marchi via Gdb; +Cc: David Blaikie, Simon Marchi >>>>> "Simon" == Simon Marchi via Gdb <gdb@sourceware.org> writes: Simon> Digging in the history leads me to: Simon> https://inbox.sourceware.org/gdb-patches/201007302017.41074.pedro@codesourcery.com/ Simon> So RVCT, the RealView compiler. I don't have access to that, Simon> unfortunately. It seems obsolete, also. I don't like that code. It calls into type_print from the reader, which seems very weird. An approach based on purely traversing the DIE tree seems preferable to me. Anyway, making it work again seems possible. And this time it could have tests. The main thing I would want to avoid here is trying to put this extra name-construction into the indexer. That will just slow it down -- but this is normally the most user-visible slow thing in gdb, and most CUs are of no interest anyway. The downside of this decision is that expansion may expand too many CUs. So for example if there are a million instantiation of template X and the user types "break X<int>::method", gdb might expand every CU referencing X and then still only set one breakpoint. However if this is an issue I think the solution could be to be more selective at expansion time. That is, let the user input "X<int>" match X, but then actually examine the DIE tree to decide if this match should result in an expansion. >>> Is it valid DWARF (5) for DW_AT_name of a templated struct instantiation >>> to omit the template parameters? I don't see DWARF mandating one or the >>> other, so I assume that both including them or not are valid. >> Yeah, this is a case where DWARF is like "here are some tools you >> could use to express some language features, have at!" and doesn't say >> "to describe this particular language feature you must use DWARF in >> this particular way" Simon> Typical "DWARF is a permissive standard, not a prescriptive one" thing. For Rust, my view was that a language ought to also have a "binding" to DWARF, to write down how DWARF features are in fact used by the language. DWARF does not really take this view, though, which is why there are a tags with different names but vaguely similar meanings... just one of the many ways that DWARF is bad. Simon> I just found this: Simon> http://wiki.dwarfstd.org/index.php?title=Best_Practices#Names_of_Program_Entities This says it "should have a canonical representation" but neglects to say what that representation should be, so IMO it can't really be relied upon by debuggers. It would be a real improvement to debug reading if the canonical form were in fact reliable across environments -- i.e., proscribed. gdb could avoid all name canonicalization during debug reading, which is a major point of serialization. This affects other languages as well, for example if Fortran and Ada specified a canonical case folding... while this would make gdb output slightly inconsistent with the source, it would also mean we could perhaps sanely handle some situations that are messy today -- see the recent discussion of strcasecmp and Unicode. Though note that DWARF also neglects to specify a Unicode normalization. Tom ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Decl/def matching with templates without template parameters in the DW_AT_name 2023-01-14 20:28 ` Tom Tromey @ 2023-01-16 21:18 ` Simon Marchi 2023-01-18 22:08 ` David Blaikie 2023-01-18 22:12 ` David Blaikie 1 sibling, 1 reply; 11+ messages in thread From: Simon Marchi @ 2023-01-16 21:18 UTC (permalink / raw) To: Tom Tromey, Simon Marchi via Gdb; +Cc: David Blaikie > The main thing I would want to avoid here is trying to put this extra > name-construction into the indexer. That will just slow it down -- but > this is normally the most user-visible slow thing in gdb, and most CUs > are of no interest anyway. > > The downside of this decision is that expansion may expand too many > CUs. So for example if there are a million instantiation of template X > and the user types "break X<int>::method", gdb might expand every CU > referencing X and then still only set one breakpoint. > > However if this is an issue I think the solution could be to be more > selective at expansion time. That is, let the user input "X<int>" match > X, but then actually examine the DIE tree to decide if this match should > result in an expansion. This is my understanding of what you are saying. Save the name without the template part in the cooked index, but attach to it a data structure that describes the template parameters. When the user types, let's say, "b my_class<int, 2>::my_method", "my_class<int, 2>" gets translated to the name "my_class" plus a description of the concrete arguments (the type argument "int" and the value argument 2). Then, when checking if a given CU should expanded, and we have a match for the "my_class" name, we compare the data structures describing the parameters to the one describing the arguments, see if it's really a match. Does that sound right? I'm just a bit worried that it might be difficult to implement this "is there a match function", given the complex rules of C++ template deduction. But maybe it's not so bad, or we already have that logic somewhere. Simon ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Decl/def matching with templates without template parameters in the DW_AT_name 2023-01-16 21:18 ` Simon Marchi @ 2023-01-18 22:08 ` David Blaikie 0 siblings, 0 replies; 11+ messages in thread From: David Blaikie @ 2023-01-18 22:08 UTC (permalink / raw) To: Simon Marchi; +Cc: Tom Tromey, Simon Marchi via Gdb On Mon, Jan 16, 2023 at 1:18 PM Simon Marchi <simark@simark.ca> wrote: > > > The main thing I would want to avoid here is trying to put this extra > > name-construction into the indexer. That will just slow it down -- but > > this is normally the most user-visible slow thing in gdb, and most CUs > > are of no interest anyway. > > > > The downside of this decision is that expansion may expand too many > > CUs. So for example if there are a million instantiation of template X > > and the user types "break X<int>::method", gdb might expand every CU > > referencing X and then still only set one breakpoint. > > > > However if this is an issue I think the solution could be to be more > > selective at expansion time. That is, let the user input "X<int>" match > > X, but then actually examine the DIE tree to decide if this match should > > result in an expansion. > > This is my understanding of what you are saying. Save the name without > the template part in the cooked index, but attach to it a data structure > that describes the template parameters. When the user types, let's say, > "b my_class<int, 2>::my_method", "my_class<int, 2>" gets translated to > the name "my_class" plus a description of the concrete arguments (the > type argument "int" and the value argument 2). Then, when checking if a > given CU should expanded, and we have a match for the "my_class" name, > we compare the data structures describing the parameters to the one > describing the arguments, see if it's really a match. Does that sound > right? That's more or less what we did in lldb - though we do do both lookups (try "t1<int>" because that's what we have on hand (need to be able to create that from the DIEs to show something good to the user anyway) and if that doesn't find any results, try "t1" then filter out the results looking for the equivalent of "t1<int>" based on the DIEs/some processed data structure from the DIEs). It'd be really good/important for us to all agree on what goes in the index so that .debug_names can be effectively portable. I think the goal should be that if the template has a simplified DW_AT_name, then the index entry should be similarly simplified (that's what Clang does for now, at least & what the DWARF spec says to do, I think/assume (even if it doesn't speak about template naming)). > I'm just a bit worried that it might be difficult to implement this "is > there a match function", given the complex rules of C++ template > deduction. But maybe it's not so bad, or we already have that logic > somewhere. Yeah, in lldb I think we're doing that based on the string we would use to show to a user - which in lldb's case is from Clang's AST generated from DWARF DIEs, but it doesn't matter too much how you do it, as you say, there's already some logic to do that in the debugger to show users a name string - so check that between instantiations to check they're the same entity. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Decl/def matching with templates without template parameters in the DW_AT_name 2023-01-14 20:28 ` Tom Tromey 2023-01-16 21:18 ` Simon Marchi @ 2023-01-18 22:12 ` David Blaikie 1 sibling, 0 replies; 11+ messages in thread From: David Blaikie @ 2023-01-18 22:12 UTC (permalink / raw) To: Tom Tromey; +Cc: Simon Marchi via Gdb, Simon Marchi On Sat, Jan 14, 2023 at 12:28 PM Tom Tromey <tom@tromey.com> wrote: > > >>>>> "Simon" == Simon Marchi via Gdb <gdb@sourceware.org> writes: > > Simon> Digging in the history leads me to: > > Simon> https://inbox.sourceware.org/gdb-patches/201007302017.41074.pedro@codesourcery.com/ > > Simon> So RVCT, the RealView compiler. I don't have access to that, > Simon> unfortunately. It seems obsolete, also. > > I don't like that code. It calls into type_print from the reader, which > seems very weird. An approach based on purely traversing the DIE tree > seems preferable to me. > > Anyway, making it work again seems possible. And this time it could > have tests. > > The main thing I would want to avoid here is trying to put this extra > name-construction into the indexer. That will just slow it down -- but > this is normally the most user-visible slow thing in gdb, and most CUs > are of no interest anyway. > > The downside of this decision is that expansion may expand too many > CUs. So for example if there are a million instantiation of template X > and the user types "break X<int>::method", gdb might expand every CU > referencing X and then still only set one breakpoint. > > However if this is an issue I think the solution could be to be more > selective at expansion time. That is, let the user input "X<int>" match > X, but then actually examine the DIE tree to decide if this match should > result in an expansion. > > >>> Is it valid DWARF (5) for DW_AT_name of a templated struct instantiation > >>> to omit the template parameters? I don't see DWARF mandating one or the > >>> other, so I assume that both including them or not are valid. > > >> Yeah, this is a case where DWARF is like "here are some tools you > >> could use to express some language features, have at!" and doesn't say > >> "to describe this particular language feature you must use DWARF in > >> this particular way" > > Simon> Typical "DWARF is a permissive standard, not a prescriptive one" thing. > > For Rust, my view was that a language ought to also have a "binding" to > DWARF, to write down how DWARF features are in fact used by the > language. DWARF does not really take this view, though, which is why > there are a tags with different names but vaguely similar > meanings... just one of the many ways that DWARF is bad. Agreed. DWARF ends up being more like XML with a library of tags with suggestions at best, but certainly not "this means this and only this" & so debuggers/compilers end up with ad-hoc agreements about how certain features should be encoded. Mostly GCC and GDB get to set that pseudostandard and Clang/lldb for the most part follow suit, though some amount of Clang+LLDB do some things together, moreso on MacOS. I'm always happy to chat more about these things/help set direction on the Clang side, at least. > > Simon> I just found this: > Simon> http://wiki.dwarfstd.org/index.php?title=Best_Practices#Names_of_Program_Entities > > This says it "should have a canonical representation" but neglects to > say what that representation should be, so IMO it can't really be relied > upon by debuggers. > > It would be a real improvement to debug reading if the canonical form > were in fact reliable across environments -- i.e., proscribed. gdb > could avoid all name canonicalization during debug reading, which is a > major point of serialization. > > This affects other languages as well, for example if Fortran and Ada > specified a canonical case folding... while this would make gdb output > slightly inconsistent with the source, it would also mean we could > perhaps sanely handle some situations that are messy today -- see the > recent discussion of strcasecmp and Unicode. Though note that DWARF > also neglects to specify a Unicode normalization. > > Tom ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Decl/def matching with templates without template parameters in the DW_AT_name 2023-01-12 1:46 ` Simon Marchi 2023-01-14 20:28 ` Tom Tromey @ 2023-01-18 22:01 ` David Blaikie 1 sibling, 0 replies; 11+ messages in thread From: David Blaikie @ 2023-01-18 22:01 UTC (permalink / raw) To: Simon Marchi; +Cc: gdb On Wed, Jan 11, 2023 at 5:46 PM Simon Marchi <simark@simark.ca> wrote: > Digging in the history leads me to: > > https://inbox.sourceware.org/gdb-patches/201007302017.41074.pedro@codesourcery.com/ > > So RVCT, the RealView compiler. I don't have access to that, > unfortunately. It seems obsolete, also. Oh, that's satisfying/interesting to know - figured /someone/ had done this before, judging by gdb's behavior. > If you end up merging this, it will be interesting to run the full GDB > testsuite against clang with that flag, it would cover a lot of things. Oh, it's already in Clang, under `-gsimple-template-names`, I think even Clang 15's functionality is well baked enough - I finished up the work about a year ago now. Here's a very small example: https://godbolt.org/z/r8zMf5qvr showing the shortened name in the .debug_info, and in the .debug_gdb_pubnames (which ends up in the index, which turns out was the bug I came across... filed here: https://sourceware.org/bugzilla/show_bug.cgi?id=30023 ) As I mentioned, though, there's a bunch of cases where Clang doesn't simplify names because they don't roundtrip well/DWARF is lossy/problematic in some way in the way it describes the template parameters. Some of the interesting (positive and negative/simplified and unsimplified) test cases I encountered were: https://github.com/llvm/llvm-project/blob/main/cross-project-tests/debuginfo-tests/clang_llvm_roundtrip/simplified_template_names.cpp - this checks that I could easily rebuild the original name from llvm-dwarfdump. So, things like templates with lambda parameters aren't simplified because there isn't necessarily enough info to rebuild the name as clang generates them (also GCC generates those names differently - and those are probably harder to roundtrip, but more likely to be canonical/unique/unambiguous), operator overloads were not simplified because it was hard to tell if they were/weren't simplified with all the <> involved in some operator names even without template parameters - etc... ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Decl/def matching with templates without template parameters in the DW_AT_name 2023-01-11 23:50 ` David Blaikie 2023-01-12 1:46 ` Simon Marchi @ 2023-01-12 2:32 ` Simon Marchi 2023-01-18 22:04 ` David Blaikie 1 sibling, 1 reply; 11+ messages in thread From: Simon Marchi @ 2023-01-12 2:32 UTC (permalink / raw) To: David Blaikie; +Cc: gdb > Yeah, this is a case where DWARF is like "here are some tools you > could use to express some language features, have at!" and doesn't say > "to describe this particular language feature you must use DWARF in > this particular way" I just found this: http://wiki.dwarfstd.org/index.php?title=Best_Practices#Names_of_Program_Entities For template instantiations, the DW_AT_name attribute should contain both the source language name of the object and the template parameters that distinguish one instantiation from another. The resulting string should be in the natural form for the language, and should have a canonical representation (i.e., different producers should generate the same representation). For C++, the string should match that produced by the target platform's canonical demangler; spaces should only be inserted where syntactically required by the compiler. Of course, it's not normative. According to the wiki history, this particular bit was added by Cary Coutant, in case you want to ask him why it is considered best practice. I Googled bits of the previous quote, and found this bug about template string canonicalization, and not including template parameters in the string: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94845#c8 And this one, that sounds like the 2 vs 2u thing I talked about earler: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81932 Ah, and the thread that was on the GDB list about that: https://inbox.sourceware.org/gdb/CAATAM3ED7B3wEJFmaZA_MaOtN5EGKSGFmusAf-Mg5hX35D2r6A@mail.gmail.com/ And the corresponding one on llvm-dev, to which you participated: https://lists.llvm.org/pipermail/llvm-dev/2018-March/121541.html It's not all relevant, but I re-read those to remember what the trouble with the name strings with template parameters was. My thought on all this is that if the name strings with template parameters are not reliable and take a lot of space, and the same information is described in a more structured way (as DIEs), then it makes sense to drop the parameters from the string. Simon ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Decl/def matching with templates without template parameters in the DW_AT_name 2023-01-12 2:32 ` Simon Marchi @ 2023-01-18 22:04 ` David Blaikie 0 siblings, 0 replies; 11+ messages in thread From: David Blaikie @ 2023-01-18 22:04 UTC (permalink / raw) To: Simon Marchi; +Cc: gdb On Wed, Jan 11, 2023 at 6:32 PM Simon Marchi <simark@simark.ca> wrote: > > > Yeah, this is a case where DWARF is like "here are some tools you > > could use to express some language features, have at!" and doesn't say > > "to describe this particular language feature you must use DWARF in > > this particular way" > I just found this: > > http://wiki.dwarfstd.org/index.php?title=Best_Practices#Names_of_Program_Entities > > For template instantiations, the DW_AT_name attribute should contain > both the source language name of the object and the template > parameters that distinguish one instantiation from another. The > resulting string should be in the natural form for the language, and > should have a canonical representation (i.e., different producers > should generate the same representation). For C++, the string should > match that produced by the target platform's canonical demangler; > spaces should only be inserted where syntactically required by the > compiler. > > Of course, it's not normative. According to the wiki history, this > particular bit was added by Cary Coutant, in case you want to ask him > why it is considered best practice. > > I Googled bits of the previous quote, and found this bug about template > string canonicalization, and not including template parameters in the > string: > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94845#c8 > > And this one, that sounds like the 2 vs 2u thing I talked about earler: > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81932 > > Ah, and the thread that was on the GDB list about that: > > https://inbox.sourceware.org/gdb/CAATAM3ED7B3wEJFmaZA_MaOtN5EGKSGFmusAf-Mg5hX35D2r6A@mail.gmail.com/ > > And the corresponding one on llvm-dev, to which you participated: > > https://lists.llvm.org/pipermail/llvm-dev/2018-March/121541.html > > It's not all relevant, but I re-read those to remember what the trouble > with the name strings with template parameters was. > > My thought on all this is that if the name strings with template > parameters are not reliable and take a lot of space, and the same > information is described in a more structured way (as DIEs), then it > makes sense to drop the parameters from the string. Yeah, though I'll be the first to admit that the DIEs aren't perfect either - there's a bunch of cases I opted out of the name simplification (keeping their full name) because the DIEs weren't adequate to reproduce the original name (in an effort to be conservatively correct - in some cases it's the original name that's the problem (some of those I fixed, some were too complex for me to fix immediately, so I punted on them and kept the unsimplified name) - lambdas, operator overloads come to mind, but there were some others). It'd be good to get those DIE issues fixed, imho & then be able to rely on simplified names uniformly. (also as I mentioned in the bug I filed about the simplified names + index, there's a bunch of inconsistencies here with both Clang and GCC about alias templates and variable templates, as to whether they get simplified or unsimplified names and whether they get DIEs or not... :/ ) ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2023-01-18 22:12 UTC | newest] Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2023-01-06 17:37 Decl/def matching with templates without template parameters in the DW_AT_name David Blaikie 2023-01-11 18:24 ` Simon Marchi 2023-01-11 23:50 ` David Blaikie 2023-01-12 1:46 ` Simon Marchi 2023-01-14 20:28 ` Tom Tromey 2023-01-16 21:18 ` Simon Marchi 2023-01-18 22:08 ` David Blaikie 2023-01-18 22:12 ` David Blaikie 2023-01-18 22:01 ` David Blaikie 2023-01-12 2:32 ` Simon Marchi 2023-01-18 22:04 ` David Blaikie
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).