From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oa1-x2e.google.com (mail-oa1-x2e.google.com [IPv6:2001:4860:4864:20::2e]) by sourceware.org (Postfix) with ESMTPS id 6B94B385B529 for ; Wed, 11 Jan 2023 23:50:24 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 6B94B385B529 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-oa1-x2e.google.com with SMTP id 586e51a60fabf-15085b8a2f7so17284681fac.2 for ; Wed, 11 Jan 2023 15:50:24 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=rsuZyhQjnyD3TX4adB0wKGYuJO/pjXUD710ZTC8B+Ds=; b=h8KZkdXeWab1+kvDejJdCua6N+WjuWoXiM724wb2mgZ0dXbc6PE7XGlp25om4xdBE6 pHuIVuwkL+cJJx8WXNjexzos3Z2wvl2vIZa/TvdNVguGthdlY9g5EFMRnVY/xE601LsM 88KTGxh1alsDQVNk6ijBU5YsXIupomKh5t3lSYvASQi38bzGNn0H+Rlr0XBFE2vxOKiq jdj4OzbZXv5QjH37xUV4+w+K2LtVEHduL7bn4teU/IdtwZplJ4e2Rzi+GmeVWP5Rfiho sdw6b0C4obQPhPy/VV377c2gFJ3lmbvirIP6R0yzxr/E1CJhVSBiihl2A0kK3lqyhWOI gFng== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=rsuZyhQjnyD3TX4adB0wKGYuJO/pjXUD710ZTC8B+Ds=; b=D86NKan6Dt6o7NLNVQ9xnigrAycnM0xmC1d5xmIglun8sIlQ2LBt7GRI9AWhIYt5Hq lRJZnlRun6YexKuOqmb44R3UNbHE9eiGQEJLnx3wKOBepyiaC6fZyXGKKieMoK2o0vMn 2/JR9pzrwBMuBShU/gDuHcCSjFczLXPNPFDNHaISoyTtollYPr6AwCUs/QLM2oFRsgq0 luQq8pIvFOnLPHarNavPJXufoI56zORVjI5rxkE6IsnjLhQK9ep//FppTmtOL8EuJ0r/ fdFxOP7kF1aCiFYjDYXzi5qHwnM69i3ydhNZTpT9Nw3oqwVAS5GTnS9AIoQe+q+02DGC nSsQ== X-Gm-Message-State: AFqh2kopGbhkDqnzfp7Vo8dn577xq+CQfhf7vHSa5kGvujugB/yEf0Ka 1RQgQy+94Bq7C1CS9rylUtBkA+B03R1u+nhefeZBtEkubDg= X-Google-Smtp-Source: AMrXdXuHsSY7EohN9yQm1UZ4YeXVJHZuQRrgnYyjcW9f/zXIpY7H8Zu+3N4YZPNluAEd07a990b9w5oyYBq6VJoXtSg= X-Received: by 2002:a05:6870:c226:b0:150:2665:677a with SMTP id z38-20020a056870c22600b001502665677amr2525544oae.177.1673481023725; Wed, 11 Jan 2023 15:50:23 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: David Blaikie Date: Wed, 11 Jan 2023 15:50:11 -0800 Message-ID: Subject: Re: Decl/def matching with templates without template parameters in the DW_AT_name To: Simon Marchi Cc: gdb Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=0.6 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Wed, Jan 11, 2023 at 10:24 AM Simon Marchi wrote: > > > > On 1/6/23 12:37, David Blaikie via Gdb wrote: > > I've been working on reducing debug info size, especially for > > expression-template heavy code like in Tensorflow and Eigen. > > > > To that end, I've implemented the flag -gsimple-template-names in > > Clang that simplifies the DW_AT_name for templates by omitting > > template parameters (eg: "vector" instead of "vector > std::allocator>") when the DW_TAG_template_*_parameters contain > > sufficient information to reconstruct the original name. > > This reduces the length of strings, and more importantly it allows more > sharing, is that it? Yep. Especially for complex expression templates - the fully expanded template names can be massive (some examples I had involve 50k character names... ) - they don't have the benefits of mangled names that include redundancy elimination/deduplication at least within a single name (admittedly the mangled names for these things are massive too - and at some point I might start some discussions about alternative mangling/symbol naming schemes, for those who can diverge from the Itanium ABI) > It doesn't change the layout of the DIE tree? Correct... well, in one case it does change the DIE tree - template declarations. GCC and Clang produce template declarations without template parameter DIEs, eg: $ cat > test.cpp template struct t1 { }; t1 *v; $ clang++-tot test.cpp -g -c && llvm-dwarfdump-tot test.o ... 0x0000002e: DW_TAG_structure_type DW_AT_name ("t1") DW_AT_declaration (true) 0x00000030: NULL So if we dropped the `` from that name, there'd be no hope to connect the declaration with the right definition. So when using -gsimple-template-names, we enable template parameter DIEs on declarations. (Sony implemented this previously, presumably because their debugger needed it for something like this - but it wasn't enabled by defaulte because GDB and LLDB didn't seem to need these extra DIEs so it wasn't worth the extra size - but maybe it's not so much to be a real problem/maybe we should switch that on by default at some point if we're going further in this direction). > > This generally seems to work in GDB - that looks intentional (perhaps > > because someone else implemented this feature elsewhere, or just for > > canonicalization reasons (the full string with template parameters > > might have different whitespace, parentheses, other formatting)), it > > seems unlikely that GDB would accidentally be able to connect two > > "vector" declarations up to the right "vector" definitions based on > > the template parameters without intentional code to support this > > scenario. > > Probably this code: > > https://gitlab.com/gnutools/binutils-gdb/-/blob/1b9af5b949bff0c750ededb459400c1857fec416/gdb/dwarf2/read.c#L9002 Something like that. Though the comment here is interesting: https://gitlab.com/gnutools/binutils-gdb/-/blob/1b9af5b949bff0c750ededb459400c1857fec416/gdb/dwarf2/read.c#L8944 /* Template parameters may be specified in the DIE's DW_AT_name, or as children with DW_TAG_template_type_param or DW_TAG_value_type_param. If the latter, add them to the name here. If the name already has template parameters, then skip this step; some versions of GCC emit both, and it is more efficient to use the pre-computed name. At least usually GCC and Clang produce both (DW_AT_name with parameters, and DIEs for the parameters) for definitions, and name-only, no DIEs for declarations. I wonder which compiler has previously produced simplified names - it'd be nice as a proof-of-concept/existence proof/prior art for the new Clang feature. It's also interesting that it says to use the name if it's already got them - that suggests no canonicalization, but I'm pretty sure GDB does do some canonicalization to allow fuzzy matching of decls/defs between Clang and GCC (& whatever other compilers) which might have minor differences between how they print template parameters. But yeah, that code that specifically says "if there's no '<' in the name, go and rebuild the name" is very explicit support for the -gsimple-template-names naming... someone must've done this before. > > But one place I found a problem was in pretty printers. I have > > situations where a type declaration isn't resolved to a definition > > when working within a pretty printer (resulting in the pretty printer > > being unable to find member variables in the object/of that type) - > > but if I print out the variable/member it works correctly - or if I > > "list" the source of the file where the definition of the type is, > > then the pretty printer works correctly. > > > > Does this ring a bell for anyone/sound like something sufficient to > > file a bug, or would I need to dig in further to create an isolated > > reproduction to help communicate the issue I'm seeing? > > I have no idea about what happens without digging into the code. I just > remember some bug where GDB would generate the templated name with let's > say "<5u>" for an integer value template parameter, so if you entered > "<5>", it didn't match textually. Maybe something like that is > happening. > > I also found: > > https://sourceware.org/bugzilla/show_bug.cgi?id=17762 Yeah, I certainly hit a few cases like that when testing -gsimple-template-names canonicalization/roundtripping to make sure it wasn't lossy. > There's the possibility that the "partial symbol" and "expanded symbol" > don't have the same name. So you could see a difference in behavior > depending on whether the CU has been expanded by GDB or not (I'm just > speculating). > > Is it valid DWARF (5) for DW_AT_name of a templated struct instantiation > to omit the template parameters? I don't see DWARF mandating one or the > other, so I assume that both including them or not are valid. Yeah, this is a case where DWARF is like "here are some tools you could use to express some language features, have at!" and doesn't say "to describe this particular language feature you must use DWARF in this particular way" > If so, I > think we can consider it a GDB bug if it doesn't process the "without" > version correctly. Feel free to file a bug with the relevant background > information. It would be useful to include some binaries, one compiled > with -gsimple-tempalte-names and one without, so it's easy to compare > the two. Yeah, for now I'm only seeing the problem with pretty printers, so to provide repro steps will be a bit more involved. (I've also got a crash that seems to happen when using the feature without -gsplit-dwarf (it's probably reproducable with -gsplit-dwarf too, but we enable -Wl,-gdb-index in that case and so GDB doesn't parse as much DWARF on startup - but if we tickled it into loading more DWARF it'd probably hit the same crash, I'd wager)) I'll see what I can do.