From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by sourceware.org (Postfix) with ESMTPS id 9CD593858C3A for ; Thu, 1 Dec 2022 23:23:53 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 9CD593858C3A Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1669937033; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=dDEKew4kHd/DwypTmIq6RjI4kvDKY7SrQ6F0Tky7xpc=; b=CCLhKI1GxIL+PV2BbVt04r+JWPVzPT53u54EgI2U+OahEXW8/osRb/pxR4E3dKPaoyDalU cBneKiqKu5FjdK5IoCQHg+j2glnJ4/a8Xo7rDJodta3iNMlBcARqua7ylS6HlvhexXaw60 pdeuJEweEempWkwPaMdLr4+pbs6bHhQ= Received: from mail-wm1-f70.google.com (mail-wm1-f70.google.com [209.85.128.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-463--SShdTAGPnuk6pn-oyNxcg-1; Thu, 01 Dec 2022 18:23:51 -0500 X-MC-Unique: -SShdTAGPnuk6pn-oyNxcg-1 Received: by mail-wm1-f70.google.com with SMTP id 8-20020a05600c228800b003d0376e42deso1243228wmf.4 for ; Thu, 01 Dec 2022 15:23:51 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=kQ7UVfM+Gc6n/vNSedwqArCLRYR1fS8bt1MBssDU/o0=; b=ioY6jwVYfOjqK4XUV1edO09rub6kSWr+eRbmwLDJMuYjSJZDkE7/uu94E8JYzqTqfs ny1jepX76QXpyrE8h4IkstWxc/xWXw+TbYv6nhkOwAWbecY2JLfisjRA4a9Wtf5X4c2J xgRCDmdVC8XOrHdxX9rXR5T9D6BZkgnmdG5hzIK1Q87NvikDHTP2ZC8drsK1xPyRywot nnx06n26aZJB9x81ZmWz3TRE4o7UL0O2jfn2KnrKJr6MyyMEUCv8z3CjIGNfvwud7rs8 yXFeq7Unz5TsD0jRnFhED0WfQU1svivPr0mPfz4H5RL4xExR/nWfgmD2UeQk1qzg0zD4 4IkA== X-Gm-Message-State: ANoB5pkrbAksbzRpdHxQnEDNH9iwoqpXqmuulqSCpS9apSP/dBBQevoG Ij57kRQ52tmhhtrYG4T5UCPAHfGs8SZADTxd8tGSwl7sD1r8gw0/lOjg5tP/1+ilQ+feGjNKrq7 gCyJwlLmgenvo0kjSZS8dLg== X-Received: by 2002:a05:600c:5388:b0:3cf:37b1:e581 with SMTP id hg8-20020a05600c538800b003cf37b1e581mr48560049wmb.96.1669937030388; Thu, 01 Dec 2022 15:23:50 -0800 (PST) X-Google-Smtp-Source: AA0mqf44rLqTnRvMfL8a3DxpHQY0pb/5niMmmliBwAdTQ+2mrdi2YBsM378BUbFW5yXokmB4AUJkmg== X-Received: by 2002:a05:600c:5388:b0:3cf:37b1:e581 with SMTP id hg8-20020a05600c538800b003cf37b1e581mr48560041wmb.96.1669937029968; Thu, 01 Dec 2022 15:23:49 -0800 (PST) Received: from localhost ([31.111.84.238]) by smtp.gmail.com with ESMTPSA id h130-20020a1c2188000000b003b4fdbb6319sm10366394wmh.21.2022.12.01.15.23.49 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 01 Dec 2022 15:23:49 -0800 (PST) Date: Thu, 1 Dec 2022 23:23:48 +0000 From: Andrew Burgess To: Tom Tromey Cc: gdb-patches@sourceware.org Subject: Re: [PATCH 3/3] Add name canonicalization for C Message-ID: <20221201232348.GD2415@redhat.com> References: <20221107162356.3175221-1-tromey@adacore.com> <20221107162356.3175221-4-tromey@adacore.com> MIME-Version: 1.0 In-Reply-To: <20221107162356.3175221-4-tromey@adacore.com> X-Operating-System: Linux/5.8.18-100.fc31.x86_64 (x86_64) X-Uptime: 23:22:02 up 35 days, 7:23, X-Editor: GNU Emacs [ http://www.gnu.org/software/emacs ] X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=us-ascii Content-Disposition: inline X-Spam-Status: No, score=-11.9 required=5.0 tests=BAYES_00,DKIM_INVALID,DKIM_SIGNED,GIT_PATCH_0,KAM_DMARC_NONE,KAM_DMARC_STATUS,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: * Tom Tromey via Gdb-patches [2022-11-07 09:23:56 -0700]: > PR symtab/29105 shows a number of situations where symbol lookup can > result in the expansion of too many CUs. > > What happens is that lookup_signed_typename will try to look up a type > like "signed int". In cooked_index_functions::expand_symtabs_matching, > when looping over languages, the C++ case will canonicalize this type > name to be "int" instead. Then this method will proceed to expand > every CU that has an entry for "int" -- i.e., nearly all of them. A > crucial component of this is that the caller, objfile::lookup_symbol, > does not do this canonicalization, so when it tries to find the symbol > for "signed int", it fails -- causing the loop to continue. > > This patch fixes the problem by introducing name canonicalization for > C. The idea here is that, by making C and C++ agree on the canonical > name when a symbol name can have multiple spellings, we avoid the bad > behavior in objfile::lookup_symbol (and any other such code -- I don't > know if there is any). > > Unlike C++, C only has a few situations where canonicalization is > needed. And, in particular, due to the lack of overloading (thus > avoiding any issues in linespec) and due to the way c-exp.y works, I > think that no canonicalization is needed during symbol lookup -- only > during symtab construction. This explains why lookup_name_info is not > touched. > > The stabs reader is modified on a "best effort" basis. > > The DWARF reader needed one small tweak in dwarf2_name to avoid a > regression in dw2-unusual-field-names.exp. I think this is adequately > explained by the comment, but basically this is a scenario that should > not occur in real code, only the gdb test suite. > > lookup_signed_typename is simplified. It used to search for two > different type names, but now gdb can search just for the canonical > form. > > gdb.dwarf2/enum-type.exp needed a small tweak, because the > canonicalizer turns "unsigned integer" into "unsigned int integer". > It seems better here to use the correct C type name. > > Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=29105 > --- > gdb/c-lang.c | 14 ++++++++++++ > gdb/c-lang.h | 5 +++++ > gdb/dbxread.c | 13 +++++++++++ > gdb/dwarf2/cooked-index.c | 8 +++++-- > gdb/dwarf2/read.c | 18 +++++++++++++++- > gdb/gdbtypes.c | 12 +++-------- > gdb/stabsread.c | 30 ++++++++++++++++---------- > gdb/testsuite/gdb.dwarf2/enum-type.exp | 4 ++-- > 8 files changed, 79 insertions(+), 25 deletions(-) > > diff --git a/gdb/c-lang.c b/gdb/c-lang.c > index 36b4d1ae3dd..bfad7aeee60 100644 > --- a/gdb/c-lang.c > +++ b/gdb/c-lang.c > @@ -727,6 +727,20 @@ c_is_string_type_p (struct type *type) > > > > +/* See c-lang.h. */ > + > +gdb::unique_xmalloc_ptr > +c_canonicalize_name (const char *name) > +{ > + if (strchr (name, ' ') != nullptr > + || streq (name, "signed") > + || streq (name, "unsigned")) > + return cp_canonicalize_string (name); > + return nullptr; > +} > + > + > + > void > c_language_arch_info (struct gdbarch *gdbarch, > struct language_arch_info *lai) > diff --git a/gdb/c-lang.h b/gdb/c-lang.h > index 93515671d80..652f147f656 100644 > --- a/gdb/c-lang.h > +++ b/gdb/c-lang.h > @@ -167,4 +167,9 @@ extern std::string cplus_compute_program (compile_instance *inst, > const struct block *expr_block, > CORE_ADDR expr_pc); > > +/* Return the canonical form of the C symbol NAME. If NAME is already > + canonical, return nullptr. */ > + > +extern gdb::unique_xmalloc_ptr c_canonicalize_name (const char *name); > + > #endif /* !defined (C_LANG_H) */ > diff --git a/gdb/dbxread.c b/gdb/dbxread.c > index b0047cf0e79..ae726bdfcc6 100644 > --- a/gdb/dbxread.c > +++ b/gdb/dbxread.c > @@ -48,6 +48,7 @@ > #include "complaints.h" > #include "cp-abi.h" > #include "cp-support.h" > +#include "c-lang.h" > #include "psympriv.h" > #include "block.h" > #include "aout/aout64.h" > @@ -1444,6 +1445,18 @@ read_dbx_symtab (minimal_symbol_reader &reader, > new_name.get ()); > } > } > + else if (psymtab_language == language_c) > + { > + std::string name (namestring, p - namestring); > + gdb::unique_xmalloc_ptr new_name > + = c_canonicalize_name (name.c_str ()); > + if (new_name != nullptr) > + { > + sym_len = strlen (new_name.get ()); > + sym_name = obstack_strdup (&objfile->objfile_obstack, > + new_name.get ()); > + } > + } > > if (sym_len == 0) > { > diff --git a/gdb/dwarf2/cooked-index.c b/gdb/dwarf2/cooked-index.c > index a580d549d0d..0aa026c7779 100644 > --- a/gdb/dwarf2/cooked-index.c > +++ b/gdb/dwarf2/cooked-index.c > @@ -21,6 +21,7 @@ > #include "dwarf2/cooked-index.h" > #include "dwarf2/read.h" > #include "cp-support.h" > +#include "c-lang.h" > #include "ada-lang.h" > #include "split-name.h" > #include > @@ -210,14 +211,17 @@ cooked_index::do_finalize () > m_names.push_back (std::move (canon_name)); > } > } > - else if (entry->per_cu->lang () == language_cplus) > + else if (entry->per_cu->lang () == language_cplus > + || entry->per_cu->lang () == language_c) > { > void **slot = htab_find_slot (seen_names.get (), entry, > INSERT); > if (*slot == nullptr) > { > gdb::unique_xmalloc_ptr canon_name > - = cp_canonicalize_string (entry->name); > + = (entry->per_cu->lang () == language_cplus > + ? cp_canonicalize_string (entry->name) > + : c_canonicalize_name (entry->name)); > if (canon_name == nullptr) > entry->canonical = entry->name; > else > diff --git a/gdb/dwarf2/read.c b/gdb/dwarf2/read.c > index 978dd4d0bb9..0826f907800 100644 > --- a/gdb/dwarf2/read.c > +++ b/gdb/dwarf2/read.c > @@ -22001,7 +22001,10 @@ static const char * > dwarf2_canonicalize_name (const char *name, struct dwarf2_cu *cu, > struct objfile *objfile) > { > - if (name && cu->lang () == language_cplus) > + if (name == nullptr) > + return name; > + > + if (cu->lang () == language_cplus) > { > gdb::unique_xmalloc_ptr canon_name > = cp_canonicalize_string (name); > @@ -22009,6 +22012,14 @@ dwarf2_canonicalize_name (const char *name, struct dwarf2_cu *cu, > if (canon_name != nullptr) > name = objfile->intern (canon_name.get ()); > } > + else if (cu->lang () == language_c) > + { > + gdb::unique_xmalloc_ptr canon_name > + = c_canonicalize_name (name); > + > + if (canon_name != nullptr) > + name = objfile->intern (canon_name.get ()); > + } > > return name; > } > @@ -22037,6 +22048,11 @@ dwarf2_name (struct die_info *die, struct dwarf2_cu *cu) > > switch (die->tag) > { > + /* A member's name should not be canonicalized. This is a bit > + of a hack, in that normally it should not be possible to run > + into this situation; however, the dw2-unusual-field-names.exp > + test creates custom DWARF that does. */ > + case DW_TAG_member: > case DW_TAG_compile_unit: > case DW_TAG_partial_unit: > /* Compilation units have a DW_AT_name that is a filename, not > diff --git a/gdb/gdbtypes.c b/gdb/gdbtypes.c > index a43d9265ad2..d6e8109a95c 100644 > --- a/gdb/gdbtypes.c > +++ b/gdb/gdbtypes.c > @@ -1721,15 +1721,9 @@ lookup_unsigned_typename (const struct language_defn *language, > struct type * > lookup_signed_typename (const struct language_defn *language, const char *name) > { > - struct type *t; > - char *uns = (char *) alloca (strlen (name) + 8); > - > - strcpy (uns, "signed "); > - strcpy (uns + 7, name); > - t = lookup_typename (language, uns, NULL, 1); > - /* If we don't find "signed FOO" just try again with plain "FOO". */ > - if (t != NULL) > - return t; > + /* In C and C++, "char" and "signed char" are distinct types. */ > + if (streq (name, "char")) > + name = "signed char"; I wondered why this "char" -> "signed char" conversion is done unconditionally for all languages, when the comment hints that the conversion only applies for C/C++? I guess I would have expected a language check here. Thanks, Andrew > return lookup_typename (language, name, NULL, 0); > } > > diff --git a/gdb/stabsread.c b/gdb/stabsread.c > index 612443557b5..74d0885fa71 100644 > --- a/gdb/stabsread.c > +++ b/gdb/stabsread.c > @@ -736,11 +736,13 @@ define_symbol (CORE_ADDR valu, const char *string, int desc, int type, > > if (sym->language () == language_cplus) > { > - char *name = (char *) alloca (p - string + 1); > - > - memcpy (name, string, p - string); > - name[p - string] = '\0'; > - new_name = cp_canonicalize_string (name); > + std::string name (string, p - string); > + new_name = cp_canonicalize_string (name.c_str ()); > + } > + else if (sym->language () == language_c) > + { > + std::string name (string, p - string); > + new_name = c_canonicalize_name (name.c_str ()); > } > if (new_name != nullptr) > sym->compute_and_set_names (new_name.get (), true, objfile->per_bfd); > @@ -1592,12 +1594,18 @@ read_type (const char **pp, struct objfile *objfile) > type_name = NULL; > if (get_current_subfile ()->language == language_cplus) > { > - char *name = (char *) alloca (p - *pp + 1); > - > - memcpy (name, *pp, p - *pp); > - name[p - *pp] = '\0'; > - > - gdb::unique_xmalloc_ptr new_name = cp_canonicalize_string (name); > + std::string name (*pp, p - *pp); > + gdb::unique_xmalloc_ptr new_name > + = cp_canonicalize_string (name.c_str ()); > + if (new_name != nullptr) > + type_name = obstack_strdup (&objfile->objfile_obstack, > + new_name.get ()); > + } > + else if (get_current_subfile ()->language == language_c) > + { > + std::string name (*pp, p - *pp); > + gdb::unique_xmalloc_ptr new_name > + = c_canonicalize_name (name.c_str ()); > if (new_name != nullptr) > type_name = obstack_strdup (&objfile->objfile_obstack, > new_name.get ()); > diff --git a/gdb/testsuite/gdb.dwarf2/enum-type.exp b/gdb/testsuite/gdb.dwarf2/enum-type.exp > index ed8e3a35d69..6ebaefa6fb1 100644 > --- a/gdb/testsuite/gdb.dwarf2/enum-type.exp > +++ b/gdb/testsuite/gdb.dwarf2/enum-type.exp > @@ -43,7 +43,7 @@ Dwarf::assemble $asm_file { > uinteger_label: DW_TAG_base_type { > {DW_AT_byte_size 4 DW_FORM_sdata} > {DW_AT_encoding @DW_ATE_unsigned} > - {DW_AT_name {unsigned integer}} > + {DW_AT_name {unsigned int}} > } > > DW_TAG_enumeration_type { > @@ -79,5 +79,5 @@ gdb_test "print sizeof(enum E)" " = 4" > gdb_test "ptype enum EU" "type = enum EU {TWO = 2}" \ > "ptype EU in enum C" > gdb_test_no_output "set lang c++" > -gdb_test "ptype enum EU" "type = enum EU : unsigned integer {TWO = 2}" \ > +gdb_test "ptype enum EU" "type = enum EU : unsigned int {TWO = 2}" \ > "ptype EU in C++" > -- > 2.34.3 >