From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <tromey@sourceware.org>
Received: by sourceware.org (Postfix, from userid 2126)
	id 199613858CDA; Thu,  1 Dec 2022 18:18:52 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 199613858CDA
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org;
	s=default; t=1669918732;
	bh=NpZmRGljZyRc7oIHebpRWySPffuCA5oA1CBpTskLBx0=;
	h=From:To:Subject:Date:From;
	b=p02oAN9M35cX0lei3xubEO73BjWzy13rpAPKdcroHEwvOOExHRhqoZ9tfR/3FDH1z
	 eNrsbbgg+KwK/1z8ki3bshtjfQx89N0PRdz7rCMrGsGz1HyIei37/MW3ZKKXVsAGOq
	 nBUrktakRjRIaEt9T0YcFPbCsDJN4I0itkZABUC4=
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
From: Tom Tromey <tromey@sourceware.org>
To: gdb-cvs@sourceware.org
Subject: [binutils-gdb] Add name canonicalization for C
X-Act-Checkin: binutils-gdb
X-Git-Author: Tom Tromey <tromey@adacore.com>
X-Git-Refname: refs/heads/master
X-Git-Oldrev: bed34ce7058b56d3a1e171de31df2a0a30afb8fd
X-Git-Newrev: 55fc1623f942fba10362cb199f9356d75ca5835b
Message-Id: <20221201181852.199613858CDA@sourceware.org>
Date: Thu,  1 Dec 2022 18:18:52 +0000 (GMT)
List-Id: <gdb-cvs.sourceware.org>

https://sourceware.org/git/gitweb.cgi?p=3Dbinutils-gdb.git;h=3D55fc1623f942=
fba10362cb199f9356d75ca5835b

commit 55fc1623f942fba10362cb199f9356d75ca5835b
Author: Tom Tromey <tromey@adacore.com>
Date:   Thu Nov 3 13:49:17 2022 -0600

    Add name canonicalization for C
   =20
    PR symtab/29105 shows a number of situations where symbol lookup can
    result in the expansion of too many CUs.
   =20
    What happens is that lookup_signed_typename will try to look up a type
    like "signed int".  In cooked_index_functions::expand_symtabs_matching,
    when looping over languages, the C++ case will canonicalize this type
    name to be "int" instead.  Then this method will proceed to expand
    every CU that has an entry for "int" -- i.e., nearly all of them.  A
    crucial component of this is that the caller, objfile::lookup_symbol,
    does not do this canonicalization, so when it tries to find the symbol
    for "signed int", it fails -- causing the loop to continue.
   =20
    This patch fixes the problem by introducing name canonicalization for
    C.  The idea here is that, by making C and C++ agree on the canonical
    name when a symbol name can have multiple spellings, we avoid the bad
    behavior in objfile::lookup_symbol (and any other such code -- I don't
    know if there is any).
   =20
    Unlike C++, C only has a few situations where canonicalization is
    needed.  And, in particular, due to the lack of overloading (thus
    avoiding any issues in linespec) and due to the way c-exp.y works, I
    think that no canonicalization is needed during symbol lookup -- only
    during symtab construction.  This explains why lookup_name_info is not
    touched.
   =20
    The stabs reader is modified on a "best effort" basis.
   =20
    The DWARF reader needed one small tweak in dwarf2_name to avoid a
    regression in dw2-unusual-field-names.exp.  I think this is adequately
    explained by the comment, but basically this is a scenario that should
    not occur in real code, only the gdb test suite.
   =20
    lookup_signed_typename is simplified.  It used to search for two
    different type names, but now gdb can search just for the canonical
    form.
   =20
    gdb.dwarf2/enum-type.exp needed a small tweak, because the
    canonicalizer turns "unsigned integer" into "unsigned int integer".
    It seems better here to use the correct C type name.
   =20
    Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=3D29105
    Tested-by: Simon Marchi <simark@simark.ca>
    Reviewed-by: Andrew Burgess <aburgess@redhat.com>

Diff:
---
 gdb/c-lang.c                           | 14 ++++++++++++++
 gdb/c-lang.h                           |  5 +++++
 gdb/dbxread.c                          | 13 +++++++++++++
 gdb/dwarf2/cooked-index.c              |  8 ++++++--
 gdb/dwarf2/read.c                      | 18 +++++++++++++++++-
 gdb/gdbtypes.c                         | 12 +++---------
 gdb/stabsread.c                        | 30 +++++++++++++++++++-----------
 gdb/testsuite/gdb.dwarf2/enum-type.exp |  6 +++---
 8 files changed, 80 insertions(+), 26 deletions(-)

diff --git a/gdb/c-lang.c b/gdb/c-lang.c
index e15541f8175..46c0da0ff79 100644
--- a/gdb/c-lang.c
+++ b/gdb/c-lang.c
@@ -727,6 +727,20 @@ c_is_string_type_p (struct type *type)
=20
 =0C
=20
+/* See c-lang.h.  */
+
+gdb::unique_xmalloc_ptr<char>
+c_canonicalize_name (const char *name)
+{
+  if (strchr (name, ' ') !=3D nullptr
+      || streq (name, "signed")
+      || streq (name, "unsigned"))
+    return cp_canonicalize_string (name);
+  return nullptr;
+}
+
+=0C
+
 void
 c_language_arch_info (struct gdbarch *gdbarch,
 		      struct language_arch_info *lai)
diff --git a/gdb/c-lang.h b/gdb/c-lang.h
index 93515671d80..652f147f656 100644
--- a/gdb/c-lang.h
+++ b/gdb/c-lang.h
@@ -167,4 +167,9 @@ extern std::string cplus_compute_program (compile_insta=
nce *inst,
 					  const struct block *expr_block,
 					  CORE_ADDR expr_pc);
=20
+/* Return the canonical form of the C symbol NAME.  If NAME is already
+   canonical, return nullptr.  */
+
+extern gdb::unique_xmalloc_ptr<char> c_canonicalize_name (const char *name=
);
+
 #endif /* !defined (C_LANG_H) */
diff --git a/gdb/dbxread.c b/gdb/dbxread.c
index b0047cf0e79..ae726bdfcc6 100644
--- a/gdb/dbxread.c
+++ b/gdb/dbxread.c
@@ -48,6 +48,7 @@
 #include "complaints.h"
 #include "cp-abi.h"
 #include "cp-support.h"
+#include "c-lang.h"
 #include "psympriv.h"
 #include "block.h"
 #include "aout/aout64.h"
@@ -1444,6 +1445,18 @@ read_dbx_symtab (minimal_symbol_reader &reader,
 					     new_name.get ());
 		}
 	    }
+	  else if (psymtab_language =3D=3D language_c)
+	    {
+	      std::string name (namestring, p - namestring);
+	      gdb::unique_xmalloc_ptr<char> new_name
+		=3D c_canonicalize_name (name.c_str ());
+	      if (new_name !=3D nullptr)
+		{
+		  sym_len =3D strlen (new_name.get ());
+		  sym_name =3D obstack_strdup (&objfile->objfile_obstack,
+					     new_name.get ());
+		}
+	    }
=20
 	  if (sym_len =3D=3D 0)
 	    {
diff --git a/gdb/dwarf2/cooked-index.c b/gdb/dwarf2/cooked-index.c
index a580d549d0d..0aa026c7779 100644
--- a/gdb/dwarf2/cooked-index.c
+++ b/gdb/dwarf2/cooked-index.c
@@ -21,6 +21,7 @@
 #include "dwarf2/cooked-index.h"
 #include "dwarf2/read.h"
 #include "cp-support.h"
+#include "c-lang.h"
 #include "ada-lang.h"
 #include "split-name.h"
 #include <algorithm>
@@ -210,14 +211,17 @@ cooked_index::do_finalize ()
 	      m_names.push_back (std::move (canon_name));
 	    }
 	}
-      else if (entry->per_cu->lang () =3D=3D language_cplus)
+      else if (entry->per_cu->lang () =3D=3D language_cplus
+	       || entry->per_cu->lang () =3D=3D language_c)
 	{
 	  void **slot =3D htab_find_slot (seen_names.get (), entry,
 					INSERT);
 	  if (*slot =3D=3D nullptr)
 	    {
 	      gdb::unique_xmalloc_ptr<char> canon_name
-		=3D cp_canonicalize_string (entry->name);
+		=3D (entry->per_cu->lang () =3D=3D language_cplus
+		   ? cp_canonicalize_string (entry->name)
+		   : c_canonicalize_name (entry->name));
 	      if (canon_name =3D=3D nullptr)
 		entry->canonical =3D entry->name;
 	      else
diff --git a/gdb/dwarf2/read.c b/gdb/dwarf2/read.c
index aa13d42ad77..032e20af93a 100644
--- a/gdb/dwarf2/read.c
+++ b/gdb/dwarf2/read.c
@@ -22014,7 +22014,10 @@ static const char *
 dwarf2_canonicalize_name (const char *name, struct dwarf2_cu *cu,
 			  struct objfile *objfile)
 {
-  if (name && cu->lang () =3D=3D language_cplus)
+  if (name =3D=3D nullptr)
+    return name;
+
+  if (cu->lang () =3D=3D language_cplus)
     {
       gdb::unique_xmalloc_ptr<char> canon_name
 	=3D cp_canonicalize_string (name);
@@ -22022,6 +22025,14 @@ dwarf2_canonicalize_name (const char *name, struct=
 dwarf2_cu *cu,
       if (canon_name !=3D nullptr)
 	name =3D objfile->intern (canon_name.get ());
     }
+  else if (cu->lang () =3D=3D language_c)
+    {
+      gdb::unique_xmalloc_ptr<char> canon_name
+	=3D c_canonicalize_name (name);
+
+      if (canon_name !=3D nullptr)
+	name =3D objfile->intern (canon_name.get ());
+    }
=20
   return name;
 }
@@ -22050,6 +22061,11 @@ dwarf2_name (struct die_info *die, struct dwarf2_c=
u *cu)
=20
   switch (die->tag)
     {
+      /* A member's name should not be canonicalized.  This is a bit
+	 of a hack, in that normally it should not be possible to run
+	 into this situation; however, the dw2-unusual-field-names.exp
+	 test creates custom DWARF that does.  */
+    case DW_TAG_member:
     case DW_TAG_compile_unit:
     case DW_TAG_partial_unit:
       /* Compilation units have a DW_AT_name that is a filename, not
diff --git a/gdb/gdbtypes.c b/gdb/gdbtypes.c
index 5e8a486d28f..2166257f71e 100644
--- a/gdb/gdbtypes.c
+++ b/gdb/gdbtypes.c
@@ -1729,15 +1729,9 @@ lookup_unsigned_typename (const struct language_defn=
 *language,
 struct type *
 lookup_signed_typename (const struct language_defn *language, const char *=
name)
 {
-  struct type *t;
-  char *uns =3D (char *) alloca (strlen (name) + 8);
-
-  strcpy (uns, "signed ");
-  strcpy (uns + 7, name);
-  t =3D lookup_typename (language, uns, NULL, 1);
-  /* If we don't find "signed FOO" just try again with plain "FOO".  */
-  if (t !=3D NULL)
-    return t;
+  /* In C and C++, "char" and "signed char" are distinct types.  */
+  if (streq (name, "char"))
+    name =3D "signed char";
   return lookup_typename (language, name, NULL, 0);
 }
=20
diff --git a/gdb/stabsread.c b/gdb/stabsread.c
index 612443557b5..74d0885fa71 100644
--- a/gdb/stabsread.c
+++ b/gdb/stabsread.c
@@ -736,11 +736,13 @@ define_symbol (CORE_ADDR valu, const char *string, in=
t desc, int type,
=20
       if (sym->language () =3D=3D language_cplus)
 	{
-	  char *name =3D (char *) alloca (p - string + 1);
-
-	  memcpy (name, string, p - string);
-	  name[p - string] =3D '\0';
-	  new_name =3D cp_canonicalize_string (name);
+	  std::string name (string, p - string);
+	  new_name =3D cp_canonicalize_string (name.c_str ());
+	}
+      else if (sym->language () =3D=3D language_c)
+	{
+	  std::string name (string, p - string);
+	  new_name =3D c_canonicalize_name (name.c_str ());
 	}
       if (new_name !=3D nullptr)
 	sym->compute_and_set_names (new_name.get (), true, objfile->per_bfd);
@@ -1592,12 +1594,18 @@ again:
 	  type_name =3D NULL;
 	  if (get_current_subfile ()->language =3D=3D language_cplus)
 	    {
-	      char *name =3D (char *) alloca (p - *pp + 1);
-
-	      memcpy (name, *pp, p - *pp);
-	      name[p - *pp] =3D '\0';
-
-	      gdb::unique_xmalloc_ptr<char> new_name =3D cp_canonicalize_string (=
name);
+	      std::string name (*pp, p - *pp);
+	      gdb::unique_xmalloc_ptr<char> new_name
+		=3D cp_canonicalize_string (name.c_str ());
+	      if (new_name !=3D nullptr)
+		type_name =3D obstack_strdup (&objfile->objfile_obstack,
+					    new_name.get ());
+	    }
+	  else if (get_current_subfile ()->language =3D=3D language_c)
+	    {
+	      std::string name (*pp, p - *pp);
+	      gdb::unique_xmalloc_ptr<char> new_name
+		=3D c_canonicalize_name (name.c_str ());
 	      if (new_name !=3D nullptr)
 		type_name =3D obstack_strdup (&objfile->objfile_obstack,
 					    new_name.get ());
diff --git a/gdb/testsuite/gdb.dwarf2/enum-type.exp b/gdb/testsuite/gdb.dwa=
rf2/enum-type.exp
index ed8e3a35d69..983b415bfdb 100644
--- a/gdb/testsuite/gdb.dwarf2/enum-type.exp
+++ b/gdb/testsuite/gdb.dwarf2/enum-type.exp
@@ -37,13 +37,13 @@ Dwarf::assemble $asm_file {
             integer_label: DW_TAG_base_type {
                 {DW_AT_byte_size 4 DW_FORM_sdata}
                 {DW_AT_encoding  @DW_ATE_signed}
-                {DW_AT_name      integer}
+                {DW_AT_name      int}
             }
=20
             uinteger_label: DW_TAG_base_type {
                 {DW_AT_byte_size 4 DW_FORM_sdata}
                 {DW_AT_encoding  @DW_ATE_unsigned}
-                {DW_AT_name      {unsigned integer}}
+		{DW_AT_name      {unsigned int}}
             }
=20
 	    DW_TAG_enumeration_type {
@@ -79,5 +79,5 @@ gdb_test "print sizeof(enum E)" " =3D 4"
 gdb_test "ptype enum EU" "type =3D enum EU {TWO =3D 2}" \
     "ptype EU in enum C"
 gdb_test_no_output "set lang c++"
-gdb_test "ptype enum EU" "type =3D enum EU : unsigned integer {TWO =3D 2}"=
 \
+gdb_test "ptype enum EU" "type =3D enum EU : unsigned int {TWO =3D 2}" \
     "ptype EU in C++"