[PATCH] Allow DW_ATE_UTF for Rust characters

public inbox for gdb-patches@sourceware.org
 help / color / mirror / Atom feed

* [PATCH] Allow DW_ATE_UTF for Rust characters
@ 2021-10-31 17:17 Tom Tromey
  2021-11-10 19:57 ` Tom Tromey
  2021-11-21 13:33 ` Joel Brobecker
  0 siblings, 2 replies; 5+ messages in thread
From: Tom Tromey @ 2021-10-31 17:17 UTC (permalink / raw)
  To: gdb-patches; +Cc: Tom Tromey

The Rust compiler plans to change the encoding of a Rust 'char' type
to use DW_ATE_UTF.  You can see the discussion here:

    https://github.com/rust-lang/rust/pull/89887

However, this fails in gdb.  I looked into this, and it turns out that
the handling of DW_ATE_UTF is currently fairly specific to C++.  In
particular, the code here assumes the C++ type names, and it creates
an integer type.

This comes from commit 53e710acd ("GDB thinks char16_t and char32_t
are signed in C++").  The message says:

    Both places need fixing.  But since I couldn't tell why dwarf2read.c
    needs to create a new type, I've made it use the per-arch built-in
    types instead, so that the types are only created once per arch
    instead of once per objfile.  That seems to work fine.

... which is fine, but it seems to me that it's also correct to make a
new character type; and this approach is better because it preserves
the type name as well.  This does use more memory, but first we
shouldn't be too concerned about the memory use of types coming from
debuginfo; and second, if we are, we should implement type interning
anyway.

Changing this code to use a character type revealed a couple of
oddities in the C/C++ handling of TYPE_CODE_CHAR.  This patch fixes
these as well.
---
 gdb/c-lang.c                          |  2 +-
 gdb/c-valprint.c                      |  2 +-
 gdb/dwarf2/read.c                     | 15 ++----
 gdb/testsuite/gdb.dwarf2/utf-rust.exp | 69 +++++++++++++++++++++++++++
 4 files changed, 75 insertions(+), 13 deletions(-)
 create mode 100644 gdb/testsuite/gdb.dwarf2/utf-rust.exp

diff --git a/gdb/c-lang.c b/gdb/c-lang.c
index 2a7dd4dd194..6c6d1603d46 100644
--- a/gdb/c-lang.c
+++ b/gdb/c-lang.c
@@ -88,7 +88,7 @@ classify_type (struct type *elttype, struct gdbarch *gdbarch,
     {
       const char *name = elttype->name ();
 
-      if (elttype->code () == TYPE_CODE_CHAR || !name)
+      if (name == nullptr)
 	{
 	  result = C_CHAR;
 	  goto done;
diff --git a/gdb/c-valprint.c b/gdb/c-valprint.c
index daf24538f95..feca0a7b227 100644
--- a/gdb/c-valprint.c
+++ b/gdb/c-valprint.c
@@ -438,6 +438,7 @@ c_value_print_inner (struct value *val, struct ui_file *stream, int recurse,
       c_value_print_struct (val, stream, recurse, options);
       break;
 
+    case TYPE_CODE_CHAR:
     case TYPE_CODE_INT:
       c_value_print_int (val, stream, options);
       break;
@@ -458,7 +459,6 @@ c_value_print_inner (struct value *val, struct ui_file *stream, int recurse,
     case TYPE_CODE_ERROR:
     case TYPE_CODE_UNDEF:
     case TYPE_CODE_COMPLEX:
-    case TYPE_CODE_CHAR:
     default:
       generic_value_print (val, stream, recurse, options, &c_decorations);
       break;
diff --git a/gdb/dwarf2/read.c b/gdb/dwarf2/read.c
index 48fb55c308c..ae56724e44b 100644
--- a/gdb/dwarf2/read.c
+++ b/gdb/dwarf2/read.c
@@ -18256,16 +18256,7 @@ read_base_type (struct die_info *die, struct dwarf2_cu *cu)
 	break;
       case DW_ATE_UTF:
 	{
-	  if (bits == 16)
-	    type = builtin_type (arch)->builtin_char16;
-	  else if (bits == 32)
-	    type = builtin_type (arch)->builtin_char32;
-	  else
-	    {
-	      complaint (_("unsupported DW_ATE_UTF bit size: '%d'"),
-			 bits);
-	      type = dwarf2_init_integer_type (cu, objfile, bits, 1, name);
-	    }
+	  type = init_character_type (objfile, bits, 1, name);
 	  return set_die_type (die, type, cu);
 	}
 	break;
@@ -18285,7 +18276,9 @@ read_base_type (struct die_info *die, struct dwarf2_cu *cu)
 	break;
     }
 
-  if (name && strcmp (name, "char") == 0)
+  if (type->code () == TYPE_CODE_INT
+      && name != nullptr
+      && strcmp (name, "char") == 0)
     type->set_has_no_signedness (true);
 
   maybe_set_alignment (cu, die, type);
diff --git a/gdb/testsuite/gdb.dwarf2/utf-rust.exp b/gdb/testsuite/gdb.dwarf2/utf-rust.exp
new file mode 100644
index 00000000000..3a2d944dd6e
--- /dev/null
+++ b/gdb/testsuite/gdb.dwarf2/utf-rust.exp
@@ -0,0 +1,69 @@
+# Copyright 2021 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+# Test DW_ATE_UTF for Rust.
+
+load_lib dwarf.exp
+
+# This test can only be run on targets which support DWARF-2 and use
+# gas.
+if {![dwarf2_support]} {
+    return 0
+}
+
+standard_testfile main.c .S
+
+# Make some DWARF for the test.
+set asm_file [standard_output_file $srcfile2]
+Dwarf::assemble $asm_file {
+    upvar cu_lang cu_lang
+
+    declare_labels char_label
+
+    # Creating a CU with 4-byte addresses lets this test link on
+    # both 32- and 64-bit machines.
+    cu { addr_size 4 } {
+	compile_unit {
+	    {name file1.txt}
+	    {language @DW_LANG_Rust}
+	} {
+            char_label: DW_TAG_base_type {
+                {DW_AT_byte_size 4 DW_FORM_sdata}
+                {DW_AT_encoding @DW_ATE_UTF}
+                {DW_AT_name char}
+            }
+
+	    DW_TAG_variable {
+		{name cvalue}
+		{type :$char_label}
+		{const_value 97 DW_FORM_udata}
+	    }
+	}
+    }
+}
+
+if {[prepare_for_testing "failed to prepare" ${testfile} \
+	 [list $srcfile $asm_file] debug]} {
+    return -1
+}
+
+if {![runto main]} {
+    return -1
+}
+
+gdb_test "set language rust" \
+    "Warning: the current language does not match this frame."
+# Get the values into history so we can use it from Rust.
+gdb_test "print cvalue" "\\\$1 = 97 'a'"
-- 
2.31.1


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] Allow DW_ATE_UTF for Rust characters
  2021-10-31 17:17 [PATCH] Allow DW_ATE_UTF for Rust characters Tom Tromey
@ 2021-11-10 19:57 ` Tom Tromey
  2021-11-21 13:33 ` Joel Brobecker
  1 sibling, 0 replies; 5+ messages in thread
From: Tom Tromey @ 2021-11-10 19:57 UTC (permalink / raw)
  To: Tom Tromey; +Cc: gdb-patches

>>>>> "Tom" == Tom Tromey <tom@tromey.com> writes:

Tom> The Rust compiler plans to change the encoding of a Rust 'char' type
Tom> to use DW_ATE_UTF.  You can see the discussion here:

Tom>     https://github.com/rust-lang/rust/pull/89887

[...]

I plan to land this relatively soon, but I wanted to note that I would
also like to backport this to gdb 11.  The Rust compiler developers are
discussing whether to land this change to the compiler, and I think it
would be helpful if a fix were in the next gdb release.  I will file a
bugzilla ticket for this, per protocol; please take a look at the patch
and let me know what you think.

Tom

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] Allow DW_ATE_UTF for Rust characters
  2021-10-31 17:17 [PATCH] Allow DW_ATE_UTF for Rust characters Tom Tromey
  2021-11-10 19:57 ` Tom Tromey
@ 2021-11-21 13:33 ` Joel Brobecker
  2021-11-29 18:04   ` Tom Tromey
  1 sibling, 1 reply; 5+ messages in thread
From: Joel Brobecker @ 2021-11-21 13:33 UTC (permalink / raw)
  To: Tom Tromey; +Cc: gdb-patches, Joel Brobecker

Hi Tom,

On Sun, Oct 31, 2021 at 11:17:44AM -0600, Tom Tromey wrote:
> The Rust compiler plans to change the encoding of a Rust 'char' type
> to use DW_ATE_UTF.  You can see the discussion here:
> 
>     https://github.com/rust-lang/rust/pull/89887
> 
> However, this fails in gdb.  I looked into this, and it turns out that
> the handling of DW_ATE_UTF is currently fairly specific to C++.  In
> particular, the code here assumes the C++ type names, and it creates
> an integer type.
> 
> This comes from commit 53e710acd ("GDB thinks char16_t and char32_t
> are signed in C++").  The message says:
> 
>     Both places need fixing.  But since I couldn't tell why dwarf2read.c
>     needs to create a new type, I've made it use the per-arch built-in
>     types instead, so that the types are only created once per arch
>     instead of once per objfile.  That seems to work fine.
> 
> ... which is fine, but it seems to me that it's also correct to make a
> new character type; and this approach is better because it preserves
> the type name as well.  This does use more memory, but first we
> shouldn't be too concerned about the memory use of types coming from
> debuginfo; and second, if we are, we should implement type interning
> anyway.
> 
> Changing this code to use a character type revealed a couple of
> oddities in the C/C++ handling of TYPE_CODE_CHAR.  This patch fixes
> these as well.
> ---
>  gdb/c-lang.c                          |  2 +-
>  gdb/c-valprint.c                      |  2 +-
>  gdb/dwarf2/read.c                     | 15 ++----
>  gdb/testsuite/gdb.dwarf2/utf-rust.exp | 69 +++++++++++++++++++++++++++
>  4 files changed, 75 insertions(+), 13 deletions(-)
>  create mode 100644 gdb/testsuite/gdb.dwarf2/utf-rust.exp

I don't see any real problem with this patch, and in particular
the change in dwarf2/read.c looks good to me. But I couldn't
really grasp what the oddities you are referring to were, and
so I don't understand the changes in c-lang.c and c-valprint.c
Can you tell us more about them?

No problem for backporting this patch to gdb-11-branch once
all the changes are explained. But do remember to create a PR
for it. I looked at the bugzilla database, and couldn't find
one that matches.

A small question about your test as well:

> diff --git a/gdb/c-lang.c b/gdb/c-lang.c
> index 2a7dd4dd194..6c6d1603d46 100644
> --- a/gdb/c-lang.c
> +++ b/gdb/c-lang.c
> @@ -88,7 +88,7 @@ classify_type (struct type *elttype, struct gdbarch *gdbarch,
>      {
>        const char *name = elttype->name ();
>  
> -      if (elttype->code () == TYPE_CODE_CHAR || !name)
> +      if (name == nullptr)
>  	{
>  	  result = C_CHAR;
>  	  goto done;
> diff --git a/gdb/c-valprint.c b/gdb/c-valprint.c
> index daf24538f95..feca0a7b227 100644
> --- a/gdb/c-valprint.c
> +++ b/gdb/c-valprint.c
> @@ -438,6 +438,7 @@ c_value_print_inner (struct value *val, struct ui_file *stream, int recurse,
>        c_value_print_struct (val, stream, recurse, options);
>        break;
>  
> +    case TYPE_CODE_CHAR:
>      case TYPE_CODE_INT:
>        c_value_print_int (val, stream, options);
>        break;
> @@ -458,7 +459,6 @@ c_value_print_inner (struct value *val, struct ui_file *stream, int recurse,
>      case TYPE_CODE_ERROR:
>      case TYPE_CODE_UNDEF:
>      case TYPE_CODE_COMPLEX:
> -    case TYPE_CODE_CHAR:
>      default:
>        generic_value_print (val, stream, recurse, options, &c_decorations);
>        break;
> diff --git a/gdb/dwarf2/read.c b/gdb/dwarf2/read.c
> index 48fb55c308c..ae56724e44b 100644
> --- a/gdb/dwarf2/read.c
> +++ b/gdb/dwarf2/read.c
> @@ -18256,16 +18256,7 @@ read_base_type (struct die_info *die, struct dwarf2_cu *cu)
>  	break;
>        case DW_ATE_UTF:
>  	{
> -	  if (bits == 16)
> -	    type = builtin_type (arch)->builtin_char16;
> -	  else if (bits == 32)
> -	    type = builtin_type (arch)->builtin_char32;
> -	  else
> -	    {
> -	      complaint (_("unsupported DW_ATE_UTF bit size: '%d'"),
> -			 bits);
> -	      type = dwarf2_init_integer_type (cu, objfile, bits, 1, name);
> -	    }
> +	  type = init_character_type (objfile, bits, 1, name);
>  	  return set_die_type (die, type, cu);
>  	}
>  	break;
> @@ -18285,7 +18276,9 @@ read_base_type (struct die_info *die, struct dwarf2_cu *cu)
>  	break;
>      }
>  
> -  if (name && strcmp (name, "char") == 0)
> +  if (type->code () == TYPE_CODE_INT
> +      && name != nullptr
> +      && strcmp (name, "char") == 0)
>      type->set_has_no_signedness (true);
>  
>    maybe_set_alignment (cu, die, type);
> diff --git a/gdb/testsuite/gdb.dwarf2/utf-rust.exp b/gdb/testsuite/gdb.dwarf2/utf-rust.exp
> new file mode 100644
> index 00000000000..3a2d944dd6e
> --- /dev/null
> +++ b/gdb/testsuite/gdb.dwarf2/utf-rust.exp
> @@ -0,0 +1,69 @@
> +# Copyright 2021 Free Software Foundation, Inc.
> +
> +# This program is free software; you can redistribute it and/or modify
> +# it under the terms of the GNU General Public License as published by
> +# the Free Software Foundation; either version 3 of the License, or
> +# (at your option) any later version.
> +#
> +# This program is distributed in the hope that it will be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +# GNU General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with this program.  If not, see <http://www.gnu.org/licenses/>.
> +
> +# Test DW_ATE_UTF for Rust.
> +
> +load_lib dwarf.exp
> +
> +# This test can only be run on targets which support DWARF-2 and use
> +# gas.
> +if {![dwarf2_support]} {
> +    return 0
> +}
> +
> +standard_testfile main.c .S
> +
> +# Make some DWARF for the test.
> +set asm_file [standard_output_file $srcfile2]
> +Dwarf::assemble $asm_file {
> +    upvar cu_lang cu_lang
> +
> +    declare_labels char_label
> +
> +    # Creating a CU with 4-byte addresses lets this test link on
> +    # both 32- and 64-bit machines.
> +    cu { addr_size 4 } {
> +	compile_unit {
> +	    {name file1.txt}
> +	    {language @DW_LANG_Rust}
> +	} {
> +            char_label: DW_TAG_base_type {
> +                {DW_AT_byte_size 4 DW_FORM_sdata}
> +                {DW_AT_encoding @DW_ATE_UTF}
> +                {DW_AT_name char}
> +            }
> +
> +	    DW_TAG_variable {
> +		{name cvalue}
> +		{type :$char_label}
> +		{const_value 97 DW_FORM_udata}

I'm wondering if there might be thinko here or not (not sure
I interpret the description correctly): In the base type DIE,
we say DW_FORM_sdata, but then for the value, we say
DW_FORM_udata. Should these two match, are they separate
entities?

> +	    }
> +	}
> +    }
> +}
> +
> +if {[prepare_for_testing "failed to prepare" ${testfile} \
> +	 [list $srcfile $asm_file] debug]} {
> +    return -1
> +}
> +
> +if {![runto main]} {
> +    return -1
> +}
> +
> +gdb_test "set language rust" \
> +    "Warning: the current language does not match this frame."
> +# Get the values into history so we can use it from Rust.
> +gdb_test "print cvalue" "\\\$1 = 97 'a'"
> -- 
> 2.31.1
> 

-- 
Joel

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] Allow DW_ATE_UTF for Rust characters
  2021-11-21 13:33 ` Joel Brobecker
@ 2021-11-29 18:04   ` Tom Tromey
  2021-12-04 11:23     ` Joel Brobecker
  0 siblings, 1 reply; 5+ messages in thread
From: Tom Tromey @ 2021-11-29 18:04 UTC (permalink / raw)
  To: Joel Brobecker; +Cc: Tom Tromey, gdb-patches

>> Changing this code to use a character type revealed a couple of
>> oddities in the C/C++ handling of TYPE_CODE_CHAR.  This patch fixes
>> these as well.

Joel> I don't see any real problem with this patch, and in particular
Joel> the change in dwarf2/read.c looks good to me. But I couldn't
Joel> really grasp what the oddities you are referring to were, and
Joel> so I don't understand the changes in c-lang.c and c-valprint.c
Joel> Can you tell us more about them?

Sure thing, will do so inline, below.

>> @@ -88,7 +88,7 @@ classify_type (struct type *elttype, struct gdbarch *gdbarch,
>> {
>> const char *name = elttype->name ();
>> 
>> -      if (elttype->code () == TYPE_CODE_CHAR || !name)
>> +      if (name == nullptr)
>> {

Here the code assumes that any TYPE_CODE_CHAR type that reaches this
point must be the C "char" type.  That seems unwarranted, and also
causes a test failure if it is left out of the patch, because all
DW_ATE_UTF types will now end up as TYPE_CODE_CHAR.  Previously they
used TYPE_CODE_INT because they used one of these:

  builtin_type->builtin_char16
    = arch_integer_type (gdbarch, 16, 1, "char16_t");
  builtin_type->builtin_char32
    = arch_integer_type (gdbarch, 32, 1, "char32_t");

>> @@ -438,6 +438,7 @@ c_value_print_inner (struct value *val, struct ui_file *stream, int recurse,
>> c_value_print_struct (val, stream, recurse, options);
>> break;
>> 
>> +    case TYPE_CODE_CHAR:
>> case TYPE_CODE_INT:
>> c_value_print_int (val, stream, options);

This change just arranges for TYPE_CODE_CHAR to use the C-specific print
function (which already handles C character types).  Without this, the
change to use TYPE_CODE_CHAR would cause the code to use the generic
printing functions, which IIRC don't use the C charset decoding code.

>> +            char_label: DW_TAG_base_type {
>> +                {DW_AT_byte_size 4 DW_FORM_sdata}
>> +                {DW_AT_encoding @DW_ATE_UTF}
>> +                {DW_AT_name char}
>> +            }
>> +
>> +	    DW_TAG_variable {
>> +		{name cvalue}
>> +		{type :$char_label}
>> +		{const_value 97 DW_FORM_udata}

Joel> I'm wondering if there might be thinko here or not (not sure
Joel> I interpret the description correctly): In the base type DIE,
Joel> we say DW_FORM_sdata, but then for the value, we say
Joel> DW_FORM_udata. Should these two match, are they separate
Joel> entities?

They are separate.  The DW_FORM_sdata just describes the form of the
size value for the base type.  Either sdata or udata would be fine
there; this doesn't affect the signed-ness of the type itself, which is
what matters for the value of the variable.

Tom

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] Allow DW_ATE_UTF for Rust characters
  2021-11-29 18:04   ` Tom Tromey
@ 2021-12-04 11:23     ` Joel Brobecker
  0 siblings, 0 replies; 5+ messages in thread
From: Joel Brobecker @ 2021-12-04 11:23 UTC (permalink / raw)
  To: Tom Tromey; +Cc: Joel Brobecker, gdb-patches

Hi Tom,

> >> Changing this code to use a character type revealed a couple of
> >> oddities in the C/C++ handling of TYPE_CODE_CHAR.  This patch fixes
> >> these as well.
> 
> Joel> I don't see any real problem with this patch, and in particular
> Joel> the change in dwarf2/read.c looks good to me. But I couldn't
> Joel> really grasp what the oddities you are referring to were, and
> Joel> so I don't understand the changes in c-lang.c and c-valprint.c
> Joel> Can you tell us more about them?
> 
> Sure thing, will do so inline, below.

Thanks for the extra explanations. I understand the patch, now, and
it looks to me.

-- 
Joel

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-12-04 11:23 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-31 17:17 [PATCH] Allow DW_ATE_UTF for Rust characters Tom Tromey
2021-11-10 19:57 ` Tom Tromey
2021-11-21 13:33 ` Joel Brobecker
2021-11-29 18:04   ` Tom Tromey
2021-12-04 11:23     ` Joel Brobecker

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).