Re: [PATCH] Allow DW_ATE_UTF for Rust characters

public inbox for gdb-patches@sourceware.org
 help / color / mirror / Atom feed

From: Joel Brobecker <brobecker@adacore.com>
To: Tom Tromey <tom@tromey.com>
Cc: gdb-patches@sourceware.org, Joel Brobecker <brobecker@adacore.com>
Subject: Re: [PATCH] Allow DW_ATE_UTF for Rust characters
Date: Sun, 21 Nov 2021 17:33:42 +0400	[thread overview]
Message-ID: <YZpKtt3VpiIwkny7@adacore.com> (raw)
In-Reply-To: <20211031171744.1746609-1-tom@tromey.com>

Hi Tom,

On Sun, Oct 31, 2021 at 11:17:44AM -0600, Tom Tromey wrote:
> The Rust compiler plans to change the encoding of a Rust 'char' type
> to use DW_ATE_UTF.  You can see the discussion here:
> 
>     https://github.com/rust-lang/rust/pull/89887
> 
> However, this fails in gdb.  I looked into this, and it turns out that
> the handling of DW_ATE_UTF is currently fairly specific to C++.  In
> particular, the code here assumes the C++ type names, and it creates
> an integer type.
> 
> This comes from commit 53e710acd ("GDB thinks char16_t and char32_t
> are signed in C++").  The message says:
> 
>     Both places need fixing.  But since I couldn't tell why dwarf2read.c
>     needs to create a new type, I've made it use the per-arch built-in
>     types instead, so that the types are only created once per arch
>     instead of once per objfile.  That seems to work fine.
> 
> ... which is fine, but it seems to me that it's also correct to make a
> new character type; and this approach is better because it preserves
> the type name as well.  This does use more memory, but first we
> shouldn't be too concerned about the memory use of types coming from
> debuginfo; and second, if we are, we should implement type interning
> anyway.
> 
> Changing this code to use a character type revealed a couple of
> oddities in the C/C++ handling of TYPE_CODE_CHAR.  This patch fixes
> these as well.
> ---
>  gdb/c-lang.c                          |  2 +-
>  gdb/c-valprint.c                      |  2 +-
>  gdb/dwarf2/read.c                     | 15 ++----
>  gdb/testsuite/gdb.dwarf2/utf-rust.exp | 69 +++++++++++++++++++++++++++
>  4 files changed, 75 insertions(+), 13 deletions(-)
>  create mode 100644 gdb/testsuite/gdb.dwarf2/utf-rust.exp

I don't see any real problem with this patch, and in particular
the change in dwarf2/read.c looks good to me. But I couldn't
really grasp what the oddities you are referring to were, and
so I don't understand the changes in c-lang.c and c-valprint.c
Can you tell us more about them?

No problem for backporting this patch to gdb-11-branch once
all the changes are explained. But do remember to create a PR
for it. I looked at the bugzilla database, and couldn't find
one that matches.

A small question about your test as well:

> diff --git a/gdb/c-lang.c b/gdb/c-lang.c
> index 2a7dd4dd194..6c6d1603d46 100644
> --- a/gdb/c-lang.c
> +++ b/gdb/c-lang.c
> @@ -88,7 +88,7 @@ classify_type (struct type *elttype, struct gdbarch *gdbarch,
>      {
>        const char *name = elttype->name ();
>  
> -      if (elttype->code () == TYPE_CODE_CHAR || !name)
> +      if (name == nullptr)
>  	{
>  	  result = C_CHAR;
>  	  goto done;
> diff --git a/gdb/c-valprint.c b/gdb/c-valprint.c
> index daf24538f95..feca0a7b227 100644
> --- a/gdb/c-valprint.c
> +++ b/gdb/c-valprint.c
> @@ -438,6 +438,7 @@ c_value_print_inner (struct value *val, struct ui_file *stream, int recurse,
>        c_value_print_struct (val, stream, recurse, options);
>        break;
>  
> +    case TYPE_CODE_CHAR:
>      case TYPE_CODE_INT:
>        c_value_print_int (val, stream, options);
>        break;
> @@ -458,7 +459,6 @@ c_value_print_inner (struct value *val, struct ui_file *stream, int recurse,
>      case TYPE_CODE_ERROR:
>      case TYPE_CODE_UNDEF:
>      case TYPE_CODE_COMPLEX:
> -    case TYPE_CODE_CHAR:
>      default:
>        generic_value_print (val, stream, recurse, options, &c_decorations);
>        break;
> diff --git a/gdb/dwarf2/read.c b/gdb/dwarf2/read.c
> index 48fb55c308c..ae56724e44b 100644
> --- a/gdb/dwarf2/read.c
> +++ b/gdb/dwarf2/read.c
> @@ -18256,16 +18256,7 @@ read_base_type (struct die_info *die, struct dwarf2_cu *cu)
>  	break;
>        case DW_ATE_UTF:
>  	{
> -	  if (bits == 16)
> -	    type = builtin_type (arch)->builtin_char16;
> -	  else if (bits == 32)
> -	    type = builtin_type (arch)->builtin_char32;
> -	  else
> -	    {
> -	      complaint (_("unsupported DW_ATE_UTF bit size: '%d'"),
> -			 bits);
> -	      type = dwarf2_init_integer_type (cu, objfile, bits, 1, name);
> -	    }
> +	  type = init_character_type (objfile, bits, 1, name);
>  	  return set_die_type (die, type, cu);
>  	}
>  	break;
> @@ -18285,7 +18276,9 @@ read_base_type (struct die_info *die, struct dwarf2_cu *cu)
>  	break;
>      }
>  
> -  if (name && strcmp (name, "char") == 0)
> +  if (type->code () == TYPE_CODE_INT
> +      && name != nullptr
> +      && strcmp (name, "char") == 0)
>      type->set_has_no_signedness (true);
>  
>    maybe_set_alignment (cu, die, type);
> diff --git a/gdb/testsuite/gdb.dwarf2/utf-rust.exp b/gdb/testsuite/gdb.dwarf2/utf-rust.exp
> new file mode 100644
> index 00000000000..3a2d944dd6e
> --- /dev/null
> +++ b/gdb/testsuite/gdb.dwarf2/utf-rust.exp
> @@ -0,0 +1,69 @@
> +# Copyright 2021 Free Software Foundation, Inc.
> +
> +# This program is free software; you can redistribute it and/or modify
> +# it under the terms of the GNU General Public License as published by
> +# the Free Software Foundation; either version 3 of the License, or
> +# (at your option) any later version.
> +#
> +# This program is distributed in the hope that it will be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +# GNU General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with this program.  If not, see <http://www.gnu.org/licenses/>.
> +
> +# Test DW_ATE_UTF for Rust.
> +
> +load_lib dwarf.exp
> +
> +# This test can only be run on targets which support DWARF-2 and use
> +# gas.
> +if {![dwarf2_support]} {
> +    return 0
> +}
> +
> +standard_testfile main.c .S
> +
> +# Make some DWARF for the test.
> +set asm_file [standard_output_file $srcfile2]
> +Dwarf::assemble $asm_file {
> +    upvar cu_lang cu_lang
> +
> +    declare_labels char_label
> +
> +    # Creating a CU with 4-byte addresses lets this test link on
> +    # both 32- and 64-bit machines.
> +    cu { addr_size 4 } {
> +	compile_unit {
> +	    {name file1.txt}
> +	    {language @DW_LANG_Rust}
> +	} {
> +            char_label: DW_TAG_base_type {
> +                {DW_AT_byte_size 4 DW_FORM_sdata}
> +                {DW_AT_encoding @DW_ATE_UTF}
> +                {DW_AT_name char}
> +            }
> +
> +	    DW_TAG_variable {
> +		{name cvalue}
> +		{type :$char_label}
> +		{const_value 97 DW_FORM_udata}

I'm wondering if there might be thinko here or not (not sure
I interpret the description correctly): In the base type DIE,
we say DW_FORM_sdata, but then for the value, we say
DW_FORM_udata. Should these two match, are they separate
entities?

> +	    }
> +	}
> +    }
> +}
> +
> +if {[prepare_for_testing "failed to prepare" ${testfile} \
> +	 [list $srcfile $asm_file] debug]} {
> +    return -1
> +}
> +
> +if {![runto main]} {
> +    return -1
> +}
> +
> +gdb_test "set language rust" \
> +    "Warning: the current language does not match this frame."
> +# Get the values into history so we can use it from Rust.
> +gdb_test "print cvalue" "\\\$1 = 97 'a'"
> -- 
> 2.31.1
> 

-- 
Joel

next prev parent reply	other threads:[~2021-11-21 13:33 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-31 17:17 Tom Tromey
2021-11-10 19:57 ` Tom Tromey
2021-11-21 13:33 ` Joel Brobecker [this message]
2021-11-29 18:04   ` Tom Tromey
2021-12-04 11:23     ` Joel Brobecker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YZpKtt3VpiIwkny7@adacore.com \
    --to=brobecker@adacore.com \
    --cc=gdb-patches@sourceware.org \
    --cc=tom@tromey.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).