From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <brobecker@adacore.com>
Received: from mail-wm1-x32e.google.com (mail-wm1-x32e.google.com
 [IPv6:2a00:1450:4864:20::32e])
 by sourceware.org (Postfix) with ESMTPS id EA2073858C3A
 for <gdb-patches@sourceware.org>; Sun, 21 Nov 2021 13:33:46 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org EA2073858C3A
Received: by mail-wm1-x32e.google.com with SMTP id
 b184-20020a1c1bc1000000b0033140bf8dd5so11342816wmb.5
 for <gdb-patches@sourceware.org>; Sun, 21 Nov 2021 05:33:46 -0800 (PST)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20210112;
 h=x-gm-message-state:date:from:to:cc:subject:message-id:references
 :mime-version:content-disposition:in-reply-to;
 bh=JjmrpdwzpxrRIoCZjGBdgVnnrmYQ+6M+/VnW6ePpyJM=;
 b=tdWwr3UsxNsTaOWDDswjekfJ10DAzM7kSfRB11nmsDxDYHQ5g0cHC31SgtSr0rW5gI
 J9tNRRO/9eafBFaSaCQD2OhP9XQQIY7UJriNLw7u89tWbD8V/D33X74/Y1WdMhrEq2fA
 hmq3JmSC6jzcd1wK7SgjYmF0BRBcLcP1MbC2dwN9ngTeB7L5Z4lJIDrI4xS+YlQgbhS6
 lWeF9R+mIF6oU/a5u+5Yn4ALUOMirydYvtmMXQCKkCjhERON8QhALsps+bi5agcJWiTu
 wDj2ZBHT/tCh7EauoXx//PwTrqJ9LQA6anR52lOAaSgbkiYNypX+NMFtUr2yEDXo06I/
 3N8A==
X-Gm-Message-State: AOAM531zkj4U5CqNe/Yh/mOLjQ4gVmjER5h3Bc7tldK6QrISxBjMlKOB
 xAZdZR4vBpGFspF4z5tGy5ppTlDpB4lY
X-Google-Smtp-Source: ABdhPJzFu/Lw74A3CtYPyoF6lMYxc5wYhhk2zD34C8RHVrPyXa/vE/kLbkweIiAKHrPmjGOJmDkB3Q==
X-Received: by 2002:a1c:c917:: with SMTP id f23mr21324521wmb.10.1637501625614; 
 Sun, 21 Nov 2021 05:33:45 -0800 (PST)
Received: from takamaka.home (lfbn-reu-1-483-157.w92-130.abo.wanadoo.fr.
 [92.130.72.157])
 by smtp.gmail.com with ESMTPSA id ay21sm18669350wmb.7.2021.11.21.05.33.44
 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
 Sun, 21 Nov 2021 05:33:44 -0800 (PST)
Received: by takamaka.home (Postfix, from userid 1000)
 id 44977A15C7; Sun, 21 Nov 2021 17:33:42 +0400 (+04)
Date: Sun, 21 Nov 2021 17:33:42 +0400
From: Joel Brobecker <brobecker@adacore.com>
To: Tom Tromey <tom@tromey.com>
Cc: gdb-patches@sourceware.org, Joel Brobecker <brobecker@adacore.com>
Subject: Re: [PATCH] Allow DW_ATE_UTF for Rust characters
Message-ID: <YZpKtt3VpiIwkny7@adacore.com>
References: <20211031171744.1746609-1-tom@tromey.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20211031171744.1746609-1-tom@tromey.com>
X-Spam-Status: No, score=-10.6 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT,
 RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS,
 TXREP autolearn=ham autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on
 server2.sourceware.org
X-BeenThere: gdb-patches@sourceware.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gdb-patches mailing list <gdb-patches.sourceware.org>
List-Unsubscribe: <https://sourceware.org/mailman/options/gdb-patches>,
 <mailto:gdb-patches-request@sourceware.org?subject=unsubscribe>
List-Archive: <https://sourceware.org/pipermail/gdb-patches/>
List-Post: <mailto:gdb-patches@sourceware.org>
List-Help: <mailto:gdb-patches-request@sourceware.org?subject=help>
List-Subscribe: <https://sourceware.org/mailman/listinfo/gdb-patches>,
 <mailto:gdb-patches-request@sourceware.org?subject=subscribe>
X-List-Received-Date: Sun, 21 Nov 2021 13:33:53 -0000

Hi Tom,

On Sun, Oct 31, 2021 at 11:17:44AM -0600, Tom Tromey wrote:
> The Rust compiler plans to change the encoding of a Rust 'char' type
> to use DW_ATE_UTF.  You can see the discussion here:
> 
>     https://github.com/rust-lang/rust/pull/89887
> 
> However, this fails in gdb.  I looked into this, and it turns out that
> the handling of DW_ATE_UTF is currently fairly specific to C++.  In
> particular, the code here assumes the C++ type names, and it creates
> an integer type.
> 
> This comes from commit 53e710acd ("GDB thinks char16_t and char32_t
> are signed in C++").  The message says:
> 
>     Both places need fixing.  But since I couldn't tell why dwarf2read.c
>     needs to create a new type, I've made it use the per-arch built-in
>     types instead, so that the types are only created once per arch
>     instead of once per objfile.  That seems to work fine.
> 
> ... which is fine, but it seems to me that it's also correct to make a
> new character type; and this approach is better because it preserves
> the type name as well.  This does use more memory, but first we
> shouldn't be too concerned about the memory use of types coming from
> debuginfo; and second, if we are, we should implement type interning
> anyway.
> 
> Changing this code to use a character type revealed a couple of
> oddities in the C/C++ handling of TYPE_CODE_CHAR.  This patch fixes
> these as well.
> ---
>  gdb/c-lang.c                          |  2 +-
>  gdb/c-valprint.c                      |  2 +-
>  gdb/dwarf2/read.c                     | 15 ++----
>  gdb/testsuite/gdb.dwarf2/utf-rust.exp | 69 +++++++++++++++++++++++++++
>  4 files changed, 75 insertions(+), 13 deletions(-)
>  create mode 100644 gdb/testsuite/gdb.dwarf2/utf-rust.exp

I don't see any real problem with this patch, and in particular
the change in dwarf2/read.c looks good to me. But I couldn't
really grasp what the oddities you are referring to were, and
so I don't understand the changes in c-lang.c and c-valprint.c
Can you tell us more about them?

No problem for backporting this patch to gdb-11-branch once
all the changes are explained. But do remember to create a PR
for it. I looked at the bugzilla database, and couldn't find
one that matches.

A small question about your test as well:

> diff --git a/gdb/c-lang.c b/gdb/c-lang.c
> index 2a7dd4dd194..6c6d1603d46 100644
> --- a/gdb/c-lang.c
> +++ b/gdb/c-lang.c
> @@ -88,7 +88,7 @@ classify_type (struct type *elttype, struct gdbarch *gdbarch,
>      {
>        const char *name = elttype->name ();
>  
> -      if (elttype->code () == TYPE_CODE_CHAR || !name)
> +      if (name == nullptr)
>  	{
>  	  result = C_CHAR;
>  	  goto done;
> diff --git a/gdb/c-valprint.c b/gdb/c-valprint.c
> index daf24538f95..feca0a7b227 100644
> --- a/gdb/c-valprint.c
> +++ b/gdb/c-valprint.c
> @@ -438,6 +438,7 @@ c_value_print_inner (struct value *val, struct ui_file *stream, int recurse,
>        c_value_print_struct (val, stream, recurse, options);
>        break;
>  
> +    case TYPE_CODE_CHAR:
>      case TYPE_CODE_INT:
>        c_value_print_int (val, stream, options);
>        break;
> @@ -458,7 +459,6 @@ c_value_print_inner (struct value *val, struct ui_file *stream, int recurse,
>      case TYPE_CODE_ERROR:
>      case TYPE_CODE_UNDEF:
>      case TYPE_CODE_COMPLEX:
> -    case TYPE_CODE_CHAR:
>      default:
>        generic_value_print (val, stream, recurse, options, &c_decorations);
>        break;
> diff --git a/gdb/dwarf2/read.c b/gdb/dwarf2/read.c
> index 48fb55c308c..ae56724e44b 100644
> --- a/gdb/dwarf2/read.c
> +++ b/gdb/dwarf2/read.c
> @@ -18256,16 +18256,7 @@ read_base_type (struct die_info *die, struct dwarf2_cu *cu)
>  	break;
>        case DW_ATE_UTF:
>  	{
> -	  if (bits == 16)
> -	    type = builtin_type (arch)->builtin_char16;
> -	  else if (bits == 32)
> -	    type = builtin_type (arch)->builtin_char32;
> -	  else
> -	    {
> -	      complaint (_("unsupported DW_ATE_UTF bit size: '%d'"),
> -			 bits);
> -	      type = dwarf2_init_integer_type (cu, objfile, bits, 1, name);
> -	    }
> +	  type = init_character_type (objfile, bits, 1, name);
>  	  return set_die_type (die, type, cu);
>  	}
>  	break;
> @@ -18285,7 +18276,9 @@ read_base_type (struct die_info *die, struct dwarf2_cu *cu)
>  	break;
>      }
>  
> -  if (name && strcmp (name, "char") == 0)
> +  if (type->code () == TYPE_CODE_INT
> +      && name != nullptr
> +      && strcmp (name, "char") == 0)
>      type->set_has_no_signedness (true);
>  
>    maybe_set_alignment (cu, die, type);
> diff --git a/gdb/testsuite/gdb.dwarf2/utf-rust.exp b/gdb/testsuite/gdb.dwarf2/utf-rust.exp
> new file mode 100644
> index 00000000000..3a2d944dd6e
> --- /dev/null
> +++ b/gdb/testsuite/gdb.dwarf2/utf-rust.exp
> @@ -0,0 +1,69 @@
> +# Copyright 2021 Free Software Foundation, Inc.
> +
> +# This program is free software; you can redistribute it and/or modify
> +# it under the terms of the GNU General Public License as published by
> +# the Free Software Foundation; either version 3 of the License, or
> +# (at your option) any later version.
> +#
> +# This program is distributed in the hope that it will be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +# GNU General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with this program.  If not, see <http://www.gnu.org/licenses/>.
> +
> +# Test DW_ATE_UTF for Rust.
> +
> +load_lib dwarf.exp
> +
> +# This test can only be run on targets which support DWARF-2 and use
> +# gas.
> +if {![dwarf2_support]} {
> +    return 0
> +}
> +
> +standard_testfile main.c .S
> +
> +# Make some DWARF for the test.
> +set asm_file [standard_output_file $srcfile2]
> +Dwarf::assemble $asm_file {
> +    upvar cu_lang cu_lang
> +
> +    declare_labels char_label
> +
> +    # Creating a CU with 4-byte addresses lets this test link on
> +    # both 32- and 64-bit machines.
> +    cu { addr_size 4 } {
> +	compile_unit {
> +	    {name file1.txt}
> +	    {language @DW_LANG_Rust}
> +	} {
> +            char_label: DW_TAG_base_type {
> +                {DW_AT_byte_size 4 DW_FORM_sdata}
> +                {DW_AT_encoding @DW_ATE_UTF}
> +                {DW_AT_name char}
> +            }
> +
> +	    DW_TAG_variable {
> +		{name cvalue}
> +		{type :$char_label}
> +		{const_value 97 DW_FORM_udata}

I'm wondering if there might be thinko here or not (not sure
I interpret the description correctly): In the base type DIE,
we say DW_FORM_sdata, but then for the value, we say
DW_FORM_udata. Should these two match, are they separate
entities?

> +	    }
> +	}
> +    }
> +}
> +
> +if {[prepare_for_testing "failed to prepare" ${testfile} \
> +	 [list $srcfile $asm_file] debug]} {
> +    return -1
> +}
> +
> +if {![runto main]} {
> +    return -1
> +}
> +
> +gdb_test "set language rust" \
> +    "Warning: the current language does not match this frame."
> +# Get the values into history so we can use it from Rust.
> +gdb_test "print cvalue" "\\\$1 = 97 'a'"
> -- 
> 2.31.1
> 

-- 
Joel