From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm1-x32e.google.com (mail-wm1-x32e.google.com [IPv6:2a00:1450:4864:20::32e]) by sourceware.org (Postfix) with ESMTPS id EA2073858C3A for ; Sun, 21 Nov 2021 13:33:46 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org EA2073858C3A Received: by mail-wm1-x32e.google.com with SMTP id b184-20020a1c1bc1000000b0033140bf8dd5so11342816wmb.5 for ; Sun, 21 Nov 2021 05:33:46 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=JjmrpdwzpxrRIoCZjGBdgVnnrmYQ+6M+/VnW6ePpyJM=; b=tdWwr3UsxNsTaOWDDswjekfJ10DAzM7kSfRB11nmsDxDYHQ5g0cHC31SgtSr0rW5gI J9tNRRO/9eafBFaSaCQD2OhP9XQQIY7UJriNLw7u89tWbD8V/D33X74/Y1WdMhrEq2fA hmq3JmSC6jzcd1wK7SgjYmF0BRBcLcP1MbC2dwN9ngTeB7L5Z4lJIDrI4xS+YlQgbhS6 lWeF9R+mIF6oU/a5u+5Yn4ALUOMirydYvtmMXQCKkCjhERON8QhALsps+bi5agcJWiTu wDj2ZBHT/tCh7EauoXx//PwTrqJ9LQA6anR52lOAaSgbkiYNypX+NMFtUr2yEDXo06I/ 3N8A== X-Gm-Message-State: AOAM531zkj4U5CqNe/Yh/mOLjQ4gVmjER5h3Bc7tldK6QrISxBjMlKOB xAZdZR4vBpGFspF4z5tGy5ppTlDpB4lY X-Google-Smtp-Source: ABdhPJzFu/Lw74A3CtYPyoF6lMYxc5wYhhk2zD34C8RHVrPyXa/vE/kLbkweIiAKHrPmjGOJmDkB3Q== X-Received: by 2002:a1c:c917:: with SMTP id f23mr21324521wmb.10.1637501625614; Sun, 21 Nov 2021 05:33:45 -0800 (PST) Received: from takamaka.home (lfbn-reu-1-483-157.w92-130.abo.wanadoo.fr. [92.130.72.157]) by smtp.gmail.com with ESMTPSA id ay21sm18669350wmb.7.2021.11.21.05.33.44 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 21 Nov 2021 05:33:44 -0800 (PST) Received: by takamaka.home (Postfix, from userid 1000) id 44977A15C7; Sun, 21 Nov 2021 17:33:42 +0400 (+04) Date: Sun, 21 Nov 2021 17:33:42 +0400 From: Joel Brobecker To: Tom Tromey Cc: gdb-patches@sourceware.org, Joel Brobecker Subject: Re: [PATCH] Allow DW_ATE_UTF for Rust characters Message-ID: References: <20211031171744.1746609-1-tom@tromey.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20211031171744.1746609-1-tom@tromey.com> X-Spam-Status: No, score=-10.6 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gdb-patches@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gdb-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 21 Nov 2021 13:33:53 -0000 Hi Tom, On Sun, Oct 31, 2021 at 11:17:44AM -0600, Tom Tromey wrote: > The Rust compiler plans to change the encoding of a Rust 'char' type > to use DW_ATE_UTF. You can see the discussion here: > > https://github.com/rust-lang/rust/pull/89887 > > However, this fails in gdb. I looked into this, and it turns out that > the handling of DW_ATE_UTF is currently fairly specific to C++. In > particular, the code here assumes the C++ type names, and it creates > an integer type. > > This comes from commit 53e710acd ("GDB thinks char16_t and char32_t > are signed in C++"). The message says: > > Both places need fixing. But since I couldn't tell why dwarf2read.c > needs to create a new type, I've made it use the per-arch built-in > types instead, so that the types are only created once per arch > instead of once per objfile. That seems to work fine. > > ... which is fine, but it seems to me that it's also correct to make a > new character type; and this approach is better because it preserves > the type name as well. This does use more memory, but first we > shouldn't be too concerned about the memory use of types coming from > debuginfo; and second, if we are, we should implement type interning > anyway. > > Changing this code to use a character type revealed a couple of > oddities in the C/C++ handling of TYPE_CODE_CHAR. This patch fixes > these as well. > --- > gdb/c-lang.c | 2 +- > gdb/c-valprint.c | 2 +- > gdb/dwarf2/read.c | 15 ++---- > gdb/testsuite/gdb.dwarf2/utf-rust.exp | 69 +++++++++++++++++++++++++++ > 4 files changed, 75 insertions(+), 13 deletions(-) > create mode 100644 gdb/testsuite/gdb.dwarf2/utf-rust.exp I don't see any real problem with this patch, and in particular the change in dwarf2/read.c looks good to me. But I couldn't really grasp what the oddities you are referring to were, and so I don't understand the changes in c-lang.c and c-valprint.c Can you tell us more about them? No problem for backporting this patch to gdb-11-branch once all the changes are explained. But do remember to create a PR for it. I looked at the bugzilla database, and couldn't find one that matches. A small question about your test as well: > diff --git a/gdb/c-lang.c b/gdb/c-lang.c > index 2a7dd4dd194..6c6d1603d46 100644 > --- a/gdb/c-lang.c > +++ b/gdb/c-lang.c > @@ -88,7 +88,7 @@ classify_type (struct type *elttype, struct gdbarch *gdbarch, > { > const char *name = elttype->name (); > > - if (elttype->code () == TYPE_CODE_CHAR || !name) > + if (name == nullptr) > { > result = C_CHAR; > goto done; > diff --git a/gdb/c-valprint.c b/gdb/c-valprint.c > index daf24538f95..feca0a7b227 100644 > --- a/gdb/c-valprint.c > +++ b/gdb/c-valprint.c > @@ -438,6 +438,7 @@ c_value_print_inner (struct value *val, struct ui_file *stream, int recurse, > c_value_print_struct (val, stream, recurse, options); > break; > > + case TYPE_CODE_CHAR: > case TYPE_CODE_INT: > c_value_print_int (val, stream, options); > break; > @@ -458,7 +459,6 @@ c_value_print_inner (struct value *val, struct ui_file *stream, int recurse, > case TYPE_CODE_ERROR: > case TYPE_CODE_UNDEF: > case TYPE_CODE_COMPLEX: > - case TYPE_CODE_CHAR: > default: > generic_value_print (val, stream, recurse, options, &c_decorations); > break; > diff --git a/gdb/dwarf2/read.c b/gdb/dwarf2/read.c > index 48fb55c308c..ae56724e44b 100644 > --- a/gdb/dwarf2/read.c > +++ b/gdb/dwarf2/read.c > @@ -18256,16 +18256,7 @@ read_base_type (struct die_info *die, struct dwarf2_cu *cu) > break; > case DW_ATE_UTF: > { > - if (bits == 16) > - type = builtin_type (arch)->builtin_char16; > - else if (bits == 32) > - type = builtin_type (arch)->builtin_char32; > - else > - { > - complaint (_("unsupported DW_ATE_UTF bit size: '%d'"), > - bits); > - type = dwarf2_init_integer_type (cu, objfile, bits, 1, name); > - } > + type = init_character_type (objfile, bits, 1, name); > return set_die_type (die, type, cu); > } > break; > @@ -18285,7 +18276,9 @@ read_base_type (struct die_info *die, struct dwarf2_cu *cu) > break; > } > > - if (name && strcmp (name, "char") == 0) > + if (type->code () == TYPE_CODE_INT > + && name != nullptr > + && strcmp (name, "char") == 0) > type->set_has_no_signedness (true); > > maybe_set_alignment (cu, die, type); > diff --git a/gdb/testsuite/gdb.dwarf2/utf-rust.exp b/gdb/testsuite/gdb.dwarf2/utf-rust.exp > new file mode 100644 > index 00000000000..3a2d944dd6e > --- /dev/null > +++ b/gdb/testsuite/gdb.dwarf2/utf-rust.exp > @@ -0,0 +1,69 @@ > +# Copyright 2021 Free Software Foundation, Inc. > + > +# This program is free software; you can redistribute it and/or modify > +# it under the terms of the GNU General Public License as published by > +# the Free Software Foundation; either version 3 of the License, or > +# (at your option) any later version. > +# > +# This program is distributed in the hope that it will be useful, > +# but WITHOUT ANY WARRANTY; without even the implied warranty of > +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > +# GNU General Public License for more details. > +# > +# You should have received a copy of the GNU General Public License > +# along with this program. If not, see . > + > +# Test DW_ATE_UTF for Rust. > + > +load_lib dwarf.exp > + > +# This test can only be run on targets which support DWARF-2 and use > +# gas. > +if {![dwarf2_support]} { > + return 0 > +} > + > +standard_testfile main.c .S > + > +# Make some DWARF for the test. > +set asm_file [standard_output_file $srcfile2] > +Dwarf::assemble $asm_file { > + upvar cu_lang cu_lang > + > + declare_labels char_label > + > + # Creating a CU with 4-byte addresses lets this test link on > + # both 32- and 64-bit machines. > + cu { addr_size 4 } { > + compile_unit { > + {name file1.txt} > + {language @DW_LANG_Rust} > + } { > + char_label: DW_TAG_base_type { > + {DW_AT_byte_size 4 DW_FORM_sdata} > + {DW_AT_encoding @DW_ATE_UTF} > + {DW_AT_name char} > + } > + > + DW_TAG_variable { > + {name cvalue} > + {type :$char_label} > + {const_value 97 DW_FORM_udata} I'm wondering if there might be thinko here or not (not sure I interpret the description correctly): In the base type DIE, we say DW_FORM_sdata, but then for the value, we say DW_FORM_udata. Should these two match, are they separate entities? > + } > + } > + } > +} > + > +if {[prepare_for_testing "failed to prepare" ${testfile} \ > + [list $srcfile $asm_file] debug]} { > + return -1 > +} > + > +if {![runto main]} { > + return -1 > +} > + > +gdb_test "set language rust" \ > + "Warning: the current language does not match this frame." > +# Get the values into history so we can use it from Rust. > +gdb_test "print cvalue" "\\\$1 = 97 'a'" > -- > 2.31.1 > -- Joel