From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from gateway31.websitewelcome.com (gateway31.websitewelcome.com [192.185.143.40]) by sourceware.org (Postfix) with ESMTPS id 7B9603858422 for ; Sun, 31 Oct 2021 17:17:50 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 7B9603858422 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=tromey.com Authentication-Results: sourceware.org; spf=fail smtp.mailfrom=tromey.com Received: from cm14.websitewelcome.com (cm14.websitewelcome.com [100.42.49.7]) by gateway31.websitewelcome.com (Postfix) with ESMTP id 05AEF2C334 for ; Sun, 31 Oct 2021 12:17:50 -0500 (CDT) Received: from box5379.bluehost.com ([162.241.216.53]) by cmsmtp with SMTP id hESrmqeSfIWzGhESrmz7Uc; Sun, 31 Oct 2021 12:17:50 -0500 X-Authority-Reason: nr=8 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=tromey.com; s=default; h=Content-Transfer-Encoding:MIME-Version:Message-Id:Date:Subject: Cc:To:From:Sender:Reply-To:Content-Type:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: In-Reply-To:References:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=QUlKiZdvJ8Wb+Z7dDDf7ZCUGgwERU6NgZ4l79KiqdT4=; b=eM4bAZxy9KLiBb4rn55gMosQ+4 HBh1ciOSL1/rhG3kQJoFvZN0eld95Uv9BddBYS0GhblZ8BKWnr7FWmLXCMZiER7X/KMCxey4P+9ya yM08rOTWljSkAfPtLSKqEwwpg; Received: from 75-166-134-234.hlrn.qwest.net ([75.166.134.234]:51952 helo=localhost.localdomain) by box5379.bluehost.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1mhESr-002fPb-P8; Sun, 31 Oct 2021 11:17:49 -0600 From: Tom Tromey To: gdb-patches@sourceware.org Cc: Tom Tromey Subject: [PATCH] Allow DW_ATE_UTF for Rust characters Date: Sun, 31 Oct 2021 11:17:44 -0600 Message-Id: <20211031171744.1746609-1-tom@tromey.com> X-Mailer: git-send-email 2.31.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - box5379.bluehost.com X-AntiAbuse: Original Domain - sourceware.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - tromey.com X-BWhitelist: no X-Source-IP: 75.166.134.234 X-Source-L: No X-Exim-ID: 1mhESr-002fPb-P8 X-Source: X-Source-Args: X-Source-Dir: X-Source-Sender: 75-166-134-234.hlrn.qwest.net (localhost.localdomain) [75.166.134.234]:51952 X-Source-Auth: tom+tromey.com X-Email-Count: 1 X-Source-Cap: ZWx5bnJvYmk7ZWx5bnJvYmk7Ym94NTM3OS5ibHVlaG9zdC5jb20= X-Local-Domain: yes X-Spam-Status: No, score=-3031.6 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, GIT_PATCH_0, JMQ_SPF_NEUTRAL, KAM_SHORT, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_NEUTRAL, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gdb-patches@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gdb-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 31 Oct 2021 17:17:53 -0000 The Rust compiler plans to change the encoding of a Rust 'char' type to use DW_ATE_UTF. You can see the discussion here: https://github.com/rust-lang/rust/pull/89887 However, this fails in gdb. I looked into this, and it turns out that the handling of DW_ATE_UTF is currently fairly specific to C++. In particular, the code here assumes the C++ type names, and it creates an integer type. This comes from commit 53e710acd ("GDB thinks char16_t and char32_t are signed in C++"). The message says: Both places need fixing. But since I couldn't tell why dwarf2read.c needs to create a new type, I've made it use the per-arch built-in types instead, so that the types are only created once per arch instead of once per objfile. That seems to work fine. ... which is fine, but it seems to me that it's also correct to make a new character type; and this approach is better because it preserves the type name as well. This does use more memory, but first we shouldn't be too concerned about the memory use of types coming from debuginfo; and second, if we are, we should implement type interning anyway. Changing this code to use a character type revealed a couple of oddities in the C/C++ handling of TYPE_CODE_CHAR. This patch fixes these as well. --- gdb/c-lang.c | 2 +- gdb/c-valprint.c | 2 +- gdb/dwarf2/read.c | 15 ++---- gdb/testsuite/gdb.dwarf2/utf-rust.exp | 69 +++++++++++++++++++++++++++ 4 files changed, 75 insertions(+), 13 deletions(-) create mode 100644 gdb/testsuite/gdb.dwarf2/utf-rust.exp diff --git a/gdb/c-lang.c b/gdb/c-lang.c index 2a7dd4dd194..6c6d1603d46 100644 --- a/gdb/c-lang.c +++ b/gdb/c-lang.c @@ -88,7 +88,7 @@ classify_type (struct type *elttype, struct gdbarch *gdbarch, { const char *name = elttype->name (); - if (elttype->code () == TYPE_CODE_CHAR || !name) + if (name == nullptr) { result = C_CHAR; goto done; diff --git a/gdb/c-valprint.c b/gdb/c-valprint.c index daf24538f95..feca0a7b227 100644 --- a/gdb/c-valprint.c +++ b/gdb/c-valprint.c @@ -438,6 +438,7 @@ c_value_print_inner (struct value *val, struct ui_file *stream, int recurse, c_value_print_struct (val, stream, recurse, options); break; + case TYPE_CODE_CHAR: case TYPE_CODE_INT: c_value_print_int (val, stream, options); break; @@ -458,7 +459,6 @@ c_value_print_inner (struct value *val, struct ui_file *stream, int recurse, case TYPE_CODE_ERROR: case TYPE_CODE_UNDEF: case TYPE_CODE_COMPLEX: - case TYPE_CODE_CHAR: default: generic_value_print (val, stream, recurse, options, &c_decorations); break; diff --git a/gdb/dwarf2/read.c b/gdb/dwarf2/read.c index 48fb55c308c..ae56724e44b 100644 --- a/gdb/dwarf2/read.c +++ b/gdb/dwarf2/read.c @@ -18256,16 +18256,7 @@ read_base_type (struct die_info *die, struct dwarf2_cu *cu) break; case DW_ATE_UTF: { - if (bits == 16) - type = builtin_type (arch)->builtin_char16; - else if (bits == 32) - type = builtin_type (arch)->builtin_char32; - else - { - complaint (_("unsupported DW_ATE_UTF bit size: '%d'"), - bits); - type = dwarf2_init_integer_type (cu, objfile, bits, 1, name); - } + type = init_character_type (objfile, bits, 1, name); return set_die_type (die, type, cu); } break; @@ -18285,7 +18276,9 @@ read_base_type (struct die_info *die, struct dwarf2_cu *cu) break; } - if (name && strcmp (name, "char") == 0) + if (type->code () == TYPE_CODE_INT + && name != nullptr + && strcmp (name, "char") == 0) type->set_has_no_signedness (true); maybe_set_alignment (cu, die, type); diff --git a/gdb/testsuite/gdb.dwarf2/utf-rust.exp b/gdb/testsuite/gdb.dwarf2/utf-rust.exp new file mode 100644 index 00000000000..3a2d944dd6e --- /dev/null +++ b/gdb/testsuite/gdb.dwarf2/utf-rust.exp @@ -0,0 +1,69 @@ +# Copyright 2021 Free Software Foundation, Inc. + +# This program is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 3 of the License, or +# (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program. If not, see . + +# Test DW_ATE_UTF for Rust. + +load_lib dwarf.exp + +# This test can only be run on targets which support DWARF-2 and use +# gas. +if {![dwarf2_support]} { + return 0 +} + +standard_testfile main.c .S + +# Make some DWARF for the test. +set asm_file [standard_output_file $srcfile2] +Dwarf::assemble $asm_file { + upvar cu_lang cu_lang + + declare_labels char_label + + # Creating a CU with 4-byte addresses lets this test link on + # both 32- and 64-bit machines. + cu { addr_size 4 } { + compile_unit { + {name file1.txt} + {language @DW_LANG_Rust} + } { + char_label: DW_TAG_base_type { + {DW_AT_byte_size 4 DW_FORM_sdata} + {DW_AT_encoding @DW_ATE_UTF} + {DW_AT_name char} + } + + DW_TAG_variable { + {name cvalue} + {type :$char_label} + {const_value 97 DW_FORM_udata} + } + } + } +} + +if {[prepare_for_testing "failed to prepare" ${testfile} \ + [list $srcfile $asm_file] debug]} { + return -1 +} + +if {![runto main]} { + return -1 +} + +gdb_test "set language rust" \ + "Warning: the current language does not match this frame." +# Get the values into history so we can use it from Rust. +gdb_test "print cvalue" "\\\$1 = 97 'a'" -- 2.31.1