From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from gateway34.websitewelcome.com (gateway34.websitewelcome.com [192.185.149.222]) by sourceware.org (Postfix) with ESMTPS id E14133857C63 for ; Mon, 29 Nov 2021 18:05:39 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org E14133857C63 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=tromey.com Authentication-Results: sourceware.org; spf=fail smtp.mailfrom=tromey.com Received: from cm10.websitewelcome.com (cm10.websitewelcome.com [100.42.49.4]) by gateway34.websitewelcome.com (Postfix) with ESMTP id CCD0C3B90315 for ; Mon, 29 Nov 2021 12:04:12 -0600 (CST) Received: from box5379.bluehost.com ([162.241.216.53]) by cmsmtp with SMTP id rl0emCFuszWe7rl0emfj5S; Mon, 29 Nov 2021 12:04:12 -0600 X-Authority-Reason: nr=8 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=tromey.com; s=default; h=Content-Type:MIME-Version:Message-ID:In-Reply-To:Date:References :Subject:Cc:To:From:Sender:Reply-To:Content-Transfer-Encoding:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=LCRCoAykcESz/mp+TL8j3LKdJ9CwVGNiAFBFvQlfaqc=; b=oxvW5OTKLDl3WXxvzhA+0wj8c3 evZLoLRPAdSYeKEYid8OoZQ/5iSF58usk94lqRta/hssbzYfcLX7i2vvTCi7+FoWlRJLN9dx7IiH5 85RTfZ5SqqsJk+L/Kyu0JvWFJ; Received: from 97-122-84-67.hlrn.qwest.net ([97.122.84.67]:54222 helo=murgatroyd) by box5379.bluehost.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1mrl0e-003QT9-G5; Mon, 29 Nov 2021 11:04:12 -0700 From: Tom Tromey To: Joel Brobecker Cc: Tom Tromey , gdb-patches@sourceware.org Subject: Re: [PATCH] Allow DW_ATE_UTF for Rust characters References: <20211031171744.1746609-1-tom@tromey.com> X-Attribution: Tom Date: Mon, 29 Nov 2021 11:04:11 -0700 In-Reply-To: (Joel Brobecker's message of "Sun, 21 Nov 2021 17:33:42 +0400") Message-ID: <87o86296uc.fsf@tromey.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - box5379.bluehost.com X-AntiAbuse: Original Domain - sourceware.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - tromey.com X-BWhitelist: no X-Source-IP: 97.122.84.67 X-Source-L: No X-Exim-ID: 1mrl0e-003QT9-G5 X-Source: X-Source-Args: X-Source-Dir: X-Source-Sender: 97-122-84-67.hlrn.qwest.net (murgatroyd) [97.122.84.67]:54222 X-Source-Auth: tom+tromey.com X-Email-Count: 2 X-Source-Cap: ZWx5bnJvYmk7ZWx5bnJvYmk7Ym94NTM3OS5ibHVlaG9zdC5jb20= X-Local-Domain: yes X-Spam-Status: No, score=-3024.8 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, JMQ_SPF_NEUTRAL, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_NEUTRAL, TXREP autolearn=no autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gdb-patches@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gdb-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 29 Nov 2021 18:05:42 -0000 >> Changing this code to use a character type revealed a couple of >> oddities in the C/C++ handling of TYPE_CODE_CHAR. This patch fixes >> these as well. Joel> I don't see any real problem with this patch, and in particular Joel> the change in dwarf2/read.c looks good to me. But I couldn't Joel> really grasp what the oddities you are referring to were, and Joel> so I don't understand the changes in c-lang.c and c-valprint.c Joel> Can you tell us more about them? Sure thing, will do so inline, below. >> @@ -88,7 +88,7 @@ classify_type (struct type *elttype, struct gdbarch *gdbarch, >> { >> const char *name = elttype->name (); >> >> - if (elttype->code () == TYPE_CODE_CHAR || !name) >> + if (name == nullptr) >> { Here the code assumes that any TYPE_CODE_CHAR type that reaches this point must be the C "char" type. That seems unwarranted, and also causes a test failure if it is left out of the patch, because all DW_ATE_UTF types will now end up as TYPE_CODE_CHAR. Previously they used TYPE_CODE_INT because they used one of these: builtin_type->builtin_char16 = arch_integer_type (gdbarch, 16, 1, "char16_t"); builtin_type->builtin_char32 = arch_integer_type (gdbarch, 32, 1, "char32_t"); >> @@ -438,6 +438,7 @@ c_value_print_inner (struct value *val, struct ui_file *stream, int recurse, >> c_value_print_struct (val, stream, recurse, options); >> break; >> >> + case TYPE_CODE_CHAR: >> case TYPE_CODE_INT: >> c_value_print_int (val, stream, options); This change just arranges for TYPE_CODE_CHAR to use the C-specific print function (which already handles C character types). Without this, the change to use TYPE_CODE_CHAR would cause the code to use the generic printing functions, which IIRC don't use the C charset decoding code. >> + char_label: DW_TAG_base_type { >> + {DW_AT_byte_size 4 DW_FORM_sdata} >> + {DW_AT_encoding @DW_ATE_UTF} >> + {DW_AT_name char} >> + } >> + >> + DW_TAG_variable { >> + {name cvalue} >> + {type :$char_label} >> + {const_value 97 DW_FORM_udata} Joel> I'm wondering if there might be thinko here or not (not sure Joel> I interpret the description correctly): In the base type DIE, Joel> we say DW_FORM_sdata, but then for the value, we say Joel> DW_FORM_udata. Should these two match, are they separate Joel> entities? They are separate. The DW_FORM_sdata just describes the form of the size value for the base type. Either sdata or udata would be fine there; this doesn't affect the signed-ness of the type itself, which is what matters for the value of the variable. Tom