From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from gproxy1-pub.mail.unifiedlayer.com (gproxy1-pub.mail.unifiedlayer.com [69.89.25.95]) by sourceware.org (Postfix) with ESMTPS id E361B3860754 for ; Sun, 16 Oct 2022 01:50:50 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org E361B3860754 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=tromey.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=tromey.com Received: from cmgw13.mail.unifiedlayer.com (unknown [10.0.90.128]) by progateway3.mail.pro1.eigbox.com (Postfix) with ESMTP id F153C10046B81 for ; Sun, 16 Oct 2022 01:50:37 +0000 (UTC) Received: from box5379.bluehost.com ([162.241.216.53]) by cmsmtp with ESMTP id jsnVoAvFltK4cjsnVoSLXJ; Sun, 16 Oct 2022 01:50:37 +0000 X-Authority-Reason: nr=8 X-Authority-Analysis: v=2.4 cv=HYLR8gI8 c=1 sm=1 tr=0 ts=634b636d a=ApxJNpeYhEAb1aAlGBBbmA==:117 a=ApxJNpeYhEAb1aAlGBBbmA==:17 a=dLZJa+xiwSxG16/P+YVxDGlgEgI=:19 a=Qawa6l4ZSaYA:10:nop_rcvd_month_year a=Qbun_eYptAEA:10:endurance_base64_authed_username_1 a=4YvixG3i0qXnx_R6wKQA:9 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=tromey.com; s=default; h=Content-Type:MIME-Version:Message-ID:In-Reply-To:Date:References :Subject:Cc:To:From:Sender:Reply-To:Content-Transfer-Encoding:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=ans7jR/QED1WLMF8pxixHNiihK7947Huev0zt9KkzWk=; b=A7vz2/q5+qC3YBHBpVXwCZiQXm Vx2KFfrBbtKUVIPcV72TZz+gyOUMNZg7tijx8j1IIZbP56FpSQg8CSCk50KgU0SoY0+xq5o1TlMPf PFF30dxYnoQqXeeOZ6/IUihvn; Received: from 71-211-160-49.hlrn.qwest.net ([71.211.160.49]:35932 helo=prentzel) by box5379.bluehost.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.95) (envelope-from ) id 1ojsnU-002QY7-Oo; Sat, 15 Oct 2022 19:50:36 -0600 From: Tom Tromey To: Tom Tromey Cc: Patrick Monnerat , Patrick Monnerat via Gdb-patches Subject: Re: [PATCH] gdb: add UTF16/UTF32 target charsets in phony_iconv References: <20221002140010.106238-1-patrick@monnerat.net> <87k05bs8c5.fsf@tromey.com> <0a978271-3085-8bf3-f5fd-6a0b3f9f3ea2@monnerat.net> <874jwejgbb.fsf@tromey.com> <2f10efe4-1095-b620-ea1c-08cc047c45c4@monnerat.net> <87zge3irph.fsf@tromey.com> X-Attribution: Tom Date: Sat, 15 Oct 2022 19:50:31 -0600 In-Reply-To: <87zge3irph.fsf@tromey.com> (Tom Tromey's message of "Mon, 10 Oct 2022 10:11:38 -0600") Message-ID: <878rlgedug.fsf@tromey.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - box5379.bluehost.com X-AntiAbuse: Original Domain - sourceware.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - tromey.com X-BWhitelist: no X-Source-IP: 71.211.160.49 X-Source-L: No X-Exim-ID: 1ojsnU-002QY7-Oo X-Source: X-Source-Args: X-Source-Dir: X-Source-Sender: 71-211-160-49.hlrn.qwest.net (prentzel) [71.211.160.49]:35932 X-Source-Auth: tom+tromey.com X-Email-Count: 2 X-Source-Cap: ZWx5bnJvYmk7ZWx5bnJvYmk7Ym94NTM3OS5ibHVlaG9zdC5jb20= X-Local-Domain: yes X-Spam-Status: No, score=-3022.7 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, JMQ_SPF_NEUTRAL, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gdb-patches@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gdb-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 16 Oct 2022 01:50:53 -0000 [ UTF-32 as intermediate encoding ] Patrick> I nevertheless don't have any idea what is the amount of work required Patrick> to change this. Tom> I did most of it already. After digging deeper into this, I think it can't work. iswprint can take the locale into account, and without this, we can end up in a situation where the host-wide-char (or in this case utf-32) conversion to the host multibyte charset ends up having to emit unnecessary escapes. For example in wchar.exp, \242 ends up being emitted as '\242\0\0\0', because the \242 can't be converted to ASCII. I think this happens because gdb_iswprint in the utf-32 formulation disagrees with the iconv conversion. However, I think there's a better way to fix all this. It's very simple and offhand I don't know why I didn't think of it before... the use of wchar_t depends on knowing the encoding of wchar_t -- and I think we do know this on mingw, as it's a form of UTF-16. So, maybe all that's needed is a host-is-windows check in gdb_wchar.h, combined with a suitable definition of INTERMEDIATE_ENCODING. Can you try this? Or if you'd prefer, I can send a patch for you to try. thanks, Tom