From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from gproxy1-pub.mail.unifiedlayer.com (gproxy1-pub.mail.unifiedlayer.com [69.89.25.95]) by sourceware.org (Postfix) with ESMTPS id 30E113858C83 for ; Mon, 17 Oct 2022 23:11:18 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 30E113858C83 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=tromey.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=tromey.com Received: from cmgw15.mail.unifiedlayer.com (unknown [10.0.90.130]) by progateway3.mail.pro1.eigbox.com (Postfix) with ESMTP id B2D0D100429C4 for ; Mon, 17 Oct 2022 23:11:04 +0000 (UTC) Received: from box5379.bluehost.com ([162.241.216.53]) by cmsmtp with ESMTP id kZGBocGAjjpD5kZGCo1yDt; Mon, 17 Oct 2022 23:11:04 +0000 X-Authority-Reason: nr=8 X-Authority-Analysis: v=2.4 cv=T8NJ89GQ c=1 sm=1 tr=0 ts=634de108 a=ApxJNpeYhEAb1aAlGBBbmA==:117 a=ApxJNpeYhEAb1aAlGBBbmA==:17 a=dLZJa+xiwSxG16/P+YVxDGlgEgI=:19 a=Qawa6l4ZSaYA:10:nop_rcvd_month_year a=Qbun_eYptAEA:10:endurance_base64_authed_username_1 a=7kt2vhRwoMaHoalVdAIA:9 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=tromey.com; s=default; h=Content-Type:MIME-Version:Message-ID:In-Reply-To:Date:References :Subject:Cc:To:From:Sender:Reply-To:Content-Transfer-Encoding:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=XShdLH+Y6hzONIPI44mzPeLcaE99LPkJXwhpE8UthUA=; b=x4iWepIqqxTLZUjyyoLHY8ex0J NTejNo0CtVweZpl0P0c9aXA/Fqfob/CJfKv40UjQBjy6xk/lyUMSA4QkomiaSnoNJ0E0ErebitbPB /NBaxMsXPYw/lxP0ED5cHLkq8; Received: from [161.98.8.3] (port=53226 helo=prentzel) by box5379.bluehost.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.95) (envelope-from ) id 1okZGB-003RFv-AG; Mon, 17 Oct 2022 17:11:03 -0600 From: Tom Tromey To: Eli Zaretskii Cc: Tom Tromey , gdb-patches@sourceware.org Subject: Re: [PATCH] gdb: add UTF16/UTF32 target charsets in phony_iconv References: <20221002140010.106238-1-patrick@monnerat.net> <87k05bs8c5.fsf@tromey.com> <0a978271-3085-8bf3-f5fd-6a0b3f9f3ea2@monnerat.net> <874jwejgbb.fsf@tromey.com> <2f10efe4-1095-b620-ea1c-08cc047c45c4@monnerat.net> <87zge3irph.fsf@tromey.com> <878rlgedug.fsf@tromey.com> <837d105lr4.fsf@gnu.org> X-Attribution: Tom Date: Mon, 17 Oct 2022 17:10:55 -0600 In-Reply-To: <837d105lr4.fsf@gnu.org> (Eli Zaretskii's message of "Sun, 16 Oct 2022 09:24:31 +0300") Message-ID: <87v8oicagw.fsf@tromey.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - box5379.bluehost.com X-AntiAbuse: Original Domain - sourceware.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - tromey.com X-BWhitelist: no X-Source-IP: 161.98.8.3 X-Source-L: No X-Exim-ID: 1okZGB-003RFv-AG X-Source: X-Source-Args: X-Source-Dir: X-Source-Sender: (prentzel) [161.98.8.3]:53226 X-Source-Auth: tom+tromey.com X-Email-Count: 2 X-Source-Cap: ZWx5bnJvYmk7ZWx5bnJvYmk7Ym94NTM3OS5ibHVlaG9zdC5jb20= X-Local-Domain: yes X-Spam-Status: No, score=-3022.2 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, JMQ_SPF_NEUTRAL, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gdb-patches@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gdb-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 17 Oct 2022 23:11:20 -0000 >> However, I think there's a better way to fix all this. It's very simple >> and offhand I don't know why I didn't think of it before... the use of >> wchar_t depends on knowing the encoding of wchar_t -- and I think we do >> know this on mingw, as it's a form of UTF-16. Eli> Beware: wchar_t on MS-Windows _is_ indeed UTF-16, but that means a Eli> single wchar_t character can only represent characters within the BMP; Eli> anything beyond the BMP will need a 'wchar_t *' string whose length is Eli> at least 2. So if the code converts single characters, on Windows it Eli> can only do that with BMP codepoints. Eli> (Ignore me if what I say makes no sense or is not useful: I wasn't Eli> tracking this discussion.) Yes, it is helpful. This may be an issue with this approach, we'd have to make sure there is a test that covers this case. If there is a problem, I think it's fixable with a bit of work in the character-printing code, though. thanks, Tom