From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp-out1.suse.de (smtp-out1.suse.de [IPv6:2001:67c:2178:6::1c]) by sourceware.org (Postfix) with ESMTPS id 7E12E3858D35 for ; Fri, 9 Jun 2023 09:34:20 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 7E12E3858D35 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=suse.de Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.de Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id B4968215E6; Fri, 9 Jun 2023 09:34:19 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1686303259; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=BAHTZBlIw2rQrWgZQ9h7uZYNYzKn1yN2WgMxjVRQi3o=; b=HEtUTE1pJj/2V5h1TSt9VV7VVQGLsdC+SdZLiaQValN3AQFDolLkEKcWD0nx1WaLU18etl PmmSNS4AxOPmu//e0ocNRenCfk8L94kXKDwUj906RhyUaTLZ2q4WBDqQ4JQ7MSGXQtF9y1 nNXprx6l0I85S7EnyMh/aJ3WPoL245w= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1686303259; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=BAHTZBlIw2rQrWgZQ9h7uZYNYzKn1yN2WgMxjVRQi3o=; b=2vjLmRpIADyatCXYn5N9pyuVRqwL4OhoYJyWCjb6xkECTqYSyjT0GztYQj+78KlXED1OoQ KbrT6gOOU8jMm3Dw== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 9A2ED139C8; Fri, 9 Jun 2023 09:34:19 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id Lpd6JBvygmSlfAAAMHmgww (envelope-from ); Fri, 09 Jun 2023 09:34:19 +0000 Message-ID: Date: Fri, 9 Jun 2023 11:34:28 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.11.0 Subject: Re: [PATCH] [gdb/tui] Handle unicode chars in prompt Content-Language: en-US To: Eli Zaretskii Cc: gdb-patches@sourceware.org, tom@tromey.com References: <20230526132512.29496-1-tdevries@suse.de> <83pm6njiwn.fsf@gnu.org> From: Tom de Vries In-Reply-To: <83pm6njiwn.fsf@gnu.org> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-6.0 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On 5/26/23 15:56, Eli Zaretskii wrote: >> Cc: Tom Tromey >> Date: Fri, 26 May 2023 15:25:12 +0200 >> From: Tom de Vries via Gdb-patches >> >> +/* Return true if STRING starts with a multi-byte char. Return the length of >> + the multi-byte char in LEN, or 0 in case it's a multi-byte null char. >> + Implementation based on _rl_read_mbchar. */ >> + >> +static bool >> +is_mb_char (const char *string, int &len) >> +{ >> + for (len = 1; len <= MB_CUR_MAX; len++) >> + { >> + size_t res; >> + >> + { >> + wchar_t wc; <<<<<<<<<<<<<<<<<<<<<<< >> + mbstate_t ps; >> + memset (&ps, 0, sizeof (mbstate_t)); >> + res = mbrtowc (&wc, string, len, &ps); > > The above assumes each call to mbrtowc produces only one wchar_t > value. But that's non-portable: on MS-Windows wchar_t is a 16-bit > wide data type, and wchar_t "wide characters" are actually encoded in > UTF-16. So characters beyond the BMP will yield 2 wchar_t values, not > one. > Hi Eli, I see, thanks for pointing that out. I've fixed this by using nullptr instead of &wc. > One additional caveat: "multibyte" != "UTF-8". There's more than one > multibyte encoding, and the current locale could use some non-UTF-8 > encoding instead. For example, some encoding of the ISO-2022 family. > I'm not sure what this means for the issue at hand. > AFAIU, interpreting the currently locale and encoding correctly is up to mbrtowc, so as long as it does that correctly I think there's no problem. > Yet another consideration is whether tui_puts_internal is used for > outputting text in the target charset, in which case you may have > problems with using mbrtowc, because AFAIK that supports only the > current locale's codeset. If the target charset is different from the > locale's (basically, the host) charset, and we don't convert one to > the other before calling tui_puts_internal, mbrtowc will fail. > [ Addressed by Tom Tromey in this thread. ] > Yes, this is a mess. > Indeed :) V2 posted here ( https://sourceware.org/pipermail/gdb-patches/2023-June/200181.html ). Thanks, - Tom