From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from eggs.gnu.org (eggs.gnu.org [IPv6:2001:470:142:3::10]) by sourceware.org (Postfix) with ESMTPS id 36BB73858C60 for ; Fri, 26 May 2023 13:55:56 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 36BB73858C60 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gnu.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gnu.org Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1q2Xv9-0007WE-77; Fri, 26 May 2023 09:55:55 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date: mime-version; bh=MtGcajPTXSTs/6/mkhWuP/ncQMulCggWh7sDqp2huEs=; b=iXSR/wbT+TMq LGYG5fZBLrLBLw7j8IJXUCFk/B4v1C86Kudf7XsgQObmRjd+gJ6LhzfDZeyG5PP8+ymdd4Gc8zYX5 DcxsTKi2wEVE1WiFY4zix7fiPkmDq3sKuOQ5VFGPbMnawNmC45iY6MNCNE1iDHT5wKp8esjhWROVy /1RfHBtSWEngfrSdzmFkF0TkqTHi0wxhbfOE6z8a+vMRT/ngNfBY3rq1GKhr+zCy52Db3s+kVH/xF V/HsePAT5oq15wKL4LiUwio6024qrU97ZhI6HCUm+VSJdyWSbLbxWiTm1J+hi5Wd5cBHocCzYA27M /3tl+APWiy9/8NzrYWYmiA==; Received: from [87.69.77.57] (helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1q2Xv7-0002dG-2j; Fri, 26 May 2023 09:55:53 -0400 Date: Fri, 26 May 2023 16:56:24 +0300 Message-Id: <83pm6njiwn.fsf@gnu.org> From: Eli Zaretskii To: Tom de Vries Cc: gdb-patches@sourceware.org, tom@tromey.com In-Reply-To: <20230526132512.29496-1-tdevries@suse.de> (message from Tom de Vries via Gdb-patches on Fri, 26 May 2023 15:25:12 +0200) Subject: Re: [PATCH] [gdb/tui] Handle unicode chars in prompt References: <20230526132512.29496-1-tdevries@suse.de> X-Spam-Status: No, score=1.7 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_BARRACUDACENTRAL,SPF_HELO_PASS,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Level: * X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: > Cc: Tom Tromey > Date: Fri, 26 May 2023 15:25:12 +0200 > From: Tom de Vries via Gdb-patches > > +/* Return true if STRING starts with a multi-byte char. Return the length of > + the multi-byte char in LEN, or 0 in case it's a multi-byte null char. > + Implementation based on _rl_read_mbchar. */ > + > +static bool > +is_mb_char (const char *string, int &len) > +{ > + for (len = 1; len <= MB_CUR_MAX; len++) > + { > + size_t res; > + > + { > + wchar_t wc; <<<<<<<<<<<<<<<<<<<<<<< > + mbstate_t ps; > + memset (&ps, 0, sizeof (mbstate_t)); > + res = mbrtowc (&wc, string, len, &ps); The above assumes each call to mbrtowc produces only one wchar_t value. But that's non-portable: on MS-Windows wchar_t is a 16-bit wide data type, and wchar_t "wide characters" are actually encoded in UTF-16. So characters beyond the BMP will yield 2 wchar_t values, not one. One additional caveat: "multibyte" != "UTF-8". There's more than one multibyte encoding, and the current locale could use some non-UTF-8 encoding instead. For example, some encoding of the ISO-2022 family. I'm not sure what this means for the issue at hand. Yet another consideration is whether tui_puts_internal is used for outputting text in the target charset, in which case you may have problems with using mbrtowc, because AFAIK that supports only the current locale's codeset. If the target charset is different from the locale's (basically, the host) charset, and we don't convert one to the other before calling tui_puts_internal, mbrtowc will fail. Yes, this is a mess. Thanks.