From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from gnu.wildebeest.org (gnu.wildebeest.org [45.83.234.184]) by sourceware.org (Postfix) with ESMTPS id 5459A3858D35; Thu, 16 Mar 2023 12:59:03 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 5459A3858D35 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=klomp.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=klomp.org Received: from r6.localdomain (82-217-174-174.cable.dynamic.v4.ziggo.nl [82.217.174.174]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by gnu.wildebeest.org (Postfix) with ESMTPSA id 40C3230067C5; Thu, 16 Mar 2023 13:59:00 +0100 (CET) Received: by r6.localdomain (Postfix, from userid 1000) id 82AC83401D9; Thu, 16 Mar 2023 13:58:57 +0100 (CET) Message-ID: Subject: Re: [GSoC] gccrs Unicode support From: Mark Wielaard To: Thomas Schwinge , Raiki Tamura , Jakub Jelinek , Philip Herron Cc: gcc@gcc.gnu.org, gcc-rust@gcc.gnu.org, David Edelsohn , Arthur Cohen , Arsen =?UTF-8?Q?Arsenovi=C4=87?= Date: Thu, 16 Mar 2023 13:58:57 +0100 In-Reply-To: <87lejxujso.fsf@euler.schwinge.homeip.net> References: <87lejxujso.fsf@euler.schwinge.homeip.net> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable User-Agent: Evolution 3.46.4 (3.46.4-1.fc37) MIME-Version: 1.0 X-Spam-Status: No, score=-3029.3 required=5.0 tests=BAYES_00,JMQ_SPF_NEUTRAL,KAM_DMARC_STATUS,RCVD_IN_BARRACUDACENTRAL,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hi, On Thu, 2023-03-16 at 10:28 +0100, Thomas Schwinge wrote: > I'm now also putting Mark Wielaard in CC; he once also started discussing > this topic, "thinking of importing a couple of gnulib modules to help > with UTF-8 processing [unless] other gcc frontends handle [these things] > already in a way that might be reusable".=C2=A0 See the thread starting a= t > > "rust frontend and UTF-8/unicode processing/properties". Thanks. BTW. I am not currently working on this. Note the responses in the above thread by Ian and Jason who pointed out that some of the requirements of the gccrs frontend might be covered in the go frontend and libcpp, but not really in a reusable way. One other thing you might want to coordinate on is NFC normalization and Confusable Detection for identifiers. https://unicode.org/reports/tr39/#Confusable_Detection There has been some work on this by David Malcolm and Marek Polacek https://developers.redhat.com/articles/2022/01/12/prevent-trojan-source-att= acks-gcc-12 But that is on a slightly higher source level (not specific to identifiers). You might want to research whether NFC normalization of identifiers is required to be done by the lexer or parser in Rust and how it interacts with proc macros. Cheers, Mark