From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=MNdc=7I=klomp.org=mark@sourceware.org>
Received: from gnu.wildebeest.org (gnu.wildebeest.org [45.83.234.184])
	by sourceware.org (Postfix) with ESMTPS id 5459A3858D35;
	Thu, 16 Mar 2023 12:59:03 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 5459A3858D35
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=klomp.org
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=klomp.org
Received: from r6.localdomain (82-217-174-174.cable.dynamic.v4.ziggo.nl [82.217.174.174])
	(using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by gnu.wildebeest.org (Postfix) with ESMTPSA id 40C3230067C5;
	Thu, 16 Mar 2023 13:59:00 +0100 (CET)
Received: by r6.localdomain (Postfix, from userid 1000)
	id 82AC83401D9; Thu, 16 Mar 2023 13:58:57 +0100 (CET)
Message-ID: <a476e56f825e8570a3f885491f871469879305c6.camel@klomp.org>
Subject: Re: [GSoC] gccrs Unicode support
From: Mark Wielaard <mark@klomp.org>
To: Thomas Schwinge <thomas@codesourcery.com>, Raiki Tamura
	 <tamaron1203@gmail.com>, Jakub Jelinek <jakub@redhat.com>, Philip Herron
	 <herron.philip@googlemail.com>
Cc: gcc@gcc.gnu.org, gcc-rust@gcc.gnu.org, David Edelsohn
 <dje.gcc@gmail.com>,  Arthur Cohen <arthur.cohen@embecosm.com>, Arsen
 =?UTF-8?Q?Arsenovi=C4=87?= <arsen@aarsen.me>
Date: Thu, 16 Mar 2023 13:58:57 +0100
In-Reply-To: <87lejxujso.fsf@euler.schwinge.homeip.net>
References: <YPQrMBHyu3wRpT5o@wildebeest.org>
	 <d5e7434b-80e8-2817-ed87-a23ef2ac0cbb@uma.es>
	 <CAOWUKr0Sd3RRSy2cuqMLj--KTWqOz=nQMxmx7ahM8YunrFzEig@mail.gmail.com>
	 <CAEvRbepBFGf00CCbxNtk0yBb1RntYM0k6j8TQ86rYb+UDqiMLg@mail.gmail.com>
	 <ZBHhutUTOl26A13z@tucnak> <87lejxujso.fsf@euler.schwinge.homeip.net>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
User-Agent: Evolution 3.46.4 (3.46.4-1.fc37) 
MIME-Version: 1.0
X-Spam-Status: No, score=-3029.3 required=5.0 tests=BAYES_00,JMQ_SPF_NEUTRAL,KAM_DMARC_STATUS,RCVD_IN_BARRACUDACENTRAL,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=no autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-rust.gcc.gnu.org>

Hi,

On Thu, 2023-03-16 at 10:28 +0100, Thomas Schwinge wrote:
> I'm now also putting Mark Wielaard in CC; he once also started discussing
> this topic, "thinking of importing a couple of gnulib modules to help
> with UTF-8 processing [unless] other gcc frontends handle [these things]
> already in a way that might be reusable".=C2=A0 See the thread starting a=
t
> <https://inbox.sourceware.org/gcc/YPQrMBHyu3wRpT5o@wildebeest.org>
> "rust frontend and UTF-8/unicode processing/properties".

Thanks. BTW. I am not currently working on this.
Note the responses in the above thread by Ian and Jason who pointed out
that some of the requirements of the gccrs frontend might be covered in
the go frontend and libcpp, but not really in a reusable way.

One other thing you might want to coordinate on is NFC normalization
and Confusable Detection for identifiers.
https://unicode.org/reports/tr39/#Confusable_Detection
There has been some work on this by David Malcolm and Marek Polacek
https://developers.redhat.com/articles/2022/01/12/prevent-trojan-source-att=
acks-gcc-12
But that is on a slightly higher source level (not specific to
identifiers).

You might want to research whether NFC normalization of identifiers is
required to be done by the lexer or parser in Rust and how it interacts
with proc macros.

Cheers,

Mark