public inbox for gcc-rust@gcc.gnu.org
 help / color / mirror / Atom feed
From: Jakub Jelinek <jakub@redhat.com>
To: Raiki Tamura <tamaron1203@gmail.com>
Cc: "Jonathan Wakely" <jwakely.gcc@gmail.com>,
	"Mark Wielaard" <mark@klomp.org>,
	"Thomas Schwinge" <thomas@codesourcery.com>,
	"Philip Herron" <herron.philip@googlemail.com>,
	"gcc@gcc.gnu.org" <gcc@gcc.gnu.org>,
	gcc-rust@gcc.gnu.org, "David Edelsohn" <dje.gcc@gmail.com>,
	"Arthur Cohen" <arthur.cohen@embecosm.com>,
	"Arsen Arsenović" <arsen@aarsen.me>
Subject: Re: [GSoC] gccrs Unicode support
Date: Sat, 18 Mar 2023 10:28:13 +0100	[thread overview]
Message-ID: <ZBWELaOXtmnBHrxs@tucnak> (raw)
In-Reply-To: <CAOWUKr3YZsXJkjEHHx42BB=R0JQO5NorRDRD5-AcT-NXYUC3uw@mail.gmail.com>

On Sat, Mar 18, 2023 at 05:59:34PM +0900, Raiki Tamura wrote:
> 2023年3月18日(土) 17:47 Jonathan Wakely <jwakely.gcc@gmail.com>:
> 
> > On Sat, 18 Mar 2023, 08:32 Raiki Tamura via Gcc, <gcc@gcc.gnu.org> wrote:
> >
> >> Thank you everyone for your advice.
> >> Some kinds of names are restricted to unicode alphabetic/numeric in Rust.
> >>
> >
> > Doesn't it use the same rules as C++, based on XID_Start and XID_Continue?
> > That should already be supported.
> >
> 
> Yes, C++ and Rust use the same rules for identifiers (described in UAX#31)
> and we can reuse it in the lexer of gccrs.
> I was talking about values of Rust's crate_name attributes, which only
> allow Unicode alphabetic/numeric characters.
> (Ref:
> https://doc.rust-lang.org/reference/crates-and-source-files.html#the-crate_name-attribute
> )

That is a pretty simple thing, so no need to use an extra library for that.
As is documented in contrib/unicode/README, the Unicode *.txt files are
already checked in and there are several generators of tables.
libcpp/makeucnid.cc already creates tables based on the
UnicodeData.txt DerivedNormalizationProps.txt DerivedCoreProperties.txt
files, including NFC/NKFC, it is true it doesn't currently compute
whether a character is alphanumeric.  That is either Alphabetic
DerivedCoreProperties.txt property, or for numeric Nd, Nl or No category
(3rd column) in UnicodeData.txt.  Should be a few lines to add that support
to libcpp/makeucnid.cc, the only question is if it won't make the ucnranges
array much larger if it differentiates based on another ALPHANUM flag.
If it doesn't grow too much, let's put it there, if it would grow too much,
perhaps we should emit it in a separate table.

	Jakub


  reply	other threads:[~2023-03-18  9:28 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-18 13:22 rust frontend and UTF-8/unicode processing/properties Mark Wielaard
2021-07-18 20:12 ` Ian Lance Taylor
2021-07-18 22:23   ` Jason Merrill
2021-07-23 11:29     ` Philip Herron
     [not found] ` <d5e7434b-80e8-2817-ed87-a23ef2ac0cbb@uma.es>
     [not found]   ` <CAOWUKr0Sd3RRSy2cuqMLj--KTWqOz=nQMxmx7ahM8YunrFzEig@mail.gmail.com>
2023-03-15 11:00     ` [GSoC] gccrs Unicode support Philip Herron
2023-03-15 14:53       ` Arsen Arsenović
2023-03-15 15:18       ` Jakub Jelinek
2023-03-16  8:57         ` Raiki Tamura
2023-03-16  9:28         ` Thomas Schwinge
2023-03-16 12:58           ` Mark Wielaard
2023-03-16 13:07             ` Jakub Jelinek
2023-03-18  8:31             ` Raiki Tamura
2023-03-18  8:47               ` Jonathan Wakely
2023-03-18  8:59                 ` Raiki Tamura
2023-03-18  9:28                   ` Jakub Jelinek [this message]
2023-03-20 10:19                     ` Raiki Tamura
2023-03-31 10:27 [GSoC] gccrs Unicode Support E M

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZBWELaOXtmnBHrxs@tucnak \
    --to=jakub@redhat.com \
    --cc=arsen@aarsen.me \
    --cc=arthur.cohen@embecosm.com \
    --cc=dje.gcc@gmail.com \
    --cc=gcc-rust@gcc.gnu.org \
    --cc=gcc@gcc.gnu.org \
    --cc=herron.philip@googlemail.com \
    --cc=jwakely.gcc@gmail.com \
    --cc=mark@klomp.org \
    --cc=tamaron1203@gmail.com \
    --cc=thomas@codesourcery.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).