public inbox for gcc-rust@gcc.gnu.org
 help / color / mirror / Atom feed
* rust frontend and UTF-8/unicode processing/properties
@ 2021-07-18 13:22 Mark Wielaard
  2021-07-18 20:12 ` Ian Lance Taylor
       [not found] ` <d5e7434b-80e8-2817-ed87-a23ef2ac0cbb@uma.es>
  0 siblings, 2 replies; 17+ messages in thread
From: Mark Wielaard @ 2021-07-18 13:22 UTC (permalink / raw)
  To: gcc, gcc-rust

Hi,

For the gcc rust frontend I was thinking of importing a couple of
gnulib modules to help with UTF-8 processing, conversion to/from
unicode codepoints and determining various properties of those
codepoints. But it seems gcc doesn't yet have any gnulib modules
imported, and maybe other frontends already have helpers to this that
the gcc rust frontend could reuse.

Rust only accepts valid UTF-8 encoded source files, which may or may
not start with UTF-8 BOM character. Whitespace is any codepoint with
the Pattern_White_Space property. Identifiers can start with any
codepoint with the XID_start property plus zero or one codepoints with
XID_continue property. It isn't required, but highly desirable to
detect confusable identifiers according to tr39/Confusable_Detection.

Other names might be constraint to Alphabetic and/or Number categories
(Nd, Nl, No), textual types can only contain Unicode Scalar Values
(any Unicode codepoint except high-surrogate and low-surrogates),
strings in source code can contain unicode escapes (24 bit, up to 6
digits codepoints) but are internally stored as UTF-8 (and must not
encode any surrogates).

Do other gcc frontends handle any of the above already in a way that
might be reusable for other frontends?

Thanks,

Mark


^ permalink raw reply	[flat|nested] 17+ messages in thread
* [GSoC] gccrs Unicode Support
@ 2023-03-31 10:27 E M
  0 siblings, 0 replies; 17+ messages in thread
From: E M @ 2023-03-31 10:27 UTC (permalink / raw)
  To: gcc-rust

[-- Attachment #1: Type: text/plain, Size: 662 bytes --]

Hi everyone!

My name is Emanuele, and I’m software engineer currently in my last MSc year.

I am writing to request your feedback on a proposal I written for this year’s GSoC.

I understand that my proposal may have come late, but I have tried to follow the feedback and recommendations of previous proposals to create as complete a version as possible.

My commitment can be full-time, I have finished my exams early, so I will be able to devote my time totally to this project to try to get a perfectly aligned solution.

https://docs.google.com/document/d/1Xlupkr6n973s9PCGjDsC03aTKQr0hn3MzltwYbK9AJY/edit#

Thank you very much!

Emanuele

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2023-03-31 10:27 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-18 13:22 rust frontend and UTF-8/unicode processing/properties Mark Wielaard
2021-07-18 20:12 ` Ian Lance Taylor
2021-07-18 22:23   ` Jason Merrill
2021-07-23 11:29     ` Philip Herron
     [not found] ` <d5e7434b-80e8-2817-ed87-a23ef2ac0cbb@uma.es>
     [not found]   ` <CAOWUKr0Sd3RRSy2cuqMLj--KTWqOz=nQMxmx7ahM8YunrFzEig@mail.gmail.com>
2023-03-15 11:00     ` [GSoC] gccrs Unicode support Philip Herron
2023-03-15 14:53       ` Arsen Arsenović
2023-03-15 15:18       ` Jakub Jelinek
2023-03-16  8:57         ` Raiki Tamura
2023-03-16  9:28         ` Thomas Schwinge
2023-03-16 12:58           ` Mark Wielaard
2023-03-16 13:07             ` Jakub Jelinek
2023-03-18  8:31             ` Raiki Tamura
2023-03-18  8:47               ` Jonathan Wakely
2023-03-18  8:59                 ` Raiki Tamura
2023-03-18  9:28                   ` Jakub Jelinek
2023-03-20 10:19                     ` Raiki Tamura
2023-03-31 10:27 [GSoC] gccrs Unicode Support E M

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).