public inbox for gcc-rust@gcc.gnu.org
 help / color / mirror / Atom feed
From: Mark Wielaard <mark@klomp.org>
To: The Other <simplytheother@gmail.com>
Cc: gcc-rust@gcc.gnu.org
Subject: Re: Fwd: New contributor tasks
Date: Sat, 17 Jul 2021 23:23:15 +0200	[thread overview]
Message-ID: <YPNKQ4eeHmfht2w5@wildebeest.org> (raw)
In-Reply-To: <CADYxmzRc1UEhkQVK5=Qvd90OAFpGj=B=AYjet=zoi9jHaxf1uA@mail.gmail.com>

Hi Joel,

On Sat, Jul 17, 2021 at 10:25:48PM +0800, The Other wrote:
> > - Full unicode/utf8 support in the lexer. Currently the lexer only
> >   explicitly interprets the input as UTF8 for string parseing. It
> >   should really treat all input as UTF-8. gnulib has some handy
> >   modules we could use to read/convert from/to utf8 (unistr/u8-to-u32,
> >   unistr/u32-to-u8) and test various unicode properties
> >   (unictype/property-white-space, unictype/property-xid-continue,
> >   unictype/property-xid-start). I don't know if we can import those or
> >   if gcc already has these kind of UTF-8/unicode support functions for
> >   other languages?
> 
> At the time of writing the lexer, I was under the impression that Rust only
> supported UTF-8 in strings. The Rust Reference seems to have changed now to
> show that it supports UTF-8 in identifiers as well. I believe that the C++
> frontend, at least, has its own specific hardcoded UTF-8 handling for
> identifiers and strings (rather than using a library).
> 
> There could be issues with lookahead of several bytes (which the lexer uses
> liberally) if using UTF-8 in strings, depending on the exact implementation
> of whatever library you use (or function you write).

The whole source file should be valid UTF-8. You can use it in
comments too. And any invalid UTF-8 encoding means the file isn't a
valid Rust source file. So the simplest is to make the lexer handle
UTF-8 and handle one codepoint (UCS4/32bits) at a time. Lookahead then
also simply works per codepoint. We would still store strings as
UTF-8. gnulib contains various helpers to convert to/from utf-8/ucs4
and to test various unicode properties of codepoints. I'll ask on the
gcc mailinglist whether to use the C++ frontend support or import the
gnulib helpers.

> >> - rust-macro-expand tries to handle both macros and attributes, is
> >>  this by design?  Should we handle different passes for different
> >>  (inert or not) attributes that run before or after macro expansion?
> > As for macro and cfg expansion Joel some stuff already in place but i do
> > think they need to be separated into distinct passes which would be a
> > good first start with the expand folder.
> 
> That is a good question. Technically, rust-macro-expand only handles cfg
> expansion at the moment. You can read and discuss more about that here:
> https://github.com/Rust-GCC/gccrs/issues/563

I have to think about whether it makes sense to handle the cfg
attribute and the !cfg macro rules in hte same pass/expansion. The
!cfg macro seems so simple it could be handled immediately by the
parser since it only relies on the compiler/host attributes and simply
generates a true or false token.

In general it seems attribute expansion cannot be simply done by one
AttributeVisitor pass because the effect can be at different stages of
parsing (and they can even affect what the lexer accepts -
e.g. whether identifiers as unicode strings are accepted). For example
the various lint attributes can warn/error/etc when lowering the final
AST (CamelCaseStructs for example), after type checking or after
lifeness analysis. So maybe we need to design a pass for each
different attribute and not try to combine them (except maybe to
recognize and validate the attribute syntax).

Cheers,

Mark

  reply	other threads:[~2021-07-17 21:23 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-11 20:10 [PATCH] Handle doc comment strings in lexer and parser Mark Wielaard
2021-07-12  8:09 ` Philip Herron
2021-07-12  8:32   ` Mark Wielaard
2021-07-12 10:06     ` Philip Herron
2021-07-12 22:44       ` New contributor tasks Mark Wielaard
2021-07-13 13:16         ` Philip Herron
     [not found]           ` <CADYxmzTdEH2pHba1+1nq5AXEQAyb6UhT8xvRKdWB7bu41ex1UA@mail.gmail.com>
2021-07-17 14:25             ` Fwd: " The Other
2021-07-17 21:23               ` Mark Wielaard [this message]
2021-07-18 20:45               ` Mark Wielaard
2021-07-13 13:30         ` Thomas Schwinge

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YPNKQ4eeHmfht2w5@wildebeest.org \
    --to=mark@klomp.org \
    --cc=gcc-rust@gcc.gnu.org \
    --cc=simplytheother@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).