From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <simplytheother@gmail.com>
Received: from mail-oo1-xc2b.google.com (mail-oo1-xc2b.google.com
 [IPv6:2607:f8b0:4864:20::c2b])
 by sourceware.org (Postfix) with ESMTPS id 421E63857C53
 for <gcc-rust@gcc.gnu.org>; Sat, 17 Jul 2021 14:26:00 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 421E63857C53
Received: by mail-oo1-xc2b.google.com with SMTP id
 o3-20020a4a84c30000b0290251d599f19bso3191492oog.8
 for <gcc-rust@gcc.gnu.org>; Sat, 17 Jul 2021 07:26:00 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:mime-version:references:in-reply-to:from:date
 :message-id:subject:to:cc;
 bh=TlUBldCHK49EAb8XTD4taOVvU1iDgzMnIjDdNvLQV0o=;
 b=OfcRKCHnje5rlg5dwxlDlfN61duPaq2vlrfGIyNKcvvyD+TN9TcNcqbO6l5jccDlZP
 tE5ulWQNuGiYiWUPkWJR9/PDVrmzQv5am76J/Dl9iSfQ1W+guLQTz6IrfzrCu4z9KH85
 C1XapA+FMhcAVNtJyfgi76u6Eg7xF+ElO9GF5NT6g3bnsR+qlmgtTscfBWljgQI9CODD
 Q8tSZWW88JKNU2zP+IkyEd8URi/Qej5t0vXhtpdAvJY9pwwoxbkSxXixase6xUJOBt1D
 hxQgUy/ha7i7kG96ZNFFbaT/zT77Ch2+Iez1KK80ry2nzEa17O1NXDG/+Z/3fcfb9R+n
 5lBg==
X-Gm-Message-State: AOAM530rdj5Ked0SZHrdELPEN6Q7eNAsAVht7M5FlYrew3rZP5PFY/1X
 XPC9ff9UoagbWlPnJyJxBYPOJGjA5WeJtJWEIB8=
X-Google-Smtp-Source: ABdhPJzx3e/0+JvlLVya39EdW5GsiULI1vafsMER+EFGDFzXHsitCIW8tJp1TFTHvEhvlwoM3jxk0jLnifRVDnxT1Cg=
X-Received: by 2002:a4a:6f0e:: with SMTP id h14mr11398437ooc.9.1626531959058; 
 Sat, 17 Jul 2021 07:25:59 -0700 (PDT)
MIME-Version: 1.0
References: <20210711201018.389798-1-mark@klomp.org>
 <fa073e0b-5601-e0be-f022-96839e191c0c@embecosm.com>
 <YOv+MTNzHnHWZdb3@wildebeest.org>
 <b21c6fe1-4e83-0796-9506-81d39faf0606@embecosm.com>
 <YOzFvSU5Yym2aQxK@wildebeest.org>
 <991da98e-d432-9e5c-feb1-66cf7e8bf6a0@embecosm.com>
 <CADYxmzTdEH2pHba1+1nq5AXEQAyb6UhT8xvRKdWB7bu41ex1UA@mail.gmail.com>
In-Reply-To: <CADYxmzTdEH2pHba1+1nq5AXEQAyb6UhT8xvRKdWB7bu41ex1UA@mail.gmail.com>
From: The Other <simplytheother@gmail.com>
Date: Sat, 17 Jul 2021 22:25:48 +0800
Message-ID: <CADYxmzRc1UEhkQVK5=Qvd90OAFpGj=B=AYjet=zoi9jHaxf1uA@mail.gmail.com>
Subject: Fwd: New contributor tasks
To: mark@klomp.org
Cc: gcc-rust@gcc.gnu.org
Content-Type: multipart/alternative; boundary="000000000000a2b45905c7527b45"
X-Spam-Status: No, score=-1.3 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, HTML_MESSAGE,
 KAM_LINEPADDING, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS,
 TXREP autolearn=no autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on
 server2.sourceware.org
X-BeenThere: gcc-rust@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: gcc-rust mailing list <gcc-rust.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-rust>,
 <mailto:gcc-rust-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-rust/>
List-Post: <mailto:gcc-rust@gcc.gnu.org>
List-Help: <mailto:gcc-rust-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-rust>,
 <mailto:gcc-rust-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Sat, 17 Jul 2021 14:26:03 -0000

--000000000000a2b45905c7527b45
Content-Type: text/plain; charset="UTF-8"

Sorry, pressed the wrong button. I meant to "reply all".

---------- Forwarded message ---------
From: The Other <simplytheother@gmail.com>
Date: Sat, Jul 17, 2021 at 10:20 PM
Subject: Re: New contributor tasks
To: Philip Herron <philip.herron@embecosm.com>


> The AST dump (--rust-dump-parse) was actually useful for checking the
> comment doc strings, but it could certainly be improved. Ideally it
> would be structured in a way that can easily be used in tests.

Yes, I agree. It has its mismatched style because I originally intended it
to be basically "to_string" in the most literal sense possible, but then
realised this would be infeasible for some of the more complicated parts.
Theoretically, I would personally like it to be in a format similar to
clang's AST dump.

> - Full unicode/utf8 support in the lexer. Currently the lexer only
>   explicitly interprets the input as UTF8 for string parseing. It
>   should really treat all input as UTF-8. gnulib has some handy
>   modules we could use to read/convert from/to utf8 (unistr/u8-to-u32,
>   unistr/u32-to-u8) and test various unicode properties
>   (unictype/property-white-space, unictype/property-xid-continue,
>   unictype/property-xid-start). I don't know if we can import those or
>   if gcc already has these kind of UTF-8/unicode support functions for
>   other languages?

At the time of writing the lexer, I was under the impression that Rust only
supported UTF-8 in strings. The Rust Reference seems to have changed now to
show that it supports UTF-8 in identifiers as well. I believe that the C++
frontend, at least, has its own specific hardcoded UTF-8 handling for
identifiers and strings (rather than using a library).

There could be issues with lookahead of several bytes (which the lexer uses
liberally) if using UTF-8 in strings, depending on the exact implementation
of whatever library you use (or function you write).

>> - Error handling using rich locations in the lexer and parser.  It
>>   seems some support is already there, but it isn't totally clear to
>>   me what is already in place and what could/should be added. e.g. how
>>   to add notes to an Error.
> I've made a wrapper over RichLocation i had some crashes when i added
> methods for annotations. Overall my understanding is that a Location
> that we have at the moment is a single character location in the source
> code but Rustc uses Spans which might be an abstraction we could think
> about implementing instead of the Location wrapper we are reusing for
> GCCGO.

The Error class may need to be redesigned. It was a quick fix I made to
allow parse errors to be ignored (since macro expansion would cause parse
errors with non-matching macro matchers). Instead of having the
"emit_error" and "emit_fatal_error" methods, it may be better to instead
store a "kind" of error upon construction, and then just have an "emit"
method that will emit the type of error as specified.
Similarly, Error may have to be rewritten to use RichLocation instead of
Location or something.

>> - I noticed some expressions didn't parse because of what looks to me
>>   operator precedence issues. e.g the following:
>>
>>   const S: usize = 64;
>>
>>   pub fn main ()
>>   {
>>     let a:u8 = 1;
>>     let b:u8 = 2;
>>     let _c = S * a as usize + b as usize;
>>   }
>>
>>   $ gcc/gccrs -Bgcc as.rs
>>
>>   as.rs:7:27: error: type param bounds (in TraitObjectType) are not
allowed as TypeNoBounds
>>     7 |   let _c = S * a as usize + b as usize;
>>       |                           ^
>>
>>   How does one fix such operator precedence issues in the parser?

> Off the top of my head it looks as though the parse_type_cast_expr has a
> FIXME for the precedence issue for it. The Pratt parser uses the notion
> of binding powers to handle this and i think it needs to follow in a
> similar style to the ::parse_expr piece.

Yes, this is probably a precedence issue. The actual issue is that while
expressions have precedence, types (such as "usize", which is what is being
parsed) do not, and greedily parse tokens like "+". Additionally, the
interactions of types and expressions and precedence between them is
something that I have no idea how to approach.
I believe that this specific issue could be fixed by modifying the
parse_type_no_bounds method - if instead of erroring when finding a plus,
it simply returned (treating it like an expression would treat a semicolon,
basically), then this would have the desired functionality. I don't believe
that parse_type_no_bounds (TypeNoBounds do not have '+' in them) would ever
be called in an instance where a Type (that allows bounds) is allowable, so
this change should hopefully not cause any correct programs to parse
incorrectly.

>> - rust-macro-expand tries to handle both macros and attributes, is
>>  this by design?  Should we handle different passes for different
>>  (inert or not) attributes that run before or after macro expansion?
> As for macro and cfg expansion Joel some stuff already in place but i do
> think they need to be separated into distinct passes which would be a
> good first start with the expand folder.

That is a good question. Technically, rust-macro-expand only handles cfg
expansion at the moment. You can read and discuss more about that here:
https://github.com/Rust-GCC/gccrs/issues/563

Thanks,
Joel


On Tue, Jul 13, 2021 at 9:16 PM Philip Herron <philip.herron@embecosm.com>
wrote:

> On 12/07/2021 23:44, Mark Wielaard wrote:
> > On Mon, Jul 12, 2021 at 11:06:01AM +0100, Philip Herron wrote:
> >> Great work once again. I am aiming to spend some time towards the end of
> >> the week to add more tickets and info for new contributors to get
> >> involved, which I will post the interesting ones onto the mailing list
> >> as well. I think it should be interesting to contributors of all levels.
> >> The main one that sticks out in my mind is the AST, HIR dumps which are
> >> a bit of a mess at the moment.
> > The AST dump (--rust-dump-parse) was actually useful for checking the
> > comment doc strings, but it could certainly be improved. Ideally it
> > would be structured in a way that can easily be used in tests.
> I think a really good project would be to update our HIR dump, it should
> really be an S-expression format so we can emit the
> Analysis::NodeMapping information in a way that looks good at the moment
> its a mess.
> > Some (random) notes I made on issues that might be nice to explain
> > and/or work on.
> >
> > - Full unicode/utf8 support in the lexer. Currently the lexer only
> >   explicitly interprets the input as UTF8 for string parseing. It
> >   should really treat all input as UTF-8. gnulib has some handy
> >   modules we could use to read/convert from/to utf8 (unistr/u8-to-u32,
> >   unistr/u32-to-u8) and test various unicode properties
> >   (unictype/property-white-space, unictype/property-xid-continue,
> >   unictype/property-xid-start). I don't know if we can import those or
> >   if gcc already has these kind of UTF-8/unicode support functions for
> >   other languages?
> GCCGO supports utf-8 formats for identifiers but I think it has its own
> implementation to do this. I think pulling in gnulib sounds like a good
> idea, i assume we should ask about this on the GCC mailing list but I
> would prefer to reuse a library for utf8 support. The piece about
> creating the strings in GENERIC will need updated as part of that work.
> > - Error handling using rich locations in the lexer and parser.  It
> >   seems some support is already there, but it isn't totally clear to
> >   me what is already in place and what could/should be added. e.g. how
> >   to add notes to an Error.
> I've made a wrapper over RichLocation i had some crashes when i added
> methods for annotations. Overall my understanding is that a Location
> that we have at the moment is a single character location in the source
> code but Rustc uses Spans which might be an abstraction we could think
> about implementing instead of the Location wrapper we are reusing for
> GCCGO.
> > - I noticed some expressions didn't parse because of what looks to me
> >   operator precedence issues. e.g the following:
> >
> >   const S: usize = 64;
> >
> >   pub fn main ()
> >   {
> >     let a:u8 = 1;
> >     let b:u8 = 2;
> >     let _c = S * a as usize + b as usize;
> >   }
> >
> >   $ gcc/gccrs -Bgcc as.rs
> >
> >   as.rs:7:27: error: type param bounds (in TraitObjectType) are not
> allowed as TypeNoBounds
> >     7 |   let _c = S * a as usize + b as usize;
> >       |                           ^
> >
> >   How does one fix such operator precedence issues in the parser?
>
> Off the top of my head it looks as though the parse_type_cast_expr has a
> FIXME for the precedence issue for it. The Pratt parser uses the notion
> of binding powers to handle this and i think it needs to follow in a
> similar style to the ::parse_expr piece.
>
> > - Related, TypeCastExpr as the above aren't lowered from AST to HIR.
> >   I believe I know how to do it, but a small description of the visitor
> >   pattern used and in which files one does such lowering would be
> helpful.
> The AST->HIR lowering does need some documentation, since it must go
> through name-resolution first but there is no documentation on how any
> of this works yet. I will put this on my todo list its come up a few
> times the naming of some of the classes like ResolveItemToplevel vs
> ResolveItem are confusing things. Some of this will get cleaned up as
> part of traits, such as the forward declared items within a block bug:
>
> Basically the idea is that we always perform a toplevel scan for all
> items and create long canonical names in the top most scope, such that
> we can resolve their names at any point without requiring prototypes or
> look ahead. This means we have a pass to look for the names then we have
> a pass to then resolve each structures fields, functions parameters,
> returns types and blocks of code. So if a block calls to a function
> declared ahead we can still resolve it to its NodeId. It is when we
> ResolveItem we push new contexts onto the stack to have lexical scoping
> for names. Its worth noting that Rust also supports shadowing of
> variables within a block so these do not cause a duplicate name error
> and simply add a new declaration to that context or what rustc calls
> Ribs such that further resolution will reference this new declaration
> and the previous one is shadowed correctly.
>
> > - And of course, how to lower HIR to GENERIC?  For TypeCastExpr you
> >   said on irc we need traits first, but the semantics for primitive
> >   types is actually spelled out in The Reference. Can we already
> >   handle them for primitive types (like in the above example having an
> >   u8 as usize)?
> Lowering HIR to GENERIC documentation is on my todo list as well, though
> there are a bunch of cleanups I have in progress which should also help
> here.
> > - rust-macro-expand tries to handle both macros and attributes, is
> >   this by design?  Should we handle different passes for different
> >   (inert or not) attributes that run before or after macro expansion?
> As for macro and cfg expansion Joel some stuff already in place but i do
> think they need to be separated into distinct passes which would be a
> good first start with the expand folder.
> >
> > Cheers,
> >
> > Mark
> >
> Great summary mail i think this sums up a lot of the common issues. Note
> I added in Joel who wrote the parser he might be to provide more insight.
>
> I added some comments inline to each point. I think i can take away from
> this that we are missing some useful pieces of architecture
> documentation which is becoming important. I think it will be easier for
> me to get this done in a few weeks as there are changes in the areas
> referenced which will affect the documentation.
>
> Overall I do really like the visitor pattern for this work since it is
> isolating the code for each AST or HIR node but it is more difficult to
> follow the flow of the pipeline.
>
> Sorry this does not contain all of the answers yet but I will work on
> them. Thanks
>
> --Phil
>
>
>
>
>
>
>
>
>
>
>
>
>

--000000000000a2b45905c7527b45
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Sorry, pressed the wrong button. I meant to &quot;reply al=
l&quot;.<br><div><br><div class=3D"gmail_quote"><div dir=3D"ltr" class=3D"g=
mail_attr">---------- Forwarded message ---------<br>From: <b class=3D"gmai=
l_sendername" dir=3D"auto">The Other</b> <span dir=3D"auto">&lt;<a href=3D"=
mailto:simplytheother@gmail.com">simplytheother@gmail.com</a>&gt;</span><br=
>Date: Sat, Jul 17, 2021 at 10:20 PM<br>Subject: Re: New contributor tasks<=
br>To: Philip Herron &lt;<a href=3D"mailto:philip.herron@embecosm.com">phil=
ip.herron@embecosm.com</a>&gt;<br></div><br><br><div dir=3D"ltr"><div>
&gt; The AST dump (--rust-dump-parse) was actually useful for checking the<=
br>
&gt; comment doc strings, but it could certainly be improved. Ideally it<br=
>
&gt; would be structured in a way that can easily be used in tests. <br></d=
iv><div><br></div><div>Yes, I agree. It has its mismatched style because I =
originally intended it to be basically &quot;to_string&quot; in the most li=
teral sense possible, but then realised this would be infeasible for some o=
f the more complicated parts. Theoretically, I would personally like it to =
be in a format similar to clang&#39;s AST dump.</div><div><br></div><div>&g=
t; - Full unicode/utf8 support in the lexer. Currently the lexer only</div>
&gt;=C2=A0 =C2=A0explicitly interprets the input as UTF8 for string parsein=
g. It<br>
&gt;=C2=A0 =C2=A0should really treat all input as UTF-8. gnulib has some ha=
ndy<br>
&gt;=C2=A0 =C2=A0modules we could use to read/convert from/to utf8 (unistr/=
u8-to-u32,<br>
&gt;=C2=A0 =C2=A0unistr/u32-to-u8) and test various unicode properties<br>
&gt;=C2=A0 =C2=A0(unictype/property-white-space, unictype/property-xid-cont=
inue,<br>
&gt;=C2=A0 =C2=A0unictype/property-xid-start). I don&#39;t know if we can i=
mport those or<br>
&gt;=C2=A0 =C2=A0if gcc already has these kind of UTF-8/unicode support fun=
ctions for<br><div>
&gt;=C2=A0 =C2=A0other languages? <br></div><div><br></div><div>At the time=
 of writing the lexer, I was under the impression that Rust only supported =
UTF-8 in strings. The Rust Reference seems to have changed now to show that=
 it supports UTF-8 in identifiers as well. I believe that the C++ frontend,=
 at least, has its own specific hardcoded UTF-8 handling for identifiers an=
d strings (rather than using a library).</div><div><br></div><div>There cou=
ld be issues with lookahead of several bytes (which the lexer uses liberall=
y) if using UTF-8 in strings, depending on the exact implementation of what=
ever library you use (or function you write).</div><div><br></div><div>
<div>
&gt;&gt; - Error handling using rich locations in the lexer and parser.=C2=
=A0 It<br>
&gt;&gt;=C2=A0=C2=A0 seems some support is already there, but it isn&#39;t =
totally clear to<br>
&gt;&gt; =C2=A0 me what is already in place and what could/should be added.=
 e.g. how<br>
&gt;&gt;=C2=A0=C2=A0 to add notes to an Error. <br></div><div>
&gt; I&#39;ve made a wrapper over RichLocation i had some crashes when i ad=
ded<br>&gt; methods for annotations. Overall my understanding is that a Loc=
ation<br>
&gt; that we have at the moment is a single character location in the sourc=
e<br>
&gt; code but Rustc uses Spans which might be an abstraction we could think=
<br>
&gt; about implementing instead of the Location wrapper we are reusing for<=
br>&gt; GCCGO.

</div><div><br></div><div>The Error class may need to be redesigned. It was=
 a quick fix I made to allow parse errors to be ignored (since macro expans=
ion would cause parse errors with non-matching macro matchers). Instead of =
having the &quot;emit_error&quot; and &quot;emit_fatal_error&quot; methods,=
 it may be better to instead store a &quot;kind&quot; of error upon constru=
ction, and then just have an &quot;emit&quot; method that will emit the typ=
e of error as specified.</div><div>Similarly, Error may have to be rewritte=
n to use RichLocation instead of Location or something.<br></div>

</div><div><br></div><div>
&gt;&gt; - I noticed some expressions didn&#39;t parse because of what look=
s to me<br>
&gt;&gt;=C2=A0 =C2=A0operator precedence issues. e.g the following:<br>
&gt;&gt;<br>
&gt;&gt;=C2=A0 =C2=A0const S: usize =3D 64;<br>
&gt;&gt;<br>
&gt;&gt;=C2=A0 =C2=A0pub fn main ()<br>
&gt;&gt;=C2=A0 =C2=A0{<br>
&gt;&gt;=C2=A0 =C2=A0 =C2=A0let a:u8 =3D 1;<br>
&gt;&gt;=C2=A0 =C2=A0 =C2=A0let b:u8 =3D 2;<br>
&gt;&gt;=C2=A0 =C2=A0 =C2=A0let _c =3D S * a as usize + b as usize;<br>
&gt;&gt;=C2=A0 =C2=A0}<br>
&gt;&gt;<br>
&gt;&gt;=C2=A0 =C2=A0$ gcc/gccrs -Bgcc <a href=3D"http://as.rs" rel=3D"nore=
ferrer" target=3D"_blank">as.rs</a><br>
&gt;&gt;<br>
&gt;&gt;=C2=A0 =C2=A0as.rs:7:27: error: type param bounds (in TraitObjectTy=
pe) are not allowed as TypeNoBounds<br>
&gt;&gt;=C2=A0 =C2=A0 =C2=A07 |=C2=A0 =C2=A0let _c =3D S * a as usize + b a=
s usize;<br>
&gt;&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0|=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0^<br>
&gt;&gt;<br>
&gt;&gt;=C2=A0 =C2=A0How does one fix such operator precedence issues in th=
e parser?<br>
<br>
&gt; Off the top of my head it looks as though the parse_type_cast_expr has=
 a<br>
&gt; FIXME for the precedence issue for it. The Pratt parser uses the notio=
n<br>
&gt; of binding powers to handle this and i think it needs to follow in a<b=
r>
&gt; similar style to the ::parse_expr piece. <br></div><div><br></div><div=
>Yes, this is probably a precedence issue. The actual issue is that while e=
xpressions have precedence, types (such as &quot;usize&quot;, which is what=
 is being parsed) do not, and greedily parse tokens like &quot;+&quot;. Add=
itionally, the interactions of types and expressions and precedence between=
 them is something that I have no idea how to approach. <br></div><div>I be=
lieve that this specific issue could be fixed by modifying the parse_type_n=
o_bounds method - if instead of erroring when finding a plus, it simply ret=
urned (treating it like an expression would treat a semicolon, basically), =
then this would have the desired functionality. I don&#39;t believe that pa=
rse_type_no_bounds (TypeNoBounds do not have &#39;+&#39; in them) would eve=
r be called in an instance where a Type (that allows bounds) is allowable, =
so this change should hopefully not cause any correct programs to parse inc=
orrectly.</div><div><br></div><div>
&gt;&gt; - rust-macro-expand tries to handle both macros and attributes, is=
<br>
&gt;&gt;=C2=A0 this by design?=C2=A0 Should we handle different passes for =
different<br>
&gt;&gt;=C2=A0 (inert or not) attributes that run before or after macro exp=
ansion?<br>
&gt; As for macro and cfg expansion Joel some stuff already in place but i =
do<br>
&gt; think they need to be separated into distinct passes which would be a<=
br>
&gt; good first start with the expand folder. <br></div><div><br></div><div=
>That is a good question. Technically, rust-macro-expand only handles cfg e=
xpansion at the moment. You can read and discuss more about that here: <a h=
ref=3D"https://github.com/Rust-GCC/gccrs/issues/563" target=3D"_blank">http=
s://github.com/Rust-GCC/gccrs/issues/563</a></div><div><br></div><div>Thank=
s,</div><div>Joel<br></div><div><br></div></div><br><div class=3D"gmail_quo=
te"><div dir=3D"ltr" class=3D"gmail_attr">On Tue, Jul 13, 2021 at 9:16 PM P=
hilip Herron &lt;<a href=3D"mailto:philip.herron@embecosm.com" target=3D"_b=
lank">philip.herron@embecosm.com</a>&gt; wrote:<br></div><blockquote class=
=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rg=
b(204,204,204);padding-left:1ex">On 12/07/2021 23:44, Mark Wielaard wrote:<=
br>
&gt; On Mon, Jul 12, 2021 at 11:06:01AM +0100, Philip Herron wrote:<br>
&gt;&gt; Great work once again. I am aiming to spend some time towards the =
end of<br>
&gt;&gt; the week to add more tickets and info for new contributors to get<=
br>
&gt;&gt; involved, which I will post the interesting ones onto the mailing =
list<br>
&gt;&gt; as well. I think it should be interesting to contributors of all l=
evels.<br>
&gt;&gt; The main one that sticks out in my mind is the AST, HIR dumps whic=
h are<br>
&gt;&gt; a bit of a mess at the moment.<br>
&gt; The AST dump (--rust-dump-parse) was actually useful for checking the<=
br>
&gt; comment doc strings, but it could certainly be improved. Ideally it<br=
>
&gt; would be structured in a way that can easily be used in tests.<br>
I think a really good project would be to update our HIR dump, it should<br=
>
really be an S-expression format so we can emit the<br>
Analysis::NodeMapping information in a way that looks good at the moment<br=
>
its a mess.<br>
&gt; Some (random) notes I made on issues that might be nice to explain<br>
&gt; and/or work on.<br>
&gt;<br>
&gt; - Full unicode/utf8 support in the lexer. Currently the lexer only<br>
&gt;=C2=A0 =C2=A0explicitly interprets the input as UTF8 for string parsein=
g. It<br>
&gt;=C2=A0 =C2=A0should really treat all input as UTF-8. gnulib has some ha=
ndy<br>
&gt;=C2=A0 =C2=A0modules we could use to read/convert from/to utf8 (unistr/=
u8-to-u32,<br>
&gt;=C2=A0 =C2=A0unistr/u32-to-u8) and test various unicode properties<br>
&gt;=C2=A0 =C2=A0(unictype/property-white-space, unictype/property-xid-cont=
inue,<br>
&gt;=C2=A0 =C2=A0unictype/property-xid-start). I don&#39;t know if we can i=
mport those or<br>
&gt;=C2=A0 =C2=A0if gcc already has these kind of UTF-8/unicode support fun=
ctions for<br>
&gt;=C2=A0 =C2=A0other languages?<br>
GCCGO supports utf-8 formats for identifiers but I think it has its own<br>
implementation to do this. I think pulling in gnulib sounds like a good<br>
idea, i assume we should ask about this on the GCC mailing list but I<br>
would prefer to reuse a library for utf8 support. The piece about<br>
creating the strings in GENERIC will need updated as part of that work.<br>
&gt; - Error handling using rich locations in the lexer and parser.=C2=A0 I=
t<br>
&gt;=C2=A0 =C2=A0seems some support is already there, but it isn&#39;t tota=
lly clear to<br>
&gt;=C2=A0 =C2=A0me what is already in place and what could/should be added=
. e.g. how<br>
&gt;=C2=A0 =C2=A0to add notes to an Error.<br>
I&#39;ve made a wrapper over RichLocation i had some crashes when i added<b=
r>
methods for annotations. Overall my understanding is that a Location<br>
that we have at the moment is a single character location in the source<br>
code but Rustc uses Spans which might be an abstraction we could think<br>
about implementing instead of the Location wrapper we are reusing for<br>
GCCGO.<br>
&gt; - I noticed some expressions didn&#39;t parse because of what looks to=
 me<br>
&gt;=C2=A0 =C2=A0operator precedence issues. e.g the following:<br>
&gt;<br>
&gt;=C2=A0 =C2=A0const S: usize =3D 64;<br>
&gt;<br>
&gt;=C2=A0 =C2=A0pub fn main ()<br>
&gt;=C2=A0 =C2=A0{<br>
&gt;=C2=A0 =C2=A0 =C2=A0let a:u8 =3D 1;<br>
&gt;=C2=A0 =C2=A0 =C2=A0let b:u8 =3D 2;<br>
&gt;=C2=A0 =C2=A0 =C2=A0let _c =3D S * a as usize + b as usize;<br>
&gt;=C2=A0 =C2=A0}<br>
&gt;<br>
&gt;=C2=A0 =C2=A0$ gcc/gccrs -Bgcc <a href=3D"http://as.rs" rel=3D"noreferr=
er" target=3D"_blank">as.rs</a><br>
&gt;<br>
&gt;=C2=A0 =C2=A0as.rs:7:27: error: type param bounds (in TraitObjectType) =
are not allowed as TypeNoBounds<br>
&gt;=C2=A0 =C2=A0 =C2=A07 |=C2=A0 =C2=A0let _c =3D S * a as usize + b as us=
ize;<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0|=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0^<br>
&gt;<br>
&gt;=C2=A0 =C2=A0How does one fix such operator precedence issues in the pa=
rser?<br>
<br>
Off the top of my head it looks as though the parse_type_cast_expr has a<br=
>
FIXME for the precedence issue for it. The Pratt parser uses the notion<br>
of binding powers to handle this and i think it needs to follow in a<br>
similar style to the ::parse_expr piece.<br>
<br>
&gt; - Related, TypeCastExpr as the above aren&#39;t lowered from AST to HI=
R.<br>
&gt;=C2=A0 =C2=A0I believe I know how to do it, but a small description of =
the visitor<br>
&gt;=C2=A0 =C2=A0pattern used and in which files one does such lowering wou=
ld be helpful.<br>
The AST-&gt;HIR lowering does need some documentation, since it must go<br>
through name-resolution first but there is no documentation on how any<br>
of this works yet. I will put this on my todo list its come up a few<br>
times the naming of some of the classes like ResolveItemToplevel vs<br>
ResolveItem are confusing things. Some of this will get cleaned up as<br>
part of traits, such as the forward declared items within a block bug:<br>
<br>
Basically the idea is that we always perform a toplevel scan for all<br>
items and create long canonical names in the top most scope, such that<br>
we can resolve their names at any point without requiring prototypes or<br>
look ahead. This means we have a pass to look for the names then we have<br=
>
a pass to then resolve each structures fields, functions parameters,<br>
returns types and blocks of code. So if a block calls to a function<br>
declared ahead we can still resolve it to its NodeId. It is when we<br>
ResolveItem we push new contexts onto the stack to have lexical scoping<br>
for names. Its worth noting that Rust also supports shadowing of<br>
variables within a block so these do not cause a duplicate name error<br>
and simply add a new declaration to that context or what rustc calls<br>
Ribs such that further resolution will reference this new declaration<br>
and the previous one is shadowed correctly.<br>
<br>
&gt; - And of course, how to lower HIR to GENERIC?=C2=A0 For TypeCastExpr y=
ou<br>
&gt;=C2=A0 =C2=A0said on irc we need traits first, but the semantics for pr=
imitive<br>
&gt;=C2=A0 =C2=A0types is actually spelled out in The Reference. Can we alr=
eady<br>
&gt;=C2=A0 =C2=A0handle them for primitive types (like in the above example=
 having an<br>
&gt;=C2=A0 =C2=A0u8 as usize)?<br>
Lowering HIR to GENERIC documentation is on my todo list as well, though<br=
>
there are a bunch of cleanups I have in progress which should also help<br>
here.<br>
&gt; - rust-macro-expand tries to handle both macros and attributes, is<br>
&gt;=C2=A0 =C2=A0this by design?=C2=A0 Should we handle different passes fo=
r different<br>
&gt;=C2=A0 =C2=A0(inert or not) attributes that run before or after macro e=
xpansion?<br>
As for macro and cfg expansion Joel some stuff already in place but i do<br=
>
think they need to be separated into distinct passes which would be a<br>
good first start with the expand folder.<br>
&gt;<br>
&gt; Cheers,<br>
&gt;<br>
&gt; Mark<br>
&gt;<br>
Great summary mail i think this sums up a lot of the common issues. Note<br=
>
I added in Joel who wrote the parser he might be to provide more insight.<b=
r>
<br>
I added some comments inline to each point. I think i can take away from<br=
>
this that we are missing some useful pieces of architecture<br>
documentation which is becoming important. I think it will be easier for<br=
>
me to get this done in a few weeks as there are changes in the areas<br>
referenced which will affect the documentation.<br>
<br>
Overall I do really like the visitor pattern for this work since it is<br>
isolating the code for each AST or HIR node but it is more difficult to<br>
follow the flow of the pipeline.<br>
<br>
Sorry this does not contain all of the answers yet but I will work on<br>
them. Thanks<br>
<br>
--Phil<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
</blockquote></div>
</div></div></div>

--000000000000a2b45905c7527b45--