From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm1-x330.google.com (mail-wm1-x330.google.com [IPv6:2a00:1450:4864:20::330]) by sourceware.org (Postfix) with ESMTPS id 904F6393C854 for ; Tue, 13 Jul 2021 13:16:37 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 904F6393C854 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=embecosm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=embecosm.com Received: by mail-wm1-x330.google.com with SMTP id b14-20020a1c1b0e0000b02901fc3a62af78so1611012wmb.3 for ; Tue, 13 Jul 2021 06:16:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=embecosm.com; s=google; h=to:cc:references:from:subject:message-id:date:user-agent :mime-version:in-reply-to; bh=hHH75O8DZOZvBP8LYRvCN1QZwe8xHhJ6+Nlyfsr2eQQ=; b=MQd+rQ/TP/WfiOgJw2ci3vsZMaCnhKdGTaCPatP1RvRPXicofudRBSjSL6L0fQ5eQI Omc5JCfEe2RTuWx4UAdInuBNIxKMkTUs8Ag47mGHXE1GCGad0VvEbGeiVxetd9gtzzTz 9MInH3VaO505knh7WbJP267Zurlg0/dU4RrFv1KHO0eFINwcLP0L8raPosUU9jAoYh04 AxlmNrk80q2BM/gLbhdA/de1pLtqj9udXLJRHd2WWT8kBkicDtb3q8y5Uz4xkWEWcF9J 0tQnu/CJtZOq+5QBsqoR7xi2pMn7MYJ1e26SoIeiUkGzJO7I+XQFSJGxjuuqs2/64VYE 3M7g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:to:cc:references:from:subject:message-id:date :user-agent:mime-version:in-reply-to; bh=hHH75O8DZOZvBP8LYRvCN1QZwe8xHhJ6+Nlyfsr2eQQ=; b=XtS6O8vQo6xB7iv1WgMx4KA57u3UWrRECHb/C19s6RrGlvE0OKdwv9FWOAhhNDNaie VQm5mjvvDBe3HK5xzzbKmzGu7XUKF1cdwije0xNCRWbNr5W98y+xlTrku3VPeqZiFonm 5JZEvA2TpFKi54Rtzi7s0mF1WRbNGc+z4I/Kizc5uoqkyGQT5Tg14dpK9DXAnLr4tUAS n7qO/BXCt6CpLLl0dd0yceo1sANBQLEcnUErpaV26GLIOt68rX52yfURI73UMOon0kZf Y7Lh0s267A1FwBX74XHzCy1pnTknOJ8hpaDiMDkLwNvfzawGlbdntQ2Mab38VsN02hEK oRJQ== X-Gm-Message-State: AOAM532oLg/ZjFMgaMngS1PYTcG5Dozlpwe9rRAg44GL2w57in0/ckPN bgNW+rrXf5MoHEdQsRRD1UUouw== X-Google-Smtp-Source: ABdhPJyzDzK/YVroLvXLqkWCbOf1cPYTD+ETtkbSK5iFb3T0QozSfdbvvCMA2cMAaWfQ8YRaguYZEQ== X-Received: by 2002:a1c:9dd6:: with SMTP id g205mr5168633wme.82.1626182196391; Tue, 13 Jul 2021 06:16:36 -0700 (PDT) Received: from [192.168.0.40] ([86.14.124.218]) by smtp.gmail.com with ESMTPSA id r16sm2332143wmg.11.2021.07.13.06.16.35 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 13 Jul 2021 06:16:35 -0700 (PDT) To: Mark Wielaard Cc: gcc-rust@gcc.gnu.org, simplytheother@gmail.com References: <20210711201018.389798-1-mark@klomp.org> From: Philip Herron Subject: Re: New contributor tasks Message-ID: <991da98e-d432-9e5c-feb1-66cf7e8bf6a0@embecosm.com> Date: Tue, 13 Jul 2021 14:16:34 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.11.0 MIME-Version: 1.0 In-Reply-To: Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="atbM1NCMYq3jPf5V4Fw0vbJXmD6FLsBM5" X-Spam-Status: No, score=-4.7 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, NICE_REPLY_A, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-rust@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: gcc-rust mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 13 Jul 2021 13:16:40 -0000 This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --atbM1NCMYq3jPf5V4Fw0vbJXmD6FLsBM5 Content-Type: multipart/mixed; boundary="Ugu3yXIk6Ua3ZNsFfjIfzOuXpj9SV3KTF"; protected-headers="v1" From: Philip Herron To: Mark Wielaard Cc: gcc-rust@gcc.gnu.org, simplytheother@gmail.com Message-ID: <991da98e-d432-9e5c-feb1-66cf7e8bf6a0@embecosm.com> Subject: Re: New contributor tasks References: <20210711201018.389798-1-mark@klomp.org> In-Reply-To: --Ugu3yXIk6Ua3ZNsFfjIfzOuXpj9SV3KTF Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Content-Language: en-US On 12/07/2021 23:44, Mark Wielaard wrote: > On Mon, Jul 12, 2021 at 11:06:01AM +0100, Philip Herron wrote: >> Great work once again. I am aiming to spend some time towards the end = of >> the week to add more tickets and info for new contributors to get >> involved, which I will post the interesting ones onto the mailing list= >> as well. I think it should be interesting to contributors of all level= s. >> The main one that sticks out in my mind is the AST, HIR dumps which ar= e >> a bit of a mess at the moment. > The AST dump (--rust-dump-parse) was actually useful for checking the > comment doc strings, but it could certainly be improved. Ideally it > would be structured in a way that can easily be used in tests. I think a really good project would be to update our HIR dump, it should really be an S-expression format so we can emit the Analysis::NodeMapping information in a way that looks good at the moment its a mess. > Some (random) notes I made on issues that might be nice to explain > and/or work on. > > - Full unicode/utf8 support in the lexer. Currently the lexer only > explicitly interprets the input as UTF8 for string parseing. It > should really treat all input as UTF-8. gnulib has some handy > modules we could use to read/convert from/to utf8 (unistr/u8-to-u32, > unistr/u32-to-u8) and test various unicode properties > (unictype/property-white-space, unictype/property-xid-continue, > unictype/property-xid-start). I don't know if we can import those or > if gcc already has these kind of UTF-8/unicode support functions for > other languages? GCCGO supports utf-8 formats for identifiers but I think it has its own implementation to do this. I think pulling in gnulib sounds like a good idea, i assume we should ask about this on the GCC mailing list but I would prefer to reuse a library for utf8 support. The piece about creating the strings in GENERIC will need updated as part of that work. > - Error handling using rich locations in the lexer and parser. It > seems some support is already there, but it isn't totally clear to > me what is already in place and what could/should be added. e.g. how > to add notes to an Error. I've made a wrapper over RichLocation i had some crashes when i added methods for annotations. Overall my understanding is that a Location that we have at the moment is a single character location in the source code but Rustc uses Spans which might be an abstraction we could think about implementing instead of the Location wrapper we are reusing for GCCGO. > - I noticed some expressions didn't parse because of what looks to me > operator precedence issues. e.g the following: > > const S: usize =3D 64; > > pub fn main () > { > let a:u8 =3D 1; > let b:u8 =3D 2; > let _c =3D S * a as usize + b as usize; > } > > $ gcc/gccrs -Bgcc as.rs > > as.rs:7:27: error: type param bounds (in TraitObjectType) are not all= owed as TypeNoBounds > 7 | let _c =3D S * a as usize + b as usize; > | ^ > > How does one fix such operator precedence issues in the parser? Off the top of my head it looks as though the parse_type_cast_expr has a FIXME for the precedence issue for it. The Pratt parser uses the notion of binding powers to handle this and i think it needs to follow in a similar style to the ::parse_expr piece. > - Related, TypeCastExpr as the above aren't lowered from AST to HIR. > I believe I know how to do it, but a small description of the visitor= > pattern used and in which files one does such lowering would be helpf= ul. The AST->HIR lowering does need some documentation, since it must go through name-resolution first but there is no documentation on how any of this works yet. I will put this on my todo list its come up a few times the naming of some of the classes like ResolveItemToplevel vs ResolveItem are confusing things. Some of this will get cleaned up as part of traits, such as the forward declared items within a block bug: Basically the idea is that we always perform a toplevel scan for all items and create long canonical names in the top most scope, such that we can resolve their names at any point without requiring prototypes or look ahead. This means we have a pass to look for the names then we have a pass to then resolve each structures fields, functions parameters, returns types and blocks of code. So if a block calls to a function declared ahead we can still resolve it to its NodeId. It is when we ResolveItem we push new contexts onto the stack to have lexical scoping for names. Its worth noting that Rust also supports shadowing of variables within a block so these do not cause a duplicate name error and simply add a new declaration to that context or what rustc calls Ribs such that further resolution will reference this new declaration and the previous one is shadowed correctly. > - And of course, how to lower HIR to GENERIC? For TypeCastExpr you > said on irc we need traits first, but the semantics for primitive > types is actually spelled out in The Reference. Can we already > handle them for primitive types (like in the above example having an > u8 as usize)? Lowering HIR to GENERIC documentation is on my todo list as well, though there are a bunch of cleanups I have in progress which should also help here. > - rust-macro-expand tries to handle both macros and attributes, is > this by design? Should we handle different passes for different > (inert or not) attributes that run before or after macro expansion? As for macro and cfg expansion Joel some stuff already in place but i do think they need to be separated into distinct passes which would be a good first start with the expand folder. > > Cheers, > > Mark > Great summary mail i think this sums up a lot of the common issues. Note I added in Joel who wrote the parser he might be to provide more insight.= I added some comments inline to each point. I think i can take away from this that we are missing some useful pieces of architecture documentation which is becoming important. I think it will be easier for me to get this done in a few weeks as there are changes in the areas referenced which will affect the documentation. Overall I do really like the visitor pattern for this work since it is isolating the code for each AST or HIR node but it is more difficult to follow the flow of the pipeline. Sorry this does not contain all of the answers yet but I will work on them. Thanks --Phil --Ugu3yXIk6Ua3ZNsFfjIfzOuXpj9SV3KTF-- --atbM1NCMYq3jPf5V4Fw0vbJXmD6FLsBM5 Content-Type: application/pgp-signature; name="OpenPGP_signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="OpenPGP_signature" -----BEGIN PGP SIGNATURE----- wsD5BAABCAAjFiEET83ATZOayqRjyL0Cr7gxHEFOdpkFAmDtkjIFAwAAAAAACgkQr7gxHEFOdpmw /QwAuvC7dNlF7O5S/Q5zpJ332NLqvJLNbd3BElKpkCJ08Dyrws4WnxN8nQNsSQP2VGrF83dsXewZ wP/AfICPbb+V+Q1Qf3g6KlFnDF4rAVH3lMHR+GtINXwVHGGpzkSHlE4mi7q7qMcA8kXiUGu5lrRP mc+OlgFdNYSUaOQgalX5NeVtkQu8Fz59Gh48UceMSbyL+BDtiI0pnBzdw2Twe+LZBS6N8HQGR91J 5U93f6xeJGplG7FlJhy0wIzgXDXcJdPtfi7ld1LsNA24vy+yztQxhZjZqCQFJUSBHFBmQoMcCbxt U9jxaIiH2iamg0y0rODgWLEp2GqNRWwm+Z2IXj6vxooSEQsO7QaoP0wnr0eGF02fqp1jgoOMEkQ2 iqYdAS2nAR5vczDJKkUKSHmSHCOLSY/83DdKhE9rhhr1GnCaV1cOwY2xWBgD+CUCc1mR01c6VYVf S4WkoER8jcuC5s4uOKhuBmwNZlm4IQnuUlF29j9vrGjFkYnkuecGMalsYTsz =3wiv -----END PGP SIGNATURE----- --atbM1NCMYq3jPf5V4Fw0vbJXmD6FLsBM5--