From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ej1-x62c.google.com (mail-ej1-x62c.google.com [IPv6:2a00:1450:4864:20::62c]) by sourceware.org (Postfix) with ESMTPS id 591E03858C53 for ; Sun, 14 May 2023 16:32:03 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 591E03858C53 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-ej1-x62c.google.com with SMTP id a640c23a62f3a-94a342f7c4cso2152810666b.0 for ; Sun, 14 May 2023 09:32:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1684081922; x=1686673922; h=content-disposition:content-transfer-encoding:mime-version:subject :references:in-reply-to:message-id:cc:to:from:date:from:to:cc :subject:date:message-id:reply-to; bh=WpzWNNwuFYQPPMUpUxlBDG0eWQBQboBK7qvcXg1FAVo=; b=sE2mghlCLwM3d7EtEBQGOQ1SznYbCxFhB14mieC9rC3fhfxUiqRy2oJwdHVxZtrrJX RQS1NhCufWjtolhAFL6Rr8e/y1IqbqpGnT7SOWm3BxRsjBoQSVoij/VM+pCSg+xz2VRt PHXJcJBNXy4AOwKxM1aqjvLuLv5x5DRMrkr7q63QxUI24IHUtP0oWqfbognyYWZg3KNz GHUzER+2bzHUfk9aZ6jsEpP/mzaguFfdYbsf+FObJnuB8J2wXSY2quid3sruCnXTmCtu rBrf3N/lj/uc+rnmUOcN/ySzyR7mFpHY3pPNI2qpgnFAi/j0kjzU02/H2bP3+dgFmbMf DYfQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1684081922; x=1686673922; h=content-disposition:content-transfer-encoding:mime-version:subject :references:in-reply-to:message-id:cc:to:from:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=WpzWNNwuFYQPPMUpUxlBDG0eWQBQboBK7qvcXg1FAVo=; b=PsPPeg3H+R9oxsio4cP7sBKE54eAk8ccFuOdlyUvCCe1LyC/Yr7PBXZi2ggmuWMfek eR56iV+Pdkb/kg5W5qqMVhGRMxkLKucXjcJrsKyIqJyNQEbSf+Z45wGD+2LI53CaW9iL Do/fvuv+7pd3roI4ehaqweHLgC5XZpS7DhzNj0UVqGsk5rJ5Slu7xQNyABGHeybrzMyE lKHGf05t8EAd3r8HrkCUbah4faumcFvF8Ber8LEgJLJN9LAsIkEjRKT7OM3tSHqgctX9 fgwWygrr24S4B2XGjHo3LgdLqDeM5pH3wLB4pWhFCpqgAYCQYCa6dtUQvTI4BTXO2y8l yfHQ== X-Gm-Message-State: AC+VfDzg7F/+7O/HHaIBJnZVVnVIFCGagPCpFaDBoxybHk7heTCp2D9G zK58KtLuWXcAYHDFkJyHMYQ= X-Google-Smtp-Source: ACHHUZ5hK5ceiG2VZ0Grqldr0b/ogTq1g9iHHk7j0GP+hKD3sBPX5Hp4Y+NTlgrCdoT7FBrVNvXUpw== X-Received: by 2002:a17:907:25c7:b0:94a:7da2:d339 with SMTP id ae7-20020a17090725c700b0094a7da2d339mr27010763ejc.26.1684081921553; Sun, 14 May 2023 09:32:01 -0700 (PDT) Received: from miso-desktop (ip-89-103-182-68.bb.vodafone.cz. [89.103.182.68]) by smtp.gmail.com with ESMTPSA id z25-20020a17090674d900b0096ac3e01a35sm2964353ejl.130.2023.05.14.09.32.00 (version=TLS1_2 cipher=ECDHE-ECDSA-CHACHA20-POLY1305 bits=256/256); Sun, 14 May 2023 09:32:00 -0700 (PDT) Date: Sun, 14 May 2023 18:31:59 +0200 From: =?utf-8?Q?Michal_Jankovi=C4=8D?= To: Iain Sandoe Cc: GCC Patches Message-ID: In-Reply-To: <4229A144-BA40-4653-A37D-171E28EAD6FF@sandoe.co.uk> References: <4229A144-BA40-4653-A37D-171E28EAD6FF@sandoe.co.uk> Subject: Re: [PATCH] c++: coroutines - Overlap variables in frame [PR105989] X-Mailer: Mailspring MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Content-Disposition: inline X-Spam-Status: No, score=-0.6 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hi Iain, I do not currently have metrics for this, but I can look into generating them, however I currently do not know of any large open-source projects using coroutines that I could use for this; I was thinking about using cppcoro unit tests, but they mostly contain very simple coroutines. I have source for =7E20k LoC proprietary project relying heavily on coroutines (using boost::asio::awaitable), but here I cannot show the source along with the numbers - would this be enough or should I look for more open source projects=3F thanks, Michal On May 14 2023, at 6:07 pm, Iain Sandoe wrote: > Hi Michal, > =20 >> On 14 May 2023, at 16:36, Michal Jankovi=C4=8D >> wrote: >> =20 >> Rebased the patch to GCC 14 trunk. Bootstrapped and regression tested >> again on x86=5F64-pc-linux-gnu, only difference is the new test failin= g >> without the patch. > =20 > (as previously noted, I am much in favour of this optimisation) > =20 > Do you have any metrics on the reductions in frame size for realistic c= oroutines=3F > =20 > thanks > Iain > =20 >> =20 >> On Jul 13 2022, at 2:54 pm, Michal Jankovic >> wrote: >> =20 >>> Hi Iain, >>> =20 >>> thanks for the info. I have some follow-up questions. >>> =20 >>> On Jul 12 2022, at 7:11 pm, Iain Sandoe wrote: >>> =20 >>>> Hi Michal, >>>> =20 >>>>> On 12 Jul 2022, at 16:14, Michal Jankovi=C4=8D >>>>> wrote: >>>> =20 >>>>> One other related thing I would like to investigate is reducing the= >>>>> number of compiler generated variables in the frame, particularly >>>>> =5FCoro=5Fdestroy=5Ffn and =5FCoro=5Fself=5Fhandle. =20 >>>>> =20 >>>>> As I understand it, =5FCoro=5Fdestroy=5Ffn just sets a flag in >>>>> =5FCoro=5Fresume=5Findex and calls =5FCoro=5Fresume=5Ffn; it should= be >>>>> possible to >>>>> move this logic to =5F=5Fbuiltin=5Fcoro=5Fdestroy, so that only =5F= Coro=5Fresume=5Ffn >>>>> is stored in the frame; >>>> =20 >>>> That is a particular point about GCC=E2=80=99s implementation =E2=80= =A6 (it is not >>>> neccesarily, or even >>>> likely to be the same for other implementations) - see below. >>>> =20 >>>> I was intending to do experiment with making the ramp/resume/destroy= >>>> value a parameter >>>> to the actor function so that we would have something like - >>>> =20 >>>> ramp calls actor(frame, 0) >>>> resume calls actor(frame, 1) >>>> destroy calls actor(frame, 2) =20 >>>> - the token values are illustrative, not intended to be a final vers= ion. >>>> =20 >>>> I think that should allow for more inlining opportunites and possibl= y >>>> a way forward to >>>> frame elision (a.k.a halo). >>>> =20 >>>>> this would however change the coroutine ABI - I don't know if that'= s >>>>> a problem. >>>> =20 >>>> The external ABI for the coroutine is the =20 >>>> resume, >>>> destroy pointers =20 >>>> and the promise =20 >>>> and that one can find each of these from the frame pointer. >>>> =20 >>>> This was agreed between the interested =E2=80=9Cvendors=E2=80=9D so = that one compiler >>>> could invoke >>>> coroutines built by another. So I do not think this is so much a >>>> useful area to explore. >>>> =20 >>> =20 >>> I understand. I still want to try to implement a more light-weight fr= ame >>> layout with just one function pointer; would it be possible to merge >>> such a change if it was made opt-in via a compiler flag, eg >>> =60-fsmall-coroutine-frame=60=3F My use-case for this is embedded env= ironments >>> with very limited memory, and I do not care about interoperability wi= th >>> other compilers there. =20 >>> =20 >>>> Also the intent is that an indirect call through the frame pointer i= s >>>> the most frequent >>>> operation so should be the most efficient. =20 >>>> resume() might be called many times, =20 >>>> destroy() just once thus it is a cold code path =20 >>>> - space can be important too - but interoperability was the goal her= e. >>>> =20 >>>>> The =5FCoro=5Fself=5Fhandle should be constructible on-demand from = the >>>>> frame address. >>>> =20 >>>> Yes, and in the header the relevant items are all constexpr - so tha= t >>>> should happen in the >>>> user=E2=80=99s code. I elected to have that value in the frame to a= void >>>> recreating it each time - I >>>> suppose that is a trade-off of one oiptimisation c.f. another =E2=80= =A6 =20 >>> =20 >>> If the handle construction cannot be optimized out, and its thus =20 >>> a tradeoff between frame size and number of instructions, then this >>> could also be enabled by a hypothetical =60-fsmall-coroutine-frame=60= . >>> =20 >>> Coming back to this: >>> =20 >>>>>> (the other related optimisation is to eliminate frame entries for >>>>>> scopes without any suspend >>>>>> points - which has the potential to save even more space for code = with >>>>>> sparse use of co=5Fxxxx) >>> =20 >>> This would be nice; although it could encompassed by a more general =20 >>> =20 >>> optimization - eliminate frame entries for all variables which are no= t >>> =20 >>> accessed (directly or via pointer / reference) beyond a suspend point= . >>> To be fair, I do not know how to get started on such an optimization,= >>> or if it is even possible to do on the frontend. This would however b= e >>> immensely useful for reducing the frame size taken-up by complicated >>> co=5Fawait expressions (among other things), for example, if I have a= >>> composed operation: >>> =20 >>> co=5Fawait when=5Feither(get=5Fleaf=5Fawaitable=5F1(), get=5Fleaf=5Fa= waitable=5F2()); >>> =20 >>> Right now, this creates space in the frame for the temporary 'leaf' =20 >>> =20 >>> awaitables, which were already moved into the composed awaitable. >>> If the awaitable has an operator co=5Fawait that returns the real awa= iter, >>> the original awaitable is also stored in the frame, even if it =20 >>> is not referenced by the awaiter; another unused object gets stored i= f >>> =20 >>> the .await=5Ftransform() customization point was used. >>> =20 >>> What are your thoughts on the feasibility / difficulty of implementin= g >>> such an optimization=3F >>> =20 >>> Michal >>> =20 >>>>> =20 >>>>> Do you have any advice / opinions on this before I try to >>>>> implement it=3F >>>> =20 >>>> Hopefully, the notes above help. >>>> =20 >>>> I will rebase my latest code changes as soon as I have a chance and >>>> put them somewhere >>>> for you to look at - basically, these are to try and address the >>>> correctness issues we face, >>>> =20 >>>> Iain >>>> =20 >>>> =20 >>>>> =20 >>>>> Michal >>>>> =20 >>>>> On Jul 12 2022, at 4:08 pm, Iain Sandoe wrote= : >>>>> =20 >>>>>> Hi Michal, >>>>>> =20 >>>>>>> On 12 Jul 2022, at 14:35, Michal Jankovi=C4=8D via Gcc-patches >>>>>>> wrote: >>>>>>> =20 >>>>>>> Currently, coroutine frames store all variables of a coroutine se= parately, >>>>>>> even if their lifetime does not overlap (they are in distinct >>>>>>> scopes). This >>>>>>> patch implements overlapping distinct variable scopes in the >>>>>>> coroutine frame, >>>>>>> by storing the frame fields in nested unions of structs. This low= ers >>>>>>> the size >>>>>>> of the frame for larger coroutines significantly, and makes them >>>>>>> more usable >>>>>>> on systems with limited memory. >>>>>> =20 >>>>>> not a review (I will try to take a look at the weekend). >>>>>> =20 >>>>>> but =E2=80=A6 this is one of the two main optimisations on my TODO= - so cool >>>>>> for doing it. >>>>>> =20 >>>>>> (the other related optimisation is to eliminate frame entries for >>>>>> scopes without any suspend >>>>>> points - which has the potential to save even more space for code = with >>>>>> sparse use of co=5Fxxxx) >>>>>> =20 >>>>>> Iain >>>>>> =20 >>>>>>> Bootstrapped and regression tested on x86=5F64-pc-linux-gnu; new >>>>>>> test fails >>>>>>> before the patch and succeeds after with no regressions. >>>>>>> =20 >>>>>>> PR c++/105989 >>>>>>> =20 >>>>>>> gcc/cp/ChangeLog: >>>>>>> =20 >>>>>>> * coroutines.cc (struct local=5Fvar=5Finfo): Add field=5Faccess=5F= path. >>>>>>> (build=5Flocal=5Fvar=5Fframe=5Faccess=5Fexpr): New. >>>>>>> (transform=5Flocal=5Fvar=5Fuses): Use build=5Flocal=5Fvar=5Ffram= e=5Faccess=5Fexpr. >>>>>>> (coro=5Fmake=5Fframe=5Fentry=5Fid): New. >>>>>>> (coro=5Fmake=5Fframe=5Fentry): Delegate to coro=5Fmake=5Fframe=5F= entry=5Fid. >>>>>>> (struct local=5Fvars=5Fframe=5Fdata): Add orig, field=5Faccess=5F= path. >>>>>>> (register=5Flocal=5Fvar=5Fuses): Generate new frame layout. Crea= te access >>>>>>> paths to vars. >>>>>>> (morph=5Ffn=5Fto=5Fcoro): Set new fields in local=5Fvars=5Fframe= =5Fdata. =20 >>>>>>> =20 >>>>>>> gcc/testsuite/ChangeLog: >>>>>>> =20 >>>>>>> * g++.dg/coroutines/pr105989.C: New test. >>>>>>> =20 >>>>>>> >>>>>> =20 >>>>>> =20 >>>> =20 >>>> =20 >> > =20 >