From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp001-out.apm-internet.net (smtp001-out.apm-internet.net [85.119.248.222]) by sourceware.org (Postfix) with ESMTPS id 68DB03857BBF for ; Tue, 12 Jul 2022 17:11:41 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 68DB03857BBF Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=sandoe.co.uk Authentication-Results: sourceware.org; spf=none smtp.mailfrom=sandoe.co.uk Received: (qmail 69422 invoked from network); 12 Jul 2022 17:11:40 -0000 X-APM-Out-ID: 16576459006942 X-APM-Authkey: 257869/1(257869/1) 7 Received: from unknown (HELO ?192.168.1.95?) (81.138.1.83) by smtp001.apm-internet.net with SMTP; 12 Jul 2022 17:11:40 -0000 Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 13.4 \(3608.120.23.2.7\)) Subject: Re: [PATCH] c++: coroutines - Overlap variables in frame [PR105989] From: Iain Sandoe In-Reply-To: <0334F8DB-C49F-4EB2-98C5-91A44A851F70@getmailspring.com> Date: Tue, 12 Jul 2022 18:11:39 +0100 Cc: GCC Patches Content-Transfer-Encoding: quoted-printable Message-Id: References: <236E416B-F744-42C1-806C-4701DB1BAB55@sandoe.co.uk> <0334F8DB-C49F-4EB2-98C5-91A44A851F70@getmailspring.com> To: =?utf-8?Q?Michal_Jankovi=C4=8D?= X-Mailer: Apple Mail (2.3608.120.23.2.7) X-Spam-Status: No, score=-8.5 required=5.0 tests=BAYES_00, KAM_COUK, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, RCVD_IN_DNSWL_LOW, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 12 Jul 2022 17:11:43 -0000 Hi Michal, > On 12 Jul 2022, at 16:14, Michal Jankovi=C4=8D = wrote: > One other related thing I would like to investigate is reducing the > number of compiler generated variables in the frame, particularly > _Coro_destroy_fn and _Coro_self_handle. =20 >=20 > As I understand it, _Coro_destroy_fn just sets a flag in > _Coro_resume_index and calls _Coro_resume_fn; it should be possible to > move this logic to __builtin_coro_destroy, so that only = _Coro_resume_fn > is stored in the frame; That is a particular point about GCC=E2=80=99s implementation =E2=80=A6 = (it is not neccesarily, or even likely to be the same for other implementations) - see below. I was intending to do experiment with making the ramp/resume/destroy = value a parameter to the actor function so that we would have something like - ramp calls actor(frame, 0) resume calls actor(frame, 1) destroy calls actor(frame, 2)=20 - the token values are illustrative, not intended to be a final version. I think that should allow for more inlining opportunites and possibly a = way forward to frame elision (a.k.a halo). > this would however change the coroutine ABI - I don't know if that's a = problem. The external ABI for the coroutine is the=20 resume, destroy pointers=20 and the promise=20 and that one can find each of these from the frame pointer. This was agreed between the interested =E2=80=9Cvendors=E2=80=9D so that = one compiler could invoke coroutines built by another. So I do not think this is so much a useful = area to explore. Also the intent is that an indirect call through the frame pointer is = the most frequent operation so should be the most efficient. =20 resume() might be called many times,=20 destroy() just once thus it is a cold code path=20 - space can be important too - but interoperability was the goal here. > The _Coro_self_handle should be constructible on-demand from the frame = address. Yes, and in the header the relevant items are all constexpr - so that = should happen in the user=E2=80=99s code. I elected to have that value in the frame to avoid = recreating it each time - I suppose that is a trade-off of one oiptimisation c.f. another =E2=80=A6=20= >=20 > Do you have any advice / opinions on this before I try to implement = it? Hopefully, the notes above help. I will rebase my latest code changes as soon as I have a chance and put = them somewhere for you to look at - basically, these are to try and address the = correctness issues we face, Iain >=20 > Michal >=20 > On Jul 12 2022, at 4:08 pm, Iain Sandoe wrote: >=20 >> Hi Michal, >>=20 >>> On 12 Jul 2022, at 14:35, Michal Jankovi=C4=8D via Gcc-patches >>> wrote: >>>=20 >>> Currently, coroutine frames store all variables of a coroutine = separately, >>> even if their lifetime does not overlap (they are in distinct >>> scopes). This >>> patch implements overlapping distinct variable scopes in the >>> coroutine frame, >>> by storing the frame fields in nested unions of structs. This lowers >>> the size >>> of the frame for larger coroutines significantly, and makes them = more usable >>> on systems with limited memory. >>=20 >> not a review (I will try to take a look at the weekend). >>=20 >> but =E2=80=A6 this is one of the two main optimisations on my TODO - = so cool >> for doing it. >>=20 >> (the other related optimisation is to eliminate frame entries for >> scopes without any suspend >> points - which has the potential to save even more space for code = with >> sparse use of co_xxxx) >>=20 >> Iain >>=20 >>> Bootstrapped and regression tested on x86_64-pc-linux-gnu; new test = fails >>> before the patch and succeeds after with no regressions. >>>=20 >>> PR c++/105989 >>>=20 >>> gcc/cp/ChangeLog: >>>=20 >>> * coroutines.cc (struct local_var_info): Add field_access_path. >>> (build_local_var_frame_access_expr): New. >>> (transform_local_var_uses): Use = build_local_var_frame_access_expr. >>> (coro_make_frame_entry_id): New. >>> (coro_make_frame_entry): Delegate to coro_make_frame_entry_id. >>> (struct local_vars_frame_data): Add orig, field_access_path. >>> (register_local_var_uses): Generate new frame layout. Create = access >>> paths to vars. >>> (morph_fn_to_coro): Set new fields in local_vars_frame_data. =20 >>>=20 >>> gcc/testsuite/ChangeLog: >>>=20 >>> * g++.dg/coroutines/pr105989.C: New test. >>>=20 >>> >>=20 >>=20