From: Iain Sandoe <iain@sandoe.co.uk>
To: "Michal Jankovič" <michal.jankovic59@gmail.com>
Cc: GCC Patches <gcc-patches@gcc.gnu.org>
Subject: Re: [PATCH] c++: coroutines - Overlap variables in frame [PR105989]
Date: Sun, 14 May 2023 17:07:32 +0100 [thread overview]
Message-ID: <4229A144-BA40-4653-A37D-171E28EAD6FF@sandoe.co.uk> (raw)
In-Reply-To: <412A866E-BE4E-4DA8-A10E-4410C01FB9FF@getmailspring.com>
Hi Michal,
> On 14 May 2023, at 16:36, Michal Jankovič <michal.jankovic59@gmail.com> wrote:
>
> Rebased the patch to GCC 14 trunk. Bootstrapped and regression tested
> again on x86_64-pc-linux-gnu, only difference is the new test failing
> without the patch.
(as previously noted, I am much in favour of this optimisation)
Do you have any metrics on the reductions in frame size for realistic coroutines?
thanks
Iain
>
> On Jul 13 2022, at 2:54 pm, Michal Jankovic
> <michal.jankovic59@gmail.com> wrote:
>
>> Hi Iain,
>>
>> thanks for the info. I have some follow-up questions.
>>
>> On Jul 12 2022, at 7:11 pm, Iain Sandoe <iain@sandoe.co.uk> wrote:
>>
>>> Hi Michal,
>>>
>>>> On 12 Jul 2022, at 16:14, Michal Jankovič
>>>> <michal.jankovic59@gmail.com> wrote:
>>>
>>>> One other related thing I would like to investigate is reducing the
>>>> number of compiler generated variables in the frame, particularly
>>>> _Coro_destroy_fn and _Coro_self_handle.
>>>>
>>>> As I understand it, _Coro_destroy_fn just sets a flag in
>>>> _Coro_resume_index and calls _Coro_resume_fn; it should be possible to
>>>> move this logic to __builtin_coro_destroy, so that only _Coro_resume_fn
>>>> is stored in the frame;
>>>
>>> That is a particular point about GCC’s implementation … (it is not
>>> neccesarily, or even
>>> likely to be the same for other implementations) - see below.
>>>
>>> I was intending to do experiment with making the ramp/resume/destroy
>>> value a parameter
>>> to the actor function so that we would have something like -
>>>
>>> ramp calls actor(frame, 0)
>>> resume calls actor(frame, 1)
>>> destroy calls actor(frame, 2)
>>> - the token values are illustrative, not intended to be a final version.
>>>
>>> I think that should allow for more inlining opportunites and possibly
>>> a way forward to
>>> frame elision (a.k.a halo).
>>>
>>>> this would however change the coroutine ABI - I don't know if that's
>>>> a problem.
>>>
>>> The external ABI for the coroutine is the
>>> resume,
>>> destroy pointers
>>> and the promise
>>> and that one can find each of these from the frame pointer.
>>>
>>> This was agreed between the interested “vendors” so that one compiler
>>> could invoke
>>> coroutines built by another. So I do not think this is so much a
>>> useful area to explore.
>>>
>>
>> I understand. I still want to try to implement a more light-weight frame
>> layout with just one function pointer; would it be possible to merge
>> such a change if it was made opt-in via a compiler flag, eg
>> `-fsmall-coroutine-frame`? My use-case for this is embedded environments
>> with very limited memory, and I do not care about interoperability with
>> other compilers there.
>>
>>> Also the intent is that an indirect call through the frame pointer is
>>> the most frequent
>>> operation so should be the most efficient.
>>> resume() might be called many times,
>>> destroy() just once thus it is a cold code path
>>> - space can be important too - but interoperability was the goal here.
>>>
>>>> The _Coro_self_handle should be constructible on-demand from the
>>>> frame address.
>>>
>>> Yes, and in the header the relevant items are all constexpr - so that
>>> should happen in the
>>> user’s code. I elected to have that value in the frame to avoid
>>> recreating it each time - I
>>> suppose that is a trade-off of one oiptimisation c.f. another …
>>
>> If the handle construction cannot be optimized out, and its thus
>> a tradeoff between frame size and number of instructions, then this
>> could also be enabled by a hypothetical `-fsmall-coroutine-frame`.
>>
>> Coming back to this:
>>
>>>>> (the other related optimisation is to eliminate frame entries for
>>>>> scopes without any suspend
>>>>> points - which has the potential to save even more space for code with
>>>>> sparse use of co_xxxx)
>>
>> This would be nice; although it could encompassed by a more general
>> optimization - eliminate frame entries for all variables which are not
>>
>> accessed (directly or via pointer / reference) beyond a suspend point.
>> To be fair, I do not know how to get started on such an optimization,
>> or if it is even possible to do on the frontend. This would however be
>> immensely useful for reducing the frame size taken-up by complicated
>> co_await expressions (among other things), for example, if I have a
>> composed operation:
>>
>> co_await when_either(get_leaf_awaitable_1(), get_leaf_awaitable_2());
>>
>> Right now, this creates space in the frame for the temporary 'leaf'
>> awaitables, which were already moved into the composed awaitable.
>> If the awaitable has an operator co_await that returns the real awaiter,
>> the original awaitable is also stored in the frame, even if it
>> is not referenced by the awaiter; another unused object gets stored if
>>
>> the .await_transform() customization point was used.
>>
>> What are your thoughts on the feasibility / difficulty of implementing
>> such an optimization?
>>
>> Michal
>>
>>>>
>>>> Do you have any advice / opinions on this before I try to implement it?
>>>
>>> Hopefully, the notes above help.
>>>
>>> I will rebase my latest code changes as soon as I have a chance and
>>> put them somewhere
>>> for you to look at - basically, these are to try and address the
>>> correctness issues we face,
>>>
>>> Iain
>>>
>>>
>>>>
>>>> Michal
>>>>
>>>> On Jul 12 2022, at 4:08 pm, Iain Sandoe <iain@sandoe.co.uk> wrote:
>>>>
>>>>> Hi Michal,
>>>>>
>>>>>> On 12 Jul 2022, at 14:35, Michal Jankovič via Gcc-patches
>>>>>> <gcc-patches@gcc.gnu.org> wrote:
>>>>>>
>>>>>> Currently, coroutine frames store all variables of a coroutine separately,
>>>>>> even if their lifetime does not overlap (they are in distinct
>>>>>> scopes). This
>>>>>> patch implements overlapping distinct variable scopes in the
>>>>>> coroutine frame,
>>>>>> by storing the frame fields in nested unions of structs. This lowers
>>>>>> the size
>>>>>> of the frame for larger coroutines significantly, and makes them
>>>>>> more usable
>>>>>> on systems with limited memory.
>>>>>
>>>>> not a review (I will try to take a look at the weekend).
>>>>>
>>>>> but … this is one of the two main optimisations on my TODO - so cool
>>>>> for doing it.
>>>>>
>>>>> (the other related optimisation is to eliminate frame entries for
>>>>> scopes without any suspend
>>>>> points - which has the potential to save even more space for code with
>>>>> sparse use of co_xxxx)
>>>>>
>>>>> Iain
>>>>>
>>>>>> Bootstrapped and regression tested on x86_64-pc-linux-gnu; new
>>>>>> test fails
>>>>>> before the patch and succeeds after with no regressions.
>>>>>>
>>>>>> PR c++/105989
>>>>>>
>>>>>> gcc/cp/ChangeLog:
>>>>>>
>>>>>> * coroutines.cc (struct local_var_info): Add field_access_path.
>>>>>> (build_local_var_frame_access_expr): New.
>>>>>> (transform_local_var_uses): Use build_local_var_frame_access_expr.
>>>>>> (coro_make_frame_entry_id): New.
>>>>>> (coro_make_frame_entry): Delegate to coro_make_frame_entry_id.
>>>>>> (struct local_vars_frame_data): Add orig, field_access_path.
>>>>>> (register_local_var_uses): Generate new frame layout. Create access
>>>>>> paths to vars.
>>>>>> (morph_fn_to_coro): Set new fields in local_vars_frame_data.
>>>>>>
>>>>>> gcc/testsuite/ChangeLog:
>>>>>>
>>>>>> * g++.dg/coroutines/pr105989.C: New test.
>>>>>>
>>>>>> <pr105989.patch>
>>>>>
>>>>>
>>>
>>>
> <pr105989.patch>
next prev parent reply other threads:[~2023-05-14 16:07 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-07-12 13:35 Michal Jankovič
2022-07-12 14:08 ` Iain Sandoe
2022-07-12 15:14 ` Michal Jankovič
2022-07-12 17:11 ` Iain Sandoe
2022-07-13 12:54 ` Michal Jankovic
2023-05-14 15:36 ` Michal Jankovič
2023-05-14 16:07 ` Iain Sandoe [this message]
2023-05-14 16:31 ` Michal Jankovič
2023-05-14 18:51 ` Iain Sandoe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4229A144-BA40-4653-A37D-171E28EAD6FF@sandoe.co.uk \
--to=iain@sandoe.co.uk \
--cc=gcc-patches@gcc.gnu.org \
--cc=michal.jankovic59@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).