public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Iain Sandoe <iain@sandoe.co.uk>
To: "Michal Jankovič" <michal.jankovic59@gmail.com>
Cc: GCC Patches <gcc-patches@gcc.gnu.org>
Subject: Re: [PATCH] c++: coroutines - Overlap variables in frame [PR105989]
Date: Sun, 14 May 2023 17:07:32 +0100	[thread overview]
Message-ID: <4229A144-BA40-4653-A37D-171E28EAD6FF@sandoe.co.uk> (raw)
In-Reply-To: <412A866E-BE4E-4DA8-A10E-4410C01FB9FF@getmailspring.com>

Hi Michal,

> On 14 May 2023, at 16:36, Michal Jankovič <michal.jankovic59@gmail.com> wrote:
> 
> Rebased the patch to GCC 14 trunk. Bootstrapped and regression tested
> again on x86_64-pc-linux-gnu, only difference is the new test failing
> without the patch.

(as previously noted, I am much in favour of this optimisation)

Do you have any metrics on the reductions in frame size for realistic coroutines?

thanks
Iain

> 
> On Jul 13 2022, at 2:54 pm, Michal Jankovic
> <michal.jankovic59@gmail.com> wrote:
> 
>> Hi Iain,
>> 
>> thanks for the info. I have some follow-up questions.
>> 
>> On Jul 12 2022, at 7:11 pm, Iain Sandoe <iain@sandoe.co.uk> wrote:
>> 
>>> Hi Michal,
>>> 
>>>> On 12 Jul 2022, at 16:14, Michal Jankovič
>>>> <michal.jankovic59@gmail.com> wrote:
>>> 
>>>> One other related thing I would like to investigate is reducing the
>>>> number of compiler generated variables in the frame, particularly
>>>> _Coro_destroy_fn and _Coro_self_handle.    
>>>> 
>>>> As I understand it, _Coro_destroy_fn just sets a flag in
>>>> _Coro_resume_index and calls _Coro_resume_fn; it should be possible to
>>>> move this logic to __builtin_coro_destroy, so that only _Coro_resume_fn
>>>> is stored in the frame;
>>> 
>>> That is a particular point about GCC’s implementation … (it is not
>>> neccesarily, or even
>>> likely to be the same for other implementations) - see below.
>>> 
>>> I was intending to do experiment with making the ramp/resume/destroy
>>> value a parameter
>>> to the actor function so that we would have something like -
>>> 
>>> ramp calls      actor(frame, 0)
>>> resume calls  actor(frame, 1)
>>> destroy calls  actor(frame, 2)   
>>> - the token values are illustrative, not intended to be a final version.
>>> 
>>> I think that should allow for more inlining opportunites and possibly
>>> a way forward to
>>> frame elision (a.k.a halo).
>>> 
>>>> this would however change the coroutine ABI - I don't know if that's
>>>> a problem.
>>> 
>>> The external ABI for the coroutine is the   
>>> resume,
>>> destroy pointers   
>>> and the promise   
>>> and that one can find each of these from the frame pointer.
>>> 
>>> This was agreed between the interested “vendors” so that one compiler
>>> could invoke
>>> coroutines built by another.  So I do not think this is so much a
>>> useful area to explore.
>>> 
>> 
>> I understand. I still want to try to implement a more light-weight frame
>> layout with just one function pointer; would it be possible to merge
>> such a change if it was made opt-in via a compiler flag, eg
>> `-fsmall-coroutine-frame`? My use-case for this is embedded environments
>> with very limited memory, and I do not care about interoperability with
>> other compilers there.   
>> 
>>> Also the intent is that an indirect call through the frame pointer is
>>> the most frequent
>>> operation so should be the most efficient.    
>>> resume() might be called many times,   
>>> destroy() just once thus it is a cold code path   
>>> - space can be important too - but interoperability was the goal here.
>>> 
>>>> The _Coro_self_handle should be constructible on-demand from the
>>>> frame address.
>>> 
>>> Yes, and in the header the relevant items are all constexpr - so that
>>> should happen in the
>>> user’s code.  I elected to have that value in the frame to avoid
>>> recreating it each time - I
>>> suppose that is a trade-off of one oiptimisation c.f. another …   
>> 
>> If the handle construction cannot be optimized out, and its thus   
>> a tradeoff between frame size and number of instructions, then this
>> could also be enabled by a hypothetical `-fsmall-coroutine-frame`.
>> 
>> Coming back to this:
>> 
>>>>> (the other related optimisation is to eliminate frame entries for
>>>>> scopes without any suspend
>>>>> points - which has the potential to save even more space for code with
>>>>> sparse use of co_xxxx)
>> 
>> This would be nice; although it could encompassed by a more general   
>> optimization - eliminate frame entries for all variables which are not
>> 
>> accessed (directly or via pointer / reference) beyond a suspend point.
>> To be fair, I do not know how to get started on such an optimization,
>> or if it is even possible to do on the frontend. This would however be
>> immensely useful for reducing the frame size taken-up by complicated
>> co_await expressions (among other things), for example, if I have a
>> composed operation:
>> 
>> co_await when_either(get_leaf_awaitable_1(), get_leaf_awaitable_2());
>> 
>> Right now, this creates space in the frame for the temporary 'leaf'   
>> awaitables, which were already moved into the composed awaitable.
>> If the awaitable has an operator co_await that returns the real awaiter,
>> the original awaitable is also stored in the frame, even if it   
>> is not referenced by the awaiter; another unused object gets stored if
>> 
>> the .await_transform() customization point was used.
>> 
>> What are your thoughts on the feasibility / difficulty of implementing
>> such an optimization?
>> 
>> Michal
>> 
>>>> 
>>>> Do you have any advice / opinions on this before I try to implement it?
>>> 
>>> Hopefully, the notes above help.
>>> 
>>> I will rebase my latest code changes as soon as I have a chance and
>>> put them somewhere
>>> for you to look at - basically, these are to try and address the
>>> correctness issues we face,
>>> 
>>> Iain
>>> 
>>> 
>>>> 
>>>> Michal
>>>> 
>>>> On Jul 12 2022, at 4:08 pm, Iain Sandoe <iain@sandoe.co.uk> wrote:
>>>> 
>>>>> Hi Michal,
>>>>> 
>>>>>> On 12 Jul 2022, at 14:35, Michal Jankovič via Gcc-patches
>>>>>> <gcc-patches@gcc.gnu.org> wrote:
>>>>>> 
>>>>>> Currently, coroutine frames store all variables of a coroutine separately,
>>>>>> even if their lifetime does not overlap (they are in distinct
>>>>>> scopes). This
>>>>>> patch implements overlapping distinct variable scopes in the
>>>>>> coroutine frame,
>>>>>> by storing the frame fields in nested unions of structs. This lowers
>>>>>> the size
>>>>>> of the frame for larger coroutines significantly, and makes them
>>>>>> more usable
>>>>>> on systems with limited memory.
>>>>> 
>>>>> not a review (I will try to take a look at the weekend).
>>>>> 
>>>>> but … this is one of the two main optimisations on my TODO - so cool
>>>>> for doing it.
>>>>> 
>>>>> (the other related optimisation is to eliminate frame entries for
>>>>> scopes without any suspend
>>>>> points - which has the potential to save even more space for code with
>>>>> sparse use of co_xxxx)
>>>>> 
>>>>> Iain
>>>>> 
>>>>>> Bootstrapped and regression tested on x86_64-pc-linux-gnu; new
>>>>>> test fails
>>>>>> before the patch and succeeds after with no regressions.
>>>>>> 
>>>>>> 	PR c++/105989
>>>>>> 
>>>>>> gcc/cp/ChangeLog:
>>>>>> 
>>>>>> 	* coroutines.cc (struct local_var_info): Add field_access_path.
>>>>>> 	(build_local_var_frame_access_expr): New.
>>>>>> 	(transform_local_var_uses): Use build_local_var_frame_access_expr.
>>>>>> 	(coro_make_frame_entry_id): New.
>>>>>> 	(coro_make_frame_entry): Delegate to coro_make_frame_entry_id.
>>>>>> 	(struct local_vars_frame_data): Add orig, field_access_path.
>>>>>> 	(register_local_var_uses): Generate new frame layout. Create access
>>>>>> 	paths to vars.
>>>>>> 	(morph_fn_to_coro): Set new fields in local_vars_frame_data.    
>>>>>> 
>>>>>> gcc/testsuite/ChangeLog:
>>>>>> 
>>>>>> 	* g++.dg/coroutines/pr105989.C: New test.
>>>>>> 
>>>>>> <pr105989.patch>
>>>>> 
>>>>> 
>>> 
>>> 
> <pr105989.patch>


  reply	other threads:[~2023-05-14 16:07 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-12 13:35 Michal Jankovič
2022-07-12 14:08 ` Iain Sandoe
2022-07-12 15:14   ` Michal Jankovič
2022-07-12 17:11     ` Iain Sandoe
2022-07-13 12:54       ` Michal Jankovic
2023-05-14 15:36         ` Michal Jankovič
2023-05-14 16:07           ` Iain Sandoe [this message]
2023-05-14 16:31             ` Michal Jankovič
2023-05-14 18:51               ` Iain Sandoe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4229A144-BA40-4653-A37D-171E28EAD6FF@sandoe.co.uk \
    --to=iain@sandoe.co.uk \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=michal.jankovic59@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).