public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: "Michal Jankovič" <michal.jankovic59@gmail.com>
To: Iain Sandoe <iain@sandoe.co.uk>
Cc: GCC Patches <gcc-patches@gcc.gnu.org>
Subject: Re: [PATCH] c++: coroutines - Overlap variables in frame [PR105989]
Date: Sun, 14 May 2023 18:31:59 +0200	[thread overview]
Message-ID: <D8279CD3-952A-44BE-A730-2464822BE102@getmailspring.com> (raw)
In-Reply-To: <4229A144-BA40-4653-A37D-171E28EAD6FF@sandoe.co.uk>

Hi Iain,

I do not currently have metrics for this, but I can look into generating
them, however I currently do not know of any large open-source projects
using coroutines that I could use for this; I was thinking about using
cppcoro unit tests, but they mostly contain very simple coroutines. I
have source for ~20k LoC proprietary project relying heavily on
coroutines (using boost::asio::awaitable), but here I cannot show the
source along with the numbers - would this be enough or should I look
for more open source projects?

thanks,
Michal

On May 14 2023, at 6:07 pm, Iain Sandoe <iain@sandoe.co.uk> wrote:

> Hi Michal,
>  
>> On 14 May 2023, at 16:36, Michal Jankovič
>> <michal.jankovic59@gmail.com> wrote:
>>  
>> Rebased the patch to GCC 14 trunk. Bootstrapped and regression tested
>> again on x86_64-pc-linux-gnu, only difference is the new test failing
>> without the patch.
>  
> (as previously noted, I am much in favour of this optimisation)
>  
> Do you have any metrics on the reductions in frame size for realistic coroutines?
>  
> thanks
> Iain
>  
>>  
>> On Jul 13 2022, at 2:54 pm, Michal Jankovic
>> <michal.jankovic59@gmail.com> wrote:
>>  
>>> Hi Iain,
>>>  
>>> thanks for the info. I have some follow-up questions.
>>>  
>>> On Jul 12 2022, at 7:11 pm, Iain Sandoe <iain@sandoe.co.uk> wrote:
>>>  
>>>> Hi Michal,
>>>>  
>>>>> On 12 Jul 2022, at 16:14, Michal Jankovič
>>>>> <michal.jankovic59@gmail.com> wrote:
>>>>  
>>>>> One other related thing I would like to investigate is reducing the
>>>>> number of compiler generated variables in the frame, particularly
>>>>> _Coro_destroy_fn and _Coro_self_handle.     
>>>>>  
>>>>> As I understand it, _Coro_destroy_fn just sets a flag in
>>>>> _Coro_resume_index and calls _Coro_resume_fn; it should be
>>>>> possible to
>>>>> move this logic to __builtin_coro_destroy, so that only _Coro_resume_fn
>>>>> is stored in the frame;
>>>>  
>>>> That is a particular point about GCC’s implementation … (it is not
>>>> neccesarily, or even
>>>> likely to be the same for other implementations) - see below.
>>>>  
>>>> I was intending to do experiment with making the ramp/resume/destroy
>>>> value a parameter
>>>> to the actor function so that we would have something like -
>>>>  
>>>> ramp calls      actor(frame, 0)
>>>> resume calls  actor(frame, 1)
>>>> destroy calls  actor(frame, 2)    
>>>> - the token values are illustrative, not intended to be a final version.
>>>>  
>>>> I think that should allow for more inlining opportunites and possibly
>>>> a way forward to
>>>> frame elision (a.k.a halo).
>>>>  
>>>>> this would however change the coroutine ABI - I don't know if that's
>>>>> a problem.
>>>>  
>>>> The external ABI for the coroutine is the    
>>>> resume,
>>>> destroy pointers    
>>>> and the promise    
>>>> and that one can find each of these from the frame pointer.
>>>>  
>>>> This was agreed between the interested “vendors” so that one compiler
>>>> could invoke
>>>> coroutines built by another.  So I do not think this is so much a
>>>> useful area to explore.
>>>>  
>>>  
>>> I understand. I still want to try to implement a more light-weight frame
>>> layout with just one function pointer; would it be possible to merge
>>> such a change if it was made opt-in via a compiler flag, eg
>>> `-fsmall-coroutine-frame`? My use-case for this is embedded environments
>>> with very limited memory, and I do not care about interoperability with
>>> other compilers there.    
>>>  
>>>> Also the intent is that an indirect call through the frame pointer is
>>>> the most frequent
>>>> operation so should be the most efficient.     
>>>> resume() might be called many times,    
>>>> destroy() just once thus it is a cold code path    
>>>> - space can be important too - but interoperability was the goal here.
>>>>  
>>>>> The _Coro_self_handle should be constructible on-demand from the
>>>>> frame address.
>>>>  
>>>> Yes, and in the header the relevant items are all constexpr - so that
>>>> should happen in the
>>>> user’s code.  I elected to have that value in the frame to avoid
>>>> recreating it each time - I
>>>> suppose that is a trade-off of one oiptimisation c.f. another …    
>>>  
>>> If the handle construction cannot be optimized out, and its thus    
>>> a tradeoff between frame size and number of instructions, then this
>>> could also be enabled by a hypothetical `-fsmall-coroutine-frame`.
>>>  
>>> Coming back to this:
>>>  
>>>>>> (the other related optimisation is to eliminate frame entries for
>>>>>> scopes without any suspend
>>>>>> points - which has the potential to save even more space for code with
>>>>>> sparse use of co_xxxx)
>>>  
>>> This would be nice; although it could encompassed by a more general  
>>>   
>>> optimization - eliminate frame entries for all variables which are not
>>>  
>>> accessed (directly or via pointer / reference) beyond a suspend point.
>>> To be fair, I do not know how to get started on such an optimization,
>>> or if it is even possible to do on the frontend. This would however be
>>> immensely useful for reducing the frame size taken-up by complicated
>>> co_await expressions (among other things), for example, if I have a
>>> composed operation:
>>>  
>>> co_await when_either(get_leaf_awaitable_1(), get_leaf_awaitable_2());
>>>  
>>> Right now, this creates space in the frame for the temporary 'leaf'  
>>>   
>>> awaitables, which were already moved into the composed awaitable.
>>> If the awaitable has an operator co_await that returns the real awaiter,
>>> the original awaitable is also stored in the frame, even if it    
>>> is not referenced by the awaiter; another unused object gets stored if
>>>  
>>> the .await_transform() customization point was used.
>>>  
>>> What are your thoughts on the feasibility / difficulty of implementing
>>> such an optimization?
>>>  
>>> Michal
>>>  
>>>>>  
>>>>> Do you have any advice / opinions on this before I try to
>>>>> implement it?
>>>>  
>>>> Hopefully, the notes above help.
>>>>  
>>>> I will rebase my latest code changes as soon as I have a chance and
>>>> put them somewhere
>>>> for you to look at - basically, these are to try and address the
>>>> correctness issues we face,
>>>>  
>>>> Iain
>>>>  
>>>>  
>>>>>  
>>>>> Michal
>>>>>  
>>>>> On Jul 12 2022, at 4:08 pm, Iain Sandoe <iain@sandoe.co.uk> wrote:
>>>>>  
>>>>>> Hi Michal,
>>>>>>  
>>>>>>> On 12 Jul 2022, at 14:35, Michal Jankovič via Gcc-patches
>>>>>>> <gcc-patches@gcc.gnu.org> wrote:
>>>>>>>  
>>>>>>> Currently, coroutine frames store all variables of a coroutine separately,
>>>>>>> even if their lifetime does not overlap (they are in distinct
>>>>>>> scopes). This
>>>>>>> patch implements overlapping distinct variable scopes in the
>>>>>>> coroutine frame,
>>>>>>> by storing the frame fields in nested unions of structs. This lowers
>>>>>>> the size
>>>>>>> of the frame for larger coroutines significantly, and makes them
>>>>>>> more usable
>>>>>>> on systems with limited memory.
>>>>>>  
>>>>>> not a review (I will try to take a look at the weekend).
>>>>>>  
>>>>>> but … this is one of the two main optimisations on my TODO - so cool
>>>>>> for doing it.
>>>>>>  
>>>>>> (the other related optimisation is to eliminate frame entries for
>>>>>> scopes without any suspend
>>>>>> points - which has the potential to save even more space for code with
>>>>>> sparse use of co_xxxx)
>>>>>>  
>>>>>> Iain
>>>>>>  
>>>>>>> Bootstrapped and regression tested on x86_64-pc-linux-gnu; new
>>>>>>> test fails
>>>>>>> before the patch and succeeds after with no regressions.
>>>>>>>  
>>>>>>> 	PR c++/105989
>>>>>>>  
>>>>>>> gcc/cp/ChangeLog:
>>>>>>>  
>>>>>>> 	* coroutines.cc (struct local_var_info): Add field_access_path.
>>>>>>> 	(build_local_var_frame_access_expr): New.
>>>>>>> 	(transform_local_var_uses): Use build_local_var_frame_access_expr.
>>>>>>> 	(coro_make_frame_entry_id): New.
>>>>>>> 	(coro_make_frame_entry): Delegate to coro_make_frame_entry_id.
>>>>>>> 	(struct local_vars_frame_data): Add orig, field_access_path.
>>>>>>> 	(register_local_var_uses): Generate new frame layout. Create access
>>>>>>> 	paths to vars.
>>>>>>> 	(morph_fn_to_coro): Set new fields in local_vars_frame_data.     
>>>>>>>  
>>>>>>> gcc/testsuite/ChangeLog:
>>>>>>>  
>>>>>>> 	* g++.dg/coroutines/pr105989.C: New test.
>>>>>>>  
>>>>>>> <pr105989.patch>
>>>>>>  
>>>>>>  
>>>>  
>>>>  
>> <pr105989.patch>
>  
>

  reply	other threads:[~2023-05-14 16:32 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-12 13:35 Michal Jankovič
2022-07-12 14:08 ` Iain Sandoe
2022-07-12 15:14   ` Michal Jankovič
2022-07-12 17:11     ` Iain Sandoe
2022-07-13 12:54       ` Michal Jankovic
2023-05-14 15:36         ` Michal Jankovič
2023-05-14 16:07           ` Iain Sandoe
2023-05-14 16:31             ` Michal Jankovič [this message]
2023-05-14 18:51               ` Iain Sandoe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=D8279CD3-952A-44BE-A730-2464822BE102@getmailspring.com \
    --to=michal.jankovic59@gmail.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=iain@sandoe.co.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).